Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis

Kelly Reeve; Begum Irmak On; Joachim Havla; Jacob Burns; Martina A Gosteli-Peter; Albraa Alabsawi; Zoheir Alayash; Andrea Götschi; Heidi Seibold; Ulrich Mansmann; Ulrike Held

doi:10.1002/14651858.CD013606.pub2

. 2023 Sep 8;2023(9):CD013606. doi: 10.1002/14651858.CD013606.pub2

Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis

Kelly Reeve ¹, Begum Irmak On ², Joachim Havla ³, Jacob Burns ^2,⁴, Martina A Gosteli-Peter ⁵, Albraa Alabsawi ², Zoheir Alayash ^2,⁶, Andrea Götschi ¹, Heidi Seibold ⁷, Ulrich Mansmann ^2,⁴, Ulrike Held ^1,^✉

Editor: Cochrane Multiple Sclerosis and Rare Diseases of the CNS Group

PMCID: PMC10486189 PMID: 37681561

Abstract

Background

Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system that affects millions of people worldwide. The disease course varies greatly across individuals and many disease‐modifying treatments with different safety and efficacy profiles have been developed recently. Prognostic models evaluated and shown to be valid in different settings have the potential to support people with MS and their physicians during the decision‐making process for treatment or disease/life management, allow stratified and more precise interpretation of interventional trials, and provide insights into disease mechanisms. Many researchers have turned to prognostic models to help predict clinical outcomes in people with MS; however, to our knowledge, no widely accepted prognostic model for MS is being used in clinical practice yet.

Objectives

To identify and summarise multivariable prognostic models, and their validation studies for quantifying the risk of clinical disease progression, worsening, and activity in adults with MS.

Search methods

We searched MEDLINE, Embase, and the Cochrane Database of Systematic Reviews from January 1996 until July 2021. We also screened the reference lists of included studies and relevant reviews, and references citing the included studies.

Selection criteria

We included all statistically developed multivariable prognostic models aiming to predict clinical disease progression, worsening, and activity, as measured by disability, relapse, conversion to definite MS, conversion to progressive MS, or a composite of these in adult individuals with MS. We also included any studies evaluating the performance of (i.e. validating) these models. There were no restrictions based on language, data source, timing of prognostication, or timing of outcome.

Data collection and analysis

Pairs of review authors independently screened titles/abstracts and full texts, extracted data using a piloted form based on the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS), assessed risk of bias using the Prediction Model Risk Of Bias Assessment Tool (PROBAST), and assessed reporting deficiencies based on the checklist items in Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD). The characteristics of the included models and their validations are described narratively. We planned to meta‐analyse the discrimination and calibration of models with at least three external validations outside the model development study but no model met this criterion. We summarised between‐study heterogeneity narratively but again could not perform the planned meta‐regression.

Main results

We included 57 studies, from which we identified 75 model developments, 15 external validations corresponding to only 12 (16%) of the models, and six author‐reported validations. Only two models were externally validated multiple times. None of the identified external validations were performed by researchers independent of those that developed the model. The outcome was related to disease progression in 39 (41%), relapses in 8 (8%), conversion to definite MS in 17 (18%), and conversion to progressive MS in 27 (28%) of the 96 models or validations. The disease and treatment‐related characteristics of included participants, and definitions of considered predictors and outcome, were highly heterogeneous amongst the studies. Based on the publication year, we observed an increase in the percent of participants on treatment, diversification of the diagnostic criteria used, an increase in consideration of biomarkers or treatment as predictors, and increased use of machine learning methods over time.

Usability and reproducibility

All identified models contained at least one predictor requiring the skills of a medical specialist for measurement or assessment. Most of the models (44; 59%) contained predictors that require specialist equipment likely to be absent from primary care or standard hospital settings. Over half (52%) of the developed models were not accompanied by model coefficients, tools, or instructions, which hinders their application, independent validation or reproduction. The data used in model developments were made publicly available or reported to be available on request only in a few studies (two and six, respectively).

Risk of bias

We rated all but one of the model developments or validations as having high overall risk of bias. The main reason for this was the statistical methods used for the development or evaluation of prognostic models; we rated all but two of the included model developments or validations as having high risk of bias in the analysis domain. None of the model developments that were externally validated or these models' external validations had low risk of bias. There were concerns related to applicability of the models to our research question in over one‐third (38%) of the models or their validations.

Reporting deficiencies

Reporting was poor overall and there was no observable increase in the quality of reporting over time. The items that were unclearly reported or not reported at all for most of the included models or validations were related to sample size justification, blinding of outcome assessors, details of the full model or how to obtain predictions from it, amount of missing data, and treatments received by the participants. Reporting of preferred model performance measures of discrimination and calibration was suboptimal.

Authors' conclusions

The current evidence is not sufficient for recommending the use of any of the published prognostic prediction models for people with MS in clinical routine today due to lack of independent external validations. The MS prognostic research community should adhere to the current reporting and methodological guidelines and conduct many more state‐of‐the‐art external validation studies for the existing or newly developed models.

Keywords: Adult, Humans, Disease Progression, Multiple Sclerosis, Prognosis, Reproducibility of Results, Systematic Reviews as Topic

Plain language summary

Which models exist for prediction of future disease outcomes in people with multiple sclerosis?

Why is it important to study multiple sclerosis?

Multiple sclerosis (MS) is a chronic disease of the brain, spine, and nerves. Millions of people worldwide suffer from this disease, but the disease and how it progresses can be very different from person to person. Although MS cannot be cured, different treatments are available that can help reduce symptoms and slow the worsening of the disease. These treatments work differently, with some having more severe side effects than others. Understanding the severity of an individual’s MS is important to patients and medical professionals.

Why are prognostic models important in the context of multiple sclerosis?

Prognostic models help patients and medical professionals understand how sick an individual is and will become. This understanding can support patients during life and treatment choices. Prognostic models can also help medical professionals make decisions about how to best treat an individual, better understand the disease, or to develop treatments. Prognostic models for MS might involve combining a range of different pieces of information about an individual to predict how their MS will continue to develop. Important pieces of information to include in a prognostic model could be, for example, information on personal characteristics (such as age, sex, body mass index), information on their behaviour (such as whether they smoke), and information about their MS (such as how long they have had the disease). Other clinical features or measurements may also be important.

What did we want to find out?

We wanted to search for and find all prognostic models that combine multiple pieces of information to predict how MS will continue to develop and worsen in adults.

What did we do?

We used different techniques to search for all studies that described prognostic models, which combine multiple pieces of information, developed in the context of MS. We were interested in studies showing how these prognostic models were developed, as well as studies evaluating how well they actually work in practice. Once we found all relevant studies, we summarised them and evaluated how well they reported their results and how well they were conducted.

What did we find?

We found 57 studies that described prognostic models combining multiple pieces of information to predict how MS will continue to develop and worsen in adults. These studies described the development of 75 different prognostic models. There were 15 instances in which the performance of specific prognostic models was evaluated.

We found that prognostic models focus on different outcomes; 41% looked at disease progression, 8% at relapses, 18% at moving from a first attack to definite MS, and 28% at moving from the early stages of MS to progressive MS. The prognostic models we found were very different from one another in many ways. The patients they used to develop the models, for example, were very different in terms of treatments. In addition, the pieces of information they used to predict the course of MS were very different from one another. We found that prognostic models have changed over time regarding the diagnosis of MS and increase in use of treatment, the information observed with new techniques, or new modelling approaches. We also found that using these prognostic models requires information about the individual that would require a medical specialist and often specialist equipment, both of which may not be available in many clinics and hospitals.

What are the limitations of the evidence?

We found problems with most studies, meaning that we may not be able to trust their results. Common problems involved data and statistical methods used across studies. Additionally, many of the studies report results that may be very different if the prognostic models are applied to a new set of people with MS. We also found that the studies did a poor job of describing their methods and reporting their findings.

What does this mean?

The studies we found show that the evidence on prognostic models for predicting how MS will continue to develop and worsen in adults is not yet well‐developed. New research is needed that focusses on using methods recommended in guidelines to develop prognostic models and evaluate their performance. This research should also focus on describing their methods and results well, so that other researchers and medical professionals can use them for research and clinical practice.

Summary of findings

Summary of findings 1. Summary of findings.

Population: adults with relapsing‐remitting multiple sclerosis Setting: specialty clinical care Model: models with more than one external validation Outcome: conversion to progressive multiple sclerosis Timing: prediction of time to outcome at disease onset
Model name	External validations (study, if different)	Number of participants	Performance measure	Overall risk of bias assessment
Manouchehrinia 2019	British Columbia cohort	3967	c‐statistic 0.77 (95% CI 0.76 to 0.78)	High due to use of a predictor measured at a time point after time of model use, and lack of calibration assessment
	ACROSS trial	175	c‐statistic 0.77 (95% CI 0.70 to 0.85)	Same as above
	FREEDOMS and FREEDOMS II trial extensions	2355	c‐statistic 0.87 (95% CI 0.84 to 0.89)	Same as above
Bayesian Risk Estimate for Multiple Sclerosis score	Italian cohort (Bergamaschi 2007)	535	Cutoff 95% sensitivity 0.17, specificity 0.99	High due to lack of discrimination or calibration assessment
Bayesian Risk Estimate for Multiple Sclerosis score	MSBase registry (Bergamaschi 2015)	1131	Cutoff 50% sensitivity 0.35, specificity 0.80	Same as above

Open in a new tab

ACROSS: A CROSS‐Sectional Long‐term Follow‐up of Fingolimod Phase II Study Patients CI: confidence interval FREEDOMS: FTY720 Research Evaluating Effects of Daily Oral therapy in Multiple Sclerosis

Background

Description of the health condition and context

Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system (CNS) that usually begins in young adulthood and affects 2.8 million people worldwide (Adelman 2013; Thompson 2018a; Walton 2020). The course of MS varies greatly and is characterised by clinical, radiological, genetic, and pathological heterogeneity. The exact aetiology of MS is still unclear, even though there are convincing arguments for an (auto‐)immunopathogenesis (Hohlfeld 2016a; Hohlfeld 2016b), triggered or driven by exposure to environmental risk factors (Attfield 2022). These include the neuropathological findings, the various analogies to the autoimmune animal models, and, above all, the response of MS to various immunosuppressive therapies (Thompson 2018a). Genetic research also supports the (auto‐)immunopathogenesis theory implicating peripheral immune cells and microglia in susceptibility (Attfield 2022; Patsopoulos 2019; Sawcer 2011). Environmental factors such as vitamin D deficiency or Epstein‐Barr virus infection have been shown to have an influence on the development (Attfield 2022; Belbasis 2015; Bjornevik 2023) and course (Hempel 2017) of the disease. Modern imaging techniques, such as diffusion tensor imaging, as well as neuropathological investigations have shown that, in addition to demyelination in MS, significant damage occurs to axons (Thompson 2018a).

The current diagnosis of MS is based on the modified ‘McDonald criteria’ (Thompson 2018b), and further differentiation of disease course into subtypes is described by Lublin 2014. In relapsing MS, the disease may initially present as clinically isolated syndrome (CIS), a first disease attack of at least 24 hours with patient‐reported symptoms reflecting an inflammatory demyelinating event in the CNS without fever or infection. According to current diagnostic criteria, the first attack may already be definite relapsing‐remitting MS (RRMS) if there is temporal and spatial dissemination at the time of initial manifestation, as evidenced by magnetic resonance imaging (MRI), cerebrospinal fluid (CSF) diagnosis, and/or clinical presentation. RRMS is characterised by relapses and periods of remission with stable neurological disability (Thompson 2018b). According to natural history studies, 30% to 50% of untreated RRMS patients convert to secondary progressive MS (SPMS) within 10 to 15 years after disease onset (Weinshenker 1989a). Progressive MS is defined as a steadily increasing neurological disability without unequivocal recovery (Lorscheider 2016). About 15% of people with MS, however, have progressive disease from the start, the primary progressive MS (PPMS) subtype (Reich 2018). This classification was made at a time when few biomarkers were available and is still used in clinical practice, especially for communication with patients and the definition of study cohorts. However, in the current understanding of MS pathophysiology, both peripherally initiated and CNS compartmentalised inflammation processes are assumed to contribute to the disease progression. In addition, signs of neurodegeneration can be detected early in the disease. From a clinical point of view, this means that even people with relapsing MS may have a gradual progression independent of relapse activity in addition to accumulation of residual disability from relapses (relapse‐associated worsening). Similarly, people with progressive MS may continue experiencing relapses (Lublin 2014).

Although MS is still incurable, pharmacological treatment for MS, particularly RRMS, has developed with increasing speed since the introduction of the first interferon‐beta preparation more than 25 years ago. The arsenal of MS therapeutics includes various substances with different mechanisms of action. The main therapeutic goals of treatment are reduction in relapse rate, delaying onset, and slowing or stopping confirmed disability progression (Wingerchuk 2016). Therapeutic choice amongst highly effective therapeutic options has led to the expectation of no evidence of disease activity (NEDA) under immunotherapy. This is defined by the absence of relapses, disability progression, and active MRI lesions (Thompson 2018a). There are two established treatment strategies: the use of mild to moderately effective but safe medications from disease onset with escalation strategies as needed ('stepwise escalation'), or the use of higher efficacy medications from disease onset, which may be associated with higher risk of adverse events ('hit hard and early' concept) (Ontaneda 2019). Overtreatment or undertreatment should be avoided and the risk‐benefit balance be considered; however, refraining from treatment is also an option.

The current guidelines usually classify the available therapies as first‐, second‐, or third‐line according to their efficacy and safety profiles and recommend selection of a therapy based on the patient’s disease activity and preferences, reserving efficacious but high‐risk second‐line medications for highly active disease (Hemmer 2021; Montalban 2018; Rae‐Grant 2018). The definition of highly active disease varies across the literature, however (Diaz 2019; Freedman 2016), and how to define benign MS is unclear (Correale 2012). With its broad spectrum of clinical manifestations and an armamentarium of therapeutic approaches with different risk profiles, MS is a prime example of a disease that requires individualised medicine.

Description of the prognostic models

Many potential prognostic factors have been identified for predicting disease progression, worsening, and activity in people with MS. These include but are not limited to age, sex, body mass index, smoking history, and disease duration (Briggs 2019). Various biomarkers for MS have also been proposed, with those measured by MRI being the most commonly investigated (Rotstein 2019). However, prediction typically requires a combination of prognostic factors (Steyerberg 2013), especially for multifactorial diseases such as MS. Researchers in this clinical field have noted the strong focus on prognostic factors as opposed to prognostic modelling and have expressed the need for models for estimation of ‘individualised’ risk (de Groot 2009; Wottschel 2015).

A prognostic model is an empirical model that combines the effects of two or more predictors in order to estimate the risk of future clinical outcomes in individual patients within a specified length of time (Steyerberg 2013; Steyerberg 2019). As with prognosis research more generally, these models can serve many purposes, including improving the study design and analysis of randomised clinical trials (Hernández 2004; Roozenbeek 2009). For instance, Sormani and colleagues suggest the use of their model for participant selection in MS clinical trials (Sormani 2007). Adjusting for baseline risk in network meta‐analyses (Chalkou 2021) and health service research (Jarman 2010) are other application areas.

Ideally, prognostic models are developed using large high‐quality datasets, with subjects representative of the population to which the model should later be applied. Large samples may generally be required for more complex modelling tasks, such as model development including data‐driven predictor selection from a large set of candidate predictors. Sufficiently large datasets reduce the potential for overfitting and ensure that the overall risk can be precisely estimated (Riley 2019). Outcomes and their timing should be important to people with the health condition of interest and, along with predictors, be well‐defined prior to their assessment. When selecting predictors to consider, basic variables known to be related to prognosis, such as disease duration and sex, should always be included, in addition to novel biomarkers that may provide added value (Steyerberg 2013).

Before a prognostic model is used in practice, it must be appropriately evaluated. This evaluation ideally has two components. One component, discrimination, assesses the success of a prognostic model in ranking those that experience the event versus those that do not. The second component, calibration, assesses the prognostic model’s ability to estimate event probabilities that are close to those actually observed. Good discriminative power is important to all prognostic model applications and may even be sufficient for some applications (Justice 1999), such as patient stratification in randomised controlled trials and adjustment in comparative healthcare research. However, people with MS and the clinicians advising them are interested in the absolute probability of outcomes in these individuals, as opposed to comparing risks with other people, hence model calibration is very important in this setting.

The data used for evaluation determines its usefulness and generalisability, i.e. how the model is expected to perform in new patients (Justice 1999). Internal validation is the evaluation of a model in the sample that it was developed in. If the internal validation is performed directly in the development sample without any resampling techniques (apparent validation), the model accuracy is expected to be overestimated, i.e. overoptimistic (Harrell 2001; Moons 2019; Steyerberg 2019). Resampling techniques, such as cross‐validation and bootstrapping, allow us to assess overfitting and account for overoptimism. However, even with correct internal validation procedures, we only learn about the accuracy of the model as applied to people from an identical underlying population. Therefore, a further prerequisite before use of a prognostic model in practice is external validation, i.e. its evaluation in a group of patients independent of those that were used in the model’s development. Such independence may be based on many qualities, such as time, location, and participant spectrum (patients with different disease severities or belonging to different disease subtypes). In MS, historical transportability is important, for example, because disease severity is likely to have changed over time with changes in diagnostic criteria. It is important to assess whether models developed under older diagnostic criteria are still accurate when applied to patients today. Before any clinical application, a prognostic model needs to have good discrimination and calibration in many different external validations, preferably by researchers independent of those that developed the model.

To our knowledge, no widely accepted prognostic model for MS is being used in clinical practice yet. A systematic review is needed to understand the state of the MS prognostic modelling literature as a whole and whether any models are on the way towards translation into practice. In order to address this goal, the scope of our review will be broad in terms of outcomes, predictors, timing, and setting, as well as the form of MS addressed.

Health outcomes

According to a survey conducted by Day and colleagues, disability progression and relapses are the most important outcomes related to disease course for people with MS and clinical experts alike (Day 2018). Disease progression is characterised by a relapse‐independent accumulation of neurological deficits and usually manifests as a decrease in walking ability that occurs over varying time spans (Warnke 2019). Disease progression is most commonly measured by the ambulation functional score of the Expanded Disability Status Scale (EDSS). Neurological disability has also been operationalised by the Multiple Sclerosis Functional Composite (MSFC). The International Advisory Committee on Clinical Trials of MS has suggested that the term ‘progression’ should only be used for the progressive subtypes of the disease and that relapse‐related increases in disability be referred to as disease 'worsening' (Lublin 2014). Although a more consistent application of the Lublin Criteria and the 'progression independent of relapse activity' and 'relapse‐associated worsening' terminology regarding relapse‐associated worsening and progression independent of relapse activity has become apparent in recent years, the terminology used in the literature may not exactly match these definitions. For the purposes of this review, increase in disability, either dependent or independent of relapses, is relevant, as it is ranked as the highest priority outcome by people with MS (Day 2018).

Relapses, another high‐priority clinical outcome indicative of disease activity, manifest as acute and transient episodes of neurological symptoms. Subacute episodes can lead to different neurological symptoms, which may remit completely in the course of the disease, but may also be accompanied by residual disability. Despite the fact that relapse rate is the primary outcome in most confirmatory clinical trials leading to market approval of RRMS therapies, it is not yet clear whether a reduction in relapse rate is associated with a better overall prognosis. For example, the strength of the effect of the reduction in relapse rate with regard to the prevention of long‐term disability progression remains controversial (Cree 2019).

Diagnostic transition to a more advanced disease stage, indicative of worsening and active disease, is also of interest to prognostic research in this field. For example, people initially diagnosed with CIS can meet the criteria of clinically definite MS by experiencing another relapse. The ability to predict whether or when the conversion to definite MS will occur might have substantial clinical impact on decisions to start or abstain from early treatment in people with CIS. Patients initially diagnosed with RRMS can be considered to have converted to a progressive course, SPMS, by retrospective assessment of sustained progression independent of relapses over a period of time, for example one year (Thompson 2018b).

As MS is a lifelong condition, we find the aforementioned outcomes to be relevant not only at various time points of prognostication during the disease course, but also for various prediction horizons. We also expect outcome definitions, timing, and measurement methods for clinical disease progression, worsening, and activity to be highly heterogeneous in the literature.

Why it is important to do this review

While there are more than 50 published Cochrane Reviews on interventions for MS or associated symptoms and more than 20 are ongoing, this is the first Cochrane Review of prognostic studies in MS (Cochrane 2021). Independent of the Cochrane network, Hempel and colleagues reviewed 59 studies of single modifiable prognostic factors in MS progression, such as vitamin D levels and smoking status (Hempel 2017). Also, Río and Ruiz‐Peña reviewed 45 studies that predict long‐term treatment response by short‐term response criteria, including both single factors and multivariable expert‐based algorithms (Río 2016). Both reviews found a wide variety of methods, timing, and outcome and prognostic factor definitions.

We aimed to conduct a systematic Cochrane Review of multivariable prognostic models for predicting future clinical outcomes indicative of disease progression, worsening, and activity in people with MS at any time point following diagnosis. The results from this review will provide a long‐sought comprehensive summary and assessment of the evidence base for all disease subtypes (not just RRMS) and across all statistical methodologies (not just machine learning (ML) studies). We aimed to thereby enhance the knowledge base described by the many non‐systematic reviews (Derfuss 2012; Gafson 2017; Miller 2008; Rotstein 2019) and focused systematic reviews (Brown 2020; Havas 2020; Seccia 2021) reported thus far. Identified models could potentially provide people with MS and their physicians with informative and clinically relevant tools for making decisions on disease management.

No review thus far presents changes in prognostic factors and methods over time, nor assessment of reporting deficiencies of the models in the literature. We also summarised the readiness of the models for translation into clinical practice in terms of external validation evidence, thereby identifying models that require further external validation or clinical impact assessment. This review forms a solid basis from which to make recommendations for future prognosis research in MS.

Objectives

To identify and summarise multivariable prognostic models for quantifying the risk of clinical disease progression, worsening, and activity in MS.

To this end we aimed to:

describe the characteristics of the identified multivariable prognostic models, including prognostic factors considered and evaluation measures used;
describe changes in outcome definitions, time frames, prognostic factors, and statistical methods over time;
summarise the validation performance of the models;
summarise model performance and synthesise across external validation studies via meta‐analysis, where possible;
investigate sources of heterogeneity between studies;
assess the risk of bias in the models;
evaluate moderating effects on model performance by meta‐regression, where possible; and
make recommendations for future MS prognostic research.

Methods

Criteria for considering studies for this review

Defining the eligibility criteria was an iterative process, which involved multiple discussions within the review team based on our previous knowledge, as well as several studies we knew should be included, and borderline cases that we knew should be excluded. These criteria are described by the PICOTS table below and in the following sections.

Population	Adults with MS, including all subtypes (CIS, RRMS, SPMS, PPMS)
Intervention	All multivariable prognostic models and their validation studies
Comparator	There are no comparators in this review
Outcome	Clinical disease progression, worsening, and activity, which are measured based on disability, relapses, conversion to a more advanced disease subtype (clinically definite, progressive), or a composite of these
Timing	The models are to be used any time following diagnosis for predicting future disease course
Setting	Any clinical setting where people with MS receive medical care
CIS: clinically isolated syndrome; MS: multiple sclerosis; PPMS: primary progressive MS; RRMS: relapsing‐remitting MS; SPMS: secondary progressive MS

Open in a new tab

Types of studies

We included studies that aimed to develop, validate, extend, or update multivariable prognostic models of future disease outcomes in people with MS.

Study design: We included prognostic modelling studies that used data collected retrospectively or prospectively from the following sources: routine care, disease/patient registries, cohort studies, case‐control studies, and randomised controlled trials.
Data source and setting: We included studies based on both primary and secondary use of data. We included models intended for use in any clinical setting where people with MS receive medical care. We excluded studies that did not contain prediction of future outcomes in individuals.
Statistical methods: We included models developed with either traditional statistical methods or machine learning (ML). For the purpose of this review, a method is considered ML if it has at least one tuning parameter, excluding Bayesian priors, for controlling its architecture and, as a result, its performance.
Validation: We included studies that evaluated a previously reported prognostic model in a different set of participants by reporting discrimination, calibration, or classification measures based on predictions from that model, even though the term ‘validation’ was not explicitly used. We also included studies that reported validation of a previously reported prognostic model, even though what was done did not constitute an external validation in its strictest sense (see 'Terms used for reporting' for details). Studies that did not meet the search or inclusion criteria themselves but described the development of models evaluated or validated in future eligible studies were also included in order to extract data from and assess the risk of bias in the model development.

Targeted population

We included studies on adult individuals, 18 years old or over, with a diagnosis of MS, irrespective of the subtype or treatment status. We included studies that did not specify the disease subtype of their sample and studies that included people with one or more MS subtypes of CIS, relapsing, progressive, or any other categories. When a study included people with a single episode of optic neuritis, we considered the event to comprise a CIS and considered the study eligible.

Types of prognostic models

Determining whether a study reporting a multivariable model is a prognostic model study for predicting future disease outcomes in individuals can be difficult (Kreuzberger 2020). In this review, a study was considered to develop a multivariable prognostic model if the aims, results, and discussion report on the model itself, and not just the individual predictors comprising the model or the methodology used. For example, a study that reported only adjusted predictor effect measures from a multivariable model and discussed these, but neither evaluated the predictive performance of the model using discrimination, calibration, or classification measures nor discussed the model as a whole was excluded.

Studies were not limited by their modelling method; i.e. inclusion did not depend on whether traditional statistical methods or ML methods were used for development. We excluded studies predicting outcomes only based on single prognostic factors. We also excluded studies reporting on models that aimed to predict treatment response, either beneficial or harmful. The use of treatment as a predictor in the model was by itself not considered to determine the aim of treatment response prediction. Rather, the reported aim of the study was the determining factor.

Types of outcomes to be predicted

We included clinical outcomes indicating disease progression, worsening, and activity. We accepted author definitions based on any of the following:

disability progression/worsening;
relapse/attack;
conversion to a more advanced disease subtype:
- to definite MS; or
- to progressive MS;
composite outcomes containing at least one of the above (such as NEDA).

We included studies with any of the above outcomes, including models validated for a different outcome than originally developed. We did not exclude studies based on the data type of the outcome, even though prognosis is usually interpreted as referring to the risk of an event, i.e. necessitating a binary outcome. We excluded models that predict only paraclinical outcomes, such as laboratory measurements or image findings, because their translation to patient‐relevant outcomes at the individual level is unclear and they are not prioritised by people with MS (Day 2018). We also excluded studies predicting only quality of life outcomes, due to the difficulty in interpreting their clinical meaning. We considered cognitive disability to constitute a domain of disability whereas fatigue, depression, and falls did not fit any of the aforementioned outcome categories and we considered them out of scope for this review aiming to be relevant to clinical practice.

We did not exclude any studies based on time point of prognostication or the time horizon for which the prognostic models apply because our preliminary review of the prognostic literature in MS indicated very liberally defined (in years) and heterogeneous time points of prognostication, both in relation to diagnosis and start of treatment. Defining a time horizon was considered too restrictive for the review objective. For clinically meaningful outcomes, however, we expected disability progression/worsening and conversion from RRMS to SPMS to be measured in years. Relapses and conversion from CIS to RRMS were expected to be measured in months to a couple of years.

Search methods for identification of studies

Electronic searches

To identify eligible studies, we searched the following databases on 2 July 2021 (Appendix 1).

MEDLINE (Ovid SP) (1996 to 1 July 2021)
Embase (embase.com) (1996 to 2 July 2021)
Cochrane Database of Systematic Reviews (CDSR 2021, Issue 6) (searched 2 July 2021, via www.cochranelibrary.com)
Cochrane Central Register of Controlled Trials (CENTRAL 2021, Issue 6) (searched 2 July 2021, via www.cochranelibrary.com)

The Embase search above included conference proceedings from the following organisations.

European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS)
Americas Committee for Treatment and Research in Multiple Sclerosis (ACTRIMS)
American Academy of Neurology (AAN)
European Academy of Neurology (EAN)

We restricted the search to studies published since 1996, the year of publication of an important tutorial on multivariable prognostic models in Statistics in Medicine (Harrell 1996). Before this time, methods were rapidly being developed but at the same time concerns over the misuse of statistical modelling for prediction of health outcomes were being raised (Chatfield 1995; Concato 1993; Diamond 1989). We considered Harrell 1996 to be a turning point, after which many papers (Altman 2000), textbooks (Harrell 2001; Steyerberg 2019), and guidelines (Enhancing the QUAlity and Transparency Of health Research (EQUATOR)) network and Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD)) addressing proper analysis and reporting became readily available (Collins 2015; Simera 2008). We did not impose language restrictions on the search.

We used a search strategy for systematic reviews of prognostic models based on that of Geersing 2012 and further refined for this review. We validated this strategy for our specific question by determining whether it could identify a list of 11 a priori defined studies of interest: six inclusions (Bejarano 2011; Bergamaschi 2007; Margaritella 2012; Pellegrini 2019; Tousignant 2019; Wottschel 2015) and five excluded studies (borderline studies) that required assessment at the full‐text stage (Bovis 2019; Cree 2016; Gauthier 2007; Kalincik 2017; Runmarker 1994). Also, we randomly selected 120 titles and abstracts from less stringent search criteria and screened them to prevent missing any relevant studies.

We split this modified filter into three sub‐searches: search terms specific for prediction or prognostic models (2a); terms for general models (2b); and the statistical search terms (2c).

The search comprised two main parts, with each combining either two or three main concepts, as follows:

MS (1) and specific prognostic models (2a); or
MS (1) and general models or general statistical terms (2b or 2c) and clinical outcomes (3).

The search strategy used combinations of thesaurus search terms (MeSH or Emtree) and free‐text search terms including synonyms in the title and abstract. Animal studies and studies in children were excluded from the search.

Searching other resources

We performed backward reference searching of all included studies and all MS prognosis reviews identified during screening using Web of Science. We tracked citations of all the included studies (forward reference searching) via Web of Science. We performed the search in Web of Science between 13 October 2020 and 25 October 2020 for the studies/reviews from the initial database search, and on 16 August 2021 for the studies/reviews from the update to the database search. We also contacted authors of all included studies for further information on unpublished or ongoing studies.

Data collection

Selection of studies

Aiming to refine the eligibility criteria and ensure a common understanding amongst the review authors, we conducted a pilot title and abstract screening with a random subset that produced 200 results from the draft search strategy. This was followed by full‐text screening of the eight titles marked for inclusion. We selected eligible studies from the search results using the criteria outlined in 'Criteria for considering studies for this review' via the Rayyan web application (Ouzzani 2016). We used the same platform to document the exclusion reasons at the full‐text screening stage.

Pairs of review authors (BIO, KAR, AA, ZA, AG) performed title and abstract screening independently, and we included all titles marked for inclusion by at least one author at this stage in the full‐text screening. We also performed, independently and in duplicate, assessment of full texts for their inclusion in the review or reasons for their exclusion. When the record corresponded to a conference abstract, we searched its title and/or authors online (www.google.com, onlinelibrary.ectrims‐congress.eu/ectrims; accessed between 22 April 2022 and 29 October 2021) for any related articles, poster, or video presentations and, if available, considered these additional sources of information during our assessment. In addition, if the full text did not meet all the inclusion criteria but also could not be excluded with the available reported information, the authors of the respective studies were contacted for clarification. We resolved disagreements by involving a third review author (BIO, KAR) and, when necessary, through group discussion.

If a conference abstract meeting our inclusion criteria did not have an associated publication like a peer‐reviewed or preprint article, the data needed to inform the review and risk of bias assessment could not be extracted. As stated by Kreuzberger and colleagues, this complicates assessment of both inclusion/exclusion and risk of bias (Kreuzberger 2020). Consultation with study authors would only provide sufficient information if an associated publication could also be supplied. Hence, we considered conference abstracts to be awaiting classification until a report with more information on them becomes available.

For the assessment of non‐English titles/abstracts, we used online translators (translate.google.com, www.deepl.com/en/translator) and included any record that seemed to be relevant for full‐text screening. At full‐text stage, we (BIO and KAR) consulted the assessment of native speakers of that language and retrieved the translation of the full‐text for our independent assessment in duplicate.

We summarised the study selection process with a flowchart adapted from the PRISMA statement (Page 2021), showing the number of records we identified, the number of reports we excluded with reasons, and the total number of studies included.

Details regarding selection of studies

Due to the recency of the relevant reporting guidelines (Collins 2015), poor labelling of prognostic prediction studies, and the novelty of this review type (Kreuzberger 2020), we had regular meetings to clarify the boundaries and application of the selection criteria both at the title/abstract and full‐text screening levels. For transparency, we report the details of the recurrent themes and the decisions below.

The distinction between reports of multivariable prognostic prediction models and reports assessing the value of a single prognostic factor or searching for independent prognostic factors by multivariable modelling was not always clear (Kreuzberger 2020). We included records if there was any hint of individual level predictions either verbally or by the measures they reported for the multivariable models. We considered mentioning the overall model performance measures (e.g. R², Brier score), discrimination measures (e.g. Harrel’s c‐index, area under the receiver operating characteristic curve (AUC)), classification measures (e.g. sensitivity, accuracy), or the terms calibration or validation in the context of prognosis sufficient for being taken forward to full‐text screening. We excluded records that only reported effect estimates (e.g. hazard ratio, odds ratio), or performance measures for single factors or univariable models at the title/abstract level.
We applied exclusion based on the eligibility criterion of aiming to develop or validate prognostic prediction models only at the full‐text screening level in order to take into account the totality of reporting.
We excluded multivariable combinations other than statistically developed prognostic prediction models, such as diagnostic criteria or expert scoring rules, even when they were used for individual prognostic prediction. Despite their potential usefulness, the intentions behind their development are different.
Expecting prognostic prediction models to be based on statistical theory, we also excluded scores based on counts of prognostic factors selected via or simplified from multivariable models unless the full‐text report provided a reason for the simplification (e.g. all effect estimates being similar) or compared the prediction performance of the count score to the multivariable model generating it.
Our search also picked up records that reported prediction of treatment response either by multivariable models or scoring rules. We were aware that some of these reports were making static or dynamic prognostic predictions conditional on treatment (e.g. Kalincik 2017; Sormani 2013), rather than treatment response predictions (Kalincik 2018; Sormani 2017; Steyerberg 2018). We decided not to reinterpret the stated objective or the presented results and excluded such reports.
In order to assign a single agreed upon exclusion justification to a full‐text report that may fulfil multiple criteria, we used a hierarchy based on convenience of assessment. As higher‐level exclusion reasons based on the headings, we evaluated a study’s eligibility in the following order: study type or objective, population, outcome, model (intervention), and timing.

Data extraction and management

Pairs of review authors (KAR, BIO, AA, AG) independently extracted data from the included studies into a predefined, piloted electronic spreadsheet (see Appendix 2) based on the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist (Moons 2014) and TRIPOD guidelines (Collins 2015) and open disagreements were resolved jointly. If a study was associated with multiple reports and the data in them were inconsistent, we preferred to collect the data from:

journal article over other types of reports;
more recent journal article over an older one; and
main text of a journal article over its supplements/appendices.

Our data extraction form included the following items, with further explanation available in Appendix 3:

Article information (title, author, year, publication type).
Data sources (e.g. use of randomised trial/cohort/registry/case‐control data, primary/secondary data use).
Participants (e.g. inclusion/exclusion criteria, recruitment method, country, number of centres, setting, participant description, treatments received, MS subtype).
Outcomes (e.g. definitions and methods of measurement, categorisation into disability/relapse/conversion to clinically definite MS/conversion to SPMS/composite, duration of follow‐up or time of outcome assessment, blinding).
Candidate predictors (e.g. predictor definitions and method/timing of measurement, handling/transformations, categorisation into the following domains: demographics, symptoms, scores, CSF, imaging, electrophysiological, omics, environmental, non‐CSF samples, disease type, treatment, or other).
Sample size (e.g. number of participants, number of events, number of events per predictor).
Missing data (e.g. number of participants with missing predictor or outcome data, handling of missing data).
Model development (e.g. type of model, method for predictor consideration, model/predictor selection method, predictor selection criteria, tuning parameter details, data leakage prevention steps, shrinkage).
Model performance and evaluation (e.g. discrimination, calibration, and classification measures with standard errors or confidence intervals, internal or external validation).
Model presentation and interpretation (e.g. final models, alternative presentations, exploratory versus confirmatory research, comparison with other studies, generalisability, strengths, and limitations).
Factors related to model usability and reproducibility (sufficient explanation to allow for further use, skill and equipment specialisation required for predictor assessment, whether model/tool, code, and/or data were provided, whether absolute risks could be computed).

Assessment of reporting deficiencies

Deficiencies in methods and reporting in prognostic modelling studies are well‐known (Bouwmeester 2012; Brown 2020; Havas 2020; Kreuzberger 2020; Peat 2014). We described deficiencies in the MS prognostic modelling literature using Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guideline items (Collins 2015). We assessed 20 items from the Methods (source of data, participants, outcome, predictors (only for developments), sample size, missing data, statistical analysis methods (only for developments)), Results (participants, model development, model specification (only for developments), model performance), and Discussion (limitations) sections of the checklist provided in the guideline. We used the categories of reported, not reported, and unclearly/somehow reported. During this assessment, we only took into account the text present in the publications themselves or the publications explicitly referenced in them for data source, definitions, or methods (referenced as auxiliary references in Characteristics of included studies) and ignored the information provided by the study authors during follow‐up correspondence.

Assessment of risk of bias in included studies

We performed risk of bias assessments independently and in duplicate (KAR, BIO, AA, AG) using the Prediction Model Risk of Bias Assessment Tool (PROBAST) (Wolff 2019). The tool consists of signalling questions in four domains (participants, predictors, outcome, analysis), covering sources of possible bias due to data sources, definition or measurements of predictors and outcomes, sample size and analysis sets, model development, and model performance evaluation. We graded each domain as having low, high, or unclear risk of bias, which formed the basis for the overall risk of bias assessment (as described in Moons 2019). A third review author (KAR or BIO) reconciled the duplicate assessments and resolved any remaining disagreements at the item and model/validation level by joint discussion with the respective raters. When insufficient information was reported to allow for clear assessment, resulting in an unclear rating at the domain or study level, we contacted the study authors via email to request further information. In order to develop a common understanding of the form, two review authors (KAR, BIO) piloted the tool, discussed discrepancies in use, and agreed on rules for further use.

When multiple models were developed in a single study or development and external validation of a model were included in the same study, we assessed the quality of each model or external validation separately. We presented the risk of bias primarily at the analysis level in the Results and Discussion. However, in the Characteristics of included studies we presented the risk of bias at the study level for each domain. When domain‐level assessments differed across analyses within a single study, the domain was rated as the best within the domain at the study level and the differences per analyses were noted in the support for judgement.

In order to assess risk of bias, we needed to further refine the interpretation of the PROBAST items. The topics that required further refinement were related to the reporting in the literature, specifics of the disease area, and the application of the tool to studies employing non‐traditional prognostic modelling methods ‐ including ML and non‐binary outcomes. These issues were jointly discussed amongst review authors (BIO, KAR, HS, UM, UH, JH, JB) until consensus was reached. Additionally, this review was designed broadly in order to identify all prognostic prediction models of clinical outcomes in MS. This meant that studies may have been included that were not considered to be exactly applicable to the aim of the review, even though they met the selection criteria. We report our decisions regarding PROBAST interpretation and the assessment of applicability in Appendix 4.

Measures of association or predictive performance measures to be extracted

In our protocol we stated that predictor effect measures would be collected and standardised in order to describe changes in prognostic factors in the models over time (On Seker 2020). We extracted effect measures where possible; however, the variety of predictors and their definitions, in addition to the use of ML models for which effect measure reporting is unclear, made comparison of predictors based on effect measures impossible. Instead, we reported categories of predictors, both considered and included in the final models, and described changes in these categories over time (Differences between protocol and review).

We primarily extracted performance measures for discrimination and calibration, as well as their measures of uncertainty. Discrimination (e.g. c‐statistic, AUC, Harrel’s c‐index, Gonen and Heller’s concordance index, Royston and Sauerbrei’s D‐statistic) refers to a model’s ability to distinguish between participants developing and not developing the outcome of interest. We expected the c‐statistic (or equivalently the AUC) to be the most frequently reported measure of discrimination. It gives the proportion of randomly chosen pairs from the sample (one participant with the outcome and one without) in which the participant with the outcome has the higher predicted score/risk. A c‐statistic of 0.5 means that the model’s discriminative performance is no better than chance, while a value of 1.0 is considered perfect discrimination.

Calibration (e.g. calibration slope, calibration‐in‐the‐large, observed‐to‐expected (O:E) ratio) refers to the extent to which the expected outcomes and observed outcomes agree. We expected calibration to be reported infrequently and therefore focused on possible extraction of the O:E ratio, which is strongly related to calibration‐in‐the‐large and is an average across the range of predicted risks (Debray 2017). Values close to 1 indicate a well‐calibrated model overall; however, this does not rule out poor calibration in some subgroups.

We also extracted data on classification measures like sensitivity, specificity, predictive values, or accuracy. Such classification measures are based on categorisation of predicted probabilities at some cutoff. A cutoff may be predetermined based on clinical relevance, arbitrarily defined as the middle (0.5) of the theoretical range of probability (0–1), or calculated post hoc in a data‐driven manner.

The aforementioned performance measures can be evaluated in different contexts. Internal validation is the evaluation of a model’s performance in the same population used for development. External validation is the evaluation of a model’s performance in a population different from that of its development. The characteristics of the participants in a validation that make it external might be based on, for example, location (e.g. participants from different sites), time (e.g. temporal split of participants from a single site), or spectrum (e.g. participants with a different disease subtype or treatment status).

Dealing with missing data

We contacted the corresponding authors via email to request missing or unclear information required for study eligibility, basic study description, quantitative data synthesis, or risk of bias assessment. When the c‐statistic was provided without its standard error or confidence interval, we calculated its variance based on the combination of sample size and number of events, if available, according to the method of Hanley and McNeil and computed the corresponding confidence interval according to Newcombe and colleagues (Hanley 1982; Newcombe 2006 method 4).

Terms used for reporting

A clarification of the terms used to differentiate between the various levels at which we are reporting was necessary. We used the term 'record' to refer to the database entries we retrieved through databases and considered during title/abstract screening. All eligible records were associated with at least one scientific report, retrieved for full‐text screening mostly from the publishers but also via contacting the study authors or searching the Internet. A comprehensive prognostic prediction exercise with a clear goal performed by the same set of authors was called a 'study'. A single study may be associated with more than one 'report'. A report may be one of the following types, ordered according to the amount of information they contain: journal article, preprint article, dissertation, poster or video presentation, conference abstract.

The main unit of interest in this review was called the 'analysis'. A single study (or report) may contain many analyses: multiple model developments or validations with different outcomes, timing, predictor sets, modelling methods, or participant subsets. Analyses may be of the type model 'development' or model 'validation'. Model developments are interchangeably referred to as models or developments.

Several studies we included reported results from development of more than one prognostic model, but only a subset of these studies aimed to present multiple final prognostic models. When multiple models in a study were reported in an almost equivalent manner without any indication of a preferred one, we included all the models in our review. We extracted the data for each model separately and presented them individually. This decision is motivated by our aim of reviewing all prognostic models with potential clinical meaning in the disease area of MS.

When the reporting in an eligible study with multiple models indicated a preferred or selected model, we included only that model in our review. Model preference was communicated either directly (e.g. by discussing the superiority of one amongst the competing models) or indirectly (e.g. by using a bold font for select results or presenting figures for a single model) by those studies’ authors. The other models were considered to be by‐products of the modelling process and not meant to be presented as final models. We always reported all validations of included models as separate analyses.

For the purpose of clear reporting we made a distinction between internal, external, and author‐reported validations. 'Internal validation' is the evaluation method directly relevant to the analyses of the type model development and thus reported in that context. For calling a validation external, we expected loyal evaluation of a developed model in an independent set of participants. Even though authors who developed prognostic models and reported model evaluation measures using a different set of participants may have referred to their activities as "validation", we refrained from calling them 'external validation' if the set of participants was not independent of the development set, if the new set of participants was only used for model refitting, or if the model was improperly changed, e.g. predictors dropped without statistical re‐estimation. These exceptional cases are referred to as other 'author‐reported validations'. We always reported all validations of included models as separate analyses. For the description of the overall literature evaluating prediction performance using a separate set of participants, we referred to the analyses of external validations and other author‐reported validations altogether as validations unless a differentiation between them was deemed necessary. For example, when reporting or discussing clinical readiness, we concentrated only on external validations.

To differentiate between multiple analyses from a single study, we referred to them first by the study name (e.g. Zhao 2020). If there were multiple models included from a single study, these models were differentiated from each other by the name/abbreviation authors have used or by a reference to what separates the models included from that study (e.g. the modelling method and considered set of predictors in Zhao 2020 XGB Common). Finally, if a model had validation other than an internal one, we differentiated these separate analyses by adding 'Dev' for development, 'Ext Val' for external validations, and 'Val' for other author‐reported validations (e.g. Zhao 2020 XGB Common Val).

Data synthesis

This broad review intended to identify all prognostic models of clinical disease progression, worsening, and activity across all types of MS. We expected to identify numerous model development studies, but only few external validation studies overall. As per the protocol, we summarised all identified multivariable models and prognostic factors included in these models in narrative, graphical, and tabular formats. We had planned to apply methods to derive missing performance measures (the c‐statistic for discrimination and the O:E ratio for calibration) (Debray 2019); however, this was not possible due to limited reporting of alternative discrimination measures, descriptions of the linear predictor distribution, and the expected number of events. We did not meta‐analyse prognostic model performance statistics for single models externally validated in several independent samples because no single model had at least three independent external validation studies outside its development study. We also could not perform meta‐regression due to heterogeneity in predictor and outcome definitions and the low number of studies with reported or derivable performance measures for a single outcome. Please see Differences between protocol and review for details.

Investigation of sources of heterogeneity between studies

We expected to find substantial heterogeneity, as diagnostic criteria for MS subtypes and available treatment options, as well as the technology used to assess disease activity, have evolved over time. Heterogeneity is also typically high in prognostic studies. We expected heterogeneity both between development studies and their corresponding validation studies for specific models and also between different development models for the same outcome. Potential sources of heterogeneity related to either or both of these include:

case mix (e.g. age, sex, disease duration, treatment status);
study design (e.g. follow‐up time, source of data, outcome definitions and prognostic factors); and
statistical analysis methods and reporting (e.g. number of prognostic factors included, traditional statistics versus ML, risk of bias, validation methods).

We aimed to extract relevant information and to include a narrative summary of these potential sources of heterogeneity. We had planned to further investigate heterogeneity statistically using I² calculation and meta‐regression for random‐effects models for external validation performance measures of single models; however, as stated earlier, this was not possible (see Differences between protocol and review). Instead, as also discussed in the protocol, we discussed these potential sources of heterogeneity in the narrative text, and with tables and figures, as far as possible.

Synthesis

For synthesis, we used median, interquartile range (IQR), or range to describe the quantitative measures reported for the included analyses. These consisted of participant characteristics (age, sex, disease duration, study timing), predictors (number considered, number included), sample size (number of participants, number of events), and performance measures (c‐statistic). The sample in some analyses belonging to the same study or using the same data source might be overlapping or even the same. However, we ignored this correlation for reporting purposes due to the facts that (1) the description is aimed to give a sense of the state of the literature rather than to give precise estimates, (2) no study or data source has excessive undue influence over these measures, and (3) it is impossible to discern the extent of overlap between analyses utilising the same data source.

Tables were organised by model outcome, which aligns with the diagnostic subtype. Model outcomes were categorised into five groups: disability, relapse, conversion to definite MS, conversion to progressive MS, and composite outcomes. The models were summarised over several tables presenting certain aspects: study characteristics, participant characteristics, predictor domains, number of predictors, model development and validation, final model presentation, reporting items, and usability. External validation information was also included in these tables, where appropriate. Figures were organised by model outcome, development versus validation, or algorithm type, where appropriate. Algorithm type was categorised into two groups: traditional statistics and ML.

We used the statistical programming software R (version 4.0.1) and the following packages for all analyses: tidyverse (1.3.0), dmetar (0.0.9000).

Conclusions and summary of findings

A GRADE framework adaptation specific to prognostic model research, rather than general prognosis (Iorio 2015) or prognostic factor research (Foroutan 2020) is still a topic of future work, hence, we did not apply GRADE to our conclusions. Our conclusions highlighted the biases in the current literature, the usability of the currently available models, and areas in need of improved reporting. We also made recommendations for future research.

Results

In this section, we start by reporting the results from the search and screening process, including the reasons for exclusion. This is followed by an in‐depth review of the models with more than one external validation. Then, we describe the data extracted from all the included studies, and the respective analyses as appropriate, in the order of the CHARMS checklist (Moons 2014): data source, participants, outcomes, predictors, sample size and missing data, model development, model performance and evaluation, model presentation, and interpretation. We finalise this section with our assessment of the analyses based on the extracted data: usability and reproducibility, risk of bias, and reporting deficiencies.

Description of studies

Results of the search

We identified 13,046 records via our database search (4757 from MEDLINE, 7706 from Embase, and 583 from the Cochrane Library ‐ search updated on 2 July 2021), as summarised in Figure 1. Our backward and forward citation tracking of the included studies and reviews on MS prognosis identified during title/abstract screening resulted in an additional 4727 records. Contact with the authors of included studies led us to a further 23 suggested records related to the topic. After deduplication of the 17,796 records from all sources, we screened the titles/abstracts of 12,258 unique records, of which 261 were found eligible for full‐text retrieval. We identified an additional 48 reports of the types preprint article, dissertation, and poster or video presentations related to the conference abstracts via searching the Internet or contacting the abstract authors. In total, we assessed 309 full‐text reports for eligibility.

**Flow diagram based on PRISMA 2020 guideline**

At the full‐text screening stage, we excluded 180 reports (see Excluded studies). Furthermore, 21 reports of 11 studies were conference abstracts or presentations without any associated full‐text publication. Despite attempts to contact the authors for more information, a final judgement on eligibility could not be reached for an additional eight reports corresponding to six studies due to limited information (see Characteristics of studies awaiting classification). Thus, we included 100 reports corresponding to 57 studies in our review. Bergamaschi 2001 did not report any predictions for individuals and Weinshenker 1991 was published before the dates covered by our search algorithm. These two studies were nevertheless included in our review due to the fact that they described the development of models that were validated in future eligible studies (Bergamaschi 2007; Bergamaschi 2015; Weinshenker 1996).

Excluded studies

We excluded 180 reports after full‐text screening for the following reasons, listed according to our hierarchy of exclusion reasons at the first level and decreasing number of studies at the second level.

Wrong study type (113)
- 112 records that do not aim to develop or validate prognostic models
- 1 record that is not an original study but a review
Wrong population (3)
- 3 records with prognostication applied to people without a diagnosis of MS
Wrong outcome (6)
- 6 records with outcomes other than disability, relapses, or conversion to a more advanced disease subtype
Wrong model (43)
- 13 records using multivariable combinations not derived from statistical prognostic models (e.g. diagnostic criteria, scoring rules)
- 12 records predicting treatment response
- 10 records that do not perform individual level predictions
- 8 records containing predictions from a model not multivariable in nature
Wrong timing (15)
- 15 records predicting concurrent or cross‐sectional outcomes

A representative selection of the excluded studies, with detailed reasons, is available in the section Characteristics of excluded studies.

Included studies

Of the 57 studies included in this review, 42 (74%) reported prognostic model development only (Aghdam 2021; Agosta 2006; Bendfeldt 2019; Bergamaschi 2001; Borras 2016; Brichetto 2020; De Brouwer 2021; de Groot 2009; Gout 2011; Kosa 2022; Kuceyeski 2018; Law 2019; Margaritella 2012; Martinelli 2017; Misicka 2020; Montolio 2021; Olesen 2019; Oprea 2020; Pellegrini 2019; Pinto 2020; Pisani 2021; Roca 2020; Rocca 2017; Rovaris 2006; Runia 2014; Seccia 2020; Skoog 2014; Sombekke 2010; Spelman 2017; Szilasiová 2020; Tacchella 2018; Tommasin 2021; Tousignant 2019; Vukusic 2004; Weinshenker 1991; Wottschel 2015; Wottschel 2019; Ye 2020; Yoo 2019; Yperman 2020; Zakharov 2013; Zhang 2019), and eight (14%) reported both development and external validation of prognostic models (Ahuja 2021; Calabrese 2013; Lejeune 2021; Malpas 2020; Mandrioli 2008; Manouchehrinia 2019; Sormani 2007; Vasconcelos 2020). Bergamaschi 2007 reported an external validation of a previously developed but not evaluated model (Bergamaschi 2001). Bejarano 2011 and probably Zhao 2020 replicated the modelling process in an independent set of participants instead of evaluating the final model derived from the development set. Hence, these two studies are considered to have reported development and other author‐reported validation of prognostic models.

The remaining four studies (7%) were a combination of the aforementioned types: the model developed in Bergamaschi 2001 (called Bayesian Risk Estimate for Multiple Sclerosis (BREMS) by its authors) was both externally validated and, after dropping of post‐onset predictors without a statistical justification (called BREMS onset (BREMSO) by its authors), validated for the original and a new outcome in Bergamaschi 2015. Gurevich 2009 developed two models of interest (called First Level Predictor (FLP) and Fine Tuning Predictor (FTP) by their authors) but externally validated only one of them (FLP). In Skoog 2019, a previously developed model (Skoog 2014) was both further internally evaluated in a subset of the development cohort and validated externally. Finally, Weinshenker 1996 reported both an external validation of a previously developed model (called Model 3 by its authors in Weinshenker 1991) and the development of a new model (short‐term outcome).

We contacted the authors of all 57 included studies to obtain missing information or for clarification of the reported information. No response was received for 21 (37%) studies. Response was received but without further information or clarification from the six (10%) contacts. Authors of the remaining 30 studies (53%) provided further details and clarifications.

Models with more than one external validation

We identified two models with more than one external validation (Table 1), both originally developed to predict time to conversion to progressive MS: the BREMS score (Bergamaschi 2001) and the survival model of Manouchehrinia 2019.

BREMS score

The BREMS score was developed using clinical data from 186 people with RRMS seen at a single MS clinic in Italy until December 1997. The mean follow‐up time was 7.5 years, ranging from 3 to 25 years. Bergamaschi and colleagues defined the time of onset of the secondary progressive phase as the earliest date of observation of a progressive worsening severe enough to induce an increase of at least one EDSS point and which persisted for at least six months after the progression onset. This score was developed using Bayesian methods to jointly model relapse, Kurtzke’s Functional Systems scores, and EDSS until the primary time to SPMS conversion outcome. The presented sum score contained nine predictors: age at onset, female sex, sphincter onset, pure motor onset, motor‐sensory onset, sequelae after onset, number of involved functional systems at onset, number of sphincter plus motor relapses, and EDSS greater than four outside relapse. This model was not internally validated in the development study, but was externally validated in two further studies (Bergamaschi 2007; Bergamaschi 2015). Additionally, the model was updated by dropping two predictors not available at onset (number of sphincter plus motor relapses and EDSS greater than four outside relapse) and renamed BREMSO (Bergamaschi 2015). This update did not, however, consist of a refitting of the model using the subset of original predictors, but rather presented the original coefficients without the two dropped ones. This updated model was evaluated for prediction of the original SPMS conversion outcome as well as for severe MS defined using the Multiple Sclerosis Severity Score. Because the BREMS model was only externally validated twice and no measures of discrimination or calibration were reported, we did not perform meta‐analysis for summarising the performance of this model.

Manouchehrinia 2019

Manouchehrinia and colleagues developed their model using data from 8825 participants with RRMS seen up until May 2016 in the Swedish national MS registry. The mean (standard deviation (SD)) follow‐up time was 12.5 (8.7) years. In this parametric survival model, time to SPMS was defined as the earliest recognised date of SPMS onset determined by a neurologist at a routine clinic visit according to the Lublin 1996 criteria. The model was presented as a nomogram for computing 10‐, 15‐, and 20‐year conversion probability and additionally as a web application (https://aliman.shinyapps.io/SPMSnom/). The final model included five predictors: year of birth, age at onset, sex, first EDSS, and age at first EDSS.

This model was internally validated using the bootstrap method, with both calibration and discrimination assessed. In the same publication as the model development study, it was reported that model validation was also performed using three external multi‐site datasets addressing temporal, geographic, and spectrum transportability. The British Columbia MS Cohort provided 3967 participants diagnosed with RRMS according to Poser 1983, who were enrolled between January 1980 and December 2004 and followed up for an average of 13.8 (SD 8.4) years. The second external validation analysis was performed using the 175 participants from the ACROSS (A CROSS‐Sectional Long‐term Follow‐up of Fingolimod Phase II Study Patients) randomised placebo‐controlled phase 2 trial of the disease‐modifying therapy fingolimod who returned for assessment at 10 years. The third external validation analysis used 2355 participants from the long‐term follow‐up extension study of the phase 3 trials FREEDOMS (FTY720 Research Evaluating Effects of Daily Oral therapy in Multiple Sclerosis) and FREEDOMS II, which also assessed fingolimod. RRMS diagnosis was made using the McDonald 2001 and 2005 criteria (Polman 2005) and mean follow‐up time was 18.6 (SD 7.9) years and 14 (SD 7.8) years, respectively, in ACROSS and FREEDOMS validation analyses.

The model development and external validations in Manouchehrinia 2019 were all found to be at high risk of bias. In the development, people with MS were included from registry data based on availability of EDSS score. It was unclear how standard the data collection was or whether the included sample may have differed from the general population with MS. The outcome, conversion to SPMS, was based on the Lublin 1996 criteria, which we considered to be subjective. The combination of retrospective use of registry data and a subjective outcome increases the risk of bias. During analysis, only complete cases were used, and it was unclear in which subset of participants the backward selection of predictors took place. This was followed by internal validation, which did not include the predictor selection process.

Age at onset, age at first recorded EDSS score, and the first recorded EDSS value were amongst the candidate predictors remaining in the final model. What is meant by 'onset' is not clearly defined here; however, in this review, 'onset' was liberally defined to include up to one year after onset. Given that the model was to be applied at onset, age at onset and age at first EDSS score should have been similar, if not equivalent. However, first EDSS assessment occurred on average 6.5 years after onset, which meant that the development included predictors available only after prediction model application. This use of unavailable predictors is even clearer here than in other analyses because of the use of survival analysis, in which the start and stop of time is explicitly defined. Although EDSS may be available at onset in general, the estimated model would probably change due to a different range of EDSS scores actually seen at onset. The inclusion of predictors unavailable at the time of model application makes the model unusable, as the predictor should be unavailable when applying the model to a future patient by definition. It also inflates model performance, because the predictor measured more closely in time to the outcome is probably more strongly associated with the outcome (Moons 2019).

We rated the external validations at high risk of bias for similar reasons. Inclusion in the British Columbia validation analysis was based on data availability, instead of employing multiple imputation of missing values, and the observation frequency for outcome measurement varied across participants. Only participants with complete follow‐up were included in the ACROSS validation analysis, even though a time‐to‐event analysis capable of dealing with censoring was employed, and this analysis included only 26 events, well below the recommended 100 events for validation studies. Only the FREEDOMS analysis used a clear definition of time of conversion to SPMS based on increased EDSS for at least six months. The three validation analyses assessed discrimination using Harrel’s c‐statistic, but did not assess calibration, which is valuable to assess in external samples, not just the development set.

Although the Manouchehrinia 2019 model was evaluated using three external datasets, all three of these validations were conducted by the same study team within the development publication. Sources of confusion related to timing in model development were propagated across all the validation analyses. We therefore did not consider these evaluations to be independent external validation studies and decided against performing a meta‐analysis.

Characteristics of included models

In total, we extracted data from 75 models developed in 54 studies (see Appendix 5 for details). Of these, 35 (47%) models were developed using traditional statistical methods and the remaining using ML methods. Of the studies that developed models, 42 (78%) contributed one model each, four (7%) contributed two models each (Gurevich 2009; Olesen 2019; Wottschel 2015; Ye 2020), seven (13%) contributed three models each (Bendfeldt 2019; de Groot 2009; Law 2019; Misicka 2020; Pinto 2020; Seccia 2020; Tacchella 2018), and Zhao 2020 contributed four models.

In those 12 studies from which multiple model developments were included, the primary difference between the models was timing of the outcome measurement in five (42%) (Misicka 2020; Pinto 2020; Seccia 2020; Tacchella 2018; Wottschel 2015), modelling method in three (25%) (Gurevich 2009; Law 2019; Zhao 2020), outcome in two (17%) (de Groot 2009; Pinto 2020), considered predictors in four (33%) (Bendfeldt 2019; Olesen 2019; Ye 2020; Zhao 2020), and participant subset in one study (Bendfeldt 2019).

We extracted data from 21 external or author‐reported validations in 15 studies. Of these studies, 11 (73%) contained one validation (Ahuja 2021; Bejarano 2011; Bergamaschi 2007; Calabrese 2013; Gurevich 2009; Lejeune 2021; Malpas 2020; Mandrioli 2008; Sormani 2007; Vasconcelos 2020; Weinshenker 1996), two (13%) contained two validations (Skoog 2019; Zhao 2020), and two (13%) contained three validations (Bergamaschi 2015; Manouchehrinia 2019).

Of all validations, 15 (71%) were external validations of 12 models (16%): 10 were externally validated once (Ahuja 2021 Dev; Calabrese 2013 Dev; Gurevich 2009 FLP Dev; Lejeune 2021 Dev; Malpas 2020 Dev; Mandrioli 2008 Dev; Skoog 2014 Dev; Sormani 2007 Dev; Vasconcelos 2020 Dev; Weinshenker 1991 M3 Dev), the model Bergamaschi 2001 BREMS Dev was externally validated twice in studies separate from the development but by the same research team, and the model Manouchehrinia 2019 Dev was externally validated three times in the same study of its development. The remaining six (29%) were other author‐reported validations (Bejarano 2011 Val; Bergamaschi 2015 BREMSO MSSS Val; Bergamaschi 2015 BREMSO SP Val; Skoog 2019 Val; Zhao 2020 LGBM Common Val; Zhao 2020 XGB Common Val). None of the validations were performed by researchers independent of that model’s development team.

Our main sources for data extraction were journal articles for 55 (96%) studies, a dissertation for Runia 2014, and a conference proceeding for Tousignant 2019. The number of published prognostic model studies, and the analyses (both model developments and validations) they contain, have greatly increased during the last couple of years (see Figure 2 left). Before 2001, two studies containing three analyses were published, whereas 36 studies containing 63 analyses have been published after 2015. Yet, there seems to be no discernible time‐trend in the number of published validations relative to the number of published developments. Recently, there has been an increase in the popularity of ML methods for prognostic prediction model development (see Figure 2 right).

**Publication characteristics by year.** Left: number of included studies (black outline) and the model developments (blue)/validations (orange) they contain by year of publication; right: number of included developments using traditional (dark blue) or machine learning (yellow) methods by year of publication. Data for the year 2021 is incomplete (only until July). ML: machine learning.

Data source

As a data source for model development or validation, 37 (39%) analyses used cohort studies, 18 (19%) analyses used routine care sources, 14 (15%) analyses used randomised trial participants, 13 (14%) analyses used disease registries. Four (4%) analyses used a combination of these: Ahuja 2021 used cohort study and routine care data (electronic health records), Kosa 2022 used cohort study and case‐control study data, Bergamaschi 2001 used registry and routine care data, and Kuceyeski 2018 used cohort study, registry, and routine care data. The source of data was not reported or unclear for 10 (10%) analyses (Gurevich 2009 FLP Dev; Gurevich 2009 FLP Ext Val; Gurevich 2009 FTP; Sombekke 2010; Tommasin 2021; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val; Ye 2020 gene signature; Ye 2020 nomogram; Zakharov 2013).

The data used in the analyses were collected to conduct prognostic research in 27 (28%) analyses. In 61 (64%) analyses, data use was secondary, i.e. the data were collected for other reasons but were then repurposed. The data collection purpose for the remaining eight (8%) analyses was either unclear or not reported (Borras 2016; Gurevich 2009 FLP Dev; Gurevich 2009 FLP Ext Val; Gurevich 2009 FTP; Martinelli 2017; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val; Zakharov 2013).

Of the 21 validations, the participants in the validation were different from those in the model development in terms of location in six (29%) (Bejarano 2011 Val; Bergamaschi 2007 BREMS Ext Val; Malpas 2020 Ext Val; Weinshenker 1996 M3 Ext Val; Zhao 2020 LGBM Common Val; Zhao 2020 XGB Common Val), in terms of time in three (14%) (Calabrese 2013 Ext Val; Mandrioli 2008 Ext Val; Vasconcelos 2020 Ext Val), and in terms of patient spectrum in two (10%) (Ahuja 2021 Ext Val; Sormani 2007 Ext Val). The difference was in multiple dimensions in eight (38%) validations: in terms of both location and time in Bergamaschi 2015 BREMS Ext Val; Bergamaschi 2015 BREMSO MSSS Val; Bergamaschi 2015 BREMSO SP Val; Skoog 2019 Ext Val, in terms of location, time, and spectrum in Manouchehrinia 2019 Ext Val 1; Manouchehrinia 2019 Ext Val 2; Manouchehrinia 2019 Ext Val 3, and in terms of location and spectrum in Lejeune 2021 Ext Val. The difference between the validation cohort and the derivation cohort was not explicitly reported in Gurevich 2009 FLP Ext Val and Skoog 2019 Val is a further evaluation of the model in a subset of the derivation cohort.

Participants

The participants were recruited from a single site in 54 (56%) and from multiple sites in 40 (42%) analyses; the number of sites was not reported for the remaining analyses. Of those 90 (94%) analyses for which country of participant recruitment could be extracted, recruitment was from centres in Europe in 68 (76%), in North America in 28 (31%), in Asia in 13 (14%), in South America in seven (8%), in Oceania in five (6%), and in Africa (country South Africa) in a single analysis (Pellegrini 2019).

In 86 (90%) analyses for which summary statistics on sex could be extracted (including studies, e.g. Zhao 2020, which did not report characteristics of the included sample but led to references on the source population), the proportion of females ranged from 50% (Rocca 2017; Rovaris 2006) to 100% (Vukusic 2004) with a median (IQR) of 69% (65% to 73%). The distribution of sex in included analyses did not seem to vary by category of the predicted outcome (Figure 3 top left).

**Participant characteristics in included analyses by outcome.** Top left: percentage of females; middle left: measure of centre of disease duration in years; bottom left: measure of centre of age in years as reported at disease onset or at the time of analysis; top right: diagnostic criteria by publication year per outcome; middle right: diagnostic subtype by publication year; bottom right: percent treated by publication year measured at baseline or during follow‐up. Data for the year 2021 are incomplete (only until July). CDMS: conversion to clinically definite MS; CPMS: conversion to progressive MS.

The reported measure of centre (i.e. mean or median) for participant disease duration in 54 (56%) analyses ranged from 0.1 years in participants with a diagnosis of 100% CIS in Wottschel 2015 to 19 years in participants with a diagnosis of 100% RRMS in Seccia 2020. As expected, participants included in the analyses of conversion to progressive MS had been living with MS diagnosis for a longer time compared to those included in the analyses of conversion to definite MS, who had their first symptoms very recently (Figure 3 middle left).

In 87 (91%) analyses that reported age, the reported measure of centre ranged from 24.8 years (Bergamaschi 2007 BREMS Ext Val) to 51.3 years (Rocca 2017; Rovaris 2006). Included in this summary are 40 (42%) analyses that did not specify the time point of measurement, 32 (33%) analyses that reported age at disease onset, and 15 (16%) analyses that reported age at an unclear time or for the source population. The participants were older at the time of those analyses with disability related or composite outcomes as opposed to relapse or diagnostic conversion outcomes. There was no evidence of variance in the distribution of age at onset by category of predicted outcome (Figure 3 bottom left).

Of those 84 (88%) analyses that clearly reported the diagnostic subtype of the included participants, participants of a single subtype were recruited in 62 (74%): CIS in 17 (20%), RRMS in 40 (48%), PPMS in two (Rocca 2017; Rovaris 2006) analyses, and SPMS in three models from a single study (Law 2019 Ada; Law 2019 DT; Law 2019 RF). Participants with a mixture of the aforementioned diagnoses were included in eight (10%) analyses (Agosta 2006; Bejarano 2011 Val; Kosa 2022; Montolio 2021; Sombekke 2010; Szilasiová 2020; Vukusic 2004; Yperman 2020). The remaining 14 (17%) used a different diagnostic subtyping in describing the mixture of their participants. The models developed in participants with primary or secondary progressive subtypes were predicting disability outcomes. As expected, all models predicting conversion to definite MS were developed in participants with CIS and all models predicting conversion to progressive MS were developed in participants with RRMS (Figure 3 middle right).

Of those 68 (71%) analyses that clearly reported the diagnostic criteria at recruitment, 13 (19%) used a mixture of different criteria: 18 (26%) used Poser 1983, two (3%) used Thompson 2000, 41 (60%) used one or more versions of the McDonald criteria (17 used 2001 (McDonald 2001), 11 used 2005 (Polman 2005), 10 used 2010 (Polman 2011), six used 2017 (Thompson 2018b), and three used an unspecified version), seven (10%) analyses used their own definition (Bendfeldt 2019 Linear Placebo; Bendfeldt 2019 M7 Placebo; Bendfeldt 2019 M9 IFN; Law 2019 Ada; Law 2019 DT; Law 2019 RF; Runia 2014), and Olesen 2019 used criteria other than mentioned above (Optic Neuritis Study Group 1991). The changing diagnostic criteria were reflected in the diversification of the criteria over time (Figure 3 top right). Although newer criteria are increasingly used, some studies published after 2015 were conducted in participants diagnosed with McDonald 2001 (Manouchehrinia 2019 Ext Val 2; Montolio 2021; Pellegrini 2019; Szilasiová 2020; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val; Ye 2020 gene signature; Ye 2020 nomogram) or even Poser 1983 (Manouchehrinia 2019 Ext Val 1; Skoog 2019 Ext Val; Skoog 2019 Val; Spelman 2017; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val).

In the 45 (47%) analyses with clear reporting, the proportion of participants on treatment at recruitment ranged from 0% in 28 analyses to 100% (Calabrese 2013 Dev; Calabrese 2013 Ext Val; Pisani 2021) with a median (IQR) of 0% (0% to 10%) and at least one participant was on treatment at recruitment in 17 (38%) analyses. In the 37 (39%) analyses with clear reporting, the proportion of participants on treatment during follow‐up ranged from 0% in 11 analyses to 100% (Bendfeldt 2019 M9 IFN; Calabrese 2013 Dev; Calabrese 2013 Ext Val; Manouchehrinia 2019 Ext Val 2; Manouchehrinia 2019 Ext Val 3; Oprea 2020; Pisani 2021; Szilasiová 2020), with a median (IQR) of 35% (0% to 68%). When the analyses reporting the treatment or its timing unclearly are included, the median (IQR, range) proportion of participants receiving treatment during follow‐up becomes 50% (12% to 73%, 0% to 100%) in 58 analyses, of which at least one participant was on treatment in 47 (81%). As expected, the proportion of participants receiving treatment is higher during follow‐up than at recruitment. This trend is especially visible in the analyses published during the last 15 years. Regardless of the time point of measurement, the proportion treated is increasing as publications become more recent (Figure 3 bottom right). It should be noted that some analyses were conducted on data from RCT arms, which partially explain percentages of 0% and 100% treated during follow‐up.

Year of observation start ranged from 1972 (Weinshenker 1991 M3 Dev) to 2014 (Brichetto 2020; Olesen 2019 Candidate; Olesen 2019 Routine) with a median of 2003 in 55 (57%) analyses clearly reporting when data collection or recruitment started. Year of observation end ranged from 1984 (Weinshenker 1991 M3 Dev) to 2021 (Kosa 2022) with a median of 2013 in 50 (52%) analyses clearly reporting when data collection or recruitment ended.

In 46 (48%) analyses that clearly reported both of the items, the median (IQR, range) duration of data collection was 7 (3 to 12, 0 to 33) years.

Outcomes

Although definitions in individual analyses might differ, we categorised the outcomes into one of the following domains in line with our PICOTS: disability, relapse, conversion to clinically definite MS, and conversion to progressive MS. Composite outcomes containing any one of the above were also included and categorised separately.

Disability progression

Of the 96 analyses, 31 model developments and eight validations (41%) defined outcomes related to disability progression. Most of these, 33 (85%), operationalised it by using EDSS, two by DSS (Weinshenker 1991 M3 Dev; Weinshenker 1996 M3 Ext Val), and two by MS Severity Score (MSSS) (Bergamaschi 2015 BREMSO MSSS Val; Sombekke 2010), a measure derived from EDSS. The most common outcome definition based on EDSS in nine analyses was disability progression or clinical worsening (sometimes confirmed, sometimes not) based on an increase in EDSS (at least 0.5 increase if EDSS < 6 and at least 1 point increase if EDSS > 5.5). Other outcomes defined by different levels of or just change in EDSS included aggressive disease, severe MS, worsening, and residual disability after relapse. Apart from (E)DSS based outcomes, two analyses used other measures of disability: Kuceyeski 2018 used the SDMT to measure cognitive disability and de Groot 2009 Dexterity used the 9‐Hole Peg Test (9‐HPT). Many of the analyses with outcomes based on disability, 22 (56%), were in participants with a mixture of diagnostic subtypes, 11 (28%) were only in RRMS participants, three (8%) were only in SPMS participants (Law 2019 Ada; Law 2019 DT; Law 2019 RF), two (5%) were only in PPMS participants (Rocca 2017; Rovaris 2006), and Roca 2020 did not report the diagnostic subtype of the participants. In those analyses that defined the timing of measurement, disease progression was measured at the earliest six months (as residual disability after relapse in Lejeune 2021), and at the latest 15 years (as EDSS score ≥ 5.0 in Szilasiová 2020) after intended time of prognostication. In those analyses that did not specify the timing of measurement or had time‐to‐event outcomes, either the follow‐up or the time of outcome occurrence were described as being at the earliest 5.25 years (as clinically worsened in Rovaris 2006) and at the latest 55 years (as mild MS in Bergamaschi 2015) after the intended time of prognostication.

Relapse

Six model developments and two validations (8%) defined outcomes based on relapses: the model developed in Sormani 2007 and its external validation were in participants with RRMS diagnosis (Sormani 2007 Dev; Sormani 2007 Ext Val), the dataset that was used for the models in Gurevich 2009 and Ye 2020 was composed of a mixture of participants with CIS and clinically definite MS (Gurevich 2009 FLP Dev; Gurevich 2009 FLP Ext Val; Gurevich 2009 FTP; Ye 2020 gene signature; Ye 2020 nomogram), and the model in Vukusic 2004 was developed in a mixture of participants with RRMS and SPMS. The relapse outcome definition in Vukusic 2004 had time of measurement: three months after the intended time of prognostication, which was child delivery. Relapse was conceptualised as a time‐to‐event outcome in the analyses other than Vukusic 2004 and the follow‐up was described as being at the earliest 10 months (Sormani 2007 Dev) and at the latest 16 months (Sormani 2007 Val) after the intended time of prognostication.

Conversion to a more advanced disease subtype

Seventeen (18%) model developments defined outcomes of conversion to definite MS in participants with CIS. When defining the definite MS, five (29%) of these referred to McDonald 2010 (Polman 2011) (Aghdam 2021; Olesen 2019 Candidate; Olesen 2019 Routine; Zakharov 2013; Zhang 2019), Yoo 2019 referred to McDonald 2005 (Polman 2005), four (24%) referred to Poser 1983 (Gout 2011; Martinelli 2017; Runia 2014; Spelman 2017), and Bendfeldt 2019 referred to a modified Poser criteria (Bakshi 2005). From the remaining analyses, Borras 2016 provided a definition of definite MS that included Barkhof criteria, whereas no criteria were cited in Wottschel 2015 or Wottschel 2019. In those analyses that defined the timing of measurement, conversion to definite MS was measured at the earliest at one year (Wottschel 2015 one year; Wottschel 2019) and at the latest three years (Wottschel 2015 three years; Zhang 2019) after intended prognostication. In those analyses that did not specify the timing of measurement or had time‐to‐event outcomes, either the follow‐up or the time of outcome occurrence were described as being at the earliest 3.4 years (Olesen 2019) and at the latest 12.7 years (Gout 2011).

Seventeen model developments and 10 validations (28%) defined outcomes of conversion to progressive MS in participants with RRMS, except for Brichetto 2020 in which the diagnostic subtype of the model development population is unclear. When describing the secondary progression outcome, seven (26%) of these analyses referred to Lublin 1996 (Manouchehrinia 2019 Dev; Manouchehrinia 2019 Ext Val 1; Manouchehrinia 2019 Ext Val 2; Pisani 2021; Skoog 2014 Dev; Skoog 2019 Ext Val; Skoog 2019 Val), 12 (44%) did not cite any criteria but provided an outcome definition based on EDSS, and eight (30%) neither cited criteria nor provided an operationalised definition of the outcome (Brichetto 2020; Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever; Pinto 2020 SP; Seccia 2020 180 days; Seccia 2020 360 days; Seccia 2020 720 days). In those analyses that defined the timing of measurement, conversion to progressive MS was measured at the earliest at six months (Seccia 2020 180 days; Tacchella 2018 180 days) and at the latest five years (Calabrese 2013) after the intended time of prognostication. In those analyses that did not specify the timing of measurement or had time‐to‐event outcomes, either the follow‐up or the time of outcome occurrence were described as being at the earliest 10 years (Misicka 2020 10 years) and at the latest 56.7 years (Skoog 2014) after the intended time of prognostication.

Composite

Finally, the remaining four model developments and one external validation (5%) had composite outcomes. Ahuja 2021 defined a relapse outcome as a clinical and/or a radiological event at one year in a participant group of mixed diagnostic subtypes. Kosa 2022 defined a model‐based outcome that included disability and imaging components in a participant group of mixed diagnostic subtypes, with no clear timing of measurement. de Groot 2009 defined a cognitive disability outcome within three years based on multiple clinical test results in a participant group of mixed diagnostic subtypes. Pellegrini 2019 defined a disability outcome based on multiple clinical scale or test results measured within two years in people with RRMS.

The number of developments or validations for different outcome types by publication year are shown in Figure 4. There seems to be an increased interest in publishing models developed to predict diagnostic conversion to a more advanced disease state (definite MS or progressive MS) during the last decade. This might be related to the changing diagnostic criteria and a willingness to predict the newly established criteria. Interestingly, while the models predicting conversion to progressive MS were validated the most in terms of their relative frequency, there were no validations of models predicting conversion to definite MS.

**Outcomes in included analyses by year of publication.** Left: categories in model developments; right: categories in model validations. Data for the year 2021 are incomplete (only until July). CDMS: conversion to clinically definite MS; CPMS: conversion to progressive MS.

Predictors

Predictor domains

Of the 75 developments, demographic predictors were considered for inclusion in 65 (87%) and finally included in 49 (65%) models. Predictors related to disability scores or tests were considered for inclusion in 56 (75%) and finally included in 38 (51%) models. Predictors related to symptoms (relapses) were considered for inclusion in 55 (73%) and finally included in 37 (49%) models. Predictors derived from analyses of MR images were considered for inclusion in 42 (56%) and finally included in 36 (48%) models. Of the 27 models developed in participants not confined to a single diagnostic subtype, 17 (63%) considered diagnostic categories as predictors and finally nine (33%) included them in the model. Predictors related to MS treatment were considered for inclusion in 15 (20%) and finally included in nine (12%) models. Predictors from molecular analysis of proteins, transcripts, or genes were considered for inclusion in 10 (13%) and finally included in all of those models. Predictors derived from cerebrospinal fluid (CSF) analysis were considered for inclusion in 10 (13%) and finally included in seven (9%) models. Electrophysiological predictors were considered for inclusion in five (7%) and finally included in four (5%) models. Serum 25‐OH‐vitamin D, considered in Runia 2014, was the only non‐CSF laboratory sample parameter to be considered, but was not selected in the final model. Only Aghdam 2021 considered a predictor from the environmental domain, season of attack (spring vs other), for inclusion but it was also not selected in the final model.

The proportion of model developments that considered a specific domain across publication years are presented in Figure 5 (top). More recent models seem to increasingly consider para‐clinical predictors, like those derived from the analysis of imaging, CSF, omics, and electrophysiological tests. This may be related to increasing interest in these biomarkers as prognostic factors, which sometimes is the main focus of the included studies, and the increased availability of technological means to collect and analyse them. The consideration of MS treatment in prognostic model developments also shows an expected increase with time as treatment options increase and become widespread. Predictor domains considered and included in individual models are presented in Appendix 5.

**Predictors in included models.** Top: percent of models considering each predictor domain by year of publication; bottom left: number of models with selection considering (light blue) and including (dark blue) each predictor domain; bottom‐right: number of considered and included predictors (on log‐2 scale) per modelling method by publication year. Shaded regions depict the predictor number range. Data for the year 2021 are incomplete (only until July). CSF: cerebrospinal fluid, ML: machine learning.

Most of the model developments (71%) considered between three and five domains out of the 11 reported above. Figure 5 (bottom left) compares the frequency of consideration and inclusion of predictor domains in the 47 (63%) models that considered more than one domain for inclusion and had predictor selection. When considered, para‐clinical biomarkers from the domains of imaging, omics, CSF, and electrophysiology seem to be included more frequently than predictors from other domains. There are probably two explanations for this observation. Firstly, the authors considering these predictors in a prognostic model are likely to be interested in them and to select a final model that contains them (e.g. Martinelli 2017). Also, the number of possible predictors that can be derived from these measurements is high. Hence, predictors from these domains tend to outnumber those from other domains and survive a selection procedure (e.g. Gurevich 2009).

Other predictors

Predictors that were considered for inclusion in a total of 28 (37%) developments from 18 studies, but that do not fit any of the above categories, were: administrative (duration of follow‐up, seen at onset, annualised visit density, hospitalisation, scanner, study identifier, presence of specific medical assessments, country, MRI site), medical history related (co‐treatment, concomitant diseases, procedures), pregnancy and post‐partum related, patient‐reported outcomes or symptoms, disability not measured by scores or tests, and output of another predictive model. All of these predictors were considered in only single studies except follow‐up time (six studies) and pregnancy (two studies).

Number of predictors

The number of considered predictors (in degrees of freedom) ranged from two (Zakharov 2013) to 852,167 (Kosa 2022) with a median (IQR) of 23 (12.5 to 124) in 67 (89%) model developments (20 of which reported it unclearly). In seven (9%) model developments, neural network algorithms were used with raw/unsummarised imaging or longitudinal data (De Brouwer 2021; Roca 2020; Seccia 2020 180 days; Seccia 2020 360 days; Seccia 2020 720 days; Tousignant 2019; Yoo 2019), making predictor number counts irrelevant. In Bendfeldt 2019 Linear Placebo, the number of considered predictors in the support vector machine (SVM) model was unclear and is reported as the number of voxels in MRI images.

The number of predictors included in the final models (in degrees of freedom) ranged from two (Borras 2016; Sormani 2007 Dev; Zakharov 2013) to 703 (Kuceyeski 2018) with a median (IQR) of 6.5 (4 to 11.5) in 64 (85%) models (17 of which were unclear). For four (5%) model developments (Bendfeldt 2019 Linear Placebo; Pinto 2020 Severity 10 years; Pinto 2020 Severity 6 years; Pinto 2020 SP), there was insufficient information on the number of predictors in the final model.

The number of predictors considered for and included in the final models were clear for 31 (41%) model developments. Of these, 10 (32%) had no predictor selection, so the numbers before and at the end of the modelling process were equal. In the remaining, the difference between the number of considered and included predictors ranged from one (de Groot 2009 Cognitive; Olesen 2019 Routine) to 201 (Ye 2020 nomogram) with a median (IQR) of 14 (1 to 28) and the median (IQR) percent decrease in the number of predictors from considered to included was 77% (40% to 81%). The number of considered and included predictors by algorithm type are presented in Figure 5 on the log2‐scale. As expected, independent of time, models developed using ML methods seem to both consider and include higher number of predictors than those using traditional methods. There also seems to be a slight increase with time in the number of considered predictors for models developed with traditional statistics and number of included predictors in models developed with ML methods.

Bergamaschi 2015 was the only study in which the set of predictors was different in the validation than in the original model, BREMS, which had nine predictors (developed in Bergamaschi 2001 and initially evaluated in Bergamaschi 2007). Two predictors that were measured within one year of disease onset were dropped from the model without refitting, resulting in BREMSO with seven predictors.

Predictor handling

In four (5%) of the 75 models, at least one interaction between predictors was considered during development. In eight (11%) models, no interactions were considered during development. Modelling methods, e.g. random forests, that intrinsically accounted for interactions were used in 31 (41%) model developments. For the remaining 32 (43%) models, it was not reported if interactions were considered or not during development. During the development of 44 (59%) models, there was no evidence of categorisation of continuous predictors. During the development of 17 (23%) models at least one predictor was dichotomised or categorised. How the predictors were handled was unclear in 13 (17%) model developments. There was insufficient information to deduce how the predictors were handled during development in Zakharov 2013.

Timing of candidate predictor measurement was described as 'at disease onset' in 20 (27%) models. The predictors were measured at study baseline in 17 (23%) models using data from RCT or cohort studies. At least 13 (17%) models considered predictors measured at multiple visits, and at least the models in Misicka 2020 and Oprea 2020 were based on predictor and outcome data collected at a single time point (Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever; Oprea 2020).

Sample size and missing data

In model developments, sample size ranged from 33 participants (Olesen 2019) to 8825 participants (Manouchehrinia 2019), with a median size (IQR) of 186 (84 to 664) participants. There were six developments (8%, from four studies) with visits as the unit of analysis. In these studies, visits per participants over time were treated as independent observations. The number of visits ranged from 527 (Tacchella 2018) to 2502 (Yperman 2020). Event number was not relevant for seven (9%) developments with continuous outcomes (Bejarano 2011; Gurevich 2009; Kosa 2022; Kuceyeski 2018; Margaritella 2012; Roca 2020; Rocca 2017). The remaining developments analysed a median of 80 events (IQR 37 to 165 events, range 16 to 1953), but five values were unclearly reported.

There were three developments that considered raw/unsummarised imaging data: while Tousignant 2019 considered only imaging data, Roca 2020 and Yoo 2019 both also considered summary predictors, such as lesion load/volume, as well as patient demographics. For these studies, the maximum EPV was calculated excluding the raw/unsummarised imaging data. The EPV could not be computed for Tousignant 2019 due to predictor type and for one development (Bendfeldt 2019 Linear Placebo) due to missing number of considered predictors. The median EPV in the remaining 73 developments was 3.9 (IQR 1 to 9.9, range 0.0002 to 122.1); however, the precise EPV was unclear in 22 developments and the largest, i.e. most optimistic, EPV possible based on reported information was used. For the 73 developments for which EPV could be computed, 17 (23%) had an EPV of 10 or greater and seven (10%) had an EPV of 20 or greater, respectively the older and more current rule of thumb thresholds for the minimum EPV needed for prediction model development.

Model validations included from 10 (Gurevich 2009) to 14,211 participants (Bergamaschi 2015) with a median (IQR) of 217 participants (136 to 700 participants). The number of events was not reported in four validations (19%) and not relevant in one study due to a continuous outcome. Validation in Ahuja 2021 was at the observation level, with an unreported number of observations from 186 participants. The median number of events in model validation was 76 (IQR 33 to 130, range 19 to 3567), below the 100 event minimum suggested by PROBAST. Only seven (44%) of the 16 validations with clear reporting included at least the minimum recommended number of events.

The most common method for handling missing data, employed in 35 (36%) analyses, was to exclude participants from the study if data were missing for specific or any variables. Complete case analysis was used in 26 (27%) analyses. Predictors, instead of participants, with missing data were excluded from nine analyses (9%) found in two studies (Seccia 2020; Zhao 2020). Imputation was reported for 18 analyses (19%), but only five of these reported using multiple imputation. Multiple methods for dealing with missing data were often combined, as reported in 25 (26%) analyses. The method of handling missing data was not reported for 25 (26%) analyses. Although reporting on the number of participants with missing data was often unclear, it was clear that, when using routine care or registry data, hundreds, even thousands, of participants were excluded from analysis due to missing or error‐prone data. De Brouwer 2021, for example, describe the exclusion steps that brought the MSBase analysis set from 55,409 participants down to 6682 participants, an 88% drop.

Model development

We identified 34 models developed using traditional statistical methods (45%), 40 models developed using ML methods (53%), and one model that selected predictors using ML but fit the final model using traditional statistical methods. The traditional statistical methods included 16 (46%) logistic regression models, 15 (43%) survival analyses (11 of them Cox models), one Bayesian model averaging model, and three (9%) linear regression models.

Of those using ML, three (8%) were developed using penalised regression (a LASSO penalty was applied to two logistic models and one Cox model), 10 (25%) using SVM, and 16 (40%) using tree‐based methods (two classification trees, nine random forests, and five using boosting). Of the random forest developments, one had a numeric outcome (Kosa 2022), and one a survival outcome (Pisani 2021). One model used partial least squares regression (Kuceyeski 2018). Another eight (20%) used neural networks and an additional two models were developed by combining ML methods.

The first identified development using ML was published in 2009. Gurevich 2009 used a multi‐class SVM to distinguish between three data‐driven categories of time until relapse. Two years later, Bejarano 2011 used a multilayer perceptron (a type of neural network) to predict change in EDSS. Since 2018, ML developments have been published in the literature every year and in increasing frequency (see Figure 2 right). As of the latest search in July 2021, only prediction model developments employing ML had been published in 2021. Please note that the decrease in number of identified prediction modelling developments in 2021 is at least partially due to the search covering only the first half of the year.

Univariable predictor selection was reported in 17 developments (23%), while this was unreported or unclear in four developments (5%). While 22 developments (29%) took a full model approach, multivariable predictor selection in the remaining developments took several forms. Of these, eight (11%) based selection on coefficient hypothesis testing, 18 (24%) employed stepwise selection, seven (9%) selected from several models with different predictor sets, two (3%) relied on the selection properties of LASSO penalised regression, and another four (5%, Montolio 2021; Pinto 2020 three models) used LASSO for selection but not for prediction. Other methods of multivariable predictor selection methods were used in nine developments (12%), including Bayesian methods, variable importance ranking, minimal depth in tree‐based methods, frequency of selection within cross‐validation, and combinations of methods. For five developments (7%), the use of multivariable selection methods was unclear or not reported.

In one study, uniform shrinkage was applied to each of its three final developed models (de Groot 2009). Some amount of shrinkage was induced in 38 (51%) developments due to modelling methods, including Bayesian methods, penalised estimation, and other ML methods. No shrinkage was applied in 31 (41%) developments, and it was unclear for two (3%) developments.

Of the 41 developments involving tuning parameters, 19 (46%) mentioned specific tuning parameters and the method used to tune them. There were six (15%) models from two studies for which the use of software defaults were reported upon correspondence (Pinto 2020; Tacchella 2018). Details were unclear in 10 (24%) developments in which tuning for parameters unrelated to the ML algorithm was mentioned but algorithm‐specific tuning was not. There was no reporting related to tuning in six model developments (15%).

Model performance and evaluation

Internal validation methods

Of the 73 model performance evaluations using development data, two evaluations relate to a single model assessed using development data in both the development study and a later validation study (Skoog 2014; Skoog 2019). There was a single development study in which model performance was only evaluated on an external validation set (Ahuja 2021). There were an additional two development studies in which model performance was not evaluated (Bergamaschi 2001; Weinshenker 1991). These were the two studies included because their models were evaluated as prediction models in later studies. Apparent performance was reported in 16 (22%) internal validations and a single random split of the data was used in nine (12%) evaluations. Cross‐validation and bootstrap procedures, preferred approaches to internal validation, were conducted 34 (47%) and 10 (14%) times, respectively. Methods were unclear in four (5%) internal validations, in which bootstrap methods were used for some purpose during development, but not clearly for performance evaluation. The number of bootstrap samples ranged from 200 (Manouchehrinia 2019) to 1500 (Spelman 2017). Leave‐one‐out cross‐validation was reported 10 times, while k‐fold cross‐validation was reported 18 times, with the number of folds varying between 2 (Wottschel 2019) and 10 (Bejarano 2011; Law 2019; Montolio 2021; Pinto 2020; Zhao 2020). Additionally, Wottschel 2019 assessed the influence of cross‐validation methods on classification performance estimates by comparing 2‐fold, 5‐fold, 10‐fold, and leave‐one‐out cross‐validation. Cross‐validation based on leaving a percentage of the data out or based on shuffle split was reported in another five evaluations.

Performance measures

There were a total of 93 model performance evaluations, 24 (26%) of which reported on model calibration, either with a plot, table, measure, or several of these. There were 16 evaluations with calibration plots, four with O:E tables, two with histograms or bar plots depicting differences between observed and predicted outcomes in some way, and one table of observed event frequencies across score levels. Calibration slopes were reported in four evaluations from two studies and the P value from the Hosmer‐Lemeshow test in five from four studies. The Gronnesby and Borgan test P value was reported for one model evaluation and mean square error once. The O:E ratio was reported twice in one study in order to provide a recalibration factor based on the development data and on external validation data (Skoog 2019).

We had intended to compute O:E ratios from reported information; however, the expected number of events or other expected outcomes were only rarely reported. Both of the evaluations rated at low risk of bias in the analysis domain assessed calibration in some way. Pellegrini 2019 presented bootstrap‐corrected calibration slopes of 1.08 (SE 0.17) and 0.97 (SE 0.15) for one and two year composite disease progression outcomes. De Brouwer 2021 reported the use of Platt scaling on their deep learning model for EDSS‐based disease progression at two years. This procedure transforms the output of the classification model into a probability distribution via logistic regression (Platt 1999). Upon correspondence, the authors of De Brouwer 2021 provided a calibration plot with no evidence of major departures from calibration.

There were 85 evaluations for which discrimination and classification measures were applicable (models with survival or binary outcomes). A c‐statistic was reported in 47 (55%) of them, 31 (36%) with some measure of uncertainty. Reporting was unclear about the use of a c‐statistic for survival data in three evaluations (Runia 2014; Spelman 2017; Ye 2020 gene signature). Reported c‐statistics ranged from a minimum of 0.59 (Pellegrini 2019; Ye 2020) to a maximum of 0.92 (Pisani 2021; Tommasin 2021), with a median (IQR) value of 0.77 (IQR 0.71 to 0.82). Both of the evaluations rated at low risk of bias in the analysis domain reported c‐statistics below the median observed across the literature: Pellegrini 2019 reported the minimum c‐statistic of 0.59 and De Brouwer 2021 reported a c‐statistic of 0.66.

Classification measures were reported in 49 (58%) evaluations of survival or classification models. These evaluations reported accuracy or error measures 36 times, sensitivity or specificity 43 times, positive or negative predictive values 21 times, and other measures such as F1 score eight times. There were nine evaluations communicating use of 0.5 as the threshold value for estimating classification performance and seven reporting classification measures for more than one threshold value. Another three used some percentile of the data and nine used data‐driven methods to identify an optimal threshold. Classification measures were also applied to models of continuous outcomes in five evaluations (Bejarano 2011 Dev; Bejarano 2011 Val; Gurevich 2009 FTP; Margaritella 2012; Rocca 2017), with the threshold value unclearly reported for two evaluations from one study (Bejarano 2011) and the other three evaluations using some window around the observed value to be predicted as a threshold.

Model presentation

Models were presented in various ways across studies and modelling methods. For the 35 developments using traditional regression methods for fitting, eight (23%) full regression models with intercepts/baseline hazards were presented and another eight (23%) regression models were presented without intercepts/baseline hazards or some other model coefficients. Three (9%) regression models were simplified into sum scores (Gout 2011; Runia 2014; Vasconcelos 2020), two of which were unweighted sums, i.e. predictor counts (Runia 2014; Vasconcelos 2020). The tools presented included eight nomograms (Manouchehrinia 2019; three in Misicka 2020; two in Olesen 2019; Spelman 2017; Ye 2020), three score charts (all in de Groot 2009), one web application (Skoog 2014), and one heat map (Borras 2016). Manouchehrinia 2019 additionally presented their nomogram as a web application and the web application of Skoog 2014 was updated using the shrinkage factor estimated in Skoog 2019. Two developments were described by lists of included predictors. Malpas 2020 presented a chart of relative risks associated with various combinations of predictors from their simplified model. Two (6%) developments based on traditional methods did not present the final model in any way (Oprea 2020; Zakharov 2013).

Of the 40 models fit using ML, only five (12%) reported tools allowing other users to make predictions for new people with MS. Lejeune 2021 presented a web application, Aghdam 2021 presented a decision tree, and Pisani 2021 presented a tool based on the sum of a heat map‐derived value and a formula weighted by predictor random forest minimal depths. The other two studies provided model coefficients from penalised regression without intercepts/baseline hazards (Ahuja 2021; Ye 2020). Other presentations included one bar chart of predictor weights from a linear SVM although a non‐linear SVM was fit (Bendfeldt 2019), and eight ML developments presented the final model as a list of included predictors. Ten ML developments were not followed by final model presentation in any way. Independent of the model presentation described above, there were a total of 19 ML developments that reported some measure of variable importance.

Model interpretation

Of the 57 studies included, 26 (46%) primarily aimed to predict clinical outcomes in individual patients, as indicated by mentioning the intent to create or assess a model or tool in their abstract, introduction, and discussion. In another 21 studies (37%), outcome prediction was an aim of the study; however, the focus appeared to be on other aspects of the study, such as predictors and modelling methods. Outcome prognostication in individuals was not the primary aim in 10 studies (18%), all of which were instead mainly interested in predictor identification or the usefulness of specific predictors. Forty‐three studies (75%) were presented as exploratory research, indicating some need for further development or validation, while 14 studies (25%) were presented with confirmatory conclusions, eight of which were not associated with any external validation.

We assessed the presence of information on study strengths and limitations, generalisability of results, and comparisons with other modelling studies for the 57 included studies. Most studies discussed their strengths and limitations (49 (86%) and 51 (89%) studies, respectively), just over half of studies (31, 54%) discussed the generalisability of their results; however, only 16 (28%) studies mentioned other models in their discussions. These comparisons with other models were focused on the predictors and modelling methods used, rather than a comparison of model performance with that of other MS prognostic models with similar outcomes. The most comprehensive comparison with other prognostic models, a table of performance measures for models from five other MS prognostic model studies with description of outcome definitions and timing, was presented by Montolio 2021.

Usability and reproducibility

Model usability and reproducibility, as defined in Appendix 3, was assessed for each of the 75 developed models and summarised in Table 2 (ordered by outcome). Usability was assessed in terms of skill and equipment specialisation required for predictor collection, model presentation, the ability of the presented model to estimate absolute risk, and the number of external validations performed for the model. Model reproducibility is summarised by the availability of the model/tool, code, and data.

1. Model usability and reproducibility.

Model	Outcome	Predictor timing	Equipment	Usability	Absolute risk	Ext. Val.	Reproducibility
Agosta 2006	Disability (EDSS)	From study entry to 1 year after study entry	Specialty centre	Unclear	No	0	Unclear
Bejarano 2011	Disability (EDSS)	From disease onset (MS) to study entry	Specialty centre	No model	Not risk model	1 external refit	None
De Brouwer 2021	Disability (EDSS)	From onset to index date (including trajectories during the 3 years prior to index date)	No special equipment	No model	No	0	Code
de Groot 2009 Dexterity	Disability (9HPT)	At Poser MS diagnosis (within 6 months)	Specialty centre	Tool + instructions	No	0	Tool
de Groot 2009 Walking	Disability (EDSS)	At Poser MS diagnosis (within 6 months)	Standard hospital	Tool + instructions	No	0	Tool
Kuceyeski 2018	Disability (cognitive ‐ SDMT)	From disease onset (undefined ‐ RRMS?) to final follow‐up	Specialty centre	No model	Not risk model	0	None
Law 2019 Ada	Disability (EDSS)	From disease onset (MS) to study entry	Specialty centre	No model	No	0	None
Law 2019 DT	Disability (EDSS)	From disease onset (MS) to study entry	Specialty centre	No model	No	0	None
Law 2019 RF	Disability (EDSS)	From disease onset (MS) to study entry	Specialty centre	No model	No	0	None
Lejeune 2021	Disability (EDSS)	From disease onset (undefined ‐ RRMS?) to index relapse	No special equipment	Tool + instructions	Yes	1	Tool, DOR
Malpas 2020	Disability (EDSS)	From symptom onset to 1 year after onset	No special equipment	Tool + instructions	No	1	Tool
Mandrioli 2008	Disability (EDSS)	From disease onset (first attack) to diagnosis (CDMS)	Specialty centre	Unclear	Yes	1	Unclear
Margaritella 2012	Disability (EDSS)	From disease onset (MS) to 1 year prior to outcome	Specialty centre	Unclear	Not risk model	0	Unclear
Montolio 2021	Disability (EDSS)	At study entry, year 1 visit and year 2 visit	Specialty centre	No model	No	0	None
Oprea 2020 Disability	Disability (EDSS)	At study entry	No special equipment	No model	No	0	None
Pinto 2020 Severity 10 years	Disability (EDSS)	From onset to 5 years post‐prognostication	Not reported	No model	No	0	None
Pinto 2020 Severity 6 years	Disability (EDSS)	From onset to 2 years post‐prognostication	Not reported	No model	No	0	None
Roca 2020	Disability (EDSS)	At FLAIR imaging (anytime)	Specialty centre	No model	Not risk model	0	None
Rocca 2017	Disability (EDSS)	From study entry to 15 months after study entry	Specialty centre	Model	Not risk model	0	Model
Rovaris 2006	Disability (EDSS)	From study entry (anytime during PPMS)	Specialty centre	Unclear	No	0	Unclear
Sombekke 2010	Disability (MSSS)	At disease onset (MS)	Specialty centre	Model	No	0	Model
Szilasiova 2020	Disability (EDSS)	At study entry	Standard hospital	Unclear		0	Unclear
Tommasin 2021	Disability (EDSS)	At imaging visit	Specialty centre	No model	No	0	None
Tousignant 2019	Disability (EDSS)	At imaging visit	Specialty centre	No model	No	0	None
Weinshenker 1991 M3	Disability (DSS)	From disease onset (initial symptom) to assessment (not defined)	No special equipment	Model + instructions	Yes	1	Model
Weinshenker 1996 Short‐term	Disability (EDSS)	From disease onset (initial symptom) to outcome measurement	No special equipment	Model	Yes	0	Model
Yperman 2020	Disability (EDSS)	At clinical visit (unclear: at any time during MS)	Specialty centre	No model	No	0	DOR
Zhao 2020 LGBM All	Disability (EDSS)	2‐year observation window	Specialty centre	No model	No	0	Code, DOR
Zhao 2020 LGBM Common	Disability (EDSS)	2‐year observation window	Specialty centre	No model	No	1 (unclear if refit)	Code, DOR
Zhao 2020 XGB All	Disability (EDSS)	2‐year observation window	Specialty centre	No model	No	0	Code, DOR
Zhao 2020 XGB Common	Disability (EDSS)	2‐year observation window	Specialty centre	No model	No	1 (unclear if refit)	Code, DOR
Gurevich 2009 FLP	Relapse	During CIS or CDMS	Specialty centre	No model	No	1	None
Gurevich 2009 FTP	Relapse	During CIS or CDMS	Specialty centre	No model	Not risk model	0	None
Sormani 2007	Relapse	From 2 years prior to baseline measurement (during RRMS)	Standard hospital	Model + instructions	Yes	1	Model
Vukusic 2004	Relapse	From disease onset (MS) to delivery	No special equipment	Model + instructions	Yes	0	Model
Ye 2020 gene signature	Relapse	At study entry	Specialty centre	Model + instructions	No	0	Model, Data
Ye 2020 nomogram	Relapse	At study entry	Specialty centre	Model	Yes	0	Tool, Data
Aghdam 2021	Conversion to definite MS	At ON event	Standard hospital	Model	Yes	0	Tool
Bendfeldt 2019 Linear Placebo	Conversion to definite MS	At CIS onset (within 60 days)	Specialty centre	No model	No	0	None
Bendfeldt 2019 M7 Placebo	Conversion to definite MS	At CIS onset (within 60 days)	Specialty centre	No model	No	0	None
Bendfeldt 2019 M9 IFN	Conversion to definite MS	At CIS onset (within 60 days)	Specialty centre	No model	No	0	None
Borras 2016	Conversion to definite MS	At disease onset (CIS, up to 126 days after onset)	Specialty centre	Tool + instructions	Yes	0	Tool
Gout 2011	Conversion to definite MS	At CIS onset (admission for CIS event)	Standard hospital	Tool + instructions	No	0	Tool
Martinelli 2017	Conversion to definite MS	At CIS onset (within 3 months)	Specialty centre	No model	No	0	None
Olesen 2019 Candidate	Conversion to definite MS	At disease onset (ON, up to 38 days after onset)	Specialty centre	Tool + instructions	Yes	0	Tool, DOR
Olesen 2019 Routine	Conversion to definite MS	At disease onset (ON, up to 38 days after onset)	Specialty centre	Tool + instructions	Yes	0	Tool, DOR
Runia 2014	Conversion to definite MS	At disease onset (CIS)	Standard hospital	Tool + instructions	No	0	Tool
Spelman 2017	Conversion to definite MS	At disease onset (within 12 months)	Specialty centre	Tool + instructions	Yes	0	Tool
Wottschel 2015 1 year	Conversion to definite MS	At CIS onset (within a mean of 6.15 weeks)	Specialty centre	No model	No	0	None
Wottschel 2015 3 years	Conversion to definite MS	At CIS onset (within a mean of 6.15 weeks)	Specialty centre	No model	No	0	None
Wottschel 2019	Conversion to definite MS	At CIS onset (within 14 weeks)	Specialty centre	No model	No	0	None
Yoo 2019	Conversion to definite MS	At CIS onset (within 180 days)	Specialty centre	No model	No	0	None
Zakharov 2013	Conversion to definite MS	At first MRI after CIS onset	Specialty centre	No model	No	0	None
Zhang 2019	Conversion to definite MS	At CIS onset (primary clinical work‐up for CIS)	Specialty centre	No model	No	0	None
Bergamaschi 2001 BREMS	Conversion to progressive MS	From disease onset (RRMS) to 1 year after disease onset	No special equipment	Unclear	No	2, simplified: 2*	Unclear
Brichetto 2020	Conversion to progressive MS	At visit of interest	Standard hospital	No model	No	0	None
Calabrese 2013	Conversion to progressive MS	At study entry (during RRMS)	Specialty centre	Model + instructions	Yes	1	Model
Manouchehrinia 2019	Conversion to progressive MS	From disease onset (unclear: RRMS?) up to first EDSS recorded (several years after onset)	No special equipment	Tool + instructions	Yes	3	Tool
Misicka 2020 10 years	Conversion to progressive MS	At study interview	Specialty centre	Tool + instructions	Yes	0	Tool
Misicka 2020 20 years	Conversion to progressive MS	At study interview	Specialty centre	Tool + instructions	Yes	0	Tool
Misicka 2020 Ever	Conversion to progressive MS	At study interview	Specialty centre	Tool + instructions	Yes	0	Tool
Pinto 2020 SP	Conversion to progressive MS	From onset to 2 years post‐prognostication	Not reported	No model	No	0	None
Pisani 2021	Conversion to progressive MS	From RRMS onset to 2 years post‐onset	Specialty centre	Model	No	0	Tool, DOR
Seccia 2020 180 days	Conversion to progressive MS	Patient trajectories until index visit (during RRMS)	Standard hospital	No model	No	0	Data
Seccia 2020 360 days	Conversion to progressive MS	Patient trajectories until index visit (during RRMS)	Standard hospital	No model	No	0	Data
Seccia 2020 720 days	Conversion to progressive MS	Patient trajectories until index visit (during RRMS)	Standard hospital	No model	No	0	Data
Skoog 2014	Conversion to progressive MS	From last relapse to index date, repeatedly	No special equipment	Tool + instructions	Yes	1	Tool
Tacchella 2018 180 days	Conversion to progressive MS	From disease onset to the index visit of interest	Standard hospital	No model	No	0	None
Tacchella 2018 360 days	Conversion to progressive MS	From disease onset to the index visit of interest	Standard hospital	No model	No	0	None
Tacchella 2018 720 days	Conversion to progressive MS	From disease onset to the index visit of interest	Standard hospital	No model	No	0	None
Vasconcelos 2020	Conversion to progressive MS	From onset (unclear) to at least 2 years (unclear)	No special equipment	Unclear	No	1	Unclear
Ahuja 2021	Composite (relapse)	From 12 months prior to index date	Standard hospital	Model	No	1	Model, Code, DOR
Kosa 2020	Composite (EDSS, SNRS, T25FW, NDH‐9HPT)	At lumbar puncture	Specialty centre	No model	Not risk model	0	None
de Groot 2009 Cognitive	Composite (cognitive tests)	At Poser MS diagnosis (within 6 months)	Specialty centre	Tool + instructions	No	0	Tool
Pellegrini 2019	Composite (EDSS, T25FW, 9HPT, PASAT, VFT)	From disease onset (MS) to study entry	Standard hospital	Model	No	0	Model

Open in a new tab

9HPT: 9‐hole peg test Ada: adaptive boosting BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset CDMS: clinically definite multiple sclerosis CIS: clinically isolated syndrome DOR: data available on request (as reported in the publication) DSS: Disability Status Scale DT: decision tree EDSS: Expanded Disability Status Scale FLP: first level predictor FTP: fine tuning predictor IFN: interferon LGBM: light gradient boosting machine MRI: magnetic resonance imaging MS: multiple sclerosis MSSS: multiple sclerosis severity score NDH‐9HPT: non‐dominant hand 9‐hole peg test ON: optic neuritis PASAT: Paced Auditory Serial Addition Test PPMS: primary progressive multiple sclerosis RF: random forest RRMS: relapsing‐remitting multiple sclerosis SDMT: symbol digit modalities test SNRS: Scripps neurological rating scale SP: secondary progressive T25FW: timed 25‐foot walk VFT: visual function test XGB: extreme gradient boosting

The timing of predictor collection varied across the final models. There were 38 models (51%) using information available at a single time point (20 at disease onset and 18 at an arbitrary point) and seven models (9%) using information from a specific timeframe (between 12 and 24 months) relative to disease onset or study entry. Twenty‐five models (33%) used data available over time and another four models from two studies specifically used predictor data longitudinally (De Brouwer 2021; Seccia 2020). Predictor assessment timing was unclear in the model of Vasconcelos 2020.

The level of skill and equipment specialisation could not be assessed for three developments from one study due to lack of information on selected predictors (Pinto 2020). All 72 models reporting details on included predictors were found to require a specialist in order to measure or assess predictors, 35 of which contained EDSS. For this reason, level of skill is omitted from Table 2. Greater variability in rating was observed for level of equipment specialisation: 11 models (14.7%) required no special equipment, relying only on demographics, disease subtype, symptoms, and treatments. Predictors from 17 models (22.7%) could be measured in a standard hospital, and 44 models (59%) required specialised equipment related to advanced imaging, omics, CSF markers, and evoked potential markers.

We identified 39 (52%) developed models that were not accompanied by model coefficients, tools, or instructions. Eight (11%) models were reported with basic model information, five (7%) with a model and instructions, and 16 (21%) were given as simple tools with instructions or explanation of use. This measure of usability was rated as unclear for seven (9%) models for which model components not considered to be predictors were not reported (for example, coefficients for follow‐up duration adjustment), when it was unclear if model coefficients were missing, or when coding of predictors was especially unclear. For two models, for example, coding of the basic demographic predictor sex was unclear (Margaritella 2012; Szilasiová 2020).

There were seven developed models that did not aim to predict the risk of a clinical outcome, but rather the value of a continuous measure. Of the 34 models for future disease risk that reported a final model in some way, absolute risk can be estimated with the reported information from 18 (53%) of them, but not for 15 (44%). Clarification of predictor coding would enable estimation of absolute risk from one further model (Szilasiová 2020).

Analysis data was made publicly available for five (7%) models from two studies (Gurevich 2009; Seccia 2020) and analysis code was publicly available for six (8%) models from three studies (Ahuja 2021; De Brouwer 2021; Zhao 2020). One further study developing two models (Ye 2020) reused the data provided by Gurevich and colleagues (Gurevich 2009). Six (8%) studies explicitly stated that data were available upon request (Ahuja 2021; Lejeune 2021; Olesen 2019; Pisani 2021; Yperman 2020; Zhao 2020). For 28 (37%) models from 19 studies, no models, code, or data were provided (three traditional regression models and 25 ML models). None of the studies provided a model/tool, code, and data or even just code and data.

Because multiple external validations should be performed before a model is deemed clinically useful, the number of external validations performed for each model is also given in Table 2. Of the identified models, only 12 (16%) were externally validated at least once. Of these, one model was externally validated twice (Bergamaschi 2001) and another three times (Manouchehrinia 2019).

Risk of bias

As depicted in Figure 6 (left), all but one of the 96 analyses (Pellegrini 2019) were found to have high risk of bias. This single study was co‐authored by a clinical prediction modelling methodologist from outside the MS field, who is part of the PROBAST group (Wolff 2019). The introduction of this study listed many of the substandard aspects of prediction model development in MS also identified in this review. It appeared to aim to depict correct prognostic model development and internal validation steps for the MS community.

The high risk of bias across the literature was driven mainly by the analysis domain, for which only two (2%) analyses, that of De Brouwer 2021 and Pellegrini 2019, were found to be at low risk of bias but 94 (98%) at high risk (see Figure 6 left), and, to a lesser extent, by the participants domain, for which 18 (19%) analyses were found to be at low risk but 78 at high or unclear risk of bias (59 (61%) and 19 (20%), respectively). Domain‐level risk of bias plots per analysis are provided in Appendix 6. Item‐level assessment details for the analysis domain are depicted in Figure 6 (right).

The high risk of bias related to the analysis domain was multi‐faceted but mainly driven by two PROBAST items: 80 (83%) analyses were found to have an insufficient number of participants and 81 (84%) analyses did not use relevant model performance measures, with most ignoring calibration. Besides the exclusion for missing data addressed in the participants domain, 32 (33%) analyses used other suboptimal methods for dealing with missing data. Predictor dichotomisation and univariable predictor selection are still used, as found in 13 (16%) analyses and 17 (23%) developments, respectively. A clear difference between studies using ML as opposed to traditional statistics can be seen in the reporting of final models. Of the 35 developments using traditional regression methods, 23 (66%) were reported in such a way that it was clear that the final model corresponded to the multivariable analysis. However, only three (8%) of the 40 ML developments reported final model details that correspond directly with the multivariable analysis. Most ML developments did not present tools or report enough information for understanding of the final models.

The two most common reasons for a high risk rating in the participants domain were the use of routine care or registry data (36 analyses, 38%) and inappropriate exclusion of participants (35 analyses, 36%). While registries are an important source of data for MS research, their quality and limitations should be reported and addressed. Data quality was rarely discussed, and the only reported method to deal with poor data quality was to exclude participants with missing/erroneous data. There was no mention of whether the excluded participants were otherwise similar to included participants with respect to observed covariates, and it was unclear whether study teams even had access to the excluded data in order to assess possible differences. Additionally, inappropriate inclusion of participants known to already have the outcome at baseline affected at least five developments from three studies (de Groot 2009; Malpas 2020; Szilasiová 2020). This is expected to result in overestimated performance estimates at internal validation (Moons 2019).

The use of problematic data sources also led to issues in the predictor and outcome domains. Combined with insufficient reporting, it was difficult to judge whether predictors and outcomes were assessed uniformly across participants and whether each was blinded to the other. The registry datasets cover long time periods and multiple sites, which makes it unlikely that predictors and outcomes were uniformly measured, especially given the rapid changes in diagnostic criteria and the poor generalisability of imaging predictors across machines (Seccia 2021). An important, independent issue to highlight within the predictors' domain though relates to timing. We identified 11 (11%) analyses using predictors only available after the intended time of model use, which makes the model unusable in practice. The intended time of model use was generally unclear, making it difficult to understand when the model is meant to be used and how far into the future it is meant to predict outcomes for.

Although our review question was broad in nature, we found only 36 (38%) analyses to be of low concern regarding applicability. The most common reason for concern related to participants was the inclusion of participants known to have the outcome of interest at the time the model was applied, jeopardising the categorisation of the model as a prognostic model. For example, one study defining disability as EDSS more than or equal to five at 15 years included an unknown number of participants with EDSS more than or equal to five already at baseline (Szilasiová 2020). We found the most frequent concern regarding predictors to be the inclusion of only a single predictor type (e.g. only imaging or genetic predictors) without consideration of more basic, easier‐to‐collect predictors. Only Kosa 2022 was rated at high concern due to unclear interpretation of the outcome. Kosa and colleagues modelled the outcome MS‐DSS, which is itself the output of another model, making interpretation difficult. The most common reason for high concern to overall applicability was a primary aim other than development or validation of a prognostic model for individual prediction, which was determined for 12 (12%) analyses. For 27 (28%) analyses lacking final model/tool presentation in a way allowing for application to new individuals, we considered the applicability unclear. This concern was especially frequent amongst the ML studies.

Reporting deficiencies

Across the included analyses, 54% of the 20 mandatory TRIPOD items we evaluated were reported. When unclear or partial reporting was included, the amount of reporting increased to 69%. Of the 19 mandatory TRIPOD items applicable to developments, fewer were reported in those using ML methods (49%) compared to those using traditional statistics (60%). An item‐wise summary of reporting is shown in Figure 7 both overall for all analyses types and by algorithm type for model developments.

When we compared the percentage of reporting in the model developments using traditional statistics published before 2016 (the publication year of TRIPOD) and during or after 2016, we did not observe any difference (59% and 61%, respectively). Visual inspection did not indicate any time trends in median percentage reporting overall or in categories based on the algorithm or the analysis type (see Appendix 6).

When described analysis‐wise, the best reporting amongst developments was in all three models from de Groot 2009 with 84% of the 19 items reported, and the worst reporting was in Oprea 2020 with 16% reported. The best reporting amongst validations was at 73% of the 15 items in Lejeune 2021 Ext Val, Manouchehrinia 2019 Ext Val 2, and Manouchehrinia 2019 Ext Val 3, and the worst reporting was at 13% in Gurevich 2009 FLP Ext Val. Item‐wise reporting per analysis is displayed in Appendix 6.

Source of data and participants

At least one out of the five items related to source of data and participants was not reported in 70 (73%) of the analyses. The item with worst reporting under this heading was treatments (item 5c). Of the 96 included analyses, the treatments received by participants, either at baseline or during follow‐up, were somehow reported but not clearly in 25 (26%) analyses and not reported at all in 40 (42%) analyses. Of those that did not report treatments received by participants, eight (20%) were solely in people with CIS (Aghdam 2021; Olesen 2019 Candidate; Olesen 2019 Routine; Runia 2014; Wottschel 2019; Yoo 2019; Zakharov 2013; Zhang 2019). This item was reported less frequently in models developed with ML methods (20%) than with traditional statistics (46%).

The study start and end dates (item 4b) were the next item that most of the analyses failed to report. They were somehow reported but not clearly in three (3%) (Bergamaschi 2001 BREMS Dev; Borras 2016; Roca 2020), and not reported at all in 52 (54%) of the 96 included analyses. Although reported relatively better than most of the other items, the most fundamental information on the study design or source of data (item 4a) was reported in an unclear manner in almost every fourth (21) analysis, and was totally missing from three (3%) analyses (Kuceyeski 2018; Oprea 2020; Yoo 2019).

Predictors and outcome

At least one out of the three items related to predictors and outcome was not reported in 92 (96%) analyses. Of the 96 included analyses, the outcome definition (item 6a) was missing from five (5%) analyses with conversion to progressive MS outcomes (Brichetto 2020; Pinto 2020 SP; Tacchella 2018 180 days; Tacchella 2018 360 days; Tacchella 2018 720 days). Outcomes were not clear in Bejarano 2011, which reported AUC measures for change in EDSS (modelled as continuous), and Oprea 2020, which reported keeping an EDSS score with unclear thresholds and time points. Blinding of the outcome assessment to predictors (item 6b) was reported in only four analyses (Kosa 2022; Olesen 2019 Candidate; Olesen 2019 Routine; Rovaris 2006) and not reported at all in the remaining.

Of the 75 model developments in which the reporting of predictor definitions (item 7a) was assessed, predictor definitions were somehow reported but not clearly in 24 (32%) developments and not reported at all in 12 (16%) developments.

Sample size and missing data

At least one out of the three items related to sample size and missing data were not reported in 82 (85%) analyses. Of the 96 analyses, none reported details of sample size justification (item 8) to reach a level of certainty of the reported effect sizes. The presentation closest to a sample size justification was in Yperman 2020, which used a random forest classifier in nested cross‐validation. They plotted a learning curve of AUC as a function of different sizes of the training set to discuss any plateauing and its sufficiency. The limitations posed by a small sample size were somewhat discussed in 24 (25%) analyses. Model developments with ML methods were more likely (45%) to discuss their limited sample size or the drawbacks posed by it compared to those with traditional methods (14%).

Only 23 (24%) of the analyses reported the amount of missing data handled (item 13b) during study design or analysis. This is despite the fact that we considered a study to have reported the amount of missing data when the only information provided on this topic was the number of excluded participants due to lack of a predictor domain measurement (e.g. missing MR images). Thirty‐six analyses (38%) reported the amount of missing data in an unclear or inconsistent manner. The method of dealing with missing data (item 9) was somehow reported but not clearly in 16 (17%), and not reported at all in 25 (26%) analyses.

Statistical analysis methods

At least one out of the two items related to the statistical analysis was not reported in 18 (24%) developments. The type of model, model‐building procedures (including predictor selection and tuning parameter optimisation, as relevant), and method for internal validation (item 10b) were reported to a limited extent for 21 (28%), and not reported at all for 13 (17%) of the 75 model developments. The model‐building steps, expected to be relatively simpler for traditional methods, were reported more frequently in the model developments utilising traditional statistics (74%) than those utilising ML methods (38%).

Results and discussion

At least one out of the seven items related to results and discussion was not reported in 79 (82%) analyses. Of the 96 analyses, the number of participants and the number of events (items 13a/14a) were reported in an unclear manner in 11 (11%), and not reported at all in seven (7%) (Ahuja 2021 Dev; Ahuja 2021 Ext Val; Bergamaschi 2015 BREMS Ext Val; Gurevich 2009 FLP Ext Val; Oprea 2020; Sormani 2007 Ext Val; Szilasiová 2020). Information on basic baseline participant characteristics (item 13b: age, sex, diagnostic subtype) was missing from 17 (18%) analyses.

A comparison of the distribution of important variables with the development data (item 13c) was missing from 11 (55%) of the 20 validations, excluding Skoog 2019 Val using a subset of participants as the model development study (Skoog 2014 Dev). Also, none of the model developments that used a single random‐split for evaluation provided such a comparison.

The full prediction model including the intercept or baseline survival to allow for calculation of absolute risk (item 15a), was reported more or less clearly only in 16 (21%) of the 75 developments. An explanation on how to make predictions or assign an individual to a risk group based on the developed model (item 15b) was provided for 22 (29%) models. Although neither item 15a nor item 15b were reported, the discussion section of five (7%) model developments had confirmatory language that suggests implementation of the models to support clinical decisions (Malpas 2020 Dev; Pisani 2021; Roca 2020; Tousignant 2019; Ye 2020 gene signature). Models developed with traditional statistical methods were much more likely to present the final full models (40%) or how to calculate predictions from them (60%) than those developed by ML methods (5% and 2%, respectively).

A model performance measure (item 16, assessed by the presence of a discrimination measure) was reported in 33 (34%) analyses with its uncertainty and in 25 (26%) analyses without its uncertainty. Reporting of AUC in Szilasiová 2020 was unclear due to the inconsistency between the receiver operating characteristic figure associated with the AUC and the reported point sensitivity/specificity value. No discrimination or classification measures were reported in 10 (10%) analyses: three did not contain any evaluation of model performance and were included because of their validations (Ahuja 2021 Dev; Bergamaschi 2001 BREMS Dev; Weinshenker 1991 M3 Dev), three reported R² (Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever), and the remaining four survival analyses reported Kaplan‐Meier or incidence plots for predicted risk groups (Gout 2011; Sormani 2007 Dev; Sormani 2007 Ext Val; Vasconcelos 2020 Dev). The discussion section of four (5%) models from two studies had confirmatory language, although no discrimination or classification measures were reported (Bergamaschi 2001 BREMS Dev; Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever).

Discussion

Summary of main results

The main objective of this review was to identify and summarise all multivariable prognostic model developments and validations for quantifying the risk of clinical disease progression, worsening, and activity in people with MS. We identified 75 models, 12 of which were externally validated at least once in a total of 15 validations by applying the model of interest as intended by its development to predict outcomes in new participants. Only a single model, Manouchehrinia 2019, was externally validated three times, all within the development study. There were six other author‐reported validations that did not meet our criteria for external validation. No external validation has yet occurred for the remaining 60 models, making them unsuitable for use in practice at this time.

Models with an external validation

Of the 12 models with any external validation, only two models were found to have been externally evaluated more than once. The BREMS score (Bergamaschi 2001) was evaluated with two external datasets by the development team before being simplified to the BREMSO score and being evaluated further using the second external development dataset. The model for conversion to progressive MS developed by Manouchehrinia and colleagues (Manouchehrinia 2019) was evaluated using three external datasets (one registry, two randomised controlled trials) within the development study. No validation studies were found to be performed by researchers external to the development team and almost all were part of the development publication. The lack of independent external validations hinders conclusions about the models’ generalisability. While these studies provide information on model performance when applied by the development authors to new people with MS, it is still unclear whether the reporting is sufficient to enable use by external researchers and clinicians. This is important because model performance will depend on the interpretation of unclearly reported models, predictors, and outcomes.

Models without an external validation

At 80%, models for which no external validation evidence of any kind exists make up the overwhelming majority of the MS prognostic modelling literature. However, this is not surprising considering that only 12 of these 60 models reported their full model or provided a tool and gave instructions on its use. Only one of these models explicitly stated that external validation was not pursued due to the poor discrimination of the developed model in the internal validation (Pellegrini 2019). It is worth noting that none of the identified models for conversion to definite MS were found to be externally validated. Due to its early position in the disease course, valid prognostic models addressing this outcome in people with CIS have the potential to exert the greatest impact on treatment decisions and thereby long‐term outcomes.

Before developing a new prognostic model, it is recommended that one should first review the literature in search of previously developed prognostic models predicting the outcome of interest (van Smeden 2018). This would ideally be followed by external validation of relevant existing models to test their generalisability. This validation process is meant to call attention to possible weaknesses of the model and to allow for an iterative process of improvement via model updating. When multiple models for the same outcome exist, it is important to compare their performance when applied to the population of interest, instead of just comparing the included predictors and modelling method. These recommended steps would channel the efforts of the scientific community towards the common goal of delivering a generalisable and clinically useful prognostic prediction model. Our review points to the fact that the recommended initial steps are omitted in MS prognostic research. Model developments dominate the literature and the model performances of newly developed models are not compared to that of other published prognostic models in a meaningful way. Some included studies, e.g. Seccia 2020, explicitly mentioned the lack of validated models for MS prognosis, and yet continued to develop new models without evaluating the performance of previously developed models for similar outcomes on their independent data.

The lack of external validations also point to the need for effective clinical data‐sharing. In order to make the best use of the resources allocated to medical research, independent researchers should be able to access existing datasets for external validation of published prognostic models (Völler 2017). Ideally, these datasets should be harmonised and be provided through an infrastructure allowing individual patient level meta‐analysis (Snell 2020) from different sources to reach a sufficient sample size. While there are several large MS registries and even a network between many of them (Big Multiple Sclerosis Data Network), access provided to general researchers appears limited. The increasing use of large, long‐term registry data will also necessitate improved data quality measures and reporting across domains, especially participant characteristics. More attention to participant selection and how it affects model applicability will be needed.

Overall completeness, certainty of the evidence and study limitations of externally validated models

Overall completeness of the data

In the MS literature, reporting of prognostic prediction model studies was very poor, echoing the experience in other disease areas (Kreuzberger 2020; Wynants 2020). The situation was at least as dire in models developed with ML compared to those developed with traditional methods, partially because the current EQUATOR Network guidelines do not seem directly applicable to these studies (Dhiman 2021). Although a reporting guideline for prediction modelling, TRIPOD (Collins 2015), has been available since 2015, most of the items that were poorly reported in the studies we included were also part of other reporting guidelines published earlier, like STROBE (von Elm 2007) or STARD (Bossuyt 2015). Additionally, most of the analyses (66%) in our review were published in or later than 2016 and no temporal pattern could be observed in the proportion of the items reported. Across the included analyses, just over half (54%) of the 20 mandatory TRIPOD items we evaluated were clearly reported. The state of the literature makes us doubt whether the reporting guidelines are being required or at least recommended by peer reviewers or publishers/editors.

None of the studies justified its sample size. The failure to consider this aspect during study design jeopardises efforts put into prognostic research (Steyerberg 2019). Only three cohort studies reported that the outcome assessors were blinded to at least a subset of the predictors (e.g. image readings or lab analysis) (Kosa 2022; Olesen 2019 Candidate; Olesen 2019 Routine; Rovaris 2006). Because the purpose of data collection was not prognostic modelling in most of the analyses (64%), i.e. secondary data use, blinding of predictors to the outcome was probably not even considered. Still, its presence (or absence) could have been reported, especially in a disease area like MS for which the subjectivity and reliability during the assessment of clinical outcomes is an ongoing issue (van Munster 2017).

Details of the full model and how to find the absolute or relative risk of an individual patient were either missing or unclear in most (79% and 71%, respectively) of the model developments. In our opinion, this indicates a failure to deliver the objective of these studies. Reporting the performance of a newly developed prognostic prediction model serves no purpose unless the models are also reported in such a way to enable future, preferably independent, external validations and further application to individual patients. Despite the anticipated difficulty of reporting models developed using ML models, we consider this to be possible by e.g. transporting model objects or developing web‐based applications/platforms to calculate predictions for individual patients (Boulesteix 2019). Failure to provide the model or a way to use it indicates that there is research waste without any tangible (potential) benefits. This failure also precludes any discussion suggesting the need for further validation or clinical application.

Clear reporting of the amount of missing data (only in 24%) was another topic that could be improved. This aspect makes assessment of potential bias due to overrepresentation of complete cases difficult (Wynants 2017). Many studies also failed to report the disease‐modifying treatments received by the participants. Any treatment received at baseline is important to understand the population to which the prognostic prediction is applicable. Treatment received during follow‐up poses another challenge to the prediction model because it is likely to change the outcome as a factor post‐baseline. The treatments received by study participants were reported clearly only in 32% of the analyses.

Finally, model performance measures were reported to a limited extent but did not meet our expectations. Although we considered only a discrimination measure, e.g. a c‐statistic or AUC, with its associated uncertainty, to be sufficient for this reporting assessment item, only 34% of the model developments or validations could fulfil this criterion. The value of a model cannot be evaluated without an appropriate performance measure. The uncertainty around this measure is also critical to understand if the model is actually performing better than random decisions, corresponding to a c‐statistic of 0.5.

Certainty of the evidence

At the time of this review’s submission, no GRADE tool was available for prognostic models. Hence, the certainty of evidence is not rated in this review.

Study limitations of prognostic model development studies

We found all but one development to have a high risk of bias according to PROBAST. The principal drivers of this result were primarily from the analysis domain: the use of routine care or registry data combined with suboptimal methods for dealing with missing data, insufficient sample sizes with respect to the number of predictors of interest, incomplete model performance evaluation due to lack of assessment of calibration and sometimes even discrimination, and failure to account for overfitting and optimism, especially with regard to accounting for all modelling steps.

Besides being of sufficient quality and representative of the population of interest, the data should also be sufficiently large in order to develop robust models that precisely estimate risk overall, as well as across the spectrum of predicted risk, both in the development set, and, more importantly, when applied to new people (Riley 2020). We found about three‐quarters of the developments to be high risk due to insufficient sample size, a point also mentioned in Pellegrini 2019 as a limitation of the MS prognosis research literature. In smaller datasets with large numbers of considered predictor parameters relative to the number of events and total size, the risk of model overfitting increases, without guarantee that further methods, such as shrinkage, can be applied to fully overcome the problem. Very few of the traditional statistical developments found here addressed overfitting and optimism by applying shrinkage.

Overfitting and optimism were also neglected in other ways. Although resampling procedures have been recommended as best practice in internal validation, around one‐third of the reviewed analyses conducted apparent validation or relied on a single random split of the data. Internal validations employing cross‐validation or bootstrapping methods, however, are not immune from overoptimism. The resampling schemes should include the entire modelling process, including, for example, predictor selection. Unfortunately, this is difficult to assess given the identified pitfalls and the increasingly complex analysis structure, especially in developments using ML.

Large sample sizes and proper validation methods need to be used in combination with clinically meaningful performance measures to contribute information on the suitability of the model to practice. Assessment of both discrimination and calibration are widely recommended; however, we found only about half of model evaluations to have reported discrimination and only one‐quarter to have assessed calibration. While the recommendation to assess calibration has been poorly received in general, it seems worse in publications using ML models. This may be partly due to a focus on classification rather than risk estimation or lack of familiarity with model evaluation in the clinical setting. Due to the increasing popularity of ML in the field of clinical prediction modelling, it is ever more important to identify and address the reasons for this shortcoming.

Additionally, we found the importance of timing in prognostic modelling to be underappreciated in the MS literature. Many studies only implicitly stated the time at which the model is meant to be applied to participants and the prediction horizon of interest, making assessment of all PROBAST domains difficult. This information is needed to understand whether the included participants align with the population of interest, whether predictors are available at the intended time of model use, whether the time interval between predictor collection and outcome assessment is appropriate, and whether complexities in the data related to time have been properly accounted for. As discussed by Pellegrini and colleagues, defining a true baseline time point in MS is difficult (Pellegrini 2019); however, even clear clinical landmarks, such as before or after treatment initiation, have been ignored especially when using registry data. Several studies have used predictors assessed years after the implicit baseline timing of the model, which has complicated interpretation and assessment. Furthermore, handling of differing observation periods across participants by ‘adjusting’ for this confounding has been unclearly reported and possibly inappropriate for prediction tasks. These issues suggest possible confusion about the use of prognostic modelling and the types of conclusions that can be drawn from these analyses.

Usability

While all models were found to require an MS specialist for predictor assessment, equipment specialisation varied, with some models relying on basic clinical predictors but many relying on advanced imaging, omics, or electrophysiological data. While these models may currently be limited to MS research centres, improving access may expand the applicable settings of use over time. This can be seen most readily in the rapid development of diagnostic capabilities in recent years: imaging with high‐resolution 3 Tesla MRI scanners is now much better positioned across the board than it was 10 years ago. New markers, such as the ability to measure brain atrophy, are being added to standardised MS MRI protocols. Major advances have been observed in other areas as well. Laboratory chemistry can determine antibodies for differential diagnostic considerations that were widely unknown 10 years ago (e.g. aquaporin‐4 antibody, myelin oligodendrocyte glycoprotein, immunoglobulin G). New laboratory analyses are also on the horizon. Here, for example, the possibility of detecting neurofilament light chain in blood is emerging; several years ago, this would have been possible only in CSF.

Besides the lack of external validation, the other greatest threat to usability is the lack of clear reporting of final models or tools and instructions on their use; half of the identified models provided neither of these while only approximately 20% reported both. The reporting of the other models contained inconsistencies, unclear predictor coding, or missing model components, making direct use on new participants difficult. Partial reporting also hampers the ability of future researchers to predict absolute risks using these models. These deficits in usability were not compensated for by measures of reproducibility, as only a handful of studies provided analysis data or code and none shared both. Sharing of both data and code could improve model sharing, especially for complex algorithms without simple representations. However, their use may require further specialised skills and other forms of model presentation should be preferred when translation into clinical practice is the goal.

Potential bias in the review process

In order to reduce the bias at the search stage, we searched three major databases. We also searched the conference proceedings of the main organisations in the MS disease area and tried to access more information on the eligible ones by Internet search and author contact. The measures we took to prevent bias in the study selection and data extraction/risk of bias processes were 1) pre‐protocol pilots, 2) training of all contributors on these steps with the relevant methodological publications, protocol, and internal guidance, 3) performing these steps independently in duplicate, 4) resolving any disagreements with at least one other co‐author in group discussions, and 5) contacting the study authors for any missing or unclear data critical for risk of bias assessment or planned analysis. Despite these measures, due to the novelty of this review type, some methodological decisions needed to be made or elaborated on either at the protocol stage or during the review, which may have introduced bias.

Database search: Our database search strategy was constructed to be sensitive. Our decision not to search trial registries is not expected to introduce bias. Prognostic research studies, like diagnostic ones, are unlikely to be pre‐registered (Korevaar 2020; Peat 2014; Sekula 2016) despite calls to do so (Altman 2014). The restriction of the search to publications after 1996 might have introduced bias. However, the fact that only two studies before 2001 (Weinshenker 1996 and the related model development study Weinshenker 1991) were found eligible for this review indicates that this restriction is expected to result in missing of very few relevant publications, if any.
Reference search: A post‐protocol change that may have introduced bias is the step of backward reference searching. For forward reference searching, we utilised the functionality of Web of Science, which is one of the available and commonly used platforms for this task (Briscoe 2020). Because Web of Science also offers an option for backward reference searching, we decided to access the titles/abstracts from the same platform instead of handsearching. This methodological change allowed us to screen the totality of titles/abstracts of the references as opposed to only titles, but it was less sensitive than handsearching due to the fact that some references are not linked in Web of Science records. Still, we expect this pitfall to introduce little bias because most of the backward/forward references were linked to the records and only 5% of the included reports came from reference searching as opposed to database searches and other sources.
Study selection: The language of reporting prognostic models varies across time and based on the main speciality of the authors, e.g. clinicians versus methodologists. In order to be more representative of the literature, we considered the objectives of a study to be an expression of intent not only within the objectives section of the abstract or the main text but also dispersed over the focus of the Results and Discussion sections. This perspective led to inclusion of studies with primary objectives other than prognostication in MS patients. We were also inclusive of various clinical outcome definitions (including timing), data types, and statistical methods. However, we consider this aspect not as a bias but rather a strength of our review, which, due to the rarity of independent external validations of the developed models, was destined to turn into a systematic and comprehensive description of the state of the literature in this disease area. Expanding the outcomes of interest to fatigue, falls, or depression would have resulted in only a few, if any, additional reports but it would have made the results from our review, which has the aim of being relevant to clinical practice and patients alike, less interpretable.
Model selection: In a model development study, fitting many models may be used, amongst others, as a means for selecting predictors, selecting the most predictable outcome, selecting the best algorithm type, or optimising tuning parameters. When the authors of a study indicated their preferred model in any way, we only included those models, otherwise, we refrained from making a selection. Although the studies that reported multiple models and failed to select a final favourite model for presentation might even be considered not to have prognostic model development as an aim, we decided not to exclude such studies. The number of models from a single study reached up to four (Zhao 2020), but less than one‐quarter of the studies contributed more than one model. This may have introduced bias in the descriptive quantitative measures we reported across the analyses, e.g. median percentage of females or median sample size, due to the overrepresentation of some studies or samples. Because our intention and capability with the available dataset is not to summarise or analyse the models but just to describe them, we find it appropriate to consider these as separate analyses for a general overview of the literature.
Risk of bias: Although a detailed explanation and elaboration publication for PROBAST exists (Moons 2019), we had to interpret some items to fit the needs of our review. These interpretations were aimed at adapting the responses to the specifics of MS or the variety of statistical methods not addressed by the narrow scope of the current PROBAST, which focusses on binary outcomes modelled with traditional regression methods. These decisions are summarised in the Methods and in detail in Appendix 4. Irrespective of our interpretations, almost all analyses that were rated as having high risk of bias are expected to have still been rated the same at the analysis domain, and hence overall, due to their failure of assessing model performance appropriately by reporting both discrimination and calibration while accounting for overoptimism. Thus, our interpretations are expected to introduce no substantial change in the overall risk of bias assessment and any bias would be limited to the item/domain level.
Analysis: Because the number of independent external validations per model did not allow us to perform a meta‐analysis or meta‐regression, we had no analysis to which bias can be introduced. The only model that would allow meta‐analysis of its performance in its three external validations (Manouchehrinia 2019) was not meta‐analysed due to the lack of independence in its external validations (Kreuzberger 2020), as well as its inherent limitations by design causing high risk of bias both in its development and validations. Any derived quantitative measures in this review are related to basic description of the studies. If reported for subgroups, pooled mean and pooled SD of participant characteristics were calculated. Also, if missing, the variance of c‐statistics was calculated and used to build confidence intervals. We report these in the data tables or Characteristics of included studies.

In this review, publication bias was not assessed due to the lack of methodological guidance on this topic.

Applicability of findings to clinical practice and policy

We identified 75 models, 63 of which were not externally validated and the performance measures for these models were reported only in the same population as model development. Although 11 of these models used language in their discussion sections suggesting their applicability and implementation to clinical care, they cannot be recommended for application without showing generalisability by independent external validations of their performance. The same is true for the remaining 12 models, none of which have external validations by independent teams and only three have external validations in separate studies: Weinshenker 1991 M3 in Weinshenker 1996, Bergamaschi 2001 (BREMS) in Bergamaschi 2007 and Bergamaschi 2015, and Skoog 2014 in Skoog 2019. Also, of those with any external validation, 10 have only a single one. The development and validations of the model with the maximum number of external validations (Manouchehrinia 2019, three in the same study) were rated as having high risk of bias in two to four domains out of the four domains addressed by PROBAST. Hence, none of the identified models are yet applicable in clinical practice.

Moreover, the heterogeneity of definitions and populations in the findings of this review rather highlight the challenges of developing or validating a prognostic prediction model in people with MS due to the difficulty of defining the clinical need in terms of the relevant population, outcome and available predictors. The literature in this disease area turned out to lack uniformity in definitions of patient‐relevant clinical outcomes and time points to measure these. Also, the fast‐changing landscape of diagnostic subtypes and their criteria, e.g. the McDonald criteria published and then revised three times during the last 20 years, makes not only the extrapolation of applicability of a model to a future patient population difficult, but also the outcomes of diagnostic conversion irrelevant in clinical practice with passing time. Lack of an objective agreed‐upon and standardised definition of secondary progression is another factor hindering any research aiming to support clinical decision‐making that targets this outcome. The increasing number of different treatment options for all diagnostic subtypes of MS during the last 25 years, and hence the increasing proportion of those treated with them, also raises questions about the applicability of prognostic models developed using data from the pre‐treatment or first‐line treatment era to the people with MS today or in the future. This is evident in many domains, including in the relapse rate, the detection of paraclinical disease activity but also in the possibility of using advanced markers such as lesion volume and atrophy measurements in MRI or the use of laboratory‐based biomarkers for prognosis assessment. With all of these changes, prognostic modelling in this field is truly like "chasing a moving target" (Chen 2017), and difficult even when applying the highest methodological standards (Pellegrini 2019). Any conclusions regarding the applicability of prognostic models in this disease area require rigorous testing of the developed models with many and up‐to‐date external validation studies; however, this is currently lacking.

Implications of the rise of ML for research

The rising popularity of ML algorithms has also spread to MS. Since 2018 an increasing number of ML prognostic model developments have been published and every single development identified in the first half of 2021 employed ML techniques. The Radiology Editorial Board suggests that artificial intelligence and ML will impact any medical application using imaging (Bluemke 2020). As MRI has been an important tool for depicting pathological features in MS since the 1980s (Ge 2006), this trend should then not be surprising. Although ML offers great potential for uncovering complex relationships in our ever‐growing data using fewer assumptions, this potential cannot be harnessed without greater attention to the needs of clinical practice and to good practice in prediction model development.

Because the use of ML for clinical prediction is still relatively new in MS, it is unsurprising that several publications are presented as pilot or proof of concept studies. As stated previously, many ML studies identified here also provide no model or tool for external use. There is, however, a looming threat of research waste if this trend continues. Across several specialities, the discrepancy between the number of developments and the number of tools used in practice has been noted. Studies using clinical applications are important for highlighting methodological accomplishments, but this type of research should not be conflated with or replace actual attempts to create prediction models for clinical practice (Mateen 2020). These differing aims may partly explain the high number of studies with no selected final models and low number of validation studies; the models identified were meant to depict methodological and technological advances rather than to provide individualised estimates of outcome risks.

Another possible reason for the lack of presented tools for prediction in new individuals may relate to the difference in cultures between ML and clinical research (Mateen 2020), and to the notion that clinical prediction modelling guidelines are not relevant to this body of ML research (Dhiman 2021). Mateen and colleagues argue that, in order for clinical practice to experience the greatest benefit from this work, greater collaboration between healthcare experts and the ML community is necessary. Our review suggests that these collaborations already exist, but that the guidelines put forth by clinical prediction modelling experts are still ignored. We argue that all researchers interested in clinical prediction need to not only work together, but also to be responsible for conducting research according to the current best practices. This entails, at a minimum, adherence to the reporting guidelines set out in TRIPOD and TRIPOD‐AI, when it becomes available. The brief guide on assessing radiological research using artifical intelligence published by the Radiology Editorial Board may also prove valuable to the MS research community (Bluemke 2020), although this document is relevant to a wider range of radiological studies involving ML.

Comparisons with other reviews

We are aware of several related prognosis reviews in MS, including Brown 2020, Havas 2020, and Seccia 2021. Although a systematic review of prognostic models in a different clinical field, it is also worth comparing our review to Kreuzberger 2020, the first published Cochrane Review of prognostic models to date.

The review Brown 2020 differed from our review in that they focused specifically on prognostic models intended to be used at diagnosis of RRMS. While their population was more specific, their definition of prediction models was broader, including all models using multiple predictors in combination to determine the probability of an outcome. This led to the inclusion of several models that were not developed with the intent of prediction of individual outcomes, but rather as explanatory models of disease aetiology, and which were excluded from our review. This highlights the difficulty in distinguishing between studies aiming to develop prognostic models and those with other purposes, a problem encountered in our review, as well as in Kreuzberger 2020. This point was also echoed in the review Havas 2020, which reported that almost half of the over 6000 studies screened were not prediction modelling studies, but rather other study types that used the words prediction and association interchangeably.

Unlike our review and that of Brown 2020, however, Havas 2020 included models predicting treatment response, such as the Magnetic Resonance Imaging in MS (MAGNIMS) score (Sormani 2016), and scoring systems defined by expert knowledge, such as the Rio score (Río 2009), rather than statistical algorithms. Models predicting treatment response are very important to MS clinical practice; however, they were outside the scope of our review. Treatment response prediction, and causal prediction more generally, is an evolving field and its model development and performance assessment methods are an area of active research. Further methodological foundations are still needed in order to inform such a review task. While Havas and colleagues called on MS researchers to establish a consensus on the definition, development, and validation of prognostic models, we would like to emphasise that this consensus already exists within the clinical prediction modelling literature and only needs to be incorporated into the MS speciality.

Seccia 2021 reviewed recent ML models considering clinical data in their feature sets, making the scope of their review much narrower than ours. All studies identified in Seccia 2021 were also included in our review and we identified 19 additional ML models. At least seven of these were published too late for consideration by Seccia and colleagues and several others were excluded from that review for a strong focus on imaging or omics data over clinical data. The review authors highlighted the importance of sufficient data size but also data quality, relevant to ML prognostic model studies and traditional regression studies alike. They also mentioned the problems inherent in subjective disease measures and the non‐generalisability of some predictor types, such as imaging data specific to a single device, which we also identified to be problematic in the MS field. They further discussed issues specific to ML, including interpretable ML and the combination of tabular and non‐tabular data. They made a point of stating that no identified study had developed a prognostic model with performance suggestive of clinical use. Given that none of these studies were truly externally validated and that risk of bias was rated high in all of them due, at least in part, to small sample sizes and lack of calibration assessment, we would add that, even if the reported performance estimates were significantly higher, these models would still not be ready for clinical practice. Additionally, these were not reported in a way that allows for external validation, as no tool or model fit was provided.

Unlike in Kreuzberger 2020, our results focus on all studies identified, not just those with external validation. Due to the rarity of multiple external validations for a single prognostic model in MS, we wanted to comprehensively describe and summarise the state of the field, not just the models most ready for translation into clinical practice. In light of this aim, we did not consider the applicability of a model as lesser based on the age of the data or diagnostic criteria. Kreuzberger and colleagues rated applicability as unclear if eligibility criteria or recruitment period were not given, stating that they could not be sure if the included individuals matched the review question and a current application of the model. This is certainly also an issue in MS, which has faced continual updating of diagnostic criteria since publication of the first McDonald criteria in 2001 (McDonald 2001). In fact, several studies we included defined conversion to clinically definite MS outcomes using the Poser 1983 criteria and used components of the later published criteria as predictors. The aim of these studies may have been more in line with validation of newly published criteria rather than development of a new prediction tool, again highlighting the need for better reporting. However, the models with multiple external validations in this review did not suffer from such problems.

Authors' conclusions

Implications for practice

The goal of prognostic modelling research must be to bring into routine clinical care the use of multivariable prognostic models for predicting future clinical outcomes in people with multiple sclerosis (MS). This is of particular interest because, although highly effective therapeutic options are available, they are associated with relevant risks to the patient. With high variability in disease worsening or progression, it is imperative to use a need‐based therapy.

The currently available evidence for predicting MS prognosis in clinical routine is not sufficient. This is due to the quality standards that have to be applied for the generation and especially the validation of such prediction scores. Ideally, prediction models are developed using large, high‐quality datasets with subjects representative of the population to which the model will later be applied. However, our results do not exclude the possibility of transferability of the existing prediction models into clinical routine after successful external validations and demonstration of benefit in clinical practice by randomised controlled trials investigating their impact. Both the validation of currently available predictive models and the consistent application of quality standards in future studies are needed.

Implications for research

Our systematic review identified an abundance of models developed for prediction of disability, relapse, conversion to definite MS, conversion to progressive MS, and models for composite outcomes based on these. As previously found within and beyond the scope of MS, these studies were generally not conducted or reported according to current standards and guidelines. We point the MS research community to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) and Prediction Model Risk of Bias Assessment Tool (PROBAST) for such guidance, and to the upcoming artificial intelligence updates to these tools for machine learning specific guidance.

Clinical prediction modelling studies should be conducted using longitudinal datasets collected for the purpose of prognostic research in order to minimise bias in terms of participants, predictors, and outcome. An understanding of when the predictor or outcome measurements can take place and how these affect the interpretation of a prognostic model is a point of note for researchers in this disease area. Appropriate methods should be used in consultation with the experts in clinical prediction modelling. Models should be provided in a manner making them usable by other researchers because developed models should be externally validated, preferably by independent researchers. Data sharing practices can support external validation efforts.

History

Protocol first published: Issue 5, 2020

Notes

Data and code availability

The dataset summarised in this review is available as tables in the Appendices and in Characteristics of included studies. The R code used for the statistical description is available upon request from the authors.

Role of sources of support

The funding sources did not have any influence on the planning, conduct, analysis, or reporting of this review.

Acknowledgements

We would like to thank Cochrane and several members for their support in the development of this review and those who conducted the editorial process for this review:

Graziella Filippini (Co‐ordinating Editor) and Ben Ridley (Managing Editor) of Cochrane Multiple Sclerosis and Rare Diseases of the CNS.

We would like to thank the following people who conducted the editorial process for this article:

Sign‐off Editor (final editorial decision): Robert Boyle, Imperial College London, Cochrane’s Editorial Board.
Managing Editor (selected peer reviewers, provided editorial guidance to authors, edited the article): Colleen Ovelman and Sam Hinsley, Central Editorial Service.
Editorial Assistant (conducted editorial policy checks, collated peer reviewer comments and supported editorial team): Lisa Wydrzynski, Central Editorial Service.
Copy Editor (copy editing and production): Jenny Bellorini, Cochrane Central Production Service; pre‐edit: Tori Capehart, Copy Editor, J&J Editorial; Margaret Silvers, Copy Editor, J&J Editorial; Sarah Hammond, Senior Copy Editor, J&J Editorial.
Peer reviewers (provided comments and recommended an editorial decision): Arman Eshaghi, University College London (clinical/content review), Bruce V Taylor, University of Tasmania (clinical/content review), Steve Simpson‐Yap, The University of Melbourne (clinical/content review), Iván Pérez‐Neri (consumer review), Nina Kreuzberger, Cochrane Haematology, Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany (methods review), Robin Featherstone, Cochrane Central Editorial Service (search review).

We would also like to thank Johanna AA Damen (Co‐ordinator) of the Cochrane Prognosis Methods Group for responding to our questions. We are thankful to Liliya Eugenevna Ziganshina for the support during the eligibility assessment of a publication in Russian (Zakharov 2013). For the full translation of the same article and support during its data extraction, we are grateful to Larissa German.

We thank all the primary study authors who replied to our requests for further information. We are very grateful for the conversations with and guidance from Beate Sick regarding deep learning methods. We are also thankful to Anja Friedrichs for the diligent proofreading of critical sections of the review text.

Appendices

Appendix 1. Electronic search strategies

Database: Ovid MEDLINE(R) and Epub Ahead of Print, In‐Process, In‐Data‐Review & Other Non‐Indexed Citations and Daily 1946 to 1 July 2021

Date search conducted: 1 July 2021

Strategy:

#	Concept	Searches	Results
1	1 Multiple sclerosis	(exp Multiple Sclerosis/ OR ((multipl* OR disseminated OR insular) ADJ1 sclerosis).ti,ab.) NOT (animals NOT humans).sh. NOT (child NOT adult).sh.	80,108
2	2a Prognostic/ prediction	(exp Prognosis/ AND (exp disease progression/ OR exp Remission, Spontaneous/ OR exp Recurrence/)) OR (predict OR prognos).ti. OR ((predict OR prognos) ADJ3 (recurrence OR progression OR relaps OR remission OR remitting OR 'multiple sclerosis' OR ms)).ti,ab. OR ((predict* OR prognos) ADJ3 treat ADJ3 response).ti,ab. OR ((predict* OR prognos*) ADJ3 disease ADJ3 activity).ti,ab.	357,355
3	2b General models	((model* OR decision* OR identif) ADJ3 (history OR variable OR multicomponent* OR multivariable* OR multivariate* OR covariate* OR criteria OR criterion OR scor* OR characteristic* OR finding* OR factor* OR rule)).ti,ab. OR (decision ADJ6 model*).ti,ab.	420,767
4	2c Statistical terms	((logistic OR statistic) ADJ3 model).ti,ab. OR (decision.ti,ab. AND exp models, statistical/) OR (Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable OR multivariate* OR covariate* OR valid).ti. OR ((Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable OR multivariate* OR covariate* OR valid).ab. AND (prognos OR predict*).ti,ab.)	976,860
5	3 Outcomes	((disease OR disability OR invalid* OR function* OR outcome OR impairment OR composite OR activity OR severity OR cognitive OR edss OR treatement OR ms OR 'multiple sclerosis' OR brems) ADJ6 (scor* OR scal* OR status OR assess* OR index OR classification)).ti,ab. OR (clinical ADJ3 (assess* OR activity)).ti,ab. OR ((disease OR disabilit* OR risk OR calculat) ADJ3 (course OR progression)).ti,ab. OR (relaps ADJ3 (rate OR frequen* OR time OR prognos* OR predict)).ti,ab. OR (clinical ADJ3 decision).ti,ab. OR ((ms OR cdms OR 'multiple sclerosis') ADJ3 (develop OR course OR progress* OR relaps* OR clinical*)).ti,ab.	1,094,627
6		(1 and 2) or (1 and (3 or 4) and 5)	5004
7		limit 6 to yr="1996 ‐Current"	4764

Open in a new tab

Database: EMBASE via embase.com 1974 to 2 July 2021

Date search conducted: 2 July 2021

Strategy:

#	Concept	Searches	Results
1	1 Multiple sclerosis	('multiple sclerosis'/exp/mj OR (ms:ti AND 'multiple sclerosis'/exp) OR (((multipl* OR disseminated OR insular) NEAR/1 sclerosis):ti,ab)) NOT ([animals]/lim NOT [humans]/lim) NOT ('nonhuman'/exp NOT 'human'/exp) NOT 'animal model'/exp NOT ('child'/exp NOT 'adult'/exp) NOT [conference abstract]/lim	82,989
2	2a Prognostic/ prediction	('predictive value'/exp AND 'model'/exp) OR ('prognosis'/exp AND ('disease exacerbation'/exp OR 'recurrent disease'/exp OR 'recurrence risk'/exp OR 'relapse'/exp OR 'remission'/exp)) OR predict:ti OR prognos:ti OR ((predict* OR prognos) NEAR/3 (recurr OR progress* OR relaps* OR remission OR remitting OR 'multiple sclerosis' OR ms)):ti,ab OR ((predict* OR prognos) NEAR/3 treat NEAR/3 response):ti,ab OR ((predict* OR prognos*) NEAR/3 disease NEAR/3 activity):ti,ab	848,229
3	2b General models	((model* OR decision* OR identif) NEAR/3 (history OR variable OR multicomponent* OR multivariable* OR multivariate* OR covariate* OR criteria OR criterion OR scor* OR characteristic* OR finding* OR factor* OR rule)):ti,ab OR (decision NEAR/6 model*):ti,ab	596,768
4	2c Statistical terms	((logistic OR statistic) NEAR/3 model):ti,ab OR (decision:ti,ab AND 'statistical model'/exp) OR (Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable OR multivariate* OR covariate* OR valid):ti OR ((Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable OR multivariate* OR covariate* OR valid):ab AND (prognos OR predict*):ti,ab)	1,397,586
5	3 Outcomes	((disease OR disability OR invalid* OR function* OR outcome OR impairment OR composite OR activity OR severity OR cognitive OR edss OR treatment OR ms OR 'multiple sclerosis' OR brems) NEAR/6 (scor* OR scal* OR status OR assess* OR index OR classification)):ti,ab OR (clinical NEAR/3 (assess* OR activity)):ti,ab OR ((disease OR disabilit* OR risk OR calculat) NEAR/3 (course OR progression)):ti,ab OR (relaps NEAR/3 (rate OR frequen* OR time OR prognos* OR predict)):ti,ab OR (clinical NEAR/3 decision):ti,ab OR ((ms OR cdms OR 'multiple sclerosis') NEAR/3 (develop OR course OR progress* OR relaps* OR clinical*)):ti,ab	1,680,033
6		#1 AND #2 OR (#1 AND (#3 OR #4) AND #5)	5041
7		(#1 AND #2 OR (#1 AND (#3 OR #4) AND #5)) AND [1996‐2021]/py	4976
8	Multiple sclerosis conference abstracts	('multiple sclerosis'/exp/mj OR (ms:ti AND 'multiple sclerosis'/exp) OR (((multipl* OR disseminated OR insular) NEAR/1 sclerosis):ti,ab)) NOT ([animals]/lim NOT [humans]/lim) NOT ('nonhuman'/exp NOT 'human'/exp) NOT 'animal model'/exp NOT ('child'/exp NOT 'adult'/exp) AND [conference abstract]/lim	33,377
9		#8 AND #2 OR (#8 AND (#3 OR #4) AND #5)	4077
10	Specific conference names	'european committee for treatment and research in multiple sclerosis':nc OR ectrims:nc OR 'americas committee for treatment and research in multiple sclerosis':nc OR actrims:nc OR 'american academy of neurology':nc OR aan:nc OR 'european academy of neurology':nc OR ean:nc	49,010
11		#9 AND #10	2730
12		#9 AND #10 AND [1996‐2021]/py	2730

Open in a new tab

Databases: Cochrane Database of Systematic Reviews (CDSR; 2021, Issue 6) and Cochrane Central Register of Controlled Trials (CENTRAL; 2021, Issue 6) via www.cochranelibrary.com

Date search conducted: 2 July 2021

Strategy:

#	Concept	Searches	Results
1	1 Multiple sclerosis	((multipl* OR disseminated OR insular) NEAR/1 sclerosis):ti,ab,kw	10,654
2	2a Prognostic/ prediction	predict:ti OR prognos:ti OR ((predict* OR prognos) NEAR/3 (recurr OR progress* OR relaps* OR remission OR remitting OR 'multiple sclerosis' OR ms)):ti,ab,kw OR ((predict* OR prognos) NEAR/3 treat NEAR/3 response):ti,ab,kw OR ((predict* OR prognos*) NEAR/3 disease NEAR/3 activity):ti,ab,kw	32,700
3	2b General models	((model* OR decision* OR identif) NEAR/3 (history OR variable OR multicomponent* OR multivariable* OR multivariate* OR covariate* OR criteria OR criterion OR scor* OR characteristic* OR finding* OR factor* OR rule)):ti,ab,kw OR (decision NEAR/6 model*):ti,ab,kw	25,595
4	2c Statistical terms	((logistic OR statistic) NEAR/3 model):ti,ab,kw OR (Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid):ti OR ((Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable OR multivariate* OR covariate* OR valid):ab AND (prognos OR predict*):ti,ab,kw)	70,797
5	3 Outcomes	((disease OR disability OR invalid* OR function* OR outcome OR impairment OR composite OR activity OR severity OR cognitive OR edss OR treatment OR ms OR 'multiple sclerosis' OR brems) NEAR/6 (scor* OR scal* OR status OR assess* OR index OR classification)):ti,ab,kw OR (clinical NEAR/3 (assess* OR activity)):ti,ab,kw OR ((disease OR disabilit* OR risk OR calculat) NEAR/3 (course OR progression)):ti,ab,kw OR (relaps NEAR/3 (rate OR frequen* OR time OR prognos* OR predict)):ti,ab,kw OR (clinical NEAR/3 decision):ti,ab,kw OR ((ms OR cdms OR 'multiple sclerosis') NEAR/3 (develop OR course OR progress* OR relaps* OR clinical*)):ti,ab,kw	340,882
6		(#1 AND #2) OR (#1 AND (#3 OR #4) AND #5)	583
7		(#1 AND #2) OR (#1 AND (#3 OR #4) AND #5) with Cochrane Library publication date from Jan 1996 to Jul 2021	583

Open in a new tab

Appendix 2. Data extraction form

Adapted from CHARMS checklist of Moons 2014.

Domain	Key items
Study information	Study identifier (last name of first author, publication year, and, if necessary, model/analysis name), citation Development with/without internal validation and/or external validation
Source of data	Cohort, case‐control, randomised trial, registry, routine care data Primary/secondary use of data
Participants	Inclusion and exclusion criteria Recruitment method and details (location, number of centres, setting) Participant description (including age, sex, disease duration, type of MS at prognostication, diagnostic criteria used, description of EDSS/relapse at entry) Details of treatments received Study dates
Outcomes to be predicted	Definition and method of measurement of outcome Category of outcome measure (conversion to definite MS, conversion to progressive MS, relapse, disability, composite) Was the same outcome definition and method of measurement used on all participants? Was the outcome assessed without knowledge of candidate predictors (i.e. blinded)? Were candidate predictors part of the outcome? Time of outcome occurrence or summary of duration of follow‐up
Candidate predictors	Number and type of predictors (e.g. demographics, symptoms, scores, CSF, imaging, electrophysiological, omics, environmental, non‐CSF samples, disease type, treatment, other) Definition and method for measurement of candidate predictors Timing of predictor measurement (e.g. at patient presentation, as diagnosis, at predefined intervals) Were predictors assessed blinded for outcome (if relevant)? Handling of predictors in the modelling (e.g. transformations, categorisations)
Sample size	Number of participants and number of events Number of events in relation to number to candidate predictors (EPV) Power of study assessed
Missing data	Number of participants with any missing value Number of participants with missing values for each predictor Method for handling missing data (e.g. complete‐case analysis, single imputation, multiple imputation) Loss to follow‐up discussed
Model development	Modelling method (e.g. logistic, survival, penalised regression, machine/deep learning methods) Modelling assumptions satisfied Method for selection of predictors for inclusion in multivariable modelling (e.g. all candidate predictors, pre‐selection based on unadjusted association with outcome, etc.) Method for selection of predictors during multivariable modelling (e.g. full model approach, stepwise selection, significance, multiple models, other) Criteria used for selection of predictors during multivariable modelling (e.g. P value, AIC, BIC) Shrinkage of predictor weights/regression coefficients (e.g. no shrinkage, uniform shrinkage, shrinkage due to estimation method) Tuning parameter selection details and information on preventing data leakage
Model performance	Measure and estimate of calibration with confidence intervals (calibration plot, calibration slope, Hosmer‐Lemeshow test) Measure and estimate of discrimination with confidence intervals (c‐statistic, D‐statistic) Log‐rank used for discrimination (yes, no, not applicable) Measure and estimate of classification with confidence intervals (sensitivity, specificity, PPV, NPV, net reclassification, accuracy rate (TP+TN/N), error rate (1‐accuracy), other) Were a priori cut points used for classification measures? (yes, no, not reported, not applicable) Overall performance (R², Brier score, etc.)
Model evaluation	If model development, model performance tested on development dataset only or on separate external validation If model development, method used for testing model performance on development dataset (random split of data, resampling methods e.g. bootstrap or cross‐validation, none) In case of poor validation, was model adjusted or updated (e.g. intercept recalibrated, predictor effects adjusted, new predictors added)?
Results	Multivariable model (e.g. basic, extended, simplified) presented, including predictor weights/regression coefficients, intercept, baseline survival; with standard errors or confidence intervals) Any alternative presentation of the final prediction models (e.g. sum score, nomogram, score chart, predictions for specific risk subgroups with performance) Provide details on how risk groups were created, if done and the observed values at which they occur Comparison of the distribution of predictors (including missing data) for development and validation datasets If validation, is the same model used as presented in development (same intercept and weights, no dropping of variables, etc.)?
Interpretation and discussion	Aim according to authors (abstract, discussion) Was the primary aim prediction of individual patient outcomes? Are the models interpreted as probably/confirmatory (model useful for practice) or probably/exploratory (more research is needed)? Comparisons made with other studies, discussion of generalisability, strengths, and limitations Suggested improvements for the future
Usability and reproducibility of final model	Skill and specialisation of equipment required for predictor collection, sufficient explanation to allow for further use, whether absolute risk can be estimated with the presented tool Model/tool, code, and/or data provided

Open in a new tab

AIC: Akaike information criterion BIC: Bayesian information criterion N: sample size NPV: negative predictive value PPV: positive predictive value TP: true positive TN: true negative

Appendix 3. Definitions used for data extraction

In order to ensure a uniform data extraction from included studies with various reporting styles, we had some working definitions. These are listed below:

Data sources

Cohort study: Many studies reported collecting data from a cohort of patients, although other details in their report implicitly or explicitly suggested sources of data other than a cohort study. This suggests that the word 'cohort' is used to refer to a group of patients on whom there is some longitudinal data available rather than a longitudinal study with pre‐defined data collection times and items. After trying to resolve any unclarity with the study authors, for practical purposes, the following occurs.
- If words indicating other types of sources (e.g. a well‐known registry like MSBase, or health records) were also used in relation to the data source without explicit definition of a cohort study, we considered the data source to be the other type.
- If no other words related to the data source were used while referring to the data, but no specific cohort study was referred to, we assumed the data source to be a cohort study, even though there were clues (e.g. irregular follow‐up times) against it.
- In both of the cases above, the reporting of the data source in Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) was marked as unclear.
Primary data use: In line with the suggestion by Wynants 2017, we refrained from using 'retrospective' or 'prospective' for describing the data source. The data source types for which prognostic prediction modelling could be considered as primary data use were case‐control or prospective cohort studies. When the data collection of an included study had vague objectives, like natural history (e.g. Weinshenker 1991) or research on certain predictor domains (e.g. Agosta 2006), we assumed the data collection purpose to be primary, unless it was explicitly reported to be a retrospective data collection (e.g. Wottschel 2015).

Participants

Participants description: When age, disease duration, or sex was reported for subsets of the included patients but not overall (e.g. by outcome or diagnosis type), we calculated and reported weighted averages and pooled standard deviations according to Cohen 1988.
Treatments received: We collected data on disease‐modifying therapies and ignored symptomatic treatments for relapses. If the reported eligibility criteria specified inclusion of only treatment‐naive patients or a requirement for a wash‐out period, irrespective of its length, then treatment received at recruitment was considered to be none. No assumptions were made based on the diagnostic subtype of the included population or inclusion time with respect to disease onset. For instance, due to proponents of treating clinically isolated syndrome (CIS) patients in the literature (e.g. Wiendl 2021), we preferred not to assume they were treatment‐free unless it was explicitly reported.

Outcomes

Blinded assessment: Analyses utilising randomised trial participants as the data source usually referred to source articles reporting trials of drug interventions with blinding. However, this blinding only concerned the intervention under investigation but not the predictors considered in the prognostic prediction modelling study, employing secondary use of data. Hence, outcome assessment was only considered blinded if there was an explicit reporting of the outcome assessment being blinded to baseline status of the study population.

Candidate predictors

Considered predictors were all predictors used in univariate or multivariate analysis of all the models with the outcome of interest and included predictors were all predictors presented as being in the final model.
In line with Moons 2019, predictors were counted in terms of degrees of freedom.
We assumed dummy‐coding of categorical predictors (instead of e.g. one‐hot encoding) for all modelling methods unless the number of features or another type of coding was explicitly reported. Hence, when counting the degrees of freedom of the predictors considered or included in the models, we might have underestimated the number.
The data on number of considered interactions was deemed irrelevant for the following modelling methods, which were assumed to intrinsically take interactions into account: tree‐based methods (e.g. boosting, random forests), neural networks, support vector machines with nonlinear kernels.

Sample size

For continuous outcomes, we considered the number of events to be the number of observations and calculated the events per variable (EPV) accordingly. For models considering both tabular and non‐tabular (imaging) predictors, the EPV was computed using only the number of tabular predictors and number of events. For models considering only non‐tabular predictors, EPV could not be computed. When models used longitudinal data, predictor trajectories were counted as single predictors.

Missing data

We considered exclusion of participants with missing data from the study to be a separate method for handling missing data than complete case analysis, because exclusions were listed in the study exclusion criteria, implying that differences between these participants and included ones could not be explored.

Model development

Shrinkage: We extracted whether a specific shrinkage method was used (e.g. uniform shrinkage) or whether the modelling method induced some form of shrinkage. Penalised regression, support vector machines, random forests, boosting, and Bayesian methods were considered to induce shrinkage. Neural networks were considered to induce shrinkage only if dropout, early stopping, or other regularisation methods were mentioned.

Model usability and reproducibility

Skill required for predictor measurement was categorised into three levels: predictors a patient could assess alone, predictors for which a primary care clinician would be qualified to measure, and predictors requiring a specialist for measurement or interpretation.
Equipment specialisation required for predictor measurement was also categorised into three levels, based on equipment standards in Western Europe: predictors that required no special equipment or that could be measured with equipment found in a typical primary care clinic, equipment found in a standard hospital (for example, magnetic resonance imaging (MRI)), and equipment only found in a speciality centre (for example, optical coherence tomography or multimodal electrophysiological equipment)
We defined whether a model can practically be expected to be used in practice based on whether a model was reported, whether it was provided in a way enabling easy use on future people with multiple sclerosis (MS), and whether instructions on use were given. The categories are: lack of both model and instructions, reporting of only a model for prognostication, reporting of a model and instructions on use, and reporting of a tool for easy use and instructions.
A model’s reproducibility was described by the components given: ‘None’ if no model, tool, data, or code was provided, ‘Model’ if a model was reported, ‘Tool’ if a tool was given (for example, a nomogram or web application), ‘Code’ if analysis code was provided, and ‘Data’ if data were provided (‘DOR’ if the authors explicitly stated that data are available on request).

If coefficients other than intercepts or baseline hazards were not reported or if it was unclear whether all necessary coefficients were reported, the usability and reproducibility measures were rated as unclear.

Appendix 4. Decisions related to risk of bias and applicability

Decisions related to risk of bias

Participants domain

Prediction Model Risk of Bias Assessment Tool (PROBAST) includes items (4.3, 4.4) on the handling of missing data in the analysis domain. However, studies often explicitly used availability of data as an eligibility criterion and excluded participants with missing data. Similarly to Kreuzberger 2020, we decided to address the exclusion of participants with missing data in the participants domain (PROBAST item 1.2) if a study mentioned it as part of the inclusion/exclusion criteria and we addressed it in the analysis domain otherwise. If selection criteria were based on complete examinations and further predictor‐level missing data was addressed in the analysis, ratings in both domains were affected.
We considered registry data sources as being at high risk of bias (PROBAST item 1.1) unless the study authors reported a specific cohort study within the registry. This was in line with the PROBAST tool, which considers a data source to be appropriate when defined methods are consistently applied for participant inclusion/exclusion, predictor assessment, and outcome determination. This is not expected to be true of registries receiving data from many clinics over long periods of time. There may also be issues related to data quality and availability. For instance, Kalincik 2017 describes implementation of quality assessments for MSBase, a popular international multiple sclerosis (MS) registry. However, it is not only unclear whether these assessments are used to improve data quality in or sampling from the database or any of the prognostic studies based on this database, but they also do not address all possible limitations inherent in observational data.

Predictors and outcome domains

Objectively defining the diagnostic conversion from relapsing‐remitting MS (RRMS) to secondary progressive MS (SPMS) is difficult (Ferrazzano 2020). Currently, it is based on retrospective evaluation of gradual worsening in clinical and radiological assessments, independent of relapses (Lublin 2014). While some studies operationalised the definition of conversion to SPMS, e.g. using a priori defined changes in Expanded Disability Status Scale (EDSS), other studies left conversion unclearly defined. We considered the definition of conversion to SPMS using only clinical judgement to be subjective and therefore at high risk of bias, especially when used in studies relying on retrospective data collected across many sites and over long periods of time.
Based on the rationale that validated disability scales and scores (e.g. EDSS), relapses, or functional systems of symptoms are the most commonly used and accepted clinical parameters in MS practice and research, measurements or definitions based on these were generally considered to be objective and not greatly affected by interrater variability or blinding. Hence, these were considered to be at low risk of bias unless there was an indication to the contrary. The EDSS, for example, is a valid measure of MS severity and progression. Although this measure has documented drawbacks, such as greater interrater variability for lower scores, it is robust for measurements over long time periods and is internationally accepted as a primary endpoint in clinical trials (Meyer‐Moock 2014).

Analysis domain

Nonparametric techniques make fewer assumptions, which require more data. Machine learning (ML) modelling methods then require at least as much data as traditional modelling methods, possibly requiring over 200 events per predictor (van der Ploeg 2014). Clear guidance on this topic is lacking, so we used the current recommendation for PROBAST item 4.1, which requires at least 20 events per predictor. For learning methods using non‐tabular input without prior feature extraction, e.g. deep learning models taking raw images as input, events per variable (EPV) could not be defined, and we rated this item as ‘no information’ (NI), unless the sample size was clearly insufficient, as evidenced by the number of inputs and events.
Some studies dealt with heterogeneity in participant observation times by adjusting for follow‐up duration or numbers of visits during follow‐up, without specifying exactly how this was done. In these situations, we considered the study to be at high risk of bias regarding the methods for accounting for the complexity in the data (PROBAST item 4.6). However, when follow‐up duration was considered as a predictor, this was rather considered to be a predictor measured after time of intended prognostication and was addressed in PROBAST item 2.3.
Although an established outcome measure in MS, the EDSS is not without criticism. For example, the EDSS exhibits greater variability for lower scores than for higher scores, has unequal interval distances, and its rate of change depends on baseline values (Meyer‐Moock 2014). Outcome definitions addressing its weaknesses are recognised; however, not all studies within the review used these preferable outcomes. When the ordinal EDSS was predicted as a continuous outcome in a parametric linear regression, we also assessed baseline EDSS range and any use of interactions. If the range was large and interactions were not tested, we considered the study to be at high risk of bias due to insufficiently accounting for the complexity in the data (PROBAST item 4.6).
Calibration is just as important as discrimination in assessing prognostic models in medicine (Steyerberg 2019). For ML algorithms that output class assignments rather than probabilities, calibration measures may seem inappropriate compared to classification measures. However, many ML methods are known to produce poor predicted probabilities, making assessment even more important (Niculescu‐Mizil 2005; Zadrozny 2001). Calibrating ML models is possible using standard software, just as for traditional regression methods, and should be expected in the biomedical setting. Hence, we did not change the interpretation of item 4.7 of PROBAST for different modelling methods and judged studies lacking calibration assessment to be at high risk of bias.
Assessment of whether overfitting and performance optimism were accounted for, especially in ML studies, required information on data pre‐processing and tuning parameter selection, which can both lead to data leakage. Data leakage is the use of information in model training which is not expected to be available at the time of prognostication, leading to overestimation of model performance (Kaufman 2011). Preprocessing steps such as predictor standardisation are done to improve model fit and were therefore treated as a model tuning step. It is best practice to tune, select, and evaluate the model performance on different data, as in, e.g. a nested cross‐validation structure (Hastie 2009; Steyerberg 2019). We rated a model development as at high risk of bias (PROBAST item 4.8) if there was evidence of data leakage. This is related to the PROBAST guidance on all modelling steps being accounted for appropriately during internal validation.
Many ML studies that reported aiming to develop a clinical prediction model stopped short of clearly selecting a final combination of tuning parameters, predictors, and algorithm and then fitting the selected combination to the full dataset. These studies instead focused on presenting the process of model development. In these cases, it was impossible to determine whether the final presented model corresponded to the results from the multivariable analysis, as the final presented model did not seem to exist. Accordingly, we considered there to be no information for which to respond to PROBAST item 4.9.

Decisions related to applicability

We rated a study as having high concern regarding applicability if:

Participants domain: Participants with the outcome at baseline were included. This review is interested in prognostic models rather than diagnostic ones.
Outcomes domain: Outcomes did not have a clear clinical interpretation. This review included studies with clinical outcomes, which are relevant to people with MS and measure their symptoms, functioning and health status. For example, we considered the well‐known composite outcome no evidence of disease activity to be a simple interpretable combination of relapse, disease progression, and magnetic resonance imaging (MRI) activity. On the other hand, outcomes based on complex weighting of many clinical and paraclinical measures were considered to be difficult to interpret as it is not clear what a specific value of such an outcome means for people with MS.
Predictors domain: Only one type of predictor was considered. To be useful, clinical prediction models should use simple and cost‐effective predictors and add more complex predictors when they offer information above and beyond that offered by simple, available predictors such as demographics and disease characteristics (Steyerberg 2019). Studies using only MRI images, for example, are rated as at high concern for applicability. This review is interested in multivariable prediction models. While such studies may technically be multivariable, they ignore the prognostic value of other, possibly easier to collect, predictors.
Overall: The main objective according to the study report was not development or validation of a model for predicting future clinical outcomes in individuals with MS. The distinction between multivariable models used for prognostication and those used for other purposes can be unclear even when considering the full text, making exclusion of all models for other purposes difficult. When prognostication is not the main aim, the methods may not be optimal for this purpose.

Additionally, we rated study applicability as unclear if:

Overall: The study did not include sufficient details on a final model to allow for validation by unrelated researchers. Model coefficients, nomograms, scores, score charts, and web‐based tools and calculators were considered sufficient, while a list of important predictors was considered insufficient. Studies not reporting a final model are likely to be interested in the importance of methods or predictors, not prognostication of outcomes in individuals. Some studies mentioned the word pipeline, but we did not consider a pipeline to be a complete model/tool directly usable by clinicians and people with MS.
Participants: No eligibility criteria were reported other than the diagnosis of MS for the study population. Having no eligibility criteria other than MS diagnosis seemed unreasonably broad and to be hinting at an underreporting issue.

Appendix 5. Data tables

Table 3

2. Study characteristics.

Analysis	Outcome	Study type	Data source	Recruitment from
Agosta 2006	Disability	Development	Cohort (primary use)	Italy (single site)
Bejarano 2011 Dev	Disability	Development + validation	Cohort (primary use)	Spain (single site)
Bejarano 2011 Val	Disability	Development + validation (location): model refit	Cohort (primary use)	Italy (single site)
Bergamaschi 2015 BREMSO MSSS Val	Disability	Validation (location, time): predictors dropped and different outcome	Registry (secondary use)	Italy, Canada, Australia, Spain, Netherlands, Argentina, Iran, Kuwait, Turkey, Denmark, Czech Republic, Portugal, French, Belgium, UK, Germany, Cuba, Israel, Hungary, USA, India, Mexico, Malta, Macedonia, Romania, Brazil
De Brouwer 2021	Disability	Development	Registry (secondary use)	ND (multisite)
de Groot 2009 Dexterity	Disability	Development	Cohort (primary use)	Netherlands (multisite)
de Groot 2009 Walking	Disability	Development	Cohort (primary use)	Netherlands (multisite)
Kuceyeski 2018	Disability	Development	Mixed: cohort, registry, routine care (secondary use)	ND (ND site)
Law 2019 Ada	Disability	Development	Randomised trial participants (secondary use)	Canada, United Kingdom, Netherlands, Sweden, Denmark, Finland, Germany, Estonia, Latvia, Spain
Law 2019 DT	Disability	Development	Randomised trial participants (secondary use)	Canada, United Kingdom, Netherlands, Sweden, Denmark, Finland, Germany, Estonia, Latvia, Spain
Law 2019 RF	Disability	Development	Randomised trial participants (secondary use)	Canada, United Kingdom, Netherlands, Sweden, Denmark, Finland, Germany, Estonia, Latvia, Spain
Lejeune 2021 Dev	Disability	Development + external validation	Randomised trial participants (secondary use)	France (multisite)
Lejeune 2021 Ext Val	Disability	Development + external validation (location, spectrum)	Routine care (secondary use)	France (single site)
Malpas 2020 Dev	Disability	Development + external validation	Registry (secondary use)	ND (multisite)
Malpas 2020 Ext Val	Disability	Development + external validation (location)	Registry (secondary use)	Sweden (multisite)
Mandrioli 2008 Dev	Disability	Development + external validation	Cohort (secondary use)	Italy (single site)
Mandrioli 2008 Ext Val	Disability	Development + external validation (time)	Cohort (secondary use)	Italy (single site)
Margaritella 2012	Disability	Development	Routine care (secondary use)	Italy (single site)
Montolio 2021	Disability	Development	Routine care (secondary use)	Spain (single site)
Oprea 2020	Disability	Development	Routine care (secondary use)	Romania (single site)
Pinto 2020 severity 10 years	Disability	Development	Routine care (secondary use)	Portugal (single site)
Pinto 2020 severity 6 years	Disability	Development	Routine care (secondary use)	Portugal (single site)
Roca 2020	Disability	Development	Registry (secondary use)	France (multisite)
Rocca 2017	Disability	Development	Cohort (primary use)	Italy (multisite)
Rovaris 2006	Disability	Development	Cohort (primary use)	Italy (multisite)
Sombekke 2010	Disability	Development	Unclear (secondary use)	Netherlands (single site)
Szilasiova 2020	Disability	Development	Cohort (secondary use)	Slovak Republic (single site)
Tommasin 2021	Disability	Development	Unclear (secondary use)	Italy (multisite)
Tousignant 2019	Disability	Development	Randomised trial participants (secondary use)	ND (multisite)
Weinshenker 1991 M3 Dev	Disability	Development	Cohort (primary use)	Canada (single site)
Weinshenker 1996 M3 Ext Val	Disability	External validation (location)	Routine care (secondary use)	Canada (single site)
Weinshenker 1996 short‐term	Disability	Development	Routine care (secondary use)	Canada (single site)
Yperman 2020	Disability	Development	Routine care (secondary use)	Belgium (single site)
Zhao 2020 LGBM All	Disability	Development + validation	Cohort (primary use)	USA (single site)
Zhao 2020 LGBM Common	Disability	Development + validation	Cohort (primary use)	USA (single site)
Zhao 2020 LGBM Common Val	Disability	Development + validation (location): unclear if model refit	Cohort (primary use)	USA (single site)
Zhao 2020 XGB All	Disability	Development + validation	Cohort (primary use)	USA (single site)
Zhao 2020 XGB Common	Disability	Development + validation	Cohort (primary use)	USA (single site)
Zhao 2020 XGB Common Val	Disability	Development + validation (location): unclear if model refit	Cohort (primary use)	USA (single site)
Gurevich 2009 FLP Dev	Relapse	Development + external validation	Unclear (unclear use)	Israel (single site)
Gurevich 2009 FLP Ext Val	Relapse	Development + external validation	Unclear (unclear use)	Israel (single site)
Gurevich 2009 FTP	Relapse	Development	Unclear (unclear use)	Israel (single site)
Sormani 2007 Dev	Relapse	Development + external validation	Randomised trial participants (secondary use)	Argentina, Australia, Austria, Belgium, Canada, Denmark, France, Germany, Hungary, Israel, Italy, Netherlands, New Zealand, Spain, Sweden, Switzerland, UK, USA
Sormani 2007 Ext Val	Relapse	Development + external validation (spectrum)	Randomised trial participants (secondary use)	Europe (undefined), Canada
Vukusic 2004	Relapse	Development	Cohort (primary use)	Unclear (total PRIMS cohort): France, Austria, Belgium, Netherlands, Italy, Denmark, Spain, Germany, United Kingdom, Portugal, Switzerland, Ireland
Ye 2020 gene signature	Relapse	Development	Unclear (secondary use)	Israel (single site)
Ye 2020 nomogram	Relapse	Development	Unclear (secondary use)	Israel (single site)
Aghdam 2021	Conversion to definite MS	Development	Cohort (secondary use)	Iran (single site)
Bendfeldt 2019 Linear Placebo	Conversion to definite MS	Development	Randomised trial participants (secondary use)	Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada
Bendfeldt 2019 M7 Placebo	Conversion to definite MS	Development	Randomised trial participants (secondary use)	Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada
Bendfeldt 2019 M9 IFN	Conversion to definite MS	Development	Randomised trial participants (secondary use)	Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada
Borras 2016	Conversion to definite MS	Development	Cohort (unclear use)	Spain (single site)
Gout 2011	Conversion to definite MS	Development	Registry (secondary use)	France (single site)
Martinelli 2017	Conversion to definite MS	Development	Routine care (unclear use)	Italy (single site)
Olesen 2019 candidate	Conversion to definite MS	Development	Cohort (primary use)	Denmark (multisite)
Olesen 2019 routine	Conversion to definite MS	Development	Cohort (primary use)	Denmark (multisite)
Runia 2014	Conversion to definite MS	Development	Cohort (primary use)	Netherlands (single site)
Spelman 2017	Conversion to definite MS	Development	Cohort (primary use)	ND (multisite)
Wottschel 2015 1 year	Conversion to definite MS	Development	Cohort (secondary use)	UK (single site)
Wottschel 2015 3 year	Conversion to definite MS	Development	Cohort (secondary use)	UK (single site)
Wottschel 2019	Conversion to definite MS	Development	Cohort (secondary use)	Spain, Denmark, Austria, UK, Italy
Yoo 2019	Conversion to definite MS	Development	Randomised trial participants (secondary use)	Canada, United States
Zakharov 2013	Conversion to definite MS	Development	Unclear (unclear use)	Russia (single site)
Zhang 2019	Conversion to definite MS	Development	Cohort (primary use)	Germany (single site)
Bergamaschi 2001 BREMS Dev	Conversion to progressive MS	Development	Mixed: registry, routine care (secondary use)	Italy (single site)
Bergamaschi 2007 BREMS Ext Val	Conversion to progressive MS	External validation (location)	Cohort (secondary use)	Italy (multi‐site)
Bergamaschi 2015 BREMS Ext Val	Conversion to progressive MS	External validation (location, time)	Registry (secondary use)	Italy, Canada, Australia, Spain, Netherlands, Argentina, Iran, Kuwait, Turkey, Denmark, Czech Republic, Portugal, French, Belgium, UK, Germany, Cuba, Israel, Hungary, USA, India, Mexico, Malta, Macedonia, Romania, Brazil
Bergamaschi 2015 BREMSO SP Val	Conversion to progressive MS	Validation (location, time): predictors dropped	Registry (secondary use)	Italy, Canada, Australia, Spain, Netherlands, Argentina, Iran, Kuwait, Turkey, Denmark, Czech Republic, Portugal, French, Belgium, UK, Germany, Cuba, Israel, Hungary, USA, India, Mexico, Malta, Macedonia, Romania, Brazil
Brichetto 2020	Conversion to progressive MS	Development	Cohort (primary use)	Italy (multisite)
Calabrese 2013 Dev	Conversion to progressive MS	Development + external validation	Cohort (primary use)	Italy (single site)
Calabrese 2013 Ext Val	Conversion to progressive MS	Development + external validation (time)	Cohort (primary use)	Italy (single site)
Manouchehrinia 2019 Dev	Conversion to progressive MS	Development + external validation	Registry (secondary use)	Sweden (multisite)
Manouchehrinia 2019 Ext Val 1	Conversion to progressive MS	Development + external validation (location, time, spectrum)	Cohort (secondary use)	Canada (multisite)
Manouchehrinia 2019 Ext Val 2	Conversion to progressive MS	Development + external validation (location, time, spectrum)	Randomised trial participants (secondary use)	Canada, Denmark, France, Germany, Italy, Poland, Portugal, Spain, Switzerland, United Kingdom
Manouchehrinia 2019 Ext Val 3	Conversion to progressive MS	Development + external validation (location, time, spectrum)	Randomised trial participants (secondary use)	ND (multisite)
Misicka 2020 10 years	Conversion to progressive MS	Development	Registry (secondary use)	USA (multisite)
Misicka 2020 20 years	Conversion to progressive MS	Development	Registry (secondary use)	USA (multisite)
Misicka 2020 ever	Conversion to progressive MS	Development	Registry (secondary use)	USA (multisite)
Pinto 2020 SP	Conversion to progressive MS	Development	Routine care (secondary use)	Portugal (single site)
Pisani 2021	Conversion to progressive MS	Development	Cohort (secondary use)	Italy (single site)
Seccia 2020 180 days	Conversion to progressive MS	Development	Routine care (secondary use)	Italy (single site)
Seccia 2020 360 days	Conversion to progressive MS	Development	Routine care (secondary use)	Italy (single site)
Seccia 2020 720 days	Conversion to progressive MS	Development	Routine care (secondary use)	Italy (single site)
Skoog 2014 Dev	Conversion to progressive MS	Development	Cohort (primary use)	Sweden (single site)
Skoog 2019 Ext Val	Conversion to progressive MS	External validation (location, time)	Registry (secondary use)	Sweden (single site)
Skoog 2019 Val	Conversion to progressive MS	Validation	Cohort (primary use)	Sweden (single site)
Tacchella 2018 180 days	Conversion to progressive MS	Development	Routine care (secondary use)	Italy (single site)
Tacchella 2018 360 days	Conversion to progressive MS	Development	Routine care (secondary use)	Italy (single site)
Tacchella 2018 720 days	Conversion to progressive MS	Development	Routine care (secondary use)	Italy (single site)
Vasconcelos 2020 Dev	Conversion to progressive MS	Development + external validation	Unclear (unclear use)	Brazil (single site)
Vasconcelos 2020 Ext Val	Conversion to progressive MS	Development + external validation (time)	Unclear (unclear use)	Brazil (single site)
Ahuja 2021 Dev	Composite	Development + external validation	Mixed: routine care (electronic health records), cohort (secondary use)	United States (single site)
Ahuja 2021 Ext Val	Composite	Development + external validation (spectrum)	Routine care: electronic health records (secondary use)	United States (multisite)
de Groot 2009 cognitive	Composite	Development	Cohort (primary use)	Netherlands (multisite)
Kosa 2022	Composite	Development	Mixed: case‐control, cohort (primary use)	USA (ND site)
Pellegrini 2019	Composite	Development	Randomised trial participants (secondary use)	Australia, Austria, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Canada, Chile, Colombia, Costa Rica, Croatia, Czech Republic, Estonia, France, Georgia, Germany, Greece, Guatemala, India, Ireland, Israel, Latvia, Mexico, Macedonia, Netherlands, Moldova, New Zealand, Peru, Poland, Romania, Russian Federation, Puerto Rico, Serbia, Slovakia, South Africa, Switzerland, Spain, Ukraine, United Kingdom, United States, Virgin Islands (USA)

Open in a new tab

Ada: adaptive boosting BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset Dev: development DT: decision tree Ext: external FLP: first level predictor FTP: fine tuning predictor IFN: interferon LGBM: light gradient boosting machine MS: multiple sclerosis MSSS: multiple sclerosis severity score ND: no data available PRIMS: pregnancy in multiple sclerosis RF: random forest SP: secondary progressive Val: validation XGB: extreme gradient boosting

Table 4

3. Participant characteristics.

Analysis	Outcome	Females	Age (years)	Diagnosis (criteria)	Disease duration (years)	Treated	Clinical description
Agosta 2006	Disability	70%	Mean: 33.5	27.4% CIS, 46.6% RRMS, 26.0% SPMS (Lublin 1996; Poser 1983)	Range: 0 to 25	Recruitment: 0%, follow‐up: 55%	EDSS median (range): CIS 0.0 (0.0 to 1.5), RRMS 2.5 (1.0 to 5.5), SPMS 5.5 (3.5 to 6.5)
Bejarano 2011 Dev	Disability	65%	Mean: 35.1	31.4% CIS, 51.0% RRMS, 5.9% SPMS, 7.8% PPMS, 3.9% PRMS (McDonald 2005)	Mean: 5.9, SD: 7.4	Recruitment: 55%, follow‐up: ND/unclear	EDSS median (range): 2.0 (0 to 6), number of relapses in previous 2 years mean (SD): 1.29 (1.51)
Bejarano 2011 Val	Disability	67%	Mean: 37	88.5% RRMS, 11.5% SPMS (McDonald 2005)	Mean: 9, SD: 6	Recruitment: ND/unclear, follow‐up: ND/unclear	EDSS median (range): 1.5 (0 to 6.5)
Bergamaschi 2015 BREMSO MSSS Val	Disability	71%	Mean: 31.1	100% RRMS (McDonald 2001)	ND/unclear	Recruitment: ND/unclear, follow‐up: 72%	ND
De Brouwer 2021	Disability	71%	Mean (at onset): 32.2	85.6% RRMS, 4.9% SPMS, 3.3% PPMS, 1.4% PRMS, 4.8% unknown (Lublin 1996)	Mean: 6.88, range: 3 to 25	Recruitment: ND/unclear, follow‐up: ND/unclear	Prior 3‐year EDSS per patient mean (SD, range): 2.38 (1.48, 0 to 8.5)
de Groot 2009 dexterity	Disability	64%	Mean: 37.4	82% relapse onset, 18% non‐relapse onset (Poser 1983)	Max: 0.5	Recruitment: 6%, follow‐up: 30%	EDSS median (IQR): 2.5 (2.0 to 3.0)
de Groot 2009 walking	Disability	64%	Mean: 37.4	82% relapse onset, 18% non‐relapse onset (Poser 1983)	Max: 0.5	Recruitment: 6%, follow‐up: 30%	EDSS median (IQR): 2.5 (2.0 to 3.0)
Kuceyeski 2018	Disability	73%	Mean (unclear when): 36.8	100% RRMS (McDonald 2010; McDonald 2017)	Mean: 1.5, SD: 1.3	Recruitment: 95%, follow‐up: ND/unclear	EDSS mean (SD): 1.1 (1.1)
Law 2019 Ada	Disability	64%	Mean: 50.9	100% SPMS (own definition)	Mean: 9.3, SD: 5	Recruitment: ND/unclear, follow‐up: 50%	EDSS median (IQR): 6.0 (4.5 to 6.5)
Law 2019 DT	Disability	64%	Mean: 50.9	100% SPMS (own definition)	Mean: 9.3, SD: 5	Recruitment: ND/unclear, follow‐up: 50%	EDSS median (IQR): 6.0 (4.5 to 6.5)
Law 2019 RF	Disability	64%	Mean: 50.9	100% SPMS (own definition)	Mean: 9.3, SD: 5	Recruitment: ND/unclear, follow‐up: 50%	EDSS median (IQR): 6.0 (4.5 to 6.5)
Lejeune 2021 Dev	Disability	76%	Mean (unclear when): 35.3	100% RRMS (McDonald 2005)	Mean: 7.32, SD: 5.5	Recruitment: 55%, follow‐up: unclear, 32.8% therapeutic escalation, 59.1% no DMT change	EDSS mean (SD): 3.45 (0.96)
Lejeune 2021 Ext Val	Disability	77%	Mean (unclear when): 36.2	100% RRMS (McDonald 2005)	Mean: 7.62, SD: 6.56	Recruitment: 59%, follow‐up: unclear, 48% therapeutic escalation, 49.1% no DMT change	EDSS Mean (SD): 2.93 (1.00)
Malpas 2020 Dev	Disability	71%	Mean (at onset): 31.7	100% RRMS (McDonald 2010)	Mean: 0.33, SD: 0.3	Recruitment: unclear number of participants, mean percentage of time on treatment 1st year, first‐line 17.1%, second‐line 0.50%, follow‐up: unclear number of participants, mean percentage of time on treatment 10th year, 46% first‐line, 5.3% second‐line	First year EDSS mean (SD): 1.78 (1.26), number of relapses mean (SD): 0.74 (0.93)
Malpas 2020 Ext Val	Disability	ND	Mean (at onset): 33.4	100% RRMS (McDonald 2010)	ND/unclear	Recruitment: ND/unclear, follow‐up: ND/unclear	First year EDSS mean (SD): 1.51 (1.28)
Mandrioli 2008 Dev	Disability	61%	Mean (at onset): 27.6	100% RRMS (ND)	ND/unclear	Recruitment: ND/unclear, follow‐up: 70%	EDSS at diagnosis mean (SD): BMS 1.76 (0.24), SMS 2.17 (0.18)
Mandrioli 2008 Ext Val	Disability	62%	Mean (at onset): 33	100% RRMS (ND)	ND/unclear	Recruitment: ND/unclear, follow‐up: 68%	EDSS at diagnosis mean (SD): BMS 1.65 (0.10), SMS 2.45 (0.23)
Margaritella 2012	Disability	79%	Mean (at onset): 28.6	89.7% RRMS, 3.4% PPMS, 6.9% Benign MS (McDonald 2001; McDonald 2005)	Mean: 10.1, SD: 7.3	Recruitment: ND/unclear, follow‐up: ND/unclear	EDSS mean (SD): 2.1 (1.5)
Montolio 2021	Disability	67%	Mean: 42.4	92.7% RRMS, 6.1% SPMS, 1.2% PPMS (McDonald 2001)	Mean: 10.1, pooled SD: 7.74	Recruitment: ND/unclear, follow‐up: 70%	EDSS mean: 2.6 (SD between 1.27 to 2.02)
Oprea 2020	Disability	62%	Mean: 40.3	Unclear: RRMS, PPMS (ND)	Mean: 10.2	Recruitment: ND/unclear, follow‐up: 100%	ND
Pinto 2020 severity 10 years	Disability	78%	Mean (at onset): 32.3	100% RRMS (McDonald (undefined))	ND/unclear	Recruitment: ND/unclear, follow‐up: ND/unclear	ND
Pinto 2020 severity 6 years	Disability	70%	Mean (at onset): 30.3	100% RRMS (McDonald (undefined))	ND/unclear	Recruitment: ND/unclear, follow‐up: ND/unclear	ND
Roca 2020	Disability	ND	ND/unclear	ND (ND)	ND/unclear	Recruitment: ND/unclear, follow‐up: ND/unclear	ND
Rocca 2017	Disability	50%	Mean: 51.3	100% PPMS (Thompson 2000)	Median: 10, range: 2 to 26	Recruitment: 33%, follow‐up: 18%	EDSS median (IQR): 6.0 (4.5 to 6.5)
Rovaris 2006	Disability	50%	Mean: 51.3	100% PPMS (Thompson 2000)	Median: 10, range: 2 to 26	Recruitment: 33%, follow‐up: 18%	EDSS median (range): 5.5 (2.5 to 7.5)
Sombekke 2010	Disability	64%	Mean (at onset): 32.4	51.2% RRMS, 31.4% SPMS, 17.4% PPMS (Poser 1983; McDonald 2006)	Mean: 13.1, SD: 8.3	Recruitment: ND/unclear, follow‐up: ND/unclear	EDSS median (IQR): 4.0 (3.5)
Szilasiova 2020	Disability	65%	ND/unclear	63.5% RRMS, 29.4% SPMS, 7.1% PPMS (McDonald 2001)	Mean: 6.7, range: 0.5 to 30	Recruitment: ND/unclear, follow‐up: 100%	EDSS mean (SD, range): 3.03 (1.5, 1.0 to 7.0)
Tommasin 2021	Disability	64%	Mean: 39.7	74.8% RRMS, 25.2% PMS (McDonald 2010; McDonald 2017)	Mean: 9.9, SD: 8.06	Recruitment: ND/unclear, follow‐up: 72%	EDSS median (range): 3.0 (0.0 to 7.5)
Tousignant 2019	Disability	ND	ND/unclear	100% RRMS (ND)	ND/unclear	Recruitment: 0%, follow‐up: 0%	ND
Weinshenker 1991 M3 Dev	Disability	66%	Mean (at onset): 30.5	65.8% RRMS, 14.8% relapsing progressive, 18.7% chronically progressive, 0.9% unknown/83.3% diagnosis probable, 16.4% diagnosis possible (Poser 1983)	Mean: 11.9, SE: 0.3	Recruitment: 0%, follow‐up: 0%	ND
Weinshenker 1996 M3 Ext Val	Disability	69%	Mean: 44.1	84.3% RRMS, 2.0% relapsing progressive, 13.7% chronically progressive (ND)	Mean: 12	Recruitment: ND/unclear, follow‐up: ND/unclear	Unclear
Weinshenker 1996 short‐term	Disability	69%	Mean: 44.1	84.3% RRMS, 2.0% relapsing progressive, 13.7% chronically progressive (ND)	Mean: 12	Recruitment: ND/unclear, follow‐up: ND/unclear	Unclear
Yperman 2020	Disability	72%	Mean: 45	CIS 1.7%, RRMS 53.2%, SPMS 10.7%, PPMS 2.9%, unknown 32.9% (unrecorded in the dataset)	ND/unclear	Recruitment: 74%, follow‐up: 79%	EDSS mean (SD): 3.0 (1.8)
Zhao 2020 LGBM All	Disability	76%	Mean (unclear when): 39	Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND)	Median: 2, range: 0 to 44	Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment	Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5)
Zhao 2020 LGBM Common	Disability	76%	Mean (unclear when): 39	Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND)	Median: 2, range: 0 to 44	Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment	Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5)
Zhao 2020 LGBM Common Val	Disability	69%	Mean (unclear when): 42.5	Unclear: 15.9% CIS, 70.8% RRMS, 9.3% SPMS, 3.9% PPMS (ND)	Median: 6, range: 0 to 45	Recruitment: unclear, for source cohort, 93.4% DMT first line, 0.7% DMT oral, 1.6% DMT high, 0.7% experimental, 0.5% immune, 1.6% steroid, 1.6% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 57.4% DMT first line, 19.8% DMT oral, 9.6% DMT high, 3% experimental, 0.5% immune, 5% steroid, 4.8% MS other, 15.1% never on treatment	Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 3; 0 to 7)
Zhao 2020 XGB All	Disability	76%	Mean (unclear when): 39	Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND)	Median: 2, range: 0 to 44	Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment	Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5)
Zhao 2020 XGB Common	Disability	76%	Mean (unclear when): 39	Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND)	Median: 2, range: 0 to 44	Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment	Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5)
Zhao 2020 XGB Common Val	Disability	69%	Mean (unclear when): 42.5	Unclear: 15.9% CIS, 70.8% RRMS, 9.3% SPMS, 3.9% PPMS (ND)	Median: 6, range: 0 to 45	Recruitment: unclear, for source cohort, 93.4% DMT first line, 0.7% DMT oral, 1.6% DMT high, 0.7% experimental, 0.5% immune, 1.6% steroid, 1.6% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 57.4% DMT first line, 19.8% DMT oral, 9.6% DMT high, 3% experimental, 0.5% immune, 5% steroid, 4.8% MS other, 15.1% never on treatment	Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 3; 0 to 7)
Gurevich 2009 FLP Dev	Relapse	64%	Mean (unclear when): 36.3	34.0% CIS, 66.0% CDMS (McDonald 2001)	Mean: 5.67, pooled SD: 0.89	Recruitment: 0%, follow‐up: 35%	EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1)
Gurevich 2009 FLP Ext Val	Relapse	ND	ND/unclear	Unclear: CIS 60%, CDMS 40% (McDonald 2001)	ND/unclear	Recruitment: 0%, follow‐up: unclear, 9 on IMD	Unclear, published inconsistencies, EDSS (unclear if mean and SD): CIS 2.58 (0.15) CDMS 5.3 (2.39), annualised relapse rate (unclear if mean and SD): CIS 6.1 (2.05) CDMS 1 (0.51)
Gurevich 2009 FTP	Relapse	64%	Mean (unclear when): 36.3	34.0% CIS, 66.0% CDMS (McDonald 2001)	Mean: 5.67, pooled SD: 0.89	Recruitment: 0%, follow‐up: 35%	EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1)
Sormani 2007 Dev	Relapse	ND	Median: 37	100% RRMS (Poser 1983)	Median: 5.9, range: 0.6 to 30	Recruitment: 0%, follow‐up: 1%	EDSS median (range): 2.0 (0.0 to 5.0), prior 2‐year number of relapses (range): 2 (1 to 11)
Sormani 2007 Ext Val	Relapse	ND	Median: 34	100% RRMS (Poser 1983)	Median: 3.8, range: 0.5 to 22	Recruitment: 0%, follow‐up: 0%	EDSS median (range): 2.0 (0.0 to 4.0), prior 2‐year number of relapses (range): 2 (1 to 8)
Vukusic 2004	Relapse	100%	Mean: 30	96% RRMS, 4% SPMS (Poser 1983)	Mean: 6, SD: 4	Recruitment: 0%, follow‐up: 2%	DSS at beginning of pregnancy mean (SD): 1.3 (1.4), annualised relapse rate during year before pregnancy (95% CI): 0.7 (0.6 to 0.8)
Ye 2020 gene signature	Relapse	64%	Mean (unclear when): 36.3	34.0% CIS, 66.0% CDMS (McDonald 2001)	Mean: 5.67, pooled SD: 0.89	Recruitment: ND/unclear, follow‐up: 31%	EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1)
Ye 2020 nomogram	Relapse	64%	Mean (unclear when): 36.3	34.0% CIS, 66.0% CDMS (McDonald 2001)	Mean: 5.67, pooled SD: 0.89	Recruitment: ND/unclear, follow‐up: 31%	EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1)
Aghdam 2021	Conversion to definite MS	74%	Mean (unclear when): 40	100% CIS (McDonald 2010)	ND/unclear	Recruitment: 0%, follow‐up: ND/unclear	History of ON: 11.9%
Bendfeldt 2019 linear placebo	Conversion to definite MS	71%	Mean: 30.8	100% CIS (own definition)	Max: 0.16	Recruitment: 0%, follow‐up: 0%	EDSS median (range): conv‐ 1.5 (1.0 to 2.0), conv+ 1.5 (1.0 to 2.0)
Bendfeldt 2019 M7 placebo	Conversion to definite MS	70%	Mean: 29.7	100% CIS (own definition)	Max: 0.16	Recruitment: 0%, follow‐up: 0%	EDSS median (range): conv‐ 1.0 (0.0 to 2.0), conv+ 1.5 (1.0 to 2.0)
Bendfeldt 2019 M9 IFN	Conversion to definite MS	66%	Mean: 29.6	100% CIS (own definition)	Max: 0.16	Recruitment: 0%, follow‐up: 100%	EDSS median (range): conv‐ 2.0 (1.0 to 2.0), conv+ 2.0 (1.0 to 2.5)
Borras 2016	Conversion to definite MS	66%	Median (unclear when): 35.5	100% CIS (ND)	Median: 0.22, range: 0.01 to 0.35	Recruitment: ND/unclear, follow‐up: 8%	EDSS median (range): 1.5 (0 to 5)
Gout 2011	Conversion to definite MS	70%	Median: 31	100% CIS (ND)	ND/unclear	Recruitment: 0%, follow‐up: 0%	EDSS median (range): 2 (0 to 6)
Martinelli 2017	Conversion to definite MS	68%	Mean: 32	100% CIS (ND)	Max: 0.25	Recruitment: ND/unclear, follow‐up: 40%	ND
Olesen 2019 candidate	Conversion to definite MS	68%	Median: 36	100% CIS (Optic Neuritis Study Group criteria 1991)	ND/unclear	Recruitment: 0%, follow‐up: ND/unclear	ND
Olesen 2019 routine	Conversion to definite MS	68%	Median: 36	100% CIS (Optic Neuritis Study Group criteria 1991)	ND/unclear	Recruitment: 0%, follow‐up: ND/unclear	ND
Runia 2014	Conversion to definite MS	73%	ND/unclear	100% CIS (own definition)	Max: 0.5	Recruitment: ND/unclear, follow‐up: ND/unclear	ND
Spelman 2017	Conversion to definite MS	71%	Median (at MS onset): 31.6	100% CIS (Poser 1983)	Max: 1	Recruitment: ND/unclear, follow‐up: 28%	EDSS median (IQR): 2 (1 to 2.5)
Wottschel 2015 1 year	Conversion to definite MS	66%	Mean: 33.1	100% CIS (ND)	Mean: 0.12, SD: 0.07	Recruitment: 0%, follow‐up: 0%	EDSS median (range): 1 (0 to 8)
Wottschel 2015 3 years	Conversion to definite MS	67%	Mean: 33.2	100% CIS (ND)	Mean: 0.12, SD: 0.07	Recruitment: 0%, follow‐up: 0%	EDSS median (range): 1 (0 to 8)
Wottschel 2019	Conversion to definite MS	66%	Mean (at onset): 32.7	100% CIS (ND)	Max: 0.27	Recruitment: ND/unclear, follow‐up: ND/unclear	EDSS median (range): 2 (0 to 8)
Yoo 2019	Conversion to definite MS	69%	Mean (at onset): 35.9	100% CIS (ND)	Median: 0.23, range: 0.06 to 0.52	Recruitment: 0%, follow‐up: 50%	EDSS median (range): 1.5 (0 to 4.5)
Zakharov 2013	Conversion to definite MS	70%	Mean: 25.1	100% CIS (ND)	ND/unclear	Recruitment: ND/unclear, follow‐up: ND/unclear	EDSS ≤ 2
Zhang 2019	Conversion to definite MS	70%	Mean (unclear when): 42.4	100% CIS (McDonald 2010)	ND/unclear	Recruitment: 1%, follow‐up: ND/unclear	EDSS median: 1
Bergamaschi 2001 BREMS Dev	Conversion to progressive MS	63%	Mean: 28.5	100% RRMS (Poser 1983; Lublin 1996)	ND/unclear	Recruitment: 10%, follow‐up: 10%	ND
Bergamaschi 2007 BREMS Ext Val	Conversion to progressive MS	69%	Median: 24.8	100% RRMS (Poser 1983)	ND/unclear	Recruitment: 3%, follow‐up: 57%	ND
Bergamaschi 2015 BREMS Ext Val	Conversion to progressive MS	71%	Mean: 31.1	100% RRMS (McDonald 2001)	ND/unclear	Recruitment: ND/unclear, follow‐up: 72%	ND
Bergamaschi 2015 BREMSO SP Val	Conversion to progressive MS	71%	Mean: 31.1	100% RRMS (McDonald 2001)	ND/unclear	Recruitment: ND/unclear, follow‐up: 72%	ND
Brichetto 2020	Conversion to progressive MS	ND	ND/unclear	Unclear: unclear, RRMS, SPMS (ND)	ND/unclear	Recruitment: ND/unclear, follow‐up: ND/unclear	ND
Calabrese 2013 Dev	Conversion to progressive MS	67%	Mean: 35.3	100% RRMS (McDonald 2001)	Mean: 11.3, range: 5 to 23	Recruitment: 100%, follow‐up: 100%	EDSS median (range): 2.5 (0 to 4.5)
Calabrese 2013 Ext Val	Conversion to progressive MS	60%	Mean: 34.5	100% RRMS (McDonald 2001)	Mean: 10.5, range: 10 to 21	Recruitment: 100%, follow‐up: 100%	ND
Manouchehrinia 2019 Dev	Conversion to progressive MS	72%	Mean (at onset): 31.5	100% RRMS (McDonald (undefined))	ND/unclear	Recruitment: unclear, a minority, follow‐up: unclear number of participants, median duration of exposures first‐line 3, second‐line 0.8	First to recorded EDSS median (IQR): 2 (1 to 3)
Manouchehrinia 2019 Ext Val 1	Conversion to progressive MS	74%	Mean (at onset): 31.1	100% RRMS (Poser 1983)	ND/unclear	Recruitment: ND/unclear, follow‐up: unclear number of participants, median 0	First to recorded EDSS median (IQR): 2 (1 to 3)
Manouchehrinia 2019 Ext Val 2	Conversion to progressive MS	67%	Mean (at onset): 29.5	100% RRMS (McDonald 2001)	ND/unclear	Recruitment: 0%, follow‐up: 100%	First to recorded EDSS median (IQR): 2 (1.5 to 3)
Manouchehrinia 2019 Ext Val 3	Conversion to progressive MS	74%	Mean (at onset): 29.9	100% RRMS (McDonald 2005)	ND/unclear	Recruitment: 0%, follow‐up: 100%	First to recorded EDSS median (IQR): 2 (1.5 to 3.5)
Misicka 2020 10 years	Conversion to progressive MS	78%	Median (at MS onset): 32	100% RRMS (McDonald 2005; McDonald 2010)	Median: 11, IQR: 5 to 19	Recruitment: 0%, follow‐up: ND/unclear	Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1%
Misicka 2020 20 years	Conversion to progressive MS	78%	Median (at MS onset): 32	100% RRMS (McDonald 2005; McDonald 2010)	Median: 11, IQR: 5 to 19	Recruitment: 0%, follow‐up: ND/unclear	Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1%
Misicka 2020 ever	Conversion to progressive MS	78%	Median (at onset): 32	100% RRMS (McDonald 2005; McDonald 2010)	Median: 11, IQR: 5 to 19	Recruitment: 0%, follow‐up: ND/unclear	Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1%
Pinto 2020 SP	Conversion to progressive MS	73%	Mean (at onset): 31.1	100% RRMS (McDonald (undefined))	ND/unclear	Recruitment: ND/unclear, follow‐up: ND/unclear	ND
Pisani 2021	Conversion to progressive MS	58%	Mean: 33.5	100% RRMS (McDonald 2005)	ND/unclear	Recruitment: 100%, follow‐up: 100%	EDSS median (range): 1.5 (0 to 3.5)
Seccia 2020 180 days	Conversion to progressive MS	70%	Mean (at onset): 29	100% RRMS (latest criteria at time of diagnosis)	Mean: 19	Recruitment: ND/unclear, follow‐up: 73%	ND
Seccia 2020 360 days	Conversion to progressive MS	70%	Mean (at onset): 29	100% RRMS (latest criteria at time of diagnosis)	Mean: 19	Recruitment: ND/unclear, follow‐up: 73%	ND
Seccia 2020 720 days	Conversion to progressive MS	70%	Mean (at onset): 29	100% RRMS (latest criteria at time of diagnosis)	Mean: 19	Recruitment: ND/unclear, follow‐up: 73%	ND
Skoog 2014 Dev	Conversion to progressive MS	65%	Mean: 33.5	100% RRMS (Poser 1983)	Median: 2	Recruitment: 0%, follow‐up: 0%	ND
Skoog 2019 Ext Val	Conversion to progressive MS	76%	Mean (at CDMS onset (2nd attack)): 33	100% RRMS (Poser 1983)	ND/unclear	Recruitment: 0%, follow‐up: unclear, few patients received first generation DMT (IFN‐beta or glatiramer acetate), 99 out of 1762 patient years	ND
Skoog 2019 Val	Conversion to progressive MS	65%	Mean (at CDMS onset (2nd attack)): 33	100% RRMS (Poser 1983)	Median: 2	Recruitment: 0%, follow‐up: 0%	ND
Tacchella 2018 180 days	Conversion to progressive MS	ND	ND/unclear	100% RRMS (McDonald 2017)	ND/unclear	Recruitment: ND/unclear, follow‐up: 89%	ND
Tacchella 2018 360 days	Conversion to progressive MS	ND	ND/unclear	100% RRMS (McDonald 2017)	ND/unclear	Recruitment: ND/unclear, follow‐up: 89%	ND
Tacchella 2018 720 days	Conversion to progressive MS	ND	ND/unclear	100% RRMS (McDonald 2017)	ND/unclear	Recruitment: ND/unclear, follow‐up: 89%	ND
Vasconcelos 2020 Dev	Conversion to progressive MS	76%	Mean (at onset): 28.7	100% RRMS (Poser 1983; McDonald 2001)	Mean: 16, SD: 9.42	Recruitment: ND/unclear, follow‐up: 58%	Patients with more than one relapse at first year of disease: 74%
Vasconcelos 2020 Ext Val	Conversion to progressive MS	78%	Mean (at onset): 28.5	100% RRMS (Poser 1983; McDonald 2001)	Mean: 13.22, SD: 9.72	Recruitment: ND/unclear, follow‐up: 77%	Patients with more than one relapse at first year of disease: 74%
Ahuja 2021 Dev	Composite	74%	Median (unclear when): 43.3	Unclear: approximately 70% to 80% RRMS, approximately 10% PPMS, 10% to 20% SPMS (ND)	Median: 5.12, IQR: 2.03	Recruitment: ND/unclear, follow‐up: 55%	ND
Ahuja 2021 Ext Val	Composite	74%	Median (unclear when): 43.3	Unclear: approximately 70% to 80% RRMS, approximately 10% PPMS, 10% to 20% SPMS (ND)	Median: 4.37, IQR: 2.82	Recruitment: ND/unclear, follow‐up: 55%	ND
de Groot 2009 cognitive	Composite	64%	Mean: 37.4	82% relapse onset, 18% non‐relapse onset (Poser 1983)	Max: 0.5	Recruitment: 6%, follow‐up: 30%	EDSS median (IQR): 2.5 (2.0 to 3.0)
Kosa 2022	Composite	54%	Mean: 49.6	30.8% RRMS, 24.2% SPMS, 44.9% PPMS (McDonald 2010; McDonald 2017)	Mean: 12.2, pooled SD: 8.51	Recruitment: 0%, follow‐up: ND/unclear	EDSS mean (SD): development set; RRMS 1.8 (1.2), SPMS 5.9 (1.2), PPMS 5.3 (1.6)/validation set RRMS 2.2 (1.6), SPMS 5.5 (1.5), PPMS 5.2 (1.6)
Pellegrini 2019	Composite	71%	Mean: 37.1	100% RRMS (McDonald 2001; McDonald 2005)	Mean: 7.5, SD: 6.5	Recruitment: 34%, follow‐up: 0%	EDSS mean (SD): 2.5 (1.2), number of relapses 1 year prior to study entry mean (SD): 1.4 (0.7)

Open in a new tab

Ada: adaptive boosting BREMS: Bayesian Risk Estimate for Multiple Sclerosis BREMSO: Bayesian Risk Estimate for Multiple Sclerosis Onset CDMS: clinically definite multiple sclerosis CI: confidence interval CIS: clinically isolated syndrome conv‐: did not convert to CDMS conv+: converted to CDMS  Dev: development DMT: disease‐modifying treatment DSS: disability status scale DT: decision tree EDSS: Expanded Disability Status Scale Ext: external FLP: first level predictor FTP: fine tuning predictor IFN: interferon IMD: immunomodulatory drug IQR: interquartile range LGBM: light gradient boosting machine Max: maximum Min: minimum MS: multiple sclerosis ND: no data available ON: optic neuritis PPMS: primary progressive MS PRMS: progressive‐relapsing MS RF: random forest RRMS: relapsing‐remitting MS SD: standard deviation SP: secondary progressive SPMS: secondary progressive MS Val: validation

Table 5

4. Number of predictors.

Model	Outcome	Number considered	Number included	Timing
Agosta 2006	Disability	26	3 (2 or 3 (unclear if follow‐up duration included))	At study baseline (cohort entry at least 2 years after diagnosis of definite MS or 3 months after CIS), 12 months (± 10 days) after baseline, at final follow‐up (outcome measurement)
Bejarano 2011	Disability	23 (22 or 23 (unclear transformation))	5	At study baseline (cohort entry)
De Brouwer 2021	Disability	24 + EDSS trajectories	19 predictors + EDSS trajectories	At multiple visits, at least 6 in 3‐year period
de Groot 2009 dexterity	Disability	5	5	At disease onset (definite MS) (study baseline within 6 months after diagnosis)
de Groot 2009 walking	Disability	5	3	At disease onset (definite MS) (study baseline within 6 months after diagnosis)
Kuceyeski 2018	Disability	965	703 (703 predictors transformed to 6 principal components)	At baseline image (early RRMS within 5 years of their first neurologic symptom), at follow‐up (unclear: outcome measurement)
Law 2019 Ada	Disability	9	9	At study baseline (RCT)
Law 2019 DT	Disability	9	9	At study baseline (RCT)
Law 2019 RF	Disability	9	9	At study baseline (RCT)
Lejeune 2021	Disability	19 (≤ 19 and ≥ 14 (unclear transformations))	6 (7 df)	At study baseline (RCT, relapse) or retrospectively at screening
Malpas 2020	Disability	17	3	At symptom onset, at visits up to 1 year following symptom onset, and at final follow‐up
Mandrioli 2008	Disability	15	4	At disease onset (RRMS)
Margaritella 2012	Disability	8 (≥ 8 (unclear transformations))	6	At multiple assessments consecutively for 3 years until 1 year prior to outcome
Montolio 2021	Disability	39	5 (4 of them longitudinal)	At 3 visits over 2 years (not defined baseline and annual visits 1 and 2)
Oprea 2020	Disability	6	6	At a single time point during outcome determination
Pinto 2020 severity 10 years	Disability	1306	Unclear which predictors make up the final model	At multiple visits dependent on which 1‐year to 5‐year model
Pinto 2020 severity 6 years	Disability	1306	Unclear which predictors make up the final model	At multiple visits dependent on which 1‐year to 5‐year model
Roca 2020	Disability	Unstructured data + 65	Unstructured data + 65	At FLAIR imaging (initial in the dataset)
Rocca 2017	Disability	26	5	At study baseline (cohort entry), at median 15 months after baseline, and at median 56 months (called 5 years) after baseline
Rovaris 2006	Disability	25	3 (2 or 3 (unclear if follow‐up time included))	At study baseline (cohort entry), at 15 months post‐baseline (follow‐up), at final follow‐up (outcome measurement)
Sombekke 2010	Disability	72	9 (13 df)	At baseline (already available or retrospectively collected)
Szilasiova 2020	Disability	11	6 (7 df)	At study baseline (cohort entry)
Tommasin 2021	Disability	16	4	At assessment (not defined), at follow‐up
Tousignant 2019	Disability	Unstructured data	Unstructured data	At imaging
Weinshenker 1991 M3	Disability	13 (≥ 13 (unclear if complete list))	7	At assessment (not defined), at follow‐up
Weinshenker 1996 short‐term	Disability	5 (≥ 5 (unclear if complete list))	5	At assessment (not defined), at follow‐up (unclear: outcome measurement)
Yperman 2020	Disability	5893	9 (≤ 9 (unclear subset))	At visit of interest
Zhao 2020 LGBM All	Disability	198	198	At multiple assessments every 6 months from baseline (undefined) to year 2
Zhao 2020 LGBM Common	Disability	105 (≤ 105 (unclear subset))	105 (≤ 105 (unclear subset))	At multiple assessments every year from baseline (undefined) to year 2
Zhao 2020 XGB All	Disability	198	198	At multiple assessments every 6 months from baseline (undefined) to year 2
Zhao 2020 XGB Common	Disability	105 (≤ 105 (unclear subset))	105 (≤ 105 (unclear subset))	At multiple assessments every year from baseline (undefined) to year 2
Gurevich 2009 FLP	Relapse	10,602	10 (df unclear)	At study baseline (cohort entry)
Gurevich 2009 FTP	Relapse	10,602	9 (df unclear)	At study baseline (cohort entry)
Sormani 2007	Relapse	12 (≥ 12 (unclear transformations))	2	At study baseline (RCT, entry at least 1 year after disease onset)
Vukusic 2004	Relapse	11	3	At study baseline (cohort entry during pregnancy week 4 to 36), at examinations at 20, 28, 36 weeks of gestation, and also post‐partum
Ye 2020 gene signature	Relapse	202	5	At study baseline (cohort entry)
Ye 2020 nomogram	Relapse	206	5	At study baseline (cohort entry)
Aghdam 2021	Conversion to definite MS	10 (≥ 10 (unclear transformations))	4	At presentation due to ON
Bendfeldt 2019 linear placebo	Conversion to definite MS	Number of voxels in the cortical GM mask	NA	At disease onset (CIS) (RCT baseline within 60 days after onset)
Bendfeldt 2019 M7 placebo	Conversion to definite MS	301	25 (df unclear (reported predictors do not add up to 25))	At disease onset (CIS) (RCT baseline within 60 days after onset)
Bendfeldt 2019 M9 IFN	Conversion to definite MS	301	15 (df unclear)	At disease onset (CIS) (RCT baseline within 60 days after onset)
Borras 2016	Conversion to definite MS	32 (≤ 32 and ≥ 17 (discrepant lists))	2	At disease onset (CIS) reported as 'first relapse' (4 to 126 days between CIS and lumbar puncture)
Gout 2011	Conversion to definite MS	15 (≥ 15 (unclear how many interactions tested))	3	At disease onset (CIS) leading to admission
Martinelli 2017	Conversion to definite MS	36 (≥ 24 and ≤ 36 (unclear adjustments and transformations))	7 (5 or 7 (unclear adjustment))	At disease onset (CIS) and up to 3 months after disease onset
Olesen 2019 candidate	Conversion to definite MS	14	3	At disease onset (ON), from ON onset median (range): 14 days (2 days 38 days)
Olesen 2019 routine	Conversion to definite MS	4	3	At disease onset (ON), from ON onset median (range): 14 days (2 days 38 days)
Runia 2014	Conversion to definite MS	21 (≥ 16 or 21 (unclear transformations))	5	At disease onset (CIS) (at study baseline within 6 months after onset)
Spelman 2017	Conversion to definite MS	16 (≥ 16 (unclear how many interactions tested))	7 (11 df)	At disease onset (CIS) (up to 12 months after disease onset)
Wottschel 2015 1 year	Conversion to definite MS	14	3 (df unclear)	At disease onset (CIS) and up to a mean of 6.15 weeks (SD 3.4) after disease onset
Wottschel 2015 3 years	Conversion to definite MS	14	6 (df unclear)	At disease onset (CIS) and up to a mean of 6.15 weeks (SD 3.4) after disease onset
Wottschel 2019	Conversion to definite MS	214	36 ((for 2‐fold CV))	At disease onset (CIS) and up to 14 weeks after disease onset
Yoo 2019	Conversion to definite MS	Unstructured data + 11(user‐defined)	Unstructured data + 11	At disease onset (CIS) (RCT baseline within 180 days after disease onset)
Zakharov 2013	Conversion to definite MS	2 (≥ 2 (unclear if complete list))	2	Unclear, at first MRI after CIS onset (timing distribution unknown)
Zhang 2019	Conversion to definite MS	30	18	At disease onset (CIS) (during primary clinical work‐up for CIS)
Bergamaschi 2001 BREMS	Conversion to progressive MS	9 (> 9 (unclear if complete list))	9	At disease onset (RRMS) and regular visits up to 1 year after onset (baseline)
Brichetto 2020	Conversion to progressive MS	143	33	Unclear, at multiple assessments every 4 months
Calabrese 2013	Conversion to progressive MS	16 (≥ 16 (unclear df of initial symptoms))	3	At study baseline (cohort entry at least 5 years after disease onset)
Manouchehrinia 2019	Conversion to progressive MS	20	5 (6 df)	From disease onset (RRMS) to first EDSS recorded (median 2 years)
Misicka 2020 10 years	Conversion to progressive MS	35	6 (7 df)	At study interview (the same as the time of outcome reporting)
Misicka 2020 20 years	Conversion to progressive MS	35	6 (7 df)	At study interview (the same as the time of outcome reporting)
Misicka 2020 ever	Conversion to progressive MS	35	6 (7 df)	At study interview (the same as the time of outcome reporting)
Pinto 2020 SP	Conversion to progressive MS	1306	Unclear which predictors make up the final model	At multiple visits dependent on which 1‐year to 5‐year model
Pisani 2021	Conversion to progressive MS	13 (12 or 13 (unclear adjustment))	7	At diagnosis (RRMS) and up to 2 years after diagnosis
Seccia 2020 180 days	Conversion to progressive MS	21 predictor trajectories	18 predictor trajectories	At multiple visits comprising patient history to the current visit of interest
Seccia 2020 360 days	Conversion to progressive MS	21 predictor trajectories	18 predictor trajectories	At multiple visits comprising patient history to the current visit of interest
Seccia 2020 720 days	Conversion to progressive MS	21 predictor trajectories	18 predictor trajectories	At multiple visits comprising patient history to the current visit of interest
Skoog 2014	Conversion to progressive MS	15 (≥ 15 (unclear transformations))	3 (4 df)	At last relapse, at time of prognostication
Tacchella 2018 180 days	Conversion to progressive MS	46	46	At visit of interest
Tacchella 2018 360 days	Conversion to progressive MS	46	46	At visit of interest
Tacchella 2018 720 days	Conversion to progressive MS	46	46	At visit of interest
Vasconcelos 2020	Conversion to progressive MS	8	5	At multiple visits (unclear if CIS or RR onset) to at least 2 years post‐onset
Ahuja 2021	Composite	2730	114 (model 1: 111, model 2: 3)	From 1 year ago to the index encounter (unspecified)
de Groot 2009 cognitive	Composite	5	4	At disease onset (definite MS) (study baseline within 6 months after diagnosis)
Kosa 2022	Composite	852,167 (852,167 or 852,165 (unclear adjustment for age and sex))	23 (23 or 21 (unclear if age and sex are predictors))	At lumbar puncture
Pellegrini 2019	Composite	23	3	At study baseline (RCT)

Open in a new tab

Ada: adaptive boosting BREMS: Bayesian Risk Estimate for Multiple Sclerosis CIS: clinically isolated syndrome CV: cross‐validation df: degrees of freedom DT: decision tree EDSS: Expanded Disability Status Scale FLAIR: fluid‐attenuated inversion recovery FLP: first level predictor FTP: fine tuning predictor IFN: interferon LGBM: light gradient boosting machine MRI: magnetic resonance imaging MS: multiple sclerosis NA: not applicable ND: no data available ON: optic neuritis RCT: randomised controlled trial RF: random forest RR: relapsing‐remitting RRMS: relapsing‐remitting MS SD: standard deviation SP: secondary progressive XGB: extreme gradient boosting

Table 6

5. Development and performance details.

Analysis	Outcome	Algorithm	Sample size (number of events)	EPV	Evaluation details	Number of external validations	Calibration	Discrimination	Classification
Agosta 2006	Disability (EDSS)	Logistic regression	70 (44)	1	Cross‐validation	0	ND	ND	Accuracy = 46/70, sensitivity = 30/41, specificity = 16/29
Bejarano 2011	Disability (EDSS)	Neural network	51 (NA)	2	Cross‐validation	1 external refit	ND	AUC computed for continuous outcome	Unclear how classification measures produced for numeric outcome, accuracy = 0.80 (SD 0.14), sensitivity = 0.92, specificity = 0.61, PPV = 0.80, NPV = 0.80
Bejarano 2011 Val	Disability (EDSS)	NA	96 (NA)	4	Validation; location	NA	ND	NA	Unclear how classification measures produced for numeric outcome, accuracy = 0.81
Bergamaschi 2015 BREMSO MSSS Val	Disability (MSSS)	NA	14,211 (3567)	NA	Validation; multiple (location, time); predictors dropped and different outcome	NA	ND	ND	Sensitivity = 0.36, specificity = 0.79
De Brouwer 2021	Disability (EDSS)	Neural network	6682 (1114)	46	Cross‐validation	0	Calibration plot upon request	0.66 (0.64 to 0.68)^B	ND
de Groot 2009 dexterity	Disability (9HPT)	Logistic regression	146 (46)	9	Bootstrap	0	Calibration plot, calibration slope 0.85	0.77 (0.69 to 0.86)	ND
de Groot 2009 walking	Disability (EDSS)	Logistic regression	146 (37)	7	Bootstrap	0	Calibration plot, calibration slope 0.93	0.89 (0.83 to 0.95)	ND
Kuceyeski 2018	Disability (cognitive ‐ SDMT)	Partial least squares regression	60 (NA)	10	Unclear	0	Calibration plot	NA	NA
Law 2019 Ada	Disability (EDSS)	Boosting	485 (115)	13	Cross‐validation	0	ND	0.6 (0.54 to 0.66)^B	Cutoff (0.527) identified by convex hull method, sensitivity = 53.0 (SD 4.7), specificity = 62.4 (SD 2.5), PPV = 30.5 (SD 1.6), NPV = 81.1 (1.9)
Law 2019 DT	Disability (EDSS)	Classification tree	485 (115)	13	Cross‐validation	0	ND	0.62 (0.56 to 0.68)^B	Cutoff (0.537) identified by convex hull method, sensitivity = 58.3 (SD 4.6), specificity = 62.2 (SD 2.5), PPV = 32.4 (SD 2.0), NPV = 82.7 (SD 1.8)
Law 2019 RF	Disability (EDSS)	Random forest	485 (115)	13	Cross‐validation	0	ND	0.61 (0.55 to 0.67)^B	Cutoff (0.531) identified by convex hull method, sensitivity = 59.1 (SD 4.6), specificity = 61.1 (SD 2.5), PPV = 32.1 (SD 2.1), NPV = 82.8 (SD 1.7)
Lejeune 2021	Disability (EDSS)	Penalised regression	186 (53)	4	Bootstrap	1	ND	0.82 (0.73 to 0.91)	Cutoff = 0.5, PPV 0.73 (95% CI 0.53 to 0.92), NPV 0.70 (95% CI 0.50 to 0.88)
Lejeune 2021 Ext Val	Disability (EDSS)	NA	175 (55)	NA	External validation; multiple (location, spectrum)	NA	Calibration plot, Hosmer‐Lemeshow test	0.71 (0.62 to 0.80)	Cutoff = 0.5, PPV 0.83 (95% CI 0.76 to 0.92), NPV 0.74 (95% CI 0.67 to 0.81)
Malpas 2020	Disability (EDSS)	Bayesian model averaging	2403 (145)	8	Apparent	1	ND	0.8 (0.75 to 0.84)	Full model: cutoff = 0.05, sensitivity = 0.78, specificity = 0.71, PPV = 0.15, NPV = 0.98. Reduced model: cutoff = 0.06, sensitivity = 0.72, specificity = 0.73, PPV = 0.15, NPV = 0.98
Malpas 2020 Ext Val	Disability (EDSS)	NA	556 (34)	NA	External validation; location	NA	ND	0.75 (0.66 to 0.84)	Cutoff determined in development set (0.06), PPV = 0.15, NPV = 0.97
Mandrioli 2008	Disability (EDSS)	Logistic regression	64 (26)	2	Apparent	1	ND	ND	Error = 0.0937, sensitivity = 0.8846, specificity = 0.9211, PPV = 0.8846, NPV = 0.9211
Mandrioli 2008 Ext Val	Disability (EDSS)	NA	65 (20)	NA	External validation; time	NA	ND	ND	Error = 0.1231, sensitivity = 0.8000, specificity = 0.9111, PPV = 0.8000, NPV = 0.9111
Margaritella 2012	Disability (EDSS)	Other regression	58 (NA)	22	Apparent	0	Histogram of differences between measured and predicted values	NA	Percent predictions within ± 0.5 of observed = 0.72
Montolio 2021	Disability (EDSS)	Neural network	82 (37)	1	Cross‐validation	0	ND	0.82 (0.72 to 0.92)^B	Accuracy = 0.817, sensitivity = 0.811, specificity = 0.822, PPV = 0.789
Oprea 2020	Disability (EDSS)	Logistic regression	151 (ND)	13	Cross‐validation	0	ND	0.82 NA	Accuracy = 0.7662, sensitivity = 0.7775, PPV = 0.8145, F1 = 0.7806
Pinto 2020 severity 10 years	Disability (EDSS)	Support vector machine	67 (30)	< 1	Cross‐validation	0	ND	0.85 (0.75 to 0.95)^B	Sensitivity = 0.77 (0.13), specificity = 0.79 (0.09), F1 score = 0.72 (0.09), geometric mean = 0.78 (0.08)
Pinto 2020 severity 6 years	Disability (EDSS)	Support vector machine	145 (38)	< 1	Cross‐validation	0	ND	0.89 (0.83 to 0.95)^B	Sensitivity = 0.84 (0.11), specificity = 0.81 (0.05), F1 score = 0.53 (0.07), geometric mean = 0.82 (0.06)
Roca 2020	Disability (EDSS)	ML combination	1427 (NA)	22^A	Random split	0	Other: plot of MSE per EDSS category, MSE 2.21 (validation), 3 (test)	NA	NA
Rocca 2017	Disability (EDSS)	Other regression	49 (NA)	2	Cross‐validation	0	ND	NA	EDSS change precision within one point = 0.776
Rovaris 2006	Disability (EDSS)	Logistic regression	52 (35)	1	Cross‐validation	0	ND	ND	Accuracy = 0.808, sensitivity = 31/35 = 0.89, specificity = 11/17 = 0.65
Sombekke 2010	Disability (MSSS)	Logistic regression	605 (86)	1	Unclear	0	Hosmer‐Lemeshow test	0.78 (0.75 to 0.84)	Sensitivity = 0.37, specificity = 0.953, LR+ = 7.9
Szilasiova 2020	Disability (EDSS)	Logistic regression	85 (ND)	4	Apparent	0	ND	Unclear due to mismatch between ROC curve and reported statistics: 0.94 (95% CI 0.89 to 0.98)	Unclear because these points do not correspond to point on plot, sensitivity = 0.94, specificity = 0.89
Tommasin 2021	Disability (EDSS)	Random forest	163 (58)	4	Cross‐validation	0	ND	0.92 (0.88 to 0.96)^B	Accuracy = 0.92, sensitivity = 0.92, specificity = 0.91
Tousignant 2019	Disability (EDSS)	Neural network	1083 (103)	NA	Cross‐validation	0	ND	0.7 (0.64 to 0.76)^B	ND
Weinshenker 1991 M3	Disability (DSS)	Survival analysis	1060 (498)	38	None	1	ND	ND	ND
Weinshenker 1996 M3 Ext Val	Disability (DSS)	NA	259 (66)	NA	External validation; location	NA	ND	ND	ND
Weinshenker 1996 short‐term	Disability (EDSS)	Logistic regression	174 (28)	9	Apparent	0	ND	ND	Cutoff = 0.5: accuracy = 0.75, sensitivity = 0.21, specificity = 0.93, for 0.3 cutoff = 0.3: accuracy = 0.67, sensitivity = 0.54, specificity = 0.72
Yperman 2020	Disability (EDSS)	Random forest	2502 (275)	< 1	Cross‐validation	0	Calibration plot upon request	0.75 (0.71 to 0.79)^B	ND
Zhao 2020 LGBM All	Disability (EDSS)	Boosting	724 (165)	1	Cross‐validation	0	ND	0.78 (0.74 to 0.82)^B	Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.77, sensitivity = 0.58, specificity = 0.82
Zhao 2020 LGBM Common	Disability (EDSS)	Boosting	724 (165)	2	Cross‐validation	1 (unclear if refit)	ND	0.76 (0.72 to 0.8)^B	Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.64, sensitivity = 0.75, specificity = 0.61
Zhao 2020 LGBM Common Val	Disability (EDSS)	NA	400 (130)	NA	Validation; location	NA	ND	0.82 (0.78 to 0.86)^B	Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.73, sensitivity = 0.73, specificity = 0.73
Zhao 2020 XGB All	Disability (EDSS)	Boosting	724 (165)	1	Cross‐validation	0	ND	0.78 (0.74 to 0.82)^B	Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.74, sensitivity = 0.68, specificity = 0.76
Zhao 2020 XGB Common	Disability (EDSS)	Boosting	724 (165)	2	Cross‐validation	1 (unclear if refit)	ND	0.76 (0.72 to 0.8)^B	Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.65, sensitivity = 0.75, specificity = 0.62
Zhao 2020 XGB Common Val	Disability (EDSS)	NA	400 (130)	NA	Validation; location	NA	ND	0.82 (0.78 to 0.86)^B	Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.68, sensitivity = 0.85, specificity = 0.60
Gurevich 2009 FLP	Relapse	Support vector machine	94 (19)	< 1	Cross‐validation	1	ND	ND	Categories determined in data, error = 0.079
Gurevich 2009 FLP Ext Val	Relapse	NA	10 (ND)	NA	External validation; ND	NA	ND	ND	Error = 0.25 (but 10 patients reported)
Gurevich 2009 FTP	Relapse	Other regression	40 (NA)	< 1	Cross‐validation	0	Calibration plot	NA	Prediction more than 50 days from observed = 0.345
Sormani 2007	Relapse	Survival analysis	539 (270)	22	Apparent	1	ND	ND	ND
Sormani 2007 Ext Val	Relapse	NA	117 (ND)	NA	External validation; spectrum	NA	ND	ND	ND
Vukusic 2004	Relapse	Logistic regression	223 (63)	6	Apparent	0	ND	0.72 (0.64 to 0.8)^B	Cutoff = 0.5, accuracy = 160/223 = 0.72
Ye 2020 gene signature	Relapse	Penalised regression	94 (64)	< 1	Random split	0	ND	0.73 (0.61 to 0.85)^B	ND
Ye 2020 nomogram	Relapse	Survival analysis	94 (64)	< 1	Random split	0	ND	0.59 (0.47 to 0.71)^B	ND
Aghdam 2021	Conversion to definite MS	Classification tree	277 (117)	12	Random split	0	ND	ND	Accuracy = 0.74, sensitivity = 0.71, specificity = 0.76, PPV = 0.65, NPV = 0.79
Bendfeldt 2019 linear placebo	Conversion to definite MS	Support vector machine	69 (25)	NA	Cross‐validation	0	ND	ND	Accuracy = 0.712 (95% CI 0.707 to 0.716), sensitivity = 0.64, specificity = 0.783
Bendfeldt 2019 M7 placebo	Conversion to definite MS	Support vector machine	61 (22)	< 1	Cross‐validation	0	ND	ND	Balanced accuracy = 0.676 (95% CI 0.559 to 0.793)
Bendfeldt 2019 M9 IFN	Conversion to definite MS	Support vector machine	99 (49)	< 1	Cross‐validation	0	ND	ND	Balanced accuracy = 0.704 (95% CI 0.614 to 0.794)
Borras 2016	Conversion to definite MS	Logistic regression	49 (24)	1	Unclear	0	ND	0.79 (0.65 to 0.93)^B	Sensitivity = 0.84, specificity = 0.83
Gout 2011	Conversion to definite MS	Survival analysis	208 (141)	9	Apparent	0	ND	ND	ND
Martinelli 2017	Conversion to definite MS	Survival analysis	243 (108)	4	Apparent	0	Other: Gronnesby and Borgan statistic	0.7 (0.64 to 0.75)	Categories defined as low: 0% to 33.3%, moderate: 33.3% to 66.7%, high: 66.6% to 100%, net reclassification improvement = 0.3
Olesen 2019 candidate	Conversion to definite MS	Logistic regression	33 (16)	1	Bootstrap	0	Calibration plot, Hosmer‐Lemeshow test	0.89 (0.77 to 1.00)	ND
Olesen 2019 routine	Conversion to definite MS	Logistic regression	38 (16)	4	Bootstrap	0	Calibration plot, Hosmer‐Lemeshow test	0.86 (0.74 to 0.98)	ND
Runia 2014	Conversion to definite MS	Survival analysis	431 (109)	7	Bootstrap	0	ND	0.66 (0.6 to 0.72)^B	ND
Spelman 2017	Conversion to definite MS	Survival analysis	3296 (1953)	122	Bootstrap	0	Calibration plot	0.81 (0.79 to 0.83)^B	ND
Wottschel 2015 1 year	Conversion to definite MS	Support vector machine	74 (22)	2	Cross‐validation	0	ND	ND	Accuracy = 0.714 (95% CI 0.58 to 0.84), sensitivity = 0.77, specificity = 0.66, PPV = 0.70, NPV = 0.74
Wottschel 2015 3 years	Conversion to definite MS	Support vector machine	70 (31)	2	Cross‐validation	0	ND	ND	Accuracy = 0.68 (95% CI 0.61 to 0.73), sensitivity = 0.60, specificity = 0.76, PPV = 0.72, NPV = 0.65
Wottschel 2019	Conversion to definite MS	Support vector machine	400 (91)	< 1	Cross‐validation	0	ND	ND	2 to fold CV: accuracy = 0.648 (95% CI 0.646 to 0.651), sensitivity = 0.641, specificity = 0.656, also reported for 5 to, 10 to fold, and LOOCV
Yoo 2019	Conversion to definite MS	Neural network	140 (80)	7^A	Cross‐validation	0	ND	0.75 (0.67 to 0.83)^B	Accuracy = 0.75 (SD = 0.113), sensitivity = 0.787 (SD = 0.122), specificity = 0.704 (SD = 0.154)
Zakharov 2013	Conversion to definite MS	Logistic regression	102 (23)	12	Apparent	0	ND	ND	Sensitivity = 0.727, specificity = 0.345
Zhang 2019	Conversion to definite MS	Random forest	84 (66)	1	Cross‐validation	0	ND	ND	Accuracy = 0.85 (95% CI 0.75 to 0.91), sensitivity = 0.94 (95% CI 0.85 to 0.98), specificity = 0.50 (95% CI 0.26 to 0.74), PPV = 0.87 (95% CI 0.81 to 0.91), NPV = 0.69 (95% CI 0.44 to 0.87), DOR = 15.50 (95% CI 3.93 to 60.98), balanced Accuracy = 0.72 (posterior probability interval 0.60 to 0.82)
Bergamaschi 2001 BREMS	Conversion to progressive MS	Survival analysis	186 (34)	4	None	2, simplified: 2	ND	ND	ND
Bergamaschi 2007 BREMS Ext Val	Conversion to progressive MS	NA	535 (87)	NA	External validation; location	NA	ND	ND	Cutoff at 95th percentile (score ≥ 2.0): sensitivity = 0.17, specificity = 0.99, PPV = 0.86, NPV = 0.83, cutoff at 5th percentile (score ≤ 0.63): sensitivity = 0.08, specificity = 1.00, PPV = 1.00, NPV = 0.18 (the event is defined as having secondary progression for 95th percentile cutoff but not having secondary progression for other)
Bergamaschi 2015 BREMS Ext Val	Conversion to progressive MS	NA	1131 (ND)	NA	External validation; multiple (location, time)	NA	ND	ND	Sensitivity = 0.35, specificity = 0.80
Bergamaschi 2015 BREMSO SP Val	Conversion to progressive MS	NA	14,211 (1954)	NA	Validation; multiple (location, time); predictors dropped	NA	ND	ND	Sensitivity = 0.28, specificity = 0.76
Brichetto 2020	Conversion to progressive MS	ML combination	810 (1451)	10	Unclear	0	ND	ND	Accuracy FCA = 0.826, CCA = 0.860
Calabrese 2013	Conversion to progressive MS	Logistic regression	334 (66)	4	Cross‐validation	1	ND	ND	Accuracy = 0.928, sensitivity = 0.878, specificity = 0.94
Calabrese 2013 Ext Val	Conversion to progressive MS	NA	83 (19)	NA	External validation; time	NA	ND	ND	Accuracy = 0.916, sensitivity = 0.842, specificity = 0.937
Manouchehrinia 2019	Conversion to progressive MS	Survival analysis	8825 (1488)	74	Bootstrap	3	Calibration plot	0.84 (0.83 to 0.85)	ND
Manouchehrinia 2019 Ext Val 1	Conversion to progressive MS	NA	3967 (888)	NA	External validation; multiple (location, time, spectrum)	NA	ND	0.77 (0.76 to 0.78)	ND
Manouchehrinia 2019 Ext Val 2	Conversion to progressive MS	NA	175 (26)	NA	External validation; multiple (location, time, spectrum)	NA	ND	0.77 (0.70 to 0.85)	ND
Manouchehrinia 2019 Ext Val 3	Conversion to progressive MS	NA	2355 (126)	NA	External validation; multiple (location, time, spectrum)	NA	ND	0.87 (0.84 to 0.89)	ND
Misicka 2020 10 years	Conversion to progressive MS	Survival analysis	1166 (55)	2	Apparent	0	ND	ND	ND
Misicka 2020 20 years	Conversion to progressive MS	Survival analysis	1166 (128)	4	Apparent	0	ND	ND	ND
Misicka 2020 ever	Conversion to progressive MS	Survival analysis	1166 (177)	5	Apparent	0	ND	ND	ND
Pinto 2020 SP	Conversion to progressive MS	Support vector machine	187 (21)	< 1	Cross‐validation	0	ND	0.86 (0.78 to 0.94)^B	Sensitivity = 0.76 (0.14), specificity = 0.77 (0.05), F1 score = 0.20 (0.05), geometric mean = 0.76 (0.08)
Pisani 2021	Conversion to progressive MS	Random survival forest	262 (69)	5	Random split	0	ND	Reported for RF, not final model	Cutoff = 17.7, accuracy = 0.88 (95% CI 0.75 to 0.96), sensitivity = 0.92 (95% CI 0.70 to 1.00), specificity = 0.87 (95% CI 0.70 to 0.96), PPV = 0.75 (95% CI 0.48 to 0.93), NPV = 0.96 (95% CI 0.81 to 1.00) from evaluation of final tool using random split
Seccia 2020 180 days	Conversion to progressive MS	Neural network	1515 (207)	10	Random split	0	ND	ND	Cutoff = 0.5, accuracy = 0.98, sensitivity = 0.385, specificity = 0.988, PPV = 0.308
Seccia 2020 360 days	Conversion to progressive MS	Neural network	1449 (207)	10	Random split	0	ND	ND	Cutoff = 0.5, accuracy = 0.975, sensitivity = 0.50, specificity = 0.982, PPV = 0.295
Seccia 2020 720 days	Conversion to progressive MS	Neural network	1375 (207)	10	Random split	0	ND	ND	Cutoff = 0.5, accuracy = 0.98, sensitivity = 0.673, specificity = 0.985, PPV = 0.427
Skoog 2014	Conversion to progressive MS	Survival analysis	157 (118)	8	Apparent	1	O:E table	ND	ND
Skoog 2019 Ext Val	Conversion to progressive MS	NA	145 (54)	NA	External validation; multiple (location, time)	NA	Calibration plot, O:E table, O:E 0.599	ND	ND
Skoog 2019 Val	Conversion to progressive MS	NA	144 (100)	NA	Apparent validation in new publication, some participants from the development excluded	NA	Calibration plot, O:E table, O:E 0.829	ND	ND
Tacchella 2018 180 days	Conversion to progressive MS	Random forest	527 (65)	1	Cross‐validation	0	ND	0.71 (0.66 to 0.76)	ND
Tacchella 2018 360 days	Conversion to progressive MS	Random forest	527 (125)	3	Cross‐validation	0	ND	0.67 (0.62 to 0.71)	ND
Tacchella 2018 720 days	Conversion to progressive MS	Random forest	527 (211)	5	Cross‐validation	0	ND	0.68 (0.64 to 0.72)	ND
Vasconcelos 2020	Conversion to progressive MS	Survival analysis	287 (88)	11	apparent	1	Other: events per score level	ND	ND
Vasconcelos 2020 Ext Val	Conversion to progressive MS	NA	142 (31)	NA	External validation; time	NA	O:E table (unclear), Hosmer‐Lemeshow test	ND	ND
Ahuja 2021	Composite (relapse)	Penalised regression	1435 (ND)	1	none	1	ND	ND	ND
Ahuja 2021 Ext Val	Composite (relapse)	NA	186 (ND)	NA	External validation; spectrum	NA	Plots comparing observed and predicted relapse proportions stratified by disease duration and age separately	0.71 (0.69 to 0.71)	Sensitivity = 0.499, specificity = 0.719, PPV = 0.223, NPV = 0.900, F1 = 0.307
de Groot 2009 cognitive	Composite (cognitive tests)	Logistic regression	146 (44)	9	Bootstrap	0	Calibration plot, calibration slope 0.88	0.74 (0.65 to 0.83)	ND
Kosa 2022	Composite (EDSS, SNRS, T25FW, NDH‐9HPT)	Random forest	227 (NA)	< 1	Random split	0	Calibration plot	NA	NA
Pellegrini 2019	Composite (EDSS, T25FW, 9HPT, PASAT, VFT)	Survival analysis	1582 (434)	19	Bootstrap	0	Calibration slope 1 year: 1.10 (bootstrap = 1.08, SE 0.17), 2 years: 1.00 (bootstrap = 0.97, SE 0.15)	0.59 (0.57 to 0.61)^B	ND

Open in a new tab

9HPT: 9‐hole peg test AUC: area under the curve Ada: adaptive boosting BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset CI: confidence interval Dev: development DSS: Disability Status Scale DT: decision tree EDSS: Expanded Disability Status Scale EPV: events per variable Ext: external FLP: first level predictor FTP: fine tuning predictor IFN: interferon LGBM: light gradient boosting machine LOOCV: leave‐one‐out cross‐validation MS: multiple sclerosis MSE: mean squared error MSSS: multiple sclerosis severity score NA: not applicable ND: no data available NDH‐9HPT: non‐dominant hand 9‐hole peg test NPV: negative predictive value O:E: observed to expected ratio PASAT: Paced Auditory Serial Addition Test PPV: positive predictive value RF: random forest ROC: receiver operator characteristics SD: standard deviation SE: standard error SP: secondary progressive T25FW: timed 25‐foot walk Val: validation SE: standard error VFT: visual function test XGB: extreme gradient boosting

^AEvents per variable was computed using only tabular predictors, but non‐tabular predictors also considered. ^BConfidence interval was not reported, but computed based on reported information.

Table 7

6. Final model and presentation.

Model	Outcome	Definition	Timing	Predictors	Presentation
Agosta 2006	Disability (EDSS)	Clinical worsening confirmed after a 3‐month, relapse‐free period (EDSS increase (for EDSS baseline) ≥ 1.0 (< 6.0), ≥ 0.5 (≥ 6.0))	Follow‐up median: 8 years, mean: 7.7 years	Baseline GM histogram peak height, average lesion MTR percentage change after 12 months, follow‐up duration	Regression coefficients without the intercept or coefficient for 'adjustment for follow‐up duration'
Bejarano 2011	Disability (EDSS)	Change in EDSS	2 years	Age, worst central motor conduction time of both arms, worst central motor conduction time of both legs, at least 1 abnormal MEP, motor score of EDSS at baseline	ND
De Brouwer 2021	Disability (EDSS)	Disability progression confirmed at least 6 months later (EDSS increase (baseline EDSS) ≥ 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5))	2 years	Gender, age at onset, MS course at time t = 0 (RRMS, SPMS, PPMS, or CIS), disease duration at time t = 0, EDSS at t = 0, last used DMT at t = 0 (none, interferons, natalizumab, fingolimod, teriflunomide, dimethyl‐fumarate, glatiramer, alemtuzumab, rituximab, cladribine, ocrelizumab, other (contains stem cells therapy, siponimod and daclizumab)), EDSS trajectories	ND
de Groot 2009 dexterity	Disability (9HPT)	Impaired dexterity (abnormal score (mean – 1.96 SD, healthy Dutch reference population) for the 9HPT)	3 years	How well can you use your hands?, impairment of sensory tract, impairment of pyramidal tract, Impairment of cerebellar tract, T2‐weighted infratentorial lesion load	Score chart
de Groot 2009 walking	Disability (EDSS)	Inability to walk 500 m (EDSS ≥ 4)	3 years	How well can you walk? Impairment of cerebellar tract, number of lesions in spinal cord	Score chart
Kuceyeski 2018	Disability (cognitive ‐ SDMT)	Processing speed measured by Symbol Digits Modality Test	Mean (SD): 28.6 months (10.3 months)	Age, sex, disease duration, treatment duration, baseline SDMT, baseline EDSS, regional GM atrophy (86 regions), NEMO pairwise disconnection measures (610 considered), number of months between time points	ND
Law 2019 Ada	Disability (EDSS)	Confirmed disability progression sustained for 6 months: EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), ≥ 0.5 (≥ 6)	2 years	T25FW, 9HPT, PASAT, EDSS, disease duration, age, sex, T2LV, BPF	ND
Law 2019 DT	Disability (EDSS)	Confirmed disability progression sustained for 6 months: EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), ≥ 0.5 (≥ 6)	2 years	T25FW, 9HPT, PASAT, EDSS, disease duration, age, sex, T2LV, BPF	ND
Law 2019 RF	Disability (EDSS)	Confirmed disability progression sustained for 6 months: EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), ≥ 0.5 (≥ 6)	2 years	T25FW, 9HPT, PASAT, EDSS, disease duration, age, sex, T2LV, BPF	ND
Lejeune 2021	Disability (EDSS)	Residual disability after relapse (EDSS increase ≥ 1)	6 months	Increased EDSS during relapse, pre‐relapse EDSS at 0, age, proprioceptive ataxia, subjective sensory disorder, disease duration	Web app at https://shiny.idbc. fr/SMILE/
Malpas 2020	Disability (EDSS)	Aggressive MS (EDSS ≥ 6 reached within 10 years of symptom onset, sustained over ≥ 6 months, and sustained until end of follow‐up)	10 years, time from onset to aggressive disease mean (SD, range): 6.05 years (2.79 years, 0 years to 9.89 years)	Onset age, median EDSS in first year, pyramidal signs	Relative risk of aggressive disease by number of positive signs in simplified model (dichotomised based on individual optimal thresholds)
Mandrioli 2008	Disability (EDSS)	Severe MS (EDSS ≥ 4 by 10 years disease duration, EDSS progression confirmed in 2 consecutive examinations)	Follow‐up from onset mean (SD): BMS 16.03 years (0.92 years), SMS 13.62 years (0.80 years), time between unclear	CSF IgM OB presence, motor symptoms at onset, sensory symptoms at onset, time to second relapse in months	Full regression model
Margaritella 2012	Disability (EDSS)	EDSS	1 year after included mEPS and EDSS predictors	EDSS, mEPS, age at onset, gender, benign course, PP course	Full regression model
Montolio 2021	Disability (EDSS)	Worsening (EDSS increase ≥ 1)	10 years (8 years) from baseline (last predictors)	Disease duration, relapse in preceding year, EDSS, temporal RNFL thickness, superior RNFL thickness	List of selected predictors
Oprea 2020	Disability (EDSS)	Keeping EDSS score ≤ threshold (chosen threshold: 2.5)	ND	Gender, age at diagnosis, age, EDSS at onset, disease duration, number of treatments	ND
Pinto 2020 severity 10 years	Disability (EDSS)	Severe disease (EDSS > 3 based on the mean EDSS from all clinical visits in prediction horizon year)	10 years from baseline (with predictors at 5 years from baseline)	ND	ND
Pinto 2020 severity 6 years	Disability (EDSS)	Severe disease (EDSS > 3 based on the mean EDSS from all clinical visits in prediction horizon year)	6 years from baseline (with predictors at 2 years from baseline)	ND	ND
Roca 2020	Disability (EDSS)	EDSS	2 years from the initial imaging	Unstructured: FLAIR images, lesion masks from white matter hyperintensities segmentation from FLAIR images, structured: 60 tracts of interest from the ICBM‐DTI 81 white matter labels and sensorimotor tracts atlases in MNI space, whole‐brain lesion load, volume of the lateral ventricles, age, gender, 3D/2D nature of FLAIR sequence	ND
Rocca 2017	Disability (EDSS)	EDSS change from baseline confirmed after 3 months	15 years, median (IQR): 15.1 years (13.9 years to 15.4 years)	Baseline EDSS, 15‐month EDSS change, 15‐month new T1 hypointense lesions, percentage brain volume change, baseline grey matter mean diffusivity	Regression coefficients without the intercept
Rovaris 2006	Disability (EDSS)	Clinical worsening confirmed after 3 months (EDSS increase (for baseline EDSS) ≥ 1.0 (< 6.0), ≥ 0.5 (≥ 6.0))	Follow‐up median (range): 56.0 months (35 months to 63 months)	Baseline EDSS, grey matter mean diffusivity, follow‐up	Regression coefficients without intercept and follow‐up time
Sombekke 2010	Disability (MSSS)	MSSS ≥ 2.5	ND	Age at onset, male gender, progressive onset type, NOS2 level, PITPNCI level, IL2 level, CCL5 level, ILIRN level, PNMT level	Regression coefficients without intercept
Szilasiova 2020	Disability (EDSS)	EDSS ≥ 5.0	15 years	Sex, age, MS form, EDSS, MS duration, P300 latency (ms)	Full regression model
Tommasin 2021	Disability (EDSS)	Disability progression (EDSS increase (for baseline EDSS) ≥ 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5))	Follow‐up mean (SD, range): 3.93 years (0.95 years, 2 years to 6 years)	T2 lesion load, cerebellar volume, thalamic volume, fractional anisotropy of normal appearing WM	List of predictors (model selected)
Tousignant 2019	Disability (EDSS)	EDSS increase (for baseline EDSS) ≥ 1.5 (0), ≥ 1 (0.5 to 5.5), ≥ 0.5 (≥ 6) sustained for ≥ 12 weeks	1 year	MRI channels: volumes from T1‐weighted pre‐contrast, T1‐weighted post‐contrast, T2w, proton density‐weighted, FLAIR; T2 lesion masks; Gadolinium enhanced lesion masks	List of predictors (no selection)
Weinshenker 1991 M3	Disability (DSS)	Time to reach DSS 6 (EDSS 6.0 or 6.5)	Follow‐up for 12 years	Age at onset, seen at MS onset, motor (insidious), brainstem, cerebellar, cerebral, pyramidal	Full regression model
Weinshenker 1996 short‐term	Disability (EDSS)	Short‐term progression (change in EDSS)	Definition 1 year to 3 years, follow‐up summarised for 2 years	Duration, EDSS, progression index, predicted time to DSS 6 from model 1, follow‐up	Full regression model
Yperman 2020	Disability (EDSS)	Disability progression (EDSS increase (for baseline EDSS) ≥ 1.0 (≤ 5.5), ≥ 0.5 (> 5.5))	Baseline to outcome EDSS median (IQR): 1.98 years (1.84 years to 2.08 years) (similar for baseline MEP)	Selected predictors unclear, at least latencies, EDSS at T0, age	ND
Zhao 2020 LGBM All	Disability (EDSS)	Worsening: EDSS increase ≥ 1.5	5 years	Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 months, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time	ND
Zhao 2020 LGBM Common	Disability (EDSS)	Worsening: EDSS increase ≥ 1.5	5 years	Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time	ND
Zhao 2020 XGB All	Disability (EDSS)	Worsening: EDSS increase ≥ 1.5	5 years	Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time	ND
Zhao 2020 XGB Common	Disability (EDSS)	Worsening: EDSS increase ≥ 1.5	5 years	Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time	ND
Gurevich 2009 FLP	Relapse	Time from baseline gene expression analysis until next relapse (3 categories: < 500 days, 500 days to 1264 days, > 1264 days)	Unclear reporting	FLJ10201, PDCD2, IL24, MEFV, CA2, SLM1, CLCN4, SMARCA1, TRIM22, TGFB2	List of selected genes
Gurevich 2009 FTP	Relapse	Time from baseline gene expression analysis to next acute relapse (given this time < 500 days)	Unclear reporting	KIAA1043, LOC51145, PPFIA1, MGC8685, DNCH2, PCOLCE2, FPRL1, G3BP, RHBG	List of selected genes
Sormani 2007	Relapse	Time to first relapse (≥ 1 neurological symptoms (causing EDSS increase ≥ 0.5 or 1 grade in the score of 2 or more functional systems or 2 grades in 1 functional system) lasting at least 48 hours and preceded by a relatively stable or improving neurological state in the prior 30 days)	Follow‐up median (range): 14 months (0.4 months to 16 months), time to outcome from study entry mean (SD): 47 weeks (0.9 weeks)	Previous 2 years' relapses, number of enhancing lesions	Regression model formula with survival probability for 6 months and 1 year
Vukusic 2004	Relapse	Postpartum relapse	3 months after delivery	Number of relapses in pre‐pregnancy year, Number of relapses during pregnancy, MS duration	Full regression model
Ye 2020 gene signature	Relapse	Relapse‐free survival	Follow‐up mean (SD): 1.97 years (1.3 years)	FTH1, GBP2, MYL6, NCOA4, SRP9	Regression coefficients without baseline hazard
Ye 2020 nomogram	Relapse	Relapse‐free survival	Follow‐up mean (SD): 1.97 years (1.3 years)	Age, gender, disease type, DMT, risk score	Nomogram
Aghdam 2021	Conversion to definite MS	McDonald 2010	Follow‐up mean (SD): 5.30 years (2.94 years), no precise time point of measurement	Presence of plaque in MRI, history of previous optic neuritis attack, type of optic neuritis, gender	Decision tree
Bendfeldt 2019 linear placebo	Conversion to definite MS	Modified Poser diagnosis confirmed by a central committee (relapse with clinical evidence of at least one CNS lesion (distinct from lesion responsible for CIS presentation if monofocal) or EDSS increase ≥ 1.5 reaching total EDSS ≥ 2.5 and confirmed 3 months later)	2 years	Cortical grey matter segmentation masks, age, sex, scanner	ND
Bendfeldt 2019 M7 placebo	Conversion to definite MS	Modified Poser diagnosis confirmed by a central committee (relapse with clinical evidence of at least one CNS lesion (distinct from lesion responsible for CIS presentation if monofocal) or EDSS increase ≥ 1.5 reaching total EDSS ≥ 2.5 and confirmed 3 months later)	2 years	Age, sex, EDSS, GM volume ratio, whole brain summaries (total, mean, median, minimum, maximum, SD) of lesion volume, surface area, and mean breadth, unclear if 3 more imaging predictors	ND
Bendfeldt 2019 M9 IFN	Conversion to definite MS	Modified Poser diagnosis confirmed by a central committee (relapse with clinical evidence of at least one CNS lesion (distinct from lesion responsible for CIS presentation if monofocal) or EDSS increase ≥ 1.5 reaching total EDSS ≥ 2.5 and confirmed 3 months later)	2 years	Age, sex, EDSS, GM volume ratio, lesion count, whole brain summaries (total, mean, SD) of lesion volume, surface area, and mean breadth, Euler‐Poincare characteristic	ND
Borras 2016	Conversion to definite MS	Presence of IgG oligoclonal bands and an abnormal brain MRI at baseline (2, 3, or 4 Barkhof criteria)	Follow‐up median (SD): CIS 3.25 years (1.32 years), CDMS 4.08 years (2.48 years)	CH3L1, CNDP1	Heat maps
Gout 2011	Conversion to definite MS	Time to Poser 1983 diagnosis	Follow‐up median (range): 3.5 years (1.0 year to 12.7 years), time to outcome in those who experience it median (range) 16.6 months (1.1 months to 112.5 months)	Age (≤ 31 years), 3 to 4 positive MR Barkhof criteria, CSF white blood cell Count > 4 per cubic millimetre	Sum score: 1 if age at onset ≤ 31 years, 3 if 3 to 4 Barkhof Criteria present, 1 if > 4 white blood cells per cubic millimetre in CSF
Martinelli 2017	Conversion to definite MS	Time to Poser 1983 diagnosis	Follow‐up median (IQR): 7.3 years (3.5 years to 10.2 years)	2010 DIS criteria fulfilled, 2010 DIT criteria fulfilled, age, T1 lesions, CSF oligoclonal bands (steroid use in 4 weeks prior to study, DMT use during follow‐up)	List of selected predictors
Olesen 2019 candidate	Conversion to definite MS	McDonald 2010	Follow‐up median (range): 29.6 months (19 months to 41 months)	IL‐10, NF‐L, CXCL13	Nomogram
Olesen 2019 routine	Conversion to definite MS	McDonald 2010	Follow‐up median (range): 29.6 months (19 months to 41 months)	OCB, leukocytes, IgG index	Nomogram
Runia 2014	Conversion to definite MS	Time from start of first symptoms to CDMS (Poser 1983)	Unclear	DIS + DIT2010, corpus callosum lesions, oligoclonal bands, fatigue, abnormal MRI	Unweighted sum score from 0 to 5
Spelman 2017	Conversion to definite MS	Time to first relapse following CIS (Poser 1983)	Follow‐up median (IQR): 1.92 years (0.90 years to 3.71 years)	Sex, age, EDSS, first symptom location, T2 infratentorial lesions, T2 periventricular lesions, OCB in CSF	Nomogram for 1‐year outcomes (nomograms for 6‐month, 2, 3, 4, and 5‐year outcomes)
Wottschel 2015 1 year	Conversion to definite MS	Occurrence of a second clinical attack attributable to demyelination of more than 24 hours in duration and at least 4 weeks from the initial attack	1 year	Type of presentation, gender, lesion load	List of selected predictors and kernel degree
Wottschel 2015 3 year	Conversion to definite MS	Occurrence of a second clinical attack attributable to demyelination of more than 24 hours in duration and at least 4 weeks from the initial attack	3 years	Lesion count, average lesion PD intensity, average distance of lesions from the centre of the brain, shortest horizontal distance of a lesion from the vertical axis, age, EDSS at onset	List of selected predictors and kernel degree
Wottschel 2019	Conversion to definite MS	Occurrence of a second clinical episode	1 year	Type of CIS, WM lesion load ‐ whole brain, WM lesion load ‐ frontal, WM lesion load ‐ limbic, WM lesion load ‐ temporal, WM lesion load ‐ dGM, WM lesion load ‐ WM, GM ‐ cerebellum, GM ‐ thalamus, GM ‐ frontal operculum, GM ‐ middle cingulate gyrus, GM ‐ precentral gyrus medial segment, GM ‐ posterior cingulate gyrus, GM ‐ praecuneus, GM ‐ parietal operculum, GM ‐ post‐central gyrus, GM ‐ planum polare, GM ‐ subcallosal area, GM ‐ supplementary motor cortex, GM ‐ superior occipital gyrus, cortical thickness ‐ central operculum, cortical thickness ‐ cuneus, cortical thickness ‐ fusiform gyrus, cortical thickness ‐ inferior temporal gyrus, cortical thickness ‐ middle occipital gyrus, cortical thickness ‐ post‐central gyrus medial segment, cortical thickness ‐ occipital pole, cortical thickness ‐ opercular part of the inferior frontal gyrus, cortical thickness ‐ orbital part of the inferior frontal gyrus, cortical thickness ‐ planum temporale, cortical thickness ‐ superior occipital gyrus, volume ‐ whole brain, volume ‐ ventral diencephalon, volume ‐ middle temporal gyrus, volume ‐ supramarginal gyrus, volume ‐ limbic	List of selected predictors for peak accuracy when using 2‐fold CV
Yoo 2019	Conversion to definite MS	McDonald 2005	2 years	Unstructured: MRI mask images, structured: T2w lesion volume, brain parenchymal fraction, diffusely abnormal white matter, gender, initial CIS event cerebrum, initial CIS event optic nerve, initial CIS event cerebellum, initial CIS event brainstem, initial CIS event spinal cord, EDSS, CIS monofocal or multifocal type at onset	ND
Zakharov 2013	Conversion to definite MS	Development of CDMS (second attack)	Follow‐up for 8 years	Age at disease onset, size of the foci of demyelination	ND
Zhang 2019	Conversion to definite MS	Demonstration of dissemination in Time by clinical relapse or new MRI lesions	3 years	Total lesion number, total lesion volume, minimum, maximum, mean, SD for surface area, sphericity, surface‐volume‐ratio, and volume of individual lesions	ND
Bergamaschi 2001 BREMS	Conversion to progressive MS	Time to earliest date of observation of progressive worsening (EDSS increase ≥ 1) persistent for ≥ 6 months	Follow‐up mean (SD, range): 7.5 years (5.7 years, 3 years to 25 years)	Age at onset, female, sphincter onset, pure motor onset, motor‐sensory onset, sequelae after onset, number of involved FS at onset, number of sphincter plus motor relapses, EDSS ≥ 4 outside relapse	Regression model without baseline hazard or proneness to failure
Brichetto 2020	Conversion to progressive MS	ND	Unclear reporting	ABILHAND item 12, ABILHAND total, HADS item 7, HADS sub1, HADS sub2, HADS total, Life Satisfaction Index total, MFIS item 2, MFIS sub1, MFIS sub2, MFIS sub3, MFIS total, Overactive Bladder Questionnaire item 1, Overactive Bladder Questionnaire item 4, Overactive Bladder Questionnaire total, Functional Independence Measure item 10, Functional Independence Measure item 11, Functional Independence Measure item 12, Functional Independence Measure item 14, Functional Independence Measure sub3, Functional Independence Measure sub4, Functional Independence Measure sub5, Functional Independence Measure sub6, Functional Independence Measure total, Montreal Cognitive Assessment item 1, Montreal Cognitive Assessment item 9, Montreal Cognitive Assessment tot1, Montreal Cognitive Assessment tot2, PASAT, SDMT, years of education, height, weight	List of selected predictors
Calabrese 2013	Conversion to progressive MS	EDSS increase ≥ 1.0 EDSS not related to a relapse and confirmed at 6 months	Up to 5 years, time to outcome median (range): 52 months (29 months to 64 months)	Age, cortical lesion volume, cerebellar cortical volume	Full regression model
Manouchehrinia 2019	Conversion to progressive MS	Time to earliest recognised date of SPMS onset determined by neurologist at routine visit	Follow‐up mean (SD): 12.5 years (8.7 years)	Calendar year of birth, male sex, onset age, first‐recorded EDSS score, age at the first‐recorded EDSS score	Nomograms for calculating probabilities of 10, 15, and 20 year risk (web app at https://aliman.shinyapps.io/SPMSnom/)
Misicka 2020 10 years	Conversion to progressive MS	Time from participant‐reported age of RRMS onset to participant‐reported age of SPMS onset	Up to 10 years	Age of MS onset, male sex, time to second relapse, cancer, brainstem/bulbar, HLA‐A*02:01 0.60	Nomogram
Misicka 2020 20 years	Conversion to progressive MS	Time from participant‐reported age of RRMS onset to participant‐reported age of SPMS onset	Up to 20 years	Age of MS onset, male sex, time to second relapse, obesity, neurological disorders, HLA‐A*02:01 0.56	Nomogram
Misicka 2020 ever	Conversion to progressive MS	Time from participant‐reported age of RRMS onset to participant‐reported age of SPMS onset	ND	Age of MS onset, male sex, time to second relapse, neurological disorders, spasticity, HLA‐A*02:01	Nomogram
Pinto 2020 SP	Conversion to progressive MS	SPMS diagnosis by clinician	Unclear (with predictors at 2 years from baseline)	ND	ND
Pisani 2021	Conversion to progressive MS	Time to occurrence of continuous disability accumulation independent of relapses confirmed 12 months later (transitory plateaus in the progressive course were allowed)	Follow‐up mean (range): 9.55 years (6.8 years to 13.13 years)	At onset: cortical lesion number, age, EDSS, white matter lesion number; difference (between 0 years and 2 years): global cortical thickness, cerebellar cortical volume, new cortical lesion number	Combination of heat map value for 2 predictors plus other predictor values weighted by their minimal depth
Seccia 2020 180 days	Conversion to progressive MS	Assessed by treating clinician	180 days from the index visit	Longitudinal trajectories of: age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs	ND
Seccia 2020 360 days	Conversion to progressive MS	Assessed by treating clinician	360 days from the index visit	Longitudinal trajectories of: age at onset, gender, age at visit, EDSS, Number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs	ND
Seccia 2020 720 days	Conversion to progressive MS	Assessed by treating clinician	720 days from the index visit	Longitudinal trajectories of: age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs	ND
Skoog 2014	Conversion to progressive MS	Time from RRMS onset to retrospectively‐determined continuous progression for at least 1 year without remission	Time to outcome median (range): 11.5 years (0.7 years to 56.7 years)	Age, attack grade, time since last relapse (interaction with attack grade)	Web app at http://msprediction.com
Tacchella 2018 180 days	Conversion to progressive MS	Gradual worsening of RRMS course determined by change in EDSS independent of relapses over a period of at least 6 or 12 months	180 days after visit of interest	Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, T25FW, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score	ND
Tacchella 2018 360 days	Conversion to progressive MS	Gradual worsening of RRMS course determined by change in EDSS independent of relapses over a period of at least 6 or 12 months	360 days after visit of interest	Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, T25FW, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score	ND
Tacchella 2018 720 days	Conversion to progressive MS	Gradual worsening of RRMS course determined by change in EDSS independent of relapses over a period of at least 6 or 12 months	720 days after visit of interest	Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, T25FW, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score	ND
Vasconcelos 2020	Conversion to progressive MS	Time until confirmed progressive and sustained worsening for at least 6 months (irreversible EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), 0.5 point (> 5.5) independent of relapse)	Time to outcome mean (SD): 13.70 years (8.88 years)	Pyramidal and cerebellar impairment at onset of the disease, treatment before EDSS 3, age at disease onset, African descent, time between first and second relapses (unclear if the coefficient for 'recovery' is needed or the model fit without recovery is presented)	Unweighted sum score from 0 to 5 (unclear if based on refit minus 'recovery,' found to be insignificant at multivariable analysis)
Ahuja 2021	Composite (relapse)	Clinical/radiological relapse (radiological relapse: new T1‐enhancing lesion or new/enlarging T2‐FLAIR hyperintense lesion on brain, orbit, or spinal cord MRI)	1 year	Step 1 model: 111 features (12 Current Procedural Terminology (CPT) codes, 60 CUIs derived from free‐text, and 35 PheCodes from ICD data, age, sex, race, disease duration), step 2 model: age, disease duration, relapse history estimated by model 1	Regression coefficients without intercept
de Groot 2009 cognitive	Composite (cognitive ‐ see definition)	Cognitive impairments: score of mean – SD for 1 or more subtests of a cognitive screening test (subscales of consistent long‐term retrieval and long‐term storage of the selective reminding test, 10/36 spatial recall test, symbol digit modalities test, PASAT, and word list generation)	3 years	Age, gender, how well can you concentrate?, T2‐weighted supratentorial lesion load	Score chart
Kosa 2022	Composite (see definition)	MS‐DSS (model output based on measured CombiWISE (EDSS, SNRS, T25FW, NDH‐9HPT), therapy adjusted CombiWISE, COMRIS‐CTD (lesion/atrophy measures), time from disease onset to first therapy, difference between adjusted and unadjusted CombiWISE, age, and family history of MS)	Follow‐up mean: 4.3 years	SOMAmer ratios, age, sex	ND
Pellegrini 2019	Composite (EDSS, T25FW, 9HPT, PASAT, VFT)	Time to disability progression (EDSS increase (for EDSS baseline) ≥ 1 (≥ 1), 1.5 (< 1) or 20% worsening on either T25FW or 9HPT or PASAT or 10‐letter worsening on VFT) confirmed at 24 weeks	2 years	PASAT, SF‐36 physical component summary, visual function test	Regression coefficients without baseline hazard

Open in a new tab

2D: 2‐dimensional 3D: 3‐dimensional 9HPT: 9‐hole peg test ABILHAND: interview‐based assessment of a patient‐reported measure of the perceived difficulty in using their hand to perform manual activities Ada: adaptive boosting BMS: benign MS BPF: brain parenchymal fraction BREMS: Bayesian Risk Estimate for Multiple Sclerosis CDMS: conversion to clinically definite MS CLCN4: chloride voltage‐gated channel 4 CombiWISE: Combinatorial Weight‐adjusted Disability Score COMRIS‐CTD: Combinatorial MRI scale of CNS tissue destruction CPT: current procedural terminology CSF: cerebrospinal fluid CUIs: concept unique identifiers CXCL13: chemokine ligand 13 CV: cross‐validation dGM: deep grey matter DIS: dissemination in space DIT: dissemination in time DIT2010: dissemination in time according to McDonald 2010 criteria DMT: disease‐modifying therapy DSS: Disability Status Scale DT: decision tree EDSS: Expanded Disability Status Scale FLAIR: fluid‐attenuated inversion recovery FLJ10201: anti‐YEATS2 antibody FLP: first level predictor FTP: fine tuning predictor GD: gadolinium GM: grey matter HADS: Hospital Anxiety and Depression Scale HLA: human leukocyte antigen ICBM‐DTI: International Consortium of Brain Diffusion Tensor Imaging ICD: International Classification of Diseases IFN: interferon IgG: immunoglobulin G IL2: interleukin‐2 ILIRN: interleukin‐1 receptor antagonist IQR: interquartile range KIAA1043: a gene LGBM: light gradient boosting machine MEFV: Mediterranean fever gene MEP/mEPS: motor evoked potentials MFIS: Modified Fatigue Impact Scale MNI: Montreal Neurological Institute MR: magnetic resonance MRI: magnetic resonance imaging MS: multiple sclerosis MS‐DSS: MS disease severity scale MTR: magnetisation transfer imaging ND: no data available NDH‐9HRT: non‐dominant hand 9‐hole peg test NEMO: network modification tool NF‐L: neurofilament light chain NOS2: nitric oxide synthase 2 OB: oligoclonal bands OCB: oligoclonal bands PASAT: Paced Auditory Serial Addition Test PD: patient‐determined PDCD2: human programmed cell death protein 2 PITPNCI: phosphatidylinositol transfer protein PNMT: phenylethanolamine‐N‐methyltransferase gene PP: primary progressive PPMS: primary progressive MS RF: random forest RNRL: retinal nerve fibre layer RRMS: relapsing‐remitting multiple sclerosis SD: standard deviation SDMT: symbol digit modalities test SF‐36: 36‐Item Short Form Health Survey SLM: a gene SMARCA1: SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 1 SMS: severe multiple sclerosis SNRS: Scripps neurologic rating scale SOMAmer: short, single stranded deoxyoligonucleotides SP: secondary progression SPMS: secondary progressive MS T25FW: timed 25‐foot walk T2LV: T2 lesion volume T2w: T2‐weighted TGFB2: transforming growth factor beta 2 TRIM22: tripartite motif containing 22 VFT: visual function test WM: white matter XGB: extreme gradient boosting

Appendix 6. Additional figures

Characteristics of studies

Characteristics of included studies [ordered by study ID]

Aghdam 2021.

*Study characteristics*
General information	Model name Not applicable Primary source Journal Data source Cohort, secondary Study type Development
Participants	Inclusion criteria Optic neuritis patients with at least 2 of the following: pain with eye movements, decreased best corrected visual acuity, relative afferent pupillary defect, prolonged P100 latency of visual evoked potentials At least 3 years' follow‐up Exclusion criteria Optic neuropathies proven to be the result of other aetiologies such as compressive, infiltrative, toxic, hereditary, and metabolic Presence of any other ocular condition that could confound data gathering like diabetic retinopathy and retinal dystrophies Failure to follow up patients during the study period Any finding suggestive of the presence of underlying vasculitis or infectious cause for optic neuritis Patients who underwent any preemptive treatment such as with interferon after ON Missing data in medical records that could affect the result of this study NMOSD diagnosis during follow‐up Recruitment Patients admitted to the ophthalmology and neurology departments of Rassoul Akram Hospital, a tertiary referral centre in Tehran, Iran Age (years) Mean 40.0 (at ON) Sex (%F) 74.1 Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria McDonald 2010 (Polman 2011) Treatment At recruitment, 0% During follow‐up, not reported Disease description History of ON: 11.9% Recruitment period 2008 to 2018
Predictors	Considered predictors Age (as continuous and/or dichotomised), gender, season of attack (spring vs other), best corrected visual acuity (as continuous or dichotomised as logMAR ≤ or > 1), optic disc swelling (type of ON), ocular pain, ON history, plaque positive (white matter lesions ≥ 3 mm in diameter in juxtacortical, periventricular, infratentorial or spinal cord regions), dissemination in space (hyperintense T2 lesions in ≥ 2 of juxtacortical, periventricular, infratentorial or spinal cord regions), treatment with prednisolone Number of considered predictors ≥ 10 Timing of predictor measurement At presentation due to ON Predictor handling Unclear, all might be dichotomised and/or continuous
Outcome	Outcome definition Conversion to definite MS (Polman 2011): CDMS based on 2010 revised McDonald criteria Timing of outcome measurement Follow‐up mean (SD): 5.30 years (2.94 years), no precise time point of measurement
Missing data	Number of participants with any missing value 60 Missing data handling Exclusion
Analysis	Number of participants (number of events) 277 (117) Modelling method Classification tree Predictor selection method For inclusion in the multivariable model, univariable analysis During multivariable modelling, full model approach Hyperparameter tuning Not reported Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Random split: 70% training, 30% test Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 0.74, sensitivity = 0.71, specificity = 0.76, PPV = 0.65, NPV = 0.79 Overall performance Not reported Risk groups Not reported
Model	Model presentation Decision tree Number of predictors in the model 4 Predictors in the model Presence of plaque in MRI, history of previous optic neuritis attack, type of optic neuritis, gender Effect measure estimates Tree given Predictor influence measure Not reported Validation model update or adjustment Not applicable
Interpretation	Aim of the study To evaluate the predisposing factors of conversion to MS in the Iranian population with ON to organise a decision tree for predicting the probability of conversion to MS Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Planned study, survival, incorporating other factors such as CSF components or serum vitamin D level, adding visual outcomes, use McDonald 2017 (Thompson 2018b)
Notes	Applicability overall Low Applicability overall rationale Study authors confirmed that participants who had already experienced the outcome at baseline were not included in the development set.

Item	Authors' judgement	Support for judgement
Participants	Unclear	Participants were excluded for having missing data.
Predictors	Yes	The predictors are collected by fellows and are standard enough.
Outcome	No	Although some predictors could have been known while assessing parts of the outcome, we consider the outcome to be robust to such information. However, a specific time point for outcome assessment was not used and its variability is high. Also, dissemination in time and space are amongst the predictors, which form part of the outcome definition.
Analysis	No	The EPV was around 10 for the entire dataset. Predictors were dichotomised and selected prior to multivariable modelling. The differing outcome time was not addressed. Neither discrimination nor calibration was assessed. A random split was used for validation.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Agosta 2006.

*Study characteristics*
General information	Model name Not applicable Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria Definite MS for at least 2 years, either RR or SP, or CIS within 3 months with paraclinical evidence of spatial disease dissemination (at least 4 focal abnormalities on T2 scans) Participated in the short‐term follow‐up study by Rovaris (2003) No immunosuppressive or immunomodulating treatments for at least 12 months prior to entry No relapses or steroidal treatment during the 3 months preceding both baseline and follow‐up scans Exclusion criteria Not reported Recruitment All participants were recruited previously by Filippi (2000) and followed‐up at short term by Rovaris (2003), at the University Hospital San Raffaele in Milan, Italy (unclear, based on Ethics Committee approval) Age (years) Mean 33.5 Sex (%F) 69.9 Disease duration (years) Range: 0 to 25 Diagnosis 27.4% CIS, 46.6% RRMS, 26.0% SPMS Diagnostic criteria Mixed: Poser 1983, Lublin 1996 Treatment At recruitment, 0% During follow‐up, 54.8% no specific treatment details Disease description EDSS median (range): CIS 0.0 (0.0 to 1.5), RRMS 2.5 (1.0 to 5.5), SPMS 5.5 (3.5 to 6.5) Recruitment period Not reported
Predictors	Considered predictors Age, disease duration, clinical phenotype (CIS+RR vs SP), baseline EDSS, baseline T2 lesion volume, baseline T1 lesion volume, baseline brain parenchymal fraction, baseline grey matter fraction, baseline white matter fraction, baseline average whole‐brain magnetisation transfer ratio, baseline average grey matter magnetisation transfer ratio, baseline average normal‐appearing white matter magnetisation transfer ratio, baseline average lesion magnetisation ratio, baseline whole‐brain magnetisation transfer ratio histogram peak height, baseline grey matter histogram peak height, baseline normal‐appearing white matter histogram peak height, brain parenchymal fraction percentage change, grey matter fraction percentage change, white matter fraction percentage change, average whole‐brain magnetisation transfer ratio percentage change, average grey matter magnetisation transfer ratio percentage change, average normal‐appearing white matter magnetisation transfer ratio percentage change, average lesion magnetisation transfer ratio percentage change, (unclear adjustment for follow‐up duration) Number of considered predictors 26 Timing of predictor measurement At study baseline (cohort entry at least 2 years after diagnosis of definite MS or 3 months after CIS), 12 months (± 10 days) after baseline, at final follow‐up (outcome measurement) Predictor handling Continuously
Outcome	Outcome definition Disability (EDSS): clinically worsened defined as an EDSS score increase ≥ 1.0, when baseline EDSS was < 6.0, or an EDSS score increase ≥ 0.5 when baseline EDSS was ≥ 6.0; EDSS changes had to be confirmed by a second visit after a 3‐month, relapse‐free period Timing of outcome measurement Follow‐up median: 8 years, mean: 7.7 years
Missing data	Number of participants with any missing value 6, only missing outcome reported Missing data handling Mixed: last value carried forward, complete case
Analysis	Number of participants (number of events) 70 (44) Modelling method Logistic regression Predictor selection method Univariable analysis Mentions a final multivariable model but no selection at multivariable stage Hyperparameter tuning Not applicable Shrinkage of predictor weights No Performance evaluation dataset Development Performance evaluation method Cross‐validation: LOOCV Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 46/70, sensitivity = 30/41, specificity = 16/29 Overall performance Nagelkerke's R² = 0.28 Risk groups Not reported
Model	Model presentation Regression coefficients without the intercept or coefficient for "adjustment for follow‐up duration" Number of predictors in the model 2 or 3 (unclear if follow‐up duration included) Predictors in the model Baseline GM histogram peak height, average lesion MTR percentage change after 12 months, follow‐up duration Effect measure OR (95% CI): baseline GM histogram peak height 0.97 (0.94 to 0.99), average lesion MTR percentage change after 12 months 0.88 (0.80 to 0.98), follow‐up duration (not reported) Predictor influence Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To assess the value of MT MRI quantities and their short‐term changes in predicting the long‐term accumulation of disability in multiple sclerosis patients Primary aim The primary aim of this study is not the prediction of individual outcomes. Rather, the focus is on the usefulness of MRI measures. Model interpretation Exploratory Suggested improvements Not reported
Notes	Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of new MRI measures. Auxiliary references Filippi M, Inglese M, Rovaris M, Sormani MP, Horsfield P, Iannucci PG, et al. Magnetization transfer imaging to monitor the evolution of MS: a 1‐year follow‐up study. Neurology 2000;55(7):940‐6. Lechner‐Scott J, Kappos L, Hofman M, Polman CH, Ronner H, Montalban X, et al. Can the expanded disability status scale be assessed by telephone? Mult Scler 2003;9(2):154‐9. Rovaris M, Agosta F, Sormani MP, Inglese M, Martinelli V, Comi G, et al. Conventional and magnetisation transfer MRI predictors of clinical multiple sclerosis evolution: a medium‐term follow‐up study. Brain 2003;126(Pt 10):2323‐32.

Item	Authors' judgement	Support for judgement
Participants	Yes	Patients were included probably from a prospectively designed cohort study with clear eligibility criteria.
Predictors	Yes	Even though there is no clear indication of predictors being collected similarly across patients, the authors described when changes were made from the outcome, suggesting they would describe if there were different assessments of variables.
Outcome	Yes	Although EDSS was determined differently either in person or by phone, EDSS assessment by phone has been shown to be valid. It is unclear if the outcome assessment was blinded to predictors, but we consider EDSS to be an objective measure.
Analysis	No	The EPV was far less than 10. Predictors were included based on univariable analyses. Calibration and discrimination were not assessed. Although cross‐validation was used, it is unclear whether the variable selection process was included within this procedure. It is unclear if all participants were analysed and how missing data were handled. Follow‐up time was added as a predictor instead of using methods to deal with different observation times.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Ahuja 2021.

*Study characteristics*
General information	Model name Dev Ext Val Primary source Journal Data source Dev: mixed (routine care‐electronic health records, cohort), secondary Ext Val: routine care‐electronic health records, secondary Study type Development + external validation (spectrum)
Participants	Inclusion criteria Dev ≥ 18 years of age Neurologist‐confirmed MS diagnosis Linked EHR data Ext Val MS Neurological care at Mass General Brigham Annotated relapse events Exclusion criteria Dev: not reported Ext Val: not part of the training set Recruitment Dev: patients in the Comprehensive Longitudinal Investigation of Multiple Sclerosis at Brigham and Women's Hospital (CLIMB) cohort of Brigham Multiple Sclerosis Centre, USA Ext Val: random selection from electronic health records of the Mass General Brigham Healthcare system, USA Age (years) Median 43.3 (at first MS code) Sex (%F) Dev: 73.9 Ext Val: 74.2 Disease duration (years) Dev: median 5.1 (IQR: 2.03) Ext Val: median 4.4 (IQR: 2.82) Diagnosis Approximately 70% to 80% RRMS, 10% PPMS, 10% to 20% SPMS Diagnostic criteria Not reported Treatment At recruitment, not reported During follow‐up, unclear, for all patients in the study, 55% on treatment Disease description Not reported Recruitment period 2006 to 2016
Predictors	Considered predictors Dev: age, sex, race, disease duration, relapse history (RH), unclear number of EHR predictors (ICD consolidated into PheCode, CPT consolidated into groupings, unclear concepts retrieved from free‐text) Ext Val: not applicable Number of considered predictors Dev: 2730 Ext Val: not applicable Timing of predictor measurement Dev: from 1 year ago to the index encounter (unspecified) Ext Val: not applicable Predictor handling Dev: continuously, (log(n + 1) transformation of counts) Ext Val: not applicable
Outcome	Outcome definition Composite (relapse): a relapse event as a clinical and/or radiological relapse; clinical relapse: new or recurrence of neurological symptoms lasting persistently for ≥ 24 h without fever or infection; radiological relapse: either a new T1‐enhancing lesion and/or a new or enlarging T2‐FLAIR hyperintense lesion on brain, orbit, or spinal cord MRI on clinical radiology report. Timing of outcome measurement 1 year
Missing data	Number of participants with any missing value Not reported Missing data handling Not reported
Analysis	Number of participants (number of events) Dev: 1435 participants, unit of analysis is visits and its number is not reported (not reported) Ext Val: 186 participants, unit of analysis is visits and its number is not reported (not reported) Modelling method Dev: logistic regression, LASSO Ext Val: not applicable Predictor selection method Dev: For inclusion in the multivariable model, univariable analysis During multivariable modelling, mixed Several models Modelling method Ext Val: not applicable Hyperparameter tuning Dev: model step 1 only: 10‐fold CV for lambda to maximise spearman correlation between observed and predicted relapse count Ext Val: not applicable Shrinkage of predictor weights Dev: modelling method Ext Val: not applicable Performance evaluation dataset Dev: none Ext Val: external validation Performance evaluation method Not applicable Calibration estimate Dev: not reported Ext Val: plots comparing observed and predicted relapse proportions stratified by disease duration and age separately Discrimination estimate Dev: not reported Ext Val: c‐statistic = 0.707 (95% CI 0.69 to 0.71) Classification estimate Sensitivity = 0.499, specificity = 0.719, PPV = 0.223, NPV = 0.900, F1 = 0.307 Overall performance Not reported Risk groups Dev: not reported Ext Val: 2 groups defined by time‐dependent threshold equal to the observed prevalence within ± 1 year of a patients time since first relapse
Model	Model presentation Dev: regression coefficients without intercept Ext Val: not applicable Number of predictors in the model Dev: 114 (model 1: 111, model 2: 3) Ext Val: not applicable Predictors in the model Dev: Step 1 model: 111 features (12 Current Procedural Terminology (CPT) codes, 60 CUIs derived from free‐text, and 35 PheCodes from ICD data, age, sex, race, disease duration) Step 2 model: age, disease duration, relapse history estimated by model 1 Ext Val: not applicable Effect measure estimates Dev: Step 1 model: see Ahuja 2021 supplementary table 1 Step 2 model (log OR:): age −0.019216, disease duration −0.033818, estimated relapse history 2.924231 Ext Val: not applicable Predictor influence measure Dev: not reported Ext Val: not applicable Validation model update or adjustment Dev: not applicable Ext Val: none
Interpretation	Aim of the study To develop and test a clinically deployable model for predicting 1‐year relapse risk in MS patients Primary aim The primary aim of this study is the prediction of individual outcomes Model interpretation Probably exploratory Suggested improvements To incorporate MRI features
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	No	EHR include data collected for a different purpose; the data are collected without protocol and various users enter data in the records. Therefore, the data are extremely heterogeneous.
Predictors	Yes	Dev: The basic set of predictors includes age, sex, race, and disease duration and the others are consolidated EHR codes. The model seems to be applicable at any visit. Ext Val: The basic set of predictors includes age, sex, race, and disease duration, which should not be problematic. The other predictors are EHR codes, which are expected to be standardised. The model seems to be applicable at any time.
Outcome	Yes	Dev: We rated this domain for this analysis as having a low risk of bias. Although it is unclear if the outcome data are coming from the CLIMB dataset, it is expected to be from the CLIMB cohort due to the cumbersome work of deriving it in the EHR dataset and its reported unreliability. Ext Val: We rated this domain for this analysis as having a high risk of bias. Relapse history is presented as unreliable in EHR but the outcome itself is based on relapse. Also, the definition of the outcome contains a radiological component, without any specifications of measurement method or standardisation between the sites.
Analysis	No	Dev: The EPV was low. No information about missing data and how it was handled was given. The model was trained on correlated data (overlapping periods from patient visits to outcome assessment), which was addressed only by keeping all observations from a single patient together when creating folds. The uncertainty of the first model step was probably not carried through to the second step of the model. It was unclear if the initial univariable feature selection was performed in the total set or training only. Ext Val: The number of events was unclear, but the relapse rate was reported to be low and only 186 participants were included. No information about missing data and how it was handled was given. It was unclear whether the external validation set contained correlated, even overlapping data (visits from patients), and whether it was accounted for.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Bejarano 2011.

*Study characteristics*
General information	Model name Dev Val Primary source Journal Data source Cohort, primary Study type Development + validation (model refit), location
Participants	Inclusion criteria Short‐medium disease duration (< 10 years) Any disease subtype fulfilling MS criteria (criteria of spatio‐temporal dissemination fulfilled if CIS patients) No relapses in the month prior to inclusion Exclusion criteria Conditions that prevent patients from undergoing motor evoked potentials (MEP) or MRI studies EDSS > 7.0 Recruitment Dev: consecutive patients with MS at the University of Navarra, Spain Val: Hospital San Raffaele (Milan), Italy Age (years) Dev: mean 35.1 Val: mean 37.0 Sex (%F) Dev: 64.7 Val: 66.7 Disease duration (years) Dev: mean 5.9 (SD: 7.4) Val: mean 9 (SD: 6) Diagnosis Dev: 31.4% CIS, 51.0% RRMS, 5.9% SPMS, 7.8% PPMS, 3.9% PRMS Val: 88.5% RRMS, 11.5% SPMS Diagnostic criteria McDonald 2005 (Polman 2005) Treatment Dev: At recruitment, 54.9% on DMT During follow‐up, not reported Val: not reported Disease description Dev: EDSS median (range): 2.0 (0 to 6), number of relapses in previous 2 years mean (SD): 1.29 (1.51) Val: EDSS median (range): 1.5 (0 to 6.5) Recruitment period Not reported
Predictors	Considered predictors Dev: disease subtype, sex, age, EDSS at study entry, motor function score of EDSS (MF), Multiple Sclerosis Functional Composite (MSFC), motor scores of MSFC (TWT and NHPT), use of disease‐modifying therapies, total lesion volume on T1 (unclear T2), gadolinium‐enhancing T1, GM and WM volumes, central motor conduction time (CMCT), Motor Evoked Potential (MEP) score, aggregated MEP score, worst Z score from the 4 limbs, abnormal MEP Val: not applicable Number of considered predictors Dev: 22 or 23 (unclear transformation) Val: not applicable Timing of predictor measurement Dev: at study baseline (cohort entry) Val: not applicable Predictor handling Dev: MEP amplitude and latency dichotomised Val: not applicable
Outcome	Outcome definition Disability (EDSS): change in EDSS as numeric delta Timing of outcome measurement At 2 years
Missing data	Number of participants with any missing value Dev: ≥ 8 Val: not reported Missing data handling Dev: single imputation (by worst latency) of missing MEP data, last value carried forward for outcomes Val: not reported
Analysis	Number of participants (number of events) Dev: 51 (continuous outcome) Val: 96 (continuous outcome) Modelling method Dev: neural network, multilayer perception Val: not applicable Predictor selection method Dev: For inclusion in the multivariable model, all candidate predictors During multivariable modelling, stepwise selection Direction unclear, accuracy as criterion Val: not applicable Hyperparameter tuning Dev: early stopping (minimum MSE in validation set) mentioned Val: not applicable Shrinkage of predictor weights Dev: modelling method Val: not applicable Performance evaluation dataset Dev: development Val: external validation Performance evaluation method Dev: cross‐validation, 10 times 10‐fold Val: model refit to new data Calibration estimate Not reported Discrimination estimate Dev: unclear how c‐statistic produced for numeric outcome, c‐statistic = 0.76 (SD 0.25) Val: not applicable Classification estimate Dev: unclear how classification measures produced for numeric outcome, accuracy = 0.80 (SD 0.14), sensitivity = 0.92, specificity = 0.61, PPV = 0.80, NPV = 0.80 Val: unclear how classification measures produced for numeric outcome, accuracy = 0.81 Overall performance Not reported Risk groups Not reported
Model	Model presentation Dev: not reported Val: not applicable Number of predictors in the model Dev: 5 Val: not applicable Predictors in the model Dev: age, worst central motor conduction time of both arms, worst central motor conduction time of both legs, at least 1 abnormal MEP, motor score of EDSS at baseline Val: not applicable Effect measure estimates Dev: not reported Val: not applicable Predictor influence measure Dev: clinical predictors of the test MS cohort selected by the Wrapper algorithm Val: not applicable Validation model update or adjustment Dev: not applicable Val: model refit
Interpretation	Aim of the study To evaluate the usefulness of clinical, imaging and neurophysiological variables for predicting short‐term disease outcomes in MS patients Primary aim The primary aim of this study is the prediction of individual outcomes Model interpretation Exploratory Suggested improvements Incorporating GM atrophy or other new MRI metrics
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	Yes	A cohort was formed for prediction purposes, and inclusion/exclusion criteria seem appropriate.
Predictors	Yes	The predictors were collected in a prospective way at a single clinic.
Outcome	Yes	The outcome is considered objective, so risk of bias from knowledge of predictors is not expected. The outcome was conceptualised as a change in EDSS and treated as a score that can be subtracted although EDSS is an ordinal measure. Yet, the modelling method of neural networks can accommodate interactions amongst baseline predictors, including baseline EDSS.
Analysis	No	The number of participants was low. Calibration was not assessed.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Bendfeldt 2019.

*Study characteristics*
General information	Model name M7 placebo M9 IFN Linear placebo Primary source Journal Data source Randomised trial participants, secondary Study type Development
Participants	Inclusion criteria CIS patients with a monofocal or multifocal presentation of the disease, a first demyelinating event suggestive of MS and at least 2 clinically silent lesions on a T2‐weighted brain magnetic resonance (MRI) scan with a size of at least 3 mm at least 1 of which being ovoid, periventricular, or infratentorial Age between 18 years and 45 years Baseline EDSS 0 to 5 Exclusion criteria Patients with any disease other than MS explaining the signs and symptoms Previous episode that could possibly be attributed to an acute demyelinating event Complete transverse myelitis or bilateral optic neuritis Patients who received prior immunosuppressive therapy Recruitment M7 Placebo: Placebo arm participants in the BENFIT, a multicentre RCT with a total of 98 centres from 20 countries Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada M9 IFN: IFNb arm participants in the BENFIT study, a multicentre RCT with a total of 98 centres from 20 countries Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada Linear Placebo: Placebo arm participants in the BENFIT, a multicentre RCT with a total of 98 centres from 20 countries Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada Age (years) M7 placebo: mean 29.7 M9 IFN: mean 29.6 Linear placebo: mean 30.8 Sex (%F) M7 placebo: 70.5 M9 IFN: 65.7 Linear placebo: 71.0 Disease duration (years) Up to 0.16 Diagnosis 100% CIS Diagnostic criteria Own definition Treatment At recruitment, 0% During follow‐up: M7 placebo and linear placebo: 0% M9 IFN: 100% on IFN‐b Disease description M7 placebo: EDSS median (range): conv‐ 1.0 (0.0 to 2.0), conv+ 1.5 (1.0 to 2.0) M9 IFN: EDSS median (range): conv‐ 2.0 (1.0 to 2.0), conv+ 2.0 (1.0 to 2.5) Linear placebo: EDSS median (range): conv‐ 1.5 (1.0 to 2.0), conv+ 1.5 (1.0 to 2.0) Recruitment period 2002 to 2005
Predictors	Considered predictors M7 placebo, and M9 IFN: age, sex, EDSS, GM volume, GM volume ratio, GM lesion count, Euler‐Poincare characteristic, whole brain summaries (total, mean, standard deviation) of lesion volume, surface area, and mean breadth, GM volume ratio by ROI, ROI summaries (total, mean, standard deviation) of lesion volume, surface area, and mean breadth Linear placebo: cortical grey matter segmentation masks, age, sex, scanner Number of considered predictors M7 placebo and M9 IFN: 301 Linear placebo: unclear, total number of dimensions was determined by the number of voxels within the cortical GM mask, a kernel matrix was created from the images based on correlation, i.e. the similarity between each pair of participants Timing of predictor measurement At disease onset (CIS) (RCT baseline within 60 days after onset) Predictor handling M7 placebo and M9 IFN: continuously, transformation of volume (cubic root) and area (square root) Linear placebo: unclear, probably continuously
Outcome	Outcome definition Conversion to definite MS (modified Poser): CDMS diagnosis within 2 years and confirmed by a central committee based on modified Poser; modified Poser defined as a relapse with clinical evidence of at least 1 CNS lesion, and if the first presentation was monofocal distinct from the lesion responsible for the CIS presentation, or 2) sustained progression by ≥ 1.5 points on the EDSS reaching a total EDSS score of ≥ 2.5 and confirmed at a consecutive visit 3 months later Timing of outcome measurement Median (IQR) in days M7 placebo: conv‐: 1780 (857 to 1805) conv+: 249 (74 to 627) M9 IFN: conv‐: 1807 (1794 to 1820) conv+: 432 (111 to 824) Linear placebo: conv‐: 1798 (1497 to 1808) conv+: 303 (138 to 709)
Missing data	Number of participants with any missing value M7 placebo: 115, unclear exactly how many participants have any missing M9 IFN: 193, unclear exactly how many participants have any missing Linear placebo: 107, unclear exactly how many participants have any missing Missing data handling Not reported
Analysis	Number of participants (number of events) M7 placebo: 61 (22) M9 IFN: 99 (49) Linear placebo: 69 (25) Modelling method Support vector machine, radial kernel Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, multiple models Hyperparameter tuning M7 placebo and M9 IFN: soft margin parameter and RBF parameter chosen based on a grid search during k‐fold CV nested within 10‐fold CV for evaluation. Linear placebo: unclear, tuning parameters not specifically mentioned for linear SVM Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method M7 placebo, and M9 IFN: cross‐validation, nested 10‐fold CV Linear placebo: LOOCV within 500 balanced bootstrap samples Calibration estimate Not reported Discrimination estimate Not reported Classification estimate M7 placebo: balanced accuracy = 0.676 (95% CI 0.559 to 0.793) M9 IFN: balanced accuracy = 0.704 (95% CI 0.614 to 0.794) Linear placebo: accuracy = 0.712 (95% CI 0.707 to 0.716), sensitivity = 0.64, specificity = 0.783 Overall performance Not reported Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model M7 placebo: 25 (df unclear) M9 IFN: 15 (df unclear) Linear placebo: not reported Predictors in the model M7 placebo: age, sex, EDSS, GM volume ratio, whole brain summaries (total, mean, median, minimum, maximum, standard deviation) of lesion volume, surface area, and mean breadth M9 IFN: age, sex, EDSS, GM volume ratio, whole brain summaries (total, mean, standard deviation) of lesion volume, surface area, and mean breadth, Euler‐Poincare characteristic Linear placebo: cortical grey matter segmentation masks, age, sex, scanner Effect measure estimates Not reported Predictor influence measure M7 placebo and linear placebo: not reported M9 IFN: predictor weights Validation model update or adjustment Not applicable
Interpretation	Aim of the study To determine whether pattern classification using SVMs facilitates predicting conversion to clinically definite multiple sclerosis (CDMS) from clinically isolated syndrome (CIS) Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on identifying advanced MRI features capable of improving prediction. Model interpretation Exploratory Suggested improvements Feature selection methods; larger, independent test data; ensembles of classifiers; other para‐clinical markers (synthesis of oligoclonal bands and genetic factors); other lesional or degenerative MRI features
Notes	Applicability overall High Applicability overall rationale Although this study contained models, the main aim was not to create a model for prediction of individual outcomes but was rather to identify advanced imaging features capable of improving prediction. Auxiliary references Bakshi R, Dandamudi VS, Neema M, De C, Bermel RA. Measurement of brain and spinal cord atrophy by magnetic resonance imaging as a tool to monitor multiple sclerosis. J Neuroimaging 2005;15(4 Suppl):30s‐45s. Barkhof F, Polman CH, Radue EW, Kappos L, Freedman MS, Edan G, et al. Magnetic resonance imaging effects of interferon beta‐1b in the BENEFIT study: integrated 2‐year results. Arch Neurol 2007;64(9):1292‐8. Kappos L, Polman C H, Freedman MS, Edan G, Hartung HP, Miller DH, et al. Treatment with interferon beta‐1b delays conversion to clinically definite and McDonald MS in patients with clinically isolated syndromes. Neurology 2006;67(7):1242‐9.

Item	Authors' judgement	Support for judgement
Participants	Yes	The data source was RCT, expected to be high in quality and with a clear eligibility assessment. The number of participants reported in the Methods section and described in results did not match, yet no exclusion criteria were reported to explain the difference. Hence, this discrepancy was addressed in the analysis domain.
Predictors	Yes	The predictors were derived from an RCT, expected to have sufficient standardisation, and there is no reason to believe that the feature extraction/processing was different. Although some of the predictors were created after the outcome was assessed, the predictor creation is automated, and the risk of bias is considered to be low.
Outcome	Yes	Imaging was assessed at a central location, and the outcome should be robust to the knowledge of demographics and baseline EDSS. The outcome is not common, but it was probably pre‐specified.
Analysis	No	M7 placebo arm and M9 IFNb arm: The number of patients included in Table 2 was inconsistent with the number of trial participants and what was reported in the Introduction. There was no mention of missing data, making it likely that a complete‐case analysis was performed. The number of observations per variable was low. Accuracy was computed, but not discrimination or calibration. Standardisation was done on the full dataset instead of within the resampling structure. The performance measures were calculated at the same level as model selection would probably occur but not tuning parameter selection. The presentation of a final selected model is unclear. Linear SVM: The number of patients included in Table 2 was inconsistent with the number of trial participants and what was reported in the Introduction. There was no mention of missing data, making it likely that a complete‐case analysis was performed. The number of observations per variable was low. Accuracy was computed, but not discrimination or calibration. Standardisation was done on the full dataset instead of within the resampling structure. There was no mention of SVM tuning, so performance was probably evaluated in the same data as tuning. The presentation of a final selected model is unclear.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Bergamaschi 2001.

*Study characteristics*
General information	Model name BREMS Dev Primary source Journal Data source Mixed (registry, routine care), secondary Study type Development
Participants	Inclusion criteria Diagnosis of CDMS (Poser 1983) Initial RR course (Lublin 1996) Disease duration ≥ 3 years Prediagnosis interval (time between symptoms onset and first examination at Institute) ≤ 12 months Exclusion criteria Not reported Recruitment Patients at the Centre for Multiple Sclerosis of Fondazione C. Mondino (Pavia), the only facility for MS patients in the district, Italy Age (years) Mean 28.5 Sex (%F) 62.9 Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Mixed: Poser 1983, Lublin 1996 Treatment At recruitment, 5.4% copolymer, 3.7% betaIFN‐1b, 2.2% betaIFN‐1a During follow‐up, 5.4% copolymer, 3.7% betaIFN‐1b, 2.2% betaIFN‐1a Disease description Not reported Recruitment period Until 1997
Predictors	Considered predictors Unclear if it is the complete list: gender, age at onset, type of initially involved functional systems (FSs), number of initially involved FSs, whether initial relapse was followed by sequelae, interval between first 2 attacks, pre‐1 year relapse counts by type of involved FSs, maximum neurological score reached in each distinct FS, whether EDSS ≥ 4 during or outside of relapse in first year, (intermediate predictors: relapses in each neurological systems, FS‐specific impairment scores, EDSS evolution, use of preventive therapies) Number of considered predictors > 9 Timing of predictor measurement At disease onset (RRMS) and regular visits up to 1 year after onset (baseline) Predictor handling EDSS dichotomised
Outcome	Outcome definition Conversion to progressive MS: time of onset of secondary progressive phase, defined as the earliest date of observation of a progressive worsening, severe enough to determine an increase of at least 1 point on the EDSS; the worsening had to persist for at least 6 months after the onset of progression in order to be confirmed Timing of outcome measurement Follow‐up mean (SD, range): 7.5 years (5.7 years, 3 years to 25 years)/follow‐up visit frequency every 6 months on average, but the frequency depended on the course of the disease: patients with 'active' relapsing disease were followed every 3 months, patients with 'stable' relapsing disease every 6 to 12 months
Missing data	Number of participants with any missing value Not reported Missing data handling Not reported
Analysis	Number of participants (number of events) 186 (34) Modelling method Survival, Bayesian joint survival model using Monte Carlo particle filtering Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, Monte Carlo particle filtering Hyperparameter tuning Not applicable Shrinkage of predictor weights Modelling method Performance evaluation dataset None Performance evaluation method Not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups Not reported
Model	Model presentation Regression model without baseline hazard or proneness to failure Number of predictors in the model 9 Predictors in the model Age at onset, female, sphincter onset, pure motor onset, motor‐sensory onset, sequelae after onset, number of involved FS at onset, number of sphincters plus motor relapses, EDSS ≥ 4 outside relapse Effect measure estimates Local relative risks (95% credible interval): age at onset 1.05 (1.02 to 1.09), female 0.39 (0.17 to 0.78), sphincter onset 2.98 (1.10 to 6.10), pure motor onset 2.11 (0.90 to 4.20), motor‐sensory onset 2.4 (1.15 to 4.41), sequelae after onset 1.76 (1.04 to 2.88), number of involved FS at onset 1.39 (1.16 to 1.64), number of sphincter plus motor relapses 2.10 (1.56 to 2.89), EDSS ≥ 4 outside relapse 2.28 (0.40 to 6.50), to be understood as hazard ratios Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study With the aid of a Bayesian statistical model of the natural course of relapsing‐remitting MS, to identify short‐term clinical predictors of long‐term evolution of the disease, with particular focus on predicting onset of secondary progressive course failure event on the basis of patient information available at an early stage of disease. Primary aim The primary aim of this study is not the prediction of individual outcomes. Rather, the focus is on predictor identification. Model interpretation Probably confirmatory Suggested improvements Not reported
Notes	Applicability overall High Applicability overall rationale Although this study contained a model, the main aim was not to create a model for prediction of individual outcomes but was rather to search for predictors.

Item	Authors' judgement	Support for judgement
Participants	No	This study used routine care data, which may introduce risk of bias.
Predictors	Yes	Because the data were collected at clinical visits, the predictors were probably collected without knowledge about the outcome. A group of neurologists saw the participants at the same clinic, so the predictors were probably assessed in a similar way for all patients. The proposed model only used the early predictors, which were collected within 1 year of disease onset.
Outcome	No	The frequency of visits depended on the disease course and was different for patients with active relapsing disease, every 3 to 6 months, vs patients with stable relapsing disease, every 6 to 12 months, which causes differential assessment of the outcome in different patients.
Analysis	No	The EPV was low, and the model or its optimism were not evaluated in any way. Many other details, including the method of dealing with missing data and overfitting, were not reported. EDSS was dichotomised.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Bergamaschi 2007.

*Study characteristics*
General information	Model name BREMS Primary source Journal Data source Cohort, secondary Study type External validation (initial validation), location
Participants	Inclusion criteria Diagnosis of definite MS Initial RR course Disease duration ≥ 10 years Interval from clinical onset to the first neurological examination ≤ 1 year Exclusion criteria Not reported Recruitment MS centres in Pavia (Northern Italy), Florence (Central Italy), Bari (Southern Italy) Age (years) Median 24.8 Sex (%F) 69.3 Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Poser 1983 Treatment At recruitment, 2.7% on DMT During follow‐up, 52.9% on DMT and 1.4% on immunosuppressive, 2.7% since the beginning, 43% never treated Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Not applicable Number of considered predictors Not applicable Timing of predictor measurement Not applicable Predictor handling Not applicable
Outcome	Outcome definition Conversion to progressive MS: time at which the patient reached the confirmed SP, defined as the earliest date of observation of a progressive worsening, severe enough to lead to an increase of at least 1 point on the EDSS, and confirmed at least 1 year after progression Timing of outcome measurement Follow‐up mean (SD, range): 17.1 years (2.1 years, 10 years to 48 years), time to endpoint median (range): 10.5 years (2 years to 44 years)/follow‐up visit frequency every 6 months on average, but the frequency depended on the course of the disease: patients with 'active' relapsing disease were followed every 3 months, patients with 'stable' relapsing disease every 6 to 12 months.
Missing data	Number of participants with any missing value Not reported Missing data handling Not reported
Analysis	Number of participants (number of events) 535 (87 within 10 years) Modelling method Not applicable Predictor selection method Not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Not applicable Performance evaluation dataset External validation Performance evaluation method Not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Cutoff at 95th percentile (score ≥ 2.0): sensitivity = 0.17, specificity = 0.99, PPV = 0.86, NPV = 0.83 Cutoff at 5th percentile (score ≤ −0.63): sensitivity = 0.08, specificity = 1.00, PPV = 1.00, NPV = 0.18 The event is defined as having secondary progression for 95th percentile cutoff but not having secondary progression for the other Overall performance Not reported Risk groups Very high risk: 95th percentile (score ≥ 2.0), very low risk: 5th percentile (score ≤ −0.63)
Model	Model presentation Not applicable Number of predictors in the model Not applicable Predictors in the model Not applicable Effect measure estimates Not applicable Predictor influence measure Not applicable Validation model update or adjustment None
Interpretation	Aim of the study To test the trustworthiness of the Bayesian risk score on the basis of a new and larger sample of patients Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Confirmatory Suggested improvements Incorporation of additional clinical aspects of disease (cognitive impairment and fatigue), genetic, neuroimmunological, neuroradiological, and neurophysiological findings
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	Yes	This study probably used cohort data and the eligibility criteria were clear.
Predictors	Yes	Because the data were collected at clinical visits, the predictors were probably collected without knowledge about the outcome. A group of neurologists saw the participants at the same clinic, so the predictors were probably assessed in a similar way for all patients. The proposed model only used the early predictors, which were collected within 1 year of disease onset.
Outcome	No	The frequency of visits depended on the disease course and was different for patients with active relapsing disease, every 3 to 6 months, vs patients with stable relapsing disease, every 6 to 12 months, which causes differential assessment of the outcome in different patients.
Analysis	No	The subset on which performance measures were estimated contained fewer than 100 events. Only classification measures were addressed. Methods of dealing with missing data were not reported.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Bergamaschi 2015.

*Study characteristics*
General information	Model name BREMS Ext Val BREMSO SP Val BREMSO MSSS Val Primary source Journal Data source Registry, secondary Study type BREMS Ext Val: external validation, multiple (location, time) BREMSO SP Val: validation (predictors dropped), multiple (location, time) BREMSO MSSS Val: validation (predictors dropped and different outcome), multiple (location, time)
Participants	Inclusion criteria RRMS Disease duration ≥ 1 year Exclusion criteria PPMS Missing or incorrect data for variables used in BREMSO Recruitment MS centres participating in MSBase registry from 26 countries Italy, Canada, Australia, Spain, Netherlands, Argentina, Iran, Kuwait, Turkey, Denmark, Czech Republic, Portugal, French, Belgium, UK, Germany, Cuba, Israel, Hungary, USA, India, Mexico, Malta, Macedonia, Romania, Brazil Age (years) Mean 31.1 Sex (%F) 71.3 Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria McDonald 2001 Treatment Unclear timing, 72.2% on treatment Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors BREMS Ext Val: not applicable BREMSO SP Val and BREMSO MSSS Val: age at onset, gender, sphincter onset, pure motor onset, motor‐sensory onset, number of functional systems involved at onset, sequel after onset Number of considered predictors BREMS Ext Val: not applicable BREMSO SP Val and BREMSO MSSS Val: 7 Timing of predictor measurement BREMS Ext Val: not applicable BREMSO SP Val and BREMSO MSSS Val: at disease onset (RRMS) Predictor handling Continuously No interactions considered
Outcome	Outcome definition BREMS Ext Val and BREMSO SP Val: Conversion to progressive MS: secondary progression (SP) defined as when a worsening, severe enough to lead to an EDSS score increase of at least 1 point, was confirmed at least 1 year after its initial observation BREMSO MSSS Val: Disability (MSSS): mild MS defined as MSSS < first quartile and severe MS defined as MSSS ≥ third quartile, calculated for each patient at the last observation or at the last observation before the introduction of DMTs, if used; MS Severity Score (MSSS) is an algorithm that adjusts EDSS according to the corresponding disease duration Timing of outcome measurement Never‐treated patients from disease onset to the last observation mean (SD) 8 years (9.9 years), median (IQR, range) 3.7 years (0.9 years to 12 years, 1 year to 55 years); 1148 patients observed for ≥ 10 years Treated patients from disease onset to the start of treatment mean (SD) 5.7 years (6.6 years), median (IQR, range) 3.2 years (1 year to 8 years, 1 year to 52 years); 2021 patients observed for ≥ 10 years
Missing data	Number of participants with any missing value 2965 Missing data handling Exclusion
Analysis	Number of participants (number of events) BREMS Ext Val: 1131 (not reported) BREMSO SP Val 14,211 (1954) BREMSO MSSS Val: 14,211 (3567) Modelling method Not applicable Predictor selection method Not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Not applicable Performance evaluation dataset External validation Performance evaluation method Not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate BREMS Ext Val: sensitivity = 0.35, specificity = 0.80 BREMSO SP Val: sensitivity = 0.28, specificity = 0.76 BREMSO MSSS Val: sensitivity = 0.36, specificity = 0.79 Overall performance Not reported Risk groups Quartiles by risk score, the first (< −0.58) and third quartiles (> 0.52)
Model	Model presentation BREMS Ext Val: not applicable BREMSO SP Val and BREMSO MSSS Val: regression model without baseline hazard and a subset of fitted coefficients Number of predictors in the model BREMS Ext Val and BREMSO MSSS Val: not applicable BREMSO SP Val: 7 Predictors in the model Not applicable Effect measure estimates Not applicable Predictor influence measure Not applicable Validation model update or adjustment BREMS Ext Val: none BREMSO SP Val and BREMSO MSSS Val: predictors removed
Interpretation	Aim of the study To predict the natural course of MS using the Bayesian Risk Estimate for MS at Onset (BREMSO), which gives an individual risk score calculated from demographic and clinical variables collected at disease onset Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Confirmatory Suggested improvements Not reported
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	No	This study used registry data and there was a substantial amount of missing data that the exclusion was based on.
Predictors	Yes	The data were collected prospectively, and only early predictors that were collected within 1 year of disease onset were included in the model to be used early in the disease. It is a multicentre study but with well‐defined tools.
Outcome	Yes	BREMS Ext Val and BREMSO for SP: We rated this domain for these analyses as having an unclear risk of bias. Due to the data source, the outcome was probably assessed with knowledge of the predictors, but we consider the definition for conversion to secondary progressive MS based on EDSS to be a rather hard outcome. It is unclear if the frequency of visits at which the outcome was assessed differed from patient to patient, which is likely due to the nature of the data source. BREMSO MSSS: We rated this domain for this analysis as having a low risk of bias. Due to the data source, the outcome was probably assessed with knowledge of the predictors, but we consider EDSS to be a rather hard outcome.
Analysis	No	BREMS Ext Val: This was a validation study without any reported discrimination or calibration measures, especially those for censored data. Exclusion for missing data was handled in the Participants section. BREMSO for SP and BREMSO MSSS: This validation study did not assess discrimination or calibration. Variables were dropped from the developed model, and the coefficients for the rest of the predictors were used as if this did not occur.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Borras 2016.

*Study characteristics*
General information	Model name CH3L1 + CNDP1 Primary source Journal Data source Cohort, unclear Study type Development
Participants	Inclusion criteria Patients with CIS or OND Exclusion criteria Not reported Recruitment Hospital Ramon and Cajal in Madrid, Spain Age (years) Median 35.5 (unclear when) Sex (%F) 66.0 Disease duration (years) Median 0.22 (range: 0.01 to 0.35) Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Unclear timing, and no specific treatment details, 8% on treatment Disease description EDSS median (range): 1.5 (0 to 5) Recruitment period From 2001 onward
Predictors	Considered predictors CH3L1, CNDP1, CLUS, A1AG1, 2‐AACT, CNTN1, AACT, SEM7A, HPT, PGCB, 3‐AACT, OSTP, CMGA, SCG2, A2MG, A1AG1, TTHY Number of considered predictors Between 32 and 17 (discrepant lists) Timing of predictor measurement At disease onset (CIS) reported as 'first relapse' (4 to 126 days between CIS and Lumbar Puncture) Predictor handling Continuously (log2 transformed)
Outcome	Outcome definition Conversion to definite MS: conversion to CDMS defined as the presence of IgG oligoclonal bands and an abnormal brain MRI at baseline (2, 3, or 4 Barkhof criteria) Timing of outcome measurement Follow‐up median (SD): CIS 3.25 years (1.32 years), CDMS 4.08 years (2.48 years)
Missing data	Number of participants with any missing value 1, only missing predictor reported Missing data handling Single value imputation of predictors (a minimum estimated log2‐transformed abundance for a given protein across runs)
Analysis	Number of participants (number of events) 49 (24) Modelling method Logistic regression Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, iterative selection within CV folds followed by inclusion for that overall training sample if included in 2 of 4 of the CV training sets, final model based on most frequent combinations, AUC as criterion Hyperparameter tuning Not applicable Shrinkage of predictor weights No Performance evaluation dataset Development Performance evaluation method Unclear, point estimates for full dataset, plots also depict median performance and measure of uncertainty for subset of 500 repeats of training‐validation split Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.858, optimism‐corrected = 0.785 Classification estimate Sensitivity = 0.84, specificity = 0.83 Overall performance Not reported Risk groups Not reported
Model	Model presentation Heat maps Number of predictors in the model 2 Predictors in the model CH3L1, CNDP1 Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To establish a diagnostic molecular classifier with high sensitivity and specificity able to differentiate between clinically isolated syndrome patients with a high and a low risk of developing multiple sclerosis over time. To build a statistical model able to assign to each patient a precise probability of conversion to clinically defined multiple sclerosis. Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Not reported
Notes	Applicability overall High Applicability overall rationale The predictors used were proteins and no other predictor domain was considered for use in the model.

Item	Authors' judgement	Support for judgement
Participants	Unclear	No details were provided about the eligibility criteria other than the diagnostic subtype and the data source is not clearly reported.
Predictors	Yes	The predictors are relatively objective to assess, available at the intended time of prognostication. Although it is unclear when the predictor assessment was done relative to outcome data collection, there is nothing to indicate different assessments for participants.
Outcome	Unclear	No exact timing of the outcome assessment was specified. Some patients were followed up for short periods and others for years.
Analysis	No	The number of participants was much lower than necessary, and EPV was less than 10. Only discrimination was addressed, but not calibration. A bootstrap procedure was used, but the variability in AUC only accounted for training samples that chose those predictors. The time for which predictions were to be made was never addressed; therefore, participants had different follow‐up times, and this was not accounted for. It is unclear whether the weights of the predictors corresponded to a final selected model or not. Although not all patients were included in the analysis, only a single patient was excluded, which is less than 5%.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Brichetto 2020.

*Study characteristics*
General information	Model name Future course assignment Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria Unclear, as reported in auxiliary reference MS diagnosis in patients with RRMS or SPMS with a minimum of 1 time point Exclusion criteria Unclear, as reported in auxiliary reference Patients with progressive‐relapsing or primary progressive MS Recruitment Patients followed as outpatients or at‐home by Italian Multiple Sclerosis Society (AISM) Rehabilitation Centres of Genoa, Padua and Vicenza, Italy Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis Unclear, RRMS, SPMS Diagnostic criteria Not reported Treatment Not reported Disease description Not reported Recruitment period 2014 to 2017
Predictors	Considered predictors ABILHAND, Edinburgh Handedness Inventory, Hospital Anxiety and Depression Scale, Life Satisfaction Index, Modified Fatigue Impact Scale, Overactive Bladder Questionnaire, Functional Independence Measure, Montreal Cognitive Assessment, Paced Auditory Serial Addition Task, Symbol Digit Modalities Test, education (years), number of relapses in past 4 months, height, weight Number of considered predictors 143 Timing of predictor measurement Unclear, at multiple assessments every 4 months Predictor handling Unclear, probably continuously
Outcome	Outcome definition Conversion to progressive MS Timing of outcome measurement Unclear if next visit or 4 months
Missing data	Number of participants with any missing value Not reported Missing data handling Unclear, K‐nearest neighbour data imputing strategy
Analysis	Number of participants (number of events) ≤ 3398 evaluations of 810 participants (unclear how many are used in the FCA model, 1451 evaluations of RR, 1947 evaluations are SP) Modelling method Unspecified ML techniques/multitask elastic net (for prediction of disease descriptors) followed by gradient boosting (for classification based on predicted predictors) in auxiliary reference Predictor selection method For inclusion in the multivariable model, not reported During multivariable modelling, unclear Elastic net and recursive feature elimination during CCA model fitting mentioned in auxiliary reference Hyperparameter tuning Unclear, according to auxiliary reference, parameter tuning done using inner parameter optimisation via grid‐search in cross‐validation, modelling/tuning not reported in Brichetto 2020 Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Unclear, methods not reported, unclear how much to rely on auxiliary reference Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy FCA = 0.826, CCA = 0.860 Overall performance Not reported Risk groups Not reported
Model	Model presentation List of selected predictors Number of predictors in the model 33 Predictors in the model ABILHAND item 12, ABILHAND total, HADS item 7, HADS sub1, HADS sub2, HADS total, Life Satisfaction Index total, MFIS item 2, MFIS sub1, MFIS sub2, MFIS sub3, MFIS total, Overactive Bladder Questionnaire item 1, Overactive Bladder Questionnaire item 4, Overactive Bladder Questionnaire total, Functional Independence Measure item 10, Functional Independence Measure item 11, Functional Independence Measure item 12, Functional Independence Measure item 14, Functional Independence Measure sub3, Functional Independence Measure sub4, Functional Independence Measure sub5, Functional Independence Measure sub6, Functional Independence Measure total, Montreal Cognitive Assessment item 1, Montreal Cognitive Assessment item 9, Montreal Cognitive Assessment tot1, Montreal Cognitive Assessment tot2, PASAT, SDMT, years of education, height, weight Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable
Interpretation	Aim of the study To confirm the important role of applying ML to PROs and CAOs of people with relapsing‐remitting (RR) and secondary progressive (SP) form of multiple sclerosis (MS), to promptly identify information useful to predict disease progression Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on showing the relevancy of PRO and CAO to MS prediction. Model interpretation Exploratory Suggested improvements Including data on therapy and MRIs
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on outcomes and their assessment, applicability is unclear. Additionally, it is unclear whether some patients had already experienced the outcome at baseline. Auxiliary references Bebo BF Jr, Fox RJ, Lee K, Utz U, Thompson AJ. Landscape of MS patient cohorts and registries: recommendations for maximising impact. Mult Scler 2018;24(5):579‐86. Fiorini S, Verri A, Barla A, Tacchino A, Brichetto G. Temporal prediction of multiple sclerosis evolution from patient‐centred outcomes. In: Proceedings of the 2nd Machine Learning for Healthcare Conference; 2017 August 18‐19; Boston MA. Boston MA: Proceedings of Machine Learning Research, 2017.

Item	Authors' judgement	Support for judgement
Participants	Unclear	The data were prospectively collected from a cohort. Although references cited in the article have some study inclusion/exclusion criteria, the number of patients used in this article does not match them. Thus, the inclusion/exclusion criteria used in the article are unclear.
Predictors	No	There are patient‐reported outcomes that could be influenced by the current diagnoses conveyed to patients by clinicians. It is clear that at 1 stage (CCA) of the FCA modelling, patients with different diagnoses (RR, SP) were included. It is unclear whether assessors of clinical predictors at the different clinics have the same level of experience.
Outcome	Unclear	The outcomes were not clearly defined and their assessments were not described.
Analysis	No	EPV of the compound FCA model was at most 10.1. There was no mention of the complexities and uncertainties of two‐stage modelling or inclusion of different time points from the same patients in training/validation/test sets being taken into account. Neither calibration nor discrimination measures was reported. It was unclear how the missing data were handled. The method of internal validation was unclear. Model selection and evaluation did not appear to be properly separated.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Calabrese 2013.

*Study characteristics*
General information	Model name Dev Ext Val Primary source Journal Data source Cohort, primary Study type Development + external validation, time
Participants	Inclusion criteria Diagnosis of RRMS At least 5 years of disease duration Exclusion criteria Not reported Recruitment Consecutive patients receiving medical care at the outpatient rooms of the MS Centre of Veneto Region–First Neurological Clinic at University Hospital of Padua, Italy Age (years) Dev: mean 35.3 Ext Val: mean 34.5 Sex (%F) Dev: 66.7 Ext Val: 59.5 Disease duration (years) Dev: mean 11.3 (range: 5 to 23) Ext Val: mean 10.5 (range: 10 to 21) Diagnosis 100% RRMS Diagnostic criteria McDonald 2001 Treatment At recruitment, 100% on DMT During follow‐up, 100% on DMT Disease description Dev: EDSS median (range): 2.5 (0 to 4.5) Ext Val: not reported Recruitment period Dev: during 2006 Ext Val: during 2007
Predictors	Considered predictors Dev: age, age at onset, gender, initial symptoms (?), EDSS score, relapse rate, Modified Fatigue Impact Scale score, T2 white matter lesion volume (T2WMLV), T2 white matter lesion number, global cortical thickness (CTh), cerebellar cortical volume (CCV), cortical lesion (CL) volume, cortical lesion (CL) number, contrast‐enhancing lesion (CEL) number, spinal cord lesion (SCL) number, patients with spinal cord lesions (SCL) Ext Val: not applicable Number of considered predictors Dev: ≥ 16 (unclear predictor definition) Ext Val: not applicable Timing of predictor measurement Dev: at study baseline (cohort entry at least 5 years after disease onset) Ext Val: not applicable Predictor handling Dev: continuously Ext Val: not applicable
Outcome	Outcome definition Conversion to progressive MS: SPMS defined as an increase of at least 1.0 EDSS point compared to T0, not related to a relapse, observed at any time of the follow‐up and confirmed at 6 months; EDSS scored every 6 months and in case of a relapse Timing of outcome measurement Dev: up to 5 years (T5), time to outcome median (range): 52 months (29 months to 64 months) Ext Val: up to 5 years (T5), time to outcome median (range): 54 months (30 months to 62 months)
Missing data	Number of participants with any missing value Dev: 11, only missing outcome reported Ext Val: 1, only missing outcome reported Missing data handling Complete case
Analysis	Number of participants (number of events) Dev: 334 (66) Ext Val: 83 (19) Modelling method Dev: logistic regression Ext Val: not applicable Predictor selection method Dev: For inclusion in the multivariable model, all candidate predictors During multivariable modelling, significance P value < 0.05 Ext Val: not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Dev: none Ext Val: not applicable Performance evaluation dataset Dev: development Ext Val: external validation Performance evaluation method Dev: cross‐validation, LOOCV Ext Val: not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Dev: accuracy = 0.928, sensitivity = 0.878, specificity = 0.94 Ext Val: accuracy = 0.916, sensitivity = 0.842, specificity = 0.937 Overall performance Not reported Risk groups Dev: 3 risk groups, low, intermediate and high risk groups created by an unsupervised cluster analysis is said to be confirmed by the logit model but the cut points are unclear Ext Val: not reported
Model	Model presentation Dev: Full regression model Logistic curve (x‐axis: score, y‐axis: estimated progression probability) coloured by risk groups defined by hierarchical clustering Ext Val: not applicable Number of predictors in the model Dev: 3 Ext Val: not applicable Predictors in the model Dev: age, cortical lesion volume, cerebellar cortical Volume Ext Val: not applicable Effect measure estimates Dev: log OR (SE): intercept −131.3, age 0.13 (0.046), cortical lesion volume 0.0053 (0.0011), cerebellar cortical volume −0.0013 (0.0003) Ext Val: not applicable Predictor influence measure Not applicable Validation model update or adjustment Dev: not applicable Ext Val: none
Interpretation	Aim of the study A prospective 5‐year longitudinal study to assess demographic, clinical, and magnetic resonance imaging (MRI) parameters that could predict the changing clinical course of MS. Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on predictor identification. Model interpretation Probably exploratory Suggested improvements Confirmation in different MS populations and with a longer follow‐up
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	Yes	The data were reported to be from a cohort study with predefined data collection times.
Predictors	Yes	Because of the prospective nature of data collection, there is no reason to suspect assessment of predictors differently or with knowledge of outcome data. The predictors were collected by a small group of clinicians at a single centre, and only the variables at the time of baseline were used.
Outcome	Yes	The outcome is standard and well‐reported. The authors used a 6‐month confirmation period to ensure that the EDSS increase is stable, and found that the results were also stable at 12 months in all patients.
Analysis	No	Dev: The EPV was below 10 in the development. No calibration or discrimination measures were reported. The internal validation did not address the whole model selection procedure, but an external validation was done. However, the need for shrinkage was not addressed. A low percentage of patients were lost to follow‐up, and complete case analysis was done after the reason was reported, so we do not consider this a large source of possible bias. However, these patients could have been included if time‐to‐event data were used instead. Ext Val: The number of events was extremely low in the validation. No calibration or discrimination measures were reported. 1 participant was excluded due to a missing outcome, but this is not considered a large possible source of bias.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

De Brouwer 2021.

*Study characteristics*
General information	Model name GRU‐ODE‐Bayes Primary source Journal Data source Registry, secondary Study type Development
Participants	Inclusion criteria MS patients with at least 6 visits in the 3‐year observation period At least one EDSS measurement after the 2‐year EDSS Exclusion criteria Patients with missing or invalid diagnosis dates No EDSS value and with a date of visit before the onset date Patients with visits before 1990 or with onset date before 1990 Patients without at least 1 observation between T1 and T3 All EDSS measurements occurring less than 1 month after a relapse in the test period CIS patients Recruitment MSBase registry Age (years) Mean 32.2 (onset) Sex (%F) 71.1 Disease duration (years) Mean 6.9 (range: 3 to 25) Diagnosis 85.6% RRMS, 4.9% SPMS, 3.3% PPMS, 1.4% PRMS, 4.8% unknown Diagnostic criteria Lublin 1996 Treatment Not reported Disease description Prior 3‐year EDSS per patient mean (SD, range): 2.38 (1.48, 0 to 8.5) Recruitment period Not reported
Predictors	Considered predictors Gender, age at onset, MS course at time t = 0 (RRMS, SPMS, PPMS, or CIS), disease duration at time t = 0, EDSS at t = 0, Last used DMT at t = 0 (none, interferons, natalizumab, fingolimod, teriflunomide, dimethyl‐fumarate, glatiramer, alemtuzumab, rituximab, cladribine, ocrelizumab, other (contains stem cells therapy, siponimod and daclizumab)), RF only: EDSS closest to t‐3 (first EDSS in dataset), maximum EDSS within t‐3 to t0, difference between maximum and minimum EDSS between t‐3 and t0, number of visits between t‐3 and t0, number of relapses between t‐3 and t0, BPTF/NN: EDSS trajectories Number of considered predictors 24+EDSS trajectories Timing of predictor measurement At multiple visits, at least 6 in 3‐year period Predictor handling Continuously
Outcome	Outcome definition Disability (EDSS): disability progression after 2 years defined as a minimum increase in EDSS (baseline EDSS) of 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5); needed to be confirmed at least 6 months after Timing of outcome measurement Closest observation time to 2 years (t2, between t1 and t3) and confirmed with a measurement after t2(at least 6 months later), median (IQR) = 1.995 years (1.887 years to 2.112 years)
Missing data	Number of participants with any missing value ≥ 48,520, unclear exactly how many participants have any missing Missing data handling Exclusion
Analysis	Number of participants (number of events) 6882 (1114) Modelling method Neural network, continuous‐time gated recurrent unit variant of recurrent neural network Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, full model approach Hyperparameter tuning Validation set (separate from train and test) used for tuning parameter selection during 5‐fold CV optimising binary cross‐entropy Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 5‐fold (train/validation/test) Calibration estimate Calibration plot upon request Discrimination estimate c‐Statistic= 0.66 (SD = 0.02) Classification estimate Not reported Overall performance Not reported Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model 19+EDSS trajectories Predictors in the model Gender, age at onset, MS course at time t = 0 (RRMS, SPMS, PPMS, or CIS), disease duration at time t = 0, EDSS at t = 0, last used DMT at t = 0 (none, interferons, natalizumab, fingolimod, teriflunomide, dimethyl‐fumarate, glatiramer, alemtuzumab, rituximab, cladribine, ocrelizumab, other (contains stem cells therapy, siponimod and daclizumab)), EDSS trajectories Effect measure estimates Not reported Predictor influence measure Average AUC degradation after random shuffling of each predictor's values Validation model update or adjustment Not applicable
Interpretation	Aim of the study To predict disability progression on the EDSS using longitudinal clinical patient data. Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on utilising patient trajectories. Model interpretation Exploratory Suggested improvements Not reported
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.

Item	Authors' judgement	Support for judgement
Participants	No	The data source was a registry, and inclusion/exclusion criteria were based on predictor/outcome data availability.
Predictors	No	Predictors were probably collected prior to outcome assessment and were available when the model was used. Most predictors were basic enough that we are not concerned about them being assessed in different ways across patients. However, disease type was used as a predictor. This predictor was probably not measured in the same way across patients or across time, as the diagnostic criteria changed. Also, the category progressive‐relapsing was probably used heterogeneously.
Outcome	Yes	The outcome is standard and was assessed similarly across patients. It did not contain predictors. Although some predictors could have been known while assessing parts of the outcome, we consider the outcome to be robust to such information. The reported assessment time was 1 year to 3 years, but upon follow‐up with the author, it was stated that the IQR was 1.9 years to 2.1 years.
Analysis	Yes	Calibration was not explicitly assessed in the report, but the model was calibrated using Platt scaling, and a calibration plot was provided during correspondence. A final model/tool was not provided, but given the model reporting, there is no reason to believe the final model differs from the multivariable analysis.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

de Groot 2009.

*Study characteristics*
General information	Model name Walking Dexterity Cognitive Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria Aged 16 to 55 years Diagnosed with MS recently (< 6 months) Exclusion criteria Patients with other neurologic disorders Systemic diseases Malignant neoplastic diseases Recruitment Consecutive patients visiting the outpatient clinics of 5 neurology departments in Amsterdam and Rotterdam, Netherlands Age (years) Mean 37.4 Sex (%F) 63.7 Disease duration (years) Up to 6 months Diagnosis 82% relapse onset, 18% non‐relapse onset Diagnostic criteria Poser 1983 Treatment At recruitment, 6% on DMT During follow‐up, 30% on DMT Disease description EDSS median (IQR): 2.5 (2.0 to 3.0) Recruitment period 1998 to 2000
Predictors	Considered predictors Walking: Items of the Disability and Impact Profile: How well can you walk? (0 to 10), Are you easily tired? (0 to 10); Item of the Functional Systems of the EDSS: Impairment of pyramidal tract (0 to 6), Impairment of cerebellar tract (0 to 5); Number of lesions in spinal cord Dexterity: Items of the Disability and Impact Profile: How well can you use your hands? (0 to 10); Item of the Functional Systems of the EDSS: Impairment of sensory tract (0 to 6), Impairment of pyramidal tract (0 to 6), Impairment of cerebellar tract (0 to 5); T2‐weighted infratentorial lesion load Cognitive: age, gender; items of the Disability and Impact Profile: How good is your memory? (0 to 10), How well can you concentrate? (0 to 10); T2‐weighted supratentorial lesion load Number of considered predictors 5 Timing of predictor measurement At disease onset (definite MS) (study baseline within 6 months after diagnosis) Predictor handling Continuously No interaction considered
Outcome	Outcome definition Walking: disability (EDSS): inability to walk 500 m defined as an EDSS score of 4 or higher Dexterity: disability (9‐HPT): impaired dexterity defined as an abnormal score (mean – 1.96 SD, healthy Dutch reference population) for the 9‐HPT Cognitive: composite (Consistent Long Term Retrieval and Long Term Storage of the Selective Reminding Test, 10/36 Spatial Recall Test, SDMT, PASAT, Word List Generation): cognitive impairments defined as a score of mean – SD for 1 or more subtests of a cognitive screening test, which include the subscales Consistent Long Term Retrieval and Long Term Storage of the Selective Reminding Test measuring verbal learning and memory, the 10/36 Spatial Recall Test measuring visuospatial learning and delayed recall, the Symbol Digit Modalities Test measuring sustained attention and concentration, the Paced Auditory Serial Addition Test measuring sustained attention and information processing speed, and the Word List Generation measuring verbal fluency Timing of outcome measurement 3 years
Missing data	Number of participants with any missing value Walking: 25 Dexterity, and Cognitive: 23 Missing data handling Mixed: complete case for outcome, multiple imputation (twice) for predictors
Analysis	Number of participants (number of events) Walking: 146 (37) Dexterity: 146 (46) Cognitive: 146 (44) Modelling method Logistic regression Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, stepwise selection P value < 0.5 Backward Hyperparameter tuning Not applicable Shrinkage of predictor weights Uniform shrinkage Performance evaluation dataset Development Performance evaluation method Bootstrap, B = 250 Calibration estimate Walking: calibration plot, calibration slope = 0.93 Dexterity: calibration plot, calibration slope = 0.85 Cognitive: calibration plot, calibration slope = 0.88 Discrimination estimate Walking: c‐statistic = 0.89 (95% CI 0.83 to 0.95) Dexterity: c‐statistic = 0.77 (95% CI 0.69 to 0.86) Cognitive: c‐statistic = 0.74 (95% CI 0.65 to 0.83) Classification estimate Not reported Overall performance Not reported Risk groups 3 risk categories: high (probability of adverse outcome > 75%), moderate (probability of adverse outcome 25% to 75%), and low (probability of adverse outcome < 25%)
Model	Model presentation Score chart Shrunken coefficients without the intercept, risk groups Number of predictors in the model Walking: 3 Dexterity: 5 Cognitive: 4 Predictors in the model Walking: How well can you walk?, Impairment of cerebellar tract, Number of lesions in spinal cord Dexterity: How well can you use your hands?, Impairment of sensory tract, Impairment of pyramidal tract, Impairment of cerebellar tract, T2‐weighted infratentorial lesion load Cognitive: age, gender, How well can you concentrate?, T2‐weighted supratentorial lesion load Effect measure estimates Walking: shrunken log OR (P value): How well can you walk?: −0.57 (0.00), Impairment of cerebellar tract: 0.77 (0.00), Number of lesions in spinal cord: 0.16 (0.05) Dexterity: shrunken log OR (P value): How well can you use your hands? −0.16 (0.16), Impairment of sensory tract 0.27 (0.17), Impairment of pyramidal tract 0.25 (0.31), Impairment of cerebellar tract 0.46 (0.03), T2‐weighted infratentorial lesion load 0.97 (0.00) Cognitive: shrunken log OR (P value): age 0.03 (0.12), gender 0.88 (0.02), How well can you concentrate? −0.17 (0.07), T2‐weighted supratentorial lesion load 0.06 (0.00) Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To predict functioning after 3 years in patients with recently diagnosed multiple sclerosis (MS) Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements New cohort recruited in a different geographic area, at a different point in time, or, assessed with different diagnostic criteria
Notes	Applicability overall High Applicability overall rationale This study included participants who had already experienced the outcome at baseline.

Item	Authors' judgement	Support for judgement
Participants	No	Participants known to already have the outcome at baseline were included.
Predictors	Yes	Predictors available early in disease were used. There is no reason to believe the predictor assessments were made with knowledge of outcome, as the collection was prospective.
Outcome	Yes	It is unclear if the outcome was determined blinded to predictors or not, but the outcomes are relatively objective, which reduces the risk of bias.
Analysis	No	Even though the number of predictors was limited and shrinkage was used, the EPV was below or around 10. More than 5% of the participants were removed due to missing outcomes. The bootstrap procedure did not include the predictor selection step.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Gout 2011.

*Study characteristics*
General information	Model name Not applicable Primary source Journal Data source Registry, secondary Study type Development
Participants	Inclusion criteria Age < 51 years Patients with a first demyelinating event (CIS) leading to admission in the neurology department At least a 1‐year follow‐up Available results of both initial brain magnetic resonance imaging (MRI) and cerebrospinal fluid (CSF) cytology Exclusion criteria Other diagnoses Recruitment Consecutive patients admitted to the neurology department of the Foundation A de Rothschild in Paris, France Age (years) Median 31.0 Sex (%F) 70.2 Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria Not reported Treatment 0% Disease description EDSS median (range): 2 (0 to 6) Recruitment period 1994 to 2006
Predictors	Considered predictors Gender, age, family history, previous symptoms suggestive of CNS involvement, initial involvement (optic nerve (ref), spinal cord, brainstem/cerebellum, polyregional/cerebrum), Initial Expanded Disability Status Scale ≥ 2.5, ≥ 2 T2 lesions (MR), 3‐4 + Barkhof criteria, CSF white blood cell count > 4/mm³, IgG oligoclonal band, positive CSF (> 4 WBC/mm³ or IgG OB), ≥ 2 T2 lesions + IgG OB, McDonald DIS (3‐4 +BC or 2 T2 lesions +IgG OB) Number of considered predictors ≥ 15 (unclear how many interactions tested) Timing of predictor measurement At disease onset (CIS) leading to admission Predictor handling All dichotomised (the cutoff levels chosen to be clinically meaningful or to maximise the power) At least 1 interaction was considered
Outcome	Outcome definition Conversion to definite MS (Poser 1983): date of occurrence of a second demyelinating event defined as the occurrence of a symptom or symptoms of neurological dysfunction lasting more than 24 hours with objective confirmation at least 1 month after initial event, or the last follow‐up date in the case of patients remaining event‐free Timing of outcome measurement Follow‐up median (range): 3.5 years (1.0 year to 12.7 years), time to outcome in those who experience it median (range) 16.6 months (1.1 months to 112.5 months)
Missing data	Number of participants with any missing value 213 Missing data handling Exclusion
Analysis	Number of participants (number of events) 208 (141) Modelling method Survival, Cox Predictor selection method For inclusion in the multivariable model, univariable analysis During multivariable modelling, significance P value < 0.05 Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups 3 risk groups: low‐risk group (score = 0), intermediate‐risk group (0 < score < 5) and high‐risk group (score = 5) based on Kaplan Meier plots/estimates.
Model	Model presentation Sum score ‐ 1 if age at onset ≤ 31 years, 3 if 3‐4 BC present, 1 if > 4 WBC in CSF Regression coefficients, risk groups (and KM plots), KM plot of baseline hazard Number of predictors in the model 3 Predictors in the model Age (≤ 31 years), 3‐4+ MR Barkhof Criteria, CSF white blood cell count > 4/mm³ Effect measure estimates HR (95% CI): age ≤ 31 years 1.44 (1.02 to 2.01), 3‐4+ MR Barkhof criteria 2.07 (1.47 to 2.91), CSF white blood cell count > 4/mm³ 1.44 (1.03 to 2.02) Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To assess whether CSF analysis at the time of a first demyelinating event is a useful tool to predict CDMS. Specifically: first, to assess the predictive value of CSF analysis independently of the other known prognostic factors, and, second, to provide a simple classification for predicting CDMS based on a multivariate Cox model. Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the CSF analysis. Model interpretation Exploratory Suggested improvements Validation in another cohort
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	No	The data were used for the second time, with study inclusion based on availability of data, availability of both MRI and CSF measures.
Predictors	Yes	The predictor assessments were probably performed before the outcome due to the prospective nature of data collection, and all predictors are expected to be collected at the onset of the disease, which is the time of intended use. It is a single‐centre study, so predictor collection and assessment should be similar in all patients.
Outcome	Yes	Although the blinding of outcome assessment was not reported, the outcome definition based on new symptoms is relatively objective. The new event had to be a month apart from the first event to ensure they were separate events.
Analysis	No	The EPV was less than 20. Predictors were selected based on univariable analysis. The predictors were dichotomised, sometimes using clinically meaningful cutoffs, sometimes at sample median. No model performance measures were reported other than cumulative incidence plots per risk group in the development set. The multivariable model coefficients were rounded to simplify it into a score, but the steps are clear and reproducible. There was no assessment of the model before simplifying it. There was no examination of the need for shrinkage.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Gurevich 2009.

*Study characteristics*
General information	Model name FLP Dev FTP FLP Ext Val Primary source Journal Data source Unclear Study type FLP Dev and FLP Ext Val: development + external validation FTP: development
Participants	Inclusion criteria FLP Dev, and FTP: Diagnosed with definite MS or CIS Free of steroids and immunomodulatory treatments for at least 30 days before blood withdrawal At least 1 year after treatment with cyclophosphamide FLP Ext Val: unclear Exclusion criteria Patients with neuromyelitis optica (NMO) according to the criteria of Wingerchuk Recruitment Unclear, Sheba Medical Centre, Israel Age (years) FLP Dev and FTP: mean, 36.3 (onset) FLP Ext Val: unclear Sex (%F) FLP Dev, and FTP: 63.8 FLP Ext Val: not reported Disease duration (years) FLP Dev, and FTP: mean, 5.67 (pooled SD 0.89) FLP Ext Val: unclear Diagnosis FLP Dev, and FTP: 34.0% CIS, 66.0% CDMS FLP Ext Val: unclear, CIS 60%, CDMS 40% Diagnostic criteria McDonald 2001 Treatment At recruitment, 0% During follow‐up: FLP Dev, and FTP: 5.3% on Interferon β‐1a avonex, 2.1% on interferon β‐1b betaferon, 10.6% on interferon β‐1a rebif, 10.6% on glatiramer acetate copaxone, 6.4% on intravenous immunoglobulins Iv‐Ig) FLP Ext Val: unclear, 9 on IMD Disease description FLP Dev, and FTP: EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1) FLP Ext Val: unclear, published inconsistencies, EDSS (unclear if mean and SD): CIS 2.58 (0.15) CDMS 5.3 (2.39), annualised relapse rate (unclear if mean and SD): CIS 6.1 (2.05) CDMS 1 (0.51) Recruitment period Not reported
Predictors	Considered predictors FLP Dev, and FTP: PBMC RNA microarray analysis set of 22,215 gene‐transcripts averaged to 10,594 potential features (genes and annotated sequences), age, MS stage (CIS or definite), gender, annual relapse rate, EDSS at time of blood sampling, disease duration, age at onset, EDSS change in the last relapse FLP Ext Val: not applicable Number of considered predictors FLP Dev, and FTP: 10,602 FLP Ext Val: not applicable Timing of predictor measurement FLP Dev, and FTP: at study baseline (cohort entry) FLP Ext Val: not applicable Predictor handling FLP Dev, and FTP: unclear, probably continuously FLP Ext Val: not applicable
Outcome	Outcome definition FLP Dev, and FLP Ext Val: Relapse: time until next relapse broken down into 3 categories as less than 500 days, between 500 days and 1264 days, and more than 1264 days; relapse defined as the onset of new objective neurological symptoms/signs or worsening of existing neurological disability not accompanied by metabolic changes, fever or other signs of infection, and lasting for a period of at least 48 hours accompanied by objective change of at least 0.5 in the EDSS score FTP: Relapse: time from baseline gene expression analysis to next acute relapse, defined as the onset of new objective neurological symptoms/signs or worsening of existing neurological disability, not accompanied by metabolic changes, fever or other signs of infection, and lasting for a period of at least 48 hours accompanied by objective change of at least 0.5 in the EDSS score Timing of outcome measurement FLP Dev and FTP: unclear because of conflicting numbers when reported separately for CIS and CDMS, up to 1264 days (40 patients had the outcome in less than 500 days, 23 patients had the outcome in 500 days to 1264 days, 31 patients did not have the outcome until 1264 days) FLP Ext Val: not reported
Missing data	Number of participants with any missing value FLP Dev: 6 FTP: ≤ 6, unclear exactly how many belonged to this subset FLP Ext Val: not reported Missing data handling FLP Dev: complete case, participants with poor‐quality microarray results FTP: complete case, unclear, participants with poor‐quality microarray results FLP Ext Val: not reported
Analysis	Number of participants (number of events) FLP Dev: published inconsistencies, probably 94 but 79 in CIS/CDMS numbers (unclear, between 19 and 23) FTP: published inconsistencies, probably 40 but 39 in CIS/CDMS numbers (continuous outcome) FLP Ext Val: published inconsistencies, 10 or 12 (not reported) Modelling method FLP Dev: support vector machine, multiclass classification FTP: linear regression FLP Ext Val: not applicable Predictor selection method FLP Dev, and FTP: For inclusion in the multivariable model, all candidate predictors During multivariable modelling, stepwise selection First, select predictor with lowest leave 20% out cross‐validation error and all predictors with error not statistically higher (P value). Then, iteratively add more until error rate is significantly worse. Final selection unclear FLP Ext Val: not applicable Hyperparameter tuning FLP Dev: not reported FTP and FLP Ext Val: not applicable Shrinkage of predictor weights FLP Dev: modelling method FTP: none FLP Ext Val: not applicable Performance evaluation dataset FLP Dev, and FTP: development FLP Ext Val: external validation Performance evaluation method FLP Dev, and FTP: cross‐validation, repeated leave 20% out CV FLP Ext Val: not applicable Calibration estimate FLP Dev, and FLP Ext Val: not reported FTP: calibration plot Discrimination estimate FLP Dev, and FLP Ext Val: not reported FTP: not applicable Classification estimate FLP Dev: categories determined in data, error = 0.079 FTP: prediction more than 50 days from observed = 0.345 FLP Ext Val: error = 0.25 (but 10 patients reported) Overall performance Not reported Risk groups Not reported
Model	Model presentation FLP Dev and FTP: list of selected genes FLP Ext Val: not applicable Number of predictors in the model FLP Dev: 10 (df unclear) FTP: 9 (df unclear) FLP Ext Val: not applicable Predictors in the model FLP Dev: FLJ10201, PDCD2, IL24, MEFV, CA2, SLM1, CLCN4, SMARCA1, TRIM22, TGFB2 FTP: KIAA1043, LOC51145, PPFIA1, MGC8685, DNCH2, PCOLCE2, FPRL1, G3BP, RHBG FLP Ext Val: not applicable Effect measure estimates FLP Dev and FTP: not reported FLP Ext Val: not applicable Predictor influence measure FLP Dev: not reported FTP and FLP Ext Val: not applicable Validation model update or adjustment FLP Dev: not applicable FLP Ext. Val: none
Interpretation	Aim of the study To determine if subsets of genes can predict the time to the next acute relapse in patients with MS Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the use of genetic information. Model interpretation Probably exploratory Suggested improvements To find sets of predictive genes that give significant results when their gene expression is measured by cheaper, small‐scale, technologies such as kinetic RT‐PCR, to predict radiological MRI lesions (that are possibly clinically silent) from gene expression in PBMC
Notes	Applicability overall Unclear Applicability overall rationale FLP Dev: Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. FTP: Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. Additionally, the model was fit such that the outcome must be known in order to decide whether to use the model because participants known to have the shortest follow‐up time were used, as opposed to those predicted to have short follow‐up time based on the FLP model. Furthermore, it is unclear to whom the model applies. FLP Ext Val: Due to the lack of reporting on participants, applicability is unclear.

Item	Authors' judgement	Support for judgement
Participants	Unclear	FLP Dev and FTP: The data source was not clearly reported. 100 patients were sampled from a larger population of unclear source. Although 6 of the samples were dropped due to QC issues, missingness is expected to be at random. FLP Ext Val: The recruitment of this additional cohort was not described at all.
Predictors	Yes	Although microarray analysis of transcriptome can be affected by different batch effects, there were efforts to circumvent it. Even though the microarray analysis might have occurred after the outcomes became known, the procedure was relatively automated and is expected not to be affected by the information. The intended time of model use concerning patient's disease history is unclear, but it may be any time that blood was drawn.
Outcome	Yes	FLP Dev: We rated this domain for this analysis as having a high risk of bias. This outcome was defined after seeing the outcome information, so it is not standard. FTP: We rated this domain for this analysis as having a low risk of bias. It is unclear which predictors were known at outcome assessment, but we consider the relapse definition to be robust. FLP Ext Val: We rated this domain for this analysis as having a high risk of bias. This outcome was defined after seeing the outcome information in the development set, so it is not standard.
Analysis	No	FLP Dev and FTP: The authors chose to use multi‐class modelling for a time‐to‐event analysis, unnecessarily categorising the outcome. The outcome groups were based on the development data distribution. The number of observations and the number of events were low. Calibration and discrimination were not addressed. Patients were excluded for poor quality in transcription data. It seems that there was no difference in the data used for predictor selection and model evaluation. The final model is unclear. FLP Ext Val: The authors chose to use multi‐class modelling for a time‐to‐event analysis, unnecessarily categorising the outcome. The number of observations and the number of events were low. Calibration and discrimination were not addressed.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Kosa 2022.

*Study characteristics*
General information	Model name Not applicable Primary source Journal Data source Mixed (case‐control, cohort), primary Study type Development
Participants	Inclusion criteria CDMS diagnosis Lumbar puncture within one year of a clinical visit that included all tests necessary to calculate CombiWISE (EDSS, SNRS, 9HPT, and 25FW) Exclusion criteria Participants in MS exacerbation or on low‐efficacy therapies (copaxone, interferon beta preparations, oral disease‐modifying treatments) within 3 months of lumbar puncture If participant on high‐efficacy therapies (natalizumab, daclizumab, alemtuzumab, rituximab, ocrelizumab) within 6 months of lumbar puncture Recruitment Prospectively recruited in the study 'Comprehensive Multimodal Analysis of Neuroimmunological Diseases of the Central Nervous System' (NCT00794352), unclear which centre(s), USA Age (years) Mean 49.6 Sex (%F) 54.2 Disease duration (years) Mean 12.2 (pooled SD: 8.51) Diagnosis 30.8% RRMS, 24.2% SPMS, 44.9% PPMS Diagnostic criteria Mixed: McDonald 2010 (Polman 2011), McDonald 2017 (Thompson 2018b) Treatment At recruitment, 0% During follow‐up, not reported Disease description EDSS mean (SD): development set; RRMS 1.8 (1.2), SPMS 5.9 (1.2), PPMS 5.3 (1.6)/validation set RRMS 2.2 (1.6), SPMS 5.5 (1.5), PPMS 5.2 (1.6) Recruitment period 2004 to 2021
Predictors	Considered predictors All possible Somamer ratios from 1305 Somamers (unclear adjustment for age and sex) along with individual markers Number of considered predictors 852,167 or 852,165 (unclear adjustment for age and sex) Timing of predictor measurement At lumbar puncture Predictor handling Continuously (transformed into ratios)
Outcome	Outcome definition Composite (EDSS, SNRS, T25FW, NDH‐9HPT): MS‐DSS, a model output based on measured CombiWISE (which contains EDSS, SNRS, T25FW, NDH‐9HPT), therapy adjusted CombiWISE (which includes a treatment efficacy model), COMRIS‐CTD (including several lesion and atrophy measures), time from disease onset to first therapy, difference between adjusted and unadjusted CombiWISE, age, and family history of MS Timing of outcome measurement Mean: 4.3 years
Missing data	Number of participants with any missing value NR Missing data handling Exclusion
Analysis	Number of participants (number of events) 227 (continuous outcome) Modelling method Random forest, numeric outcome Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, stepwise selection Backward recursive feature elimination on 10 RFs, in which the predictors with lowest 10% variable importance removed, RFs refit, and process iterated until n predictors remaining (n chosen based on lowest out‐of‐bag error) Hyperparameter tuning Unclear, number of predictors to include chosen by out of bag error, random forest tuning parameters not reported Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Random split Calibration estimate Calibration plot Discrimination estimate Not applicable Classification estimate Not applicable Overall performance Not reported Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model 21 or 23 (unclear if age and sex are predictors) Predictors in the model Somamer ratios, age, sex Effect measure estimates R² = 0.264 Predictor influence measure Variable importance Validation model update or adjustment Not applicable
Interpretation	Aim of the study To test the hypothesis that CSF biomarker models provide insight into MS pathophysiology, identify molecular disease heterogeneity, and lead to an independent‐cohort validated prognostic test Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the prognostic value of CSF biomarkers. Model interpretation Probably exploratory Suggested improvements Further mechanistic research
Notes	Applicability overall High Applicability overall rationale The outcome was not a clinical measure but rather a value produced by another model with unclear interpretation. Auxiliary references Calle ML, Urrea V, Boulesteix AL, Malats N. AUC‐RF: a new strategy for genomic profiling with random forest. Hum Hered 2011;72(2):121‐32. Kosa P, Komori M, Waters R, Wu T, Cortese I, Ohayon J, et al. Novel composite MRI scale correlates highly with disability in multiple sclerosis patients. Mult Scler Relat Disord 2015;4(6):526‐35. Roxburgh RH, Seaman SR, Masterman T, Hensiek AE, Sawcer SJ, Vukusic S, et al. Multiple Sclerosis Severity Score: using disability and disease duration to rate disease severity. Neurology 2005;64(7):1144‐51. Weideman A M, Barbour C, Tapia‐Maltos MA, Tran T, Jackson K, Kosa P, et al. New multiple sclerosis disease severity scale predicts future accumulation of disability. Front Neurol 2017;8:598. NCT00794352. Comprehensive multimodal analysis of neuroimmunological diseases of the central nervous system. https://clinicaltrials.gov/show/NCT00794352 (first received 20 November 2008).

Item	Authors' judgement	Support for judgement
Participants	No	Although the study categorised itself as a case‐control study, the model we are interested in used prospectively measured predictors and outcomes of interest. In addition, the inclusion criteria depended on the availability of some tests, which is likely to introduce risk of bias.
Predictors	Yes	Predictors were collected prospectively according to a standard operating procedure by investigators blinded to clinical and MRI outcomes. The predictors were available at the intended time of use, reported as first lumbar puncture.
Outcome	No	During the study, the calculation of neurological scales changes from manual to an app where it is likely to introduce variability. The timing of the outcome was not well defined and, despite the prospective design of the study, follow‐up time has high variability.
Analysis	No	The sample size was small. Participants with missing outcome data were excluded from the analysis via exclusion criteria. The model performance was assessed suboptimally in a random‐split sample. There was no indication of a final selected model that could be used by others.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Kuceyeski 2018.

*Study characteristics*
General information	Model name Pairwise disconnection and GM atrophy Primary source Journal Data source Mixed (cohort, registry, routine care), secondary Study type Development
Participants	Inclusion criteria Early RRMS patients Exclusion criteria Not reported Recruitment Not reported Age (years) Mean 36.8 (unclear when) Sex (%F) 73.3 Disease duration (years) Mean 1.5 (SD 1.3) Diagnosis 100% RRMS Diagnostic criteria Mixed: McDonald 2010 (Polman 2011), McDonald 2017 (Thompson 2018b) Treatment At recruitment, 95% on DMT During follow‐up, not reported Disease description EDSS mean (SD): 1.1 (1.1) Recruitment period Not reported
Predictors	Considered predictors Age, sex, disease duration, treatment duration, baseline SDMT, baseline EDSS, regional GM atrophy (86 regions), NEMO pairwise disconnection measures (610 of 3655 considered), (other models: JHU‐MNI atlas overlap (176 regions), regional disconnection (86 ChaCo scores)), number of months between time points Number of considered predictors 965 Timing of predictor measurement At baseline image (early RRMS within 5 years of their first neurologic symptom), at follow‐up (unclear: outcome measurement) Predictor handling Continuously
Outcome	Outcome definition Disability (cognitive, SDMT): future processing speed measured using Symbol Digits Modality Test (SDMT) scores Timing of outcome measurement Mean (SD): 28.6 months (10.3 months)
Missing data	Number of participants with any missing value Not reported Missing data handling Not reported
Analysis	Number of participants (number of events) 60 (continuous outcome) Modelling method Partial least squares regression Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, multiple models Hyperparameter tuning Ten‐fold cross‐validation to identify number of components that minimised predicted residual sum of squares Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Unclear (cross‐validation and bootstrap used for model selection/fitting) Calibration estimate Calibration plot Discrimination estimate Not applicable Classification estimate Not applicable Overall performance R² = 0.79 (95% CI 0.80 to 0.97) Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model 703 predictors transformed into 6 principal components Predictors in the model Age, sex, disease duration, treatment duration, baseline SDMT, baseline EDSS, regional GM atrophy (86 regions), NEMO pairwise disconnection measures (610 considered), number of months between time points Effect measure estimates Not reported Predictor influence measure Median and bootstrapped 95% confidence intervals for coefficients of statistically significant predictors Validation model update or adjustment Not applicable
Interpretation	Aim of the study To identify which of our 5 models (based on GM atrophy, global, regional and region‐pair disconnectivity and atlas overlap) has the best accuracy in predicting follow‐up processing speed To identify which of the global, regional, or pairwise disconnectivity; atrophy; and atlas overlap metrics in these models are significant predictors of future processing speed Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on the usefulness of MRI measures and identifying promising MRI features. Model interpretation Exploratory Suggested improvements Increase sample size, the scores addressing SDMT domains as outcome measures, add WM damage measures
Notes	Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of MRI‐based connectome measures.

Item	Authors' judgement	Support for judgement
Participants	No	Combination of data from a cohort study, registry, and routine care were used. No information was reported about the eligibility criteria.
Predictors	Yes	The predictors are objective measures or scores. All except for number of months between time points could be collected at the time of the first image.
Outcome	Yes	It is unclear whether the outcome was blinded to the predictors, but we consider the outcome based on SDMT to be objective. SDMT is a validated measure of cognitive function measurement.
Analysis	No	Even when based on the number of principal components, the EPV was low. No information on the missing data or how it was handled was provided. Details of the model were not reported. The post‐baseline time variable number of months between time points was included in the models. Although bootstrapping was used for confidence interval calculation, there was no indication that any optimism correction was performed.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Law 2019.

*Study characteristics*
General information	Model name DT RF Ada Primary source Journal Data source Randomised trial participants, secondary Study type Development
Participants	Inclusion criteria Age between 18 years and 65 years Patients meeting a stringent definition of SPMS (recently confirmed progression in EDSS in the absence of relapse) Documented history of SPMS Absence of relapse in the 3 months leading up to trial participation EDSS score of 3.5 to 6.5 Kurtzke pyramidal or cerebellar system subscore ≥ 3 Exclusion criteria Participants with multiple missing visits or data entries at any given visit, including participants that did not have a complete set of baseline clinical scores (EDSS, MSFC, 9HP, T25W, PASAT) or missing baseline T2LV or BPF Diagnosis of PPMS Previous treatment with MBP8298 History of malignancy Steroid therapy within 30 days of study entry Treatment with beta‐interferon Glatiramer acetate within 3 months Mitoxantrone, cyclophosphamide, methotrexate, azathioprine, or any other immunomodulating or immunosuppressive drugs or plasma exchange within 6 months prior to the first study‐specific test with the exception of corticosteroids or ACTH for relapse treatment Initiation or discontinuation of 4‐AP or 3,4‐DAP at any time during the study History of anaphylactic/anaphylactoid reactions to glatiramer acetate or Gd‐DTPA Abnormal baseline results deemed clinically significant by the investigator Any condition that could interfere with the performance of study‐specific procedures and any other condition that, in the investigator’s opinion, would make the individual unsuitable for participation. Recruitment MBP8298 RCT participants from 47 centres across 10 countries Canada, United Kingdom, Netherlands, Sweden, Denmark, Finland, Germany, Estonia, Latvia, Spain Age (years) Mean 50.9 Sex (%F) 64.1 Disease duration (years) Mean 9.3 (SD 5.0) Diagnosis 100% SPMS Diagnostic criteria Own definition Treatment At recruitment, not reported During follow‐up, 50% on MBP8298 Disease description EDSS median (IQR): 6.0 (4.5 to 6.5) Recruitment period 2004 to 2009
Predictors	Considered predictors Timed 25‐foot walk (T25W), 9HPT, Paced Auditory Serial Addition Test (PASAT), EDSS, disease duration, age, sex, T2 lesion volume (T2LV), brain parenchymal fraction (BPF) Number of considered predictors 9 Timing of predictor measurement At study baseline (RCT) Predictor handling Continuously
Outcome	Outcome definition Disability (EDSS): confirmed disability progression defined as an increase in EDSS (≥ 1.0 or ≥ 0.5 for baseline EDSS ≤ 5.5 or ≥ 6, respectively) sustained for 6 months Timing of outcome measurement At 2 years
Missing data	Number of participants with any missing value Unclear exactly how many participants have any missing 1 participant with missing value 127 participants excluded due to missing value Missing data handling Mixed: mean imputation for single patient's disease duration, exclusion
Analysis	Number of participants (number of events) 485 (115) Modelling method DT: decision tree RF: random forest Ada: boosting, AdaBoost Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, full model approach Hyperparameter tuning DT: nested 5‐fold cross‐validation for minimum size of each child node necessary for splitting a node (considered 5, 10, or 15%) RF and Ada: nested 5‐fold cross‐validation for number of models to include in ensemble (considered 2, 5, or 10) Shrinkage of predictor weights DT: none RF and Ada: modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 10‐fold Calibration estimate Not reported Discrimination estimate c‐Statistic: DT: 0.618 (SD 0.03) RF: 0.607 (SD 0.031) Ada: 0.602 (SD 0.031) Classification estimate DT: cutoff (0.537) identified by convex hull method, sensitivity = 58.3 (SD 4.6), specificity = 62.2 (SD 2.5), PPV = 32.4 (SD 2.0), NPV = 82.7 (SD 1.8) RF: cutoff (0.531) identified by convex hull method, sensitivity = 59.1 (SD 4.6), specificity = 61.1 (SD 2.5), PPV = 32.1 (SD 2.1), NPV = 82.8 (SD 1.7) Ada: cutoff (0.527) identified by convex hull method, sensitivity = 53.0 (SD 4.7), specificity = 62.4 (SD 2.5), PPV = 30.5 (SD 1.6), NPV = 81.1 (SD 1.9) Overall performance Not reported Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model 9 Predictors in the model Timed 25‐foot walk (T25W), 9HPT, Paced Auditory Serial Addition Test (PASAT), EDSS, disease duration, age, sex, T2 lesion volume (T2LV), brain parenchymal fraction (BPF) Effect measure estimates Not reported Predictor influence measure Mean % feature contribution/importance Validation model update or adjustment Not applicable
Interpretation	Aim of the study To evaluate individual and ensemble model performance built using decision tree (DT)‐based algorithms compared to logistic regression (LR) and support vector machines (SVMs) for predicting SPMS disability progression Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on modelling methods. Model interpretation Exploratory Suggested improvements Bigger samples, more predictors with non‐linear relationships with progression, using random trees instead of simple DTs in AdBoost
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. Auxiliary references Freedman MS, Bar‐Or A, Oger J, Traboulsee A, Patry D, Young C, et al. A phase III study evaluating the efficacy and safety of MBP8298 in secondary progressive MS. Neurology 2011;77(16):1551‐60.

Item	Authors' judgement	Support for judgement
Participants	No	Data from an RCT were used, but only complete cases were included. Around 10% of patients were excluded due to missing values, and it is unclear if the excluded patients differed from the included patients.
Predictors	Yes	The predictors were collected during an RCT; therefore, they are expected to be collected in the same way across all patients.
Outcome	Yes	The outcome was standard and was assessed during an RCT. We expect EDSS assessment to be objective and do not think that predictor knowledge influences results.
Analysis	No	The EPV was close to 10. Discrimination was addressed, but not calibration. The final model is unclear.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Lejeune 2021.

*Study characteristics*
General information	Model name Dev Ext Val Primary source Journal Data source Dev: randomised trial participants, secondary Ext Val: routine care, secondary Study type Development + external validation, location
Participants	Inclusion criteria 18 to 55 years of age RRMS diagnosis Relapse with available data about its clinical presentation EDSS ≤ 5 before relapse Pre‐ and post‐relapse EDSS scores available Exclusion criteria Dev: use of natalizumab, mitoxantrone, cyclophosphamide Ext Val: not reported Recruitment Dev: participants in the COPOUSEP (Corticothérapie Orale dans les Poussées de Sclérose en Plaques), an RCT run in 14 centres (NCT00984984), France Ext Val: Bordeaux University Hospital, France Age (years) Dev: mean 35.3 (unclear when) Ext Val: mean 36.2 (unclear when) Sex (%F) Dev: 76.3 Ext Val: 76.6 Disease duration (years) Dev: mean 7.32 (SD 5.5) Ext Val: mean 7.62 (SD 6.56) Diagnosis 100% RRMS Diagnostic criteria McDonald 2005 (Polman 2005) Treatment Dev: At recruitment, 51.1% first line, 3.8% second line, 45.2% no treatment During follow‐up, unclear, 32.8% therapeutic escalation, 59.1% no DMT change Ext Val: At recruitment, 33.9% first line, 24.7% second line, 41.4% no treatment During follow‐up, unclear, 48% therapeutic escalation, 49.1% no DMT change Disease description Dev: EDSS mean (SD): 3.45 (0.96) Ext Val: EDSS mean (SD): 2.93 (1.00) Recruitment period Dev: 2008 to 2013 Ext Val: 2005 to 2016
Predictors	Considered predictors Dev: sex, age, disease duration, DMT (unclear if binary or none, first line, second line), EDSS 3 to 6 months prior to relapse, EDSS at relapse, difference between relapse EDSS and prior EDSS, relapse phenotype (at least a dummy for each of: 1) motor (motor disorders or isolated irritative pyramidal signs), 2) sensory (subjective sensory disturbances corresponding to paraesthesia, objective sensory disturbances corresponding to anaesthesia or hypoesthesia), 3) gait or balance disorder related to proprioceptive ataxia, 4) visual, 5) bladder/bowel, 6) cerebellar, 7) brainstem, 8) cognitive disorders, and 9) multifocal symptoms) Ext Val: not applicable Number of considered predictors Dev: between 14 and 19 (unclear transformations) Ext Val: not applicable Timing of predictor measurement Dev: at study baseline (RCT, relapse) or retrospectively at screening Ext Val: not applicable Predictor handling Dev: Unclear, disease duration continuously (ln transformed), age and EDSS prior to relapse dichotomised or categorised, EDSS change categorised No interactions considered Ext Val: not applicable
Outcome	Outcome definition Disability (EDSS): residual disability at 6 months after relapse defined as an increase of at least 1 EDSS point compared with pre‐relapse EDSS Timing of outcome measurement At 6 months
Missing data	Number of participants with any missing value Dev: ≥ 29, unclear exactly how many participants have any missing Ext Val: ≥ 782, unclear exactly how many participants have any missing Missing data handling Exclusion
Analysis	Number of participants (number of events) Dev: 186 (53) Ext Val: 175 (55) Modelling method Dev: logistic regression, LASSO Ext Val: not applicable Predictor selection method Dev: For inclusion in the multivariable model, all candidate predictors During multivariable modelling, modelling method Ext Val: not applicable Hyperparameter tuning Dev: penalty tuning parameter estimated by 5‐fold cross‐validation Ext Val: not applicable Shrinkage of predictor weights Dev: modelling method Ext Val: not applicable Performance evaluation dataset Dev: development Ext Val: external validation Performance evaluation method Dev: bootstrap, B = 1000 Ext Val: not applicable Calibration estimate Dev: not reported Ext Val: calibration plot, Hosmer‐Lemeshow test Discrimination estimate Dev: c‐statistics = 0.82 (95% CI 0.73 to 0.91) Ext Val: c‐statistics = 0.71 (95% CI 0.62 to 0.80) Classification estimate Dev: cutoff = 0.5, PPV 0.73 (95% CI 0.53 to 0.92), NPV 0.70 (95% CI 0.50 to 0.88) Ext Val: cutoff = 0.5, PPV 0.83 (95% CI 0.76 to 0.92), NPV 0.74 (95% CI 0.67 to 0.81) Overall performance Not reported Risk groups Not reported
Model	Model presentation Dev: Web app at https://shiny.idbc.fr/SMILE/ Regression coefficients without intercept Ext Val: not applicable Number of predictors in the model Dev: 6 (7 df) Ext Val: not applicable Predictors in the model Dev: increased EDSS during relapse, pre‐relapse EDSS at 0, age, proprioceptive ataxia, subjective sensory disorder, disease duration Ext Val: not applicable Effect measure estimates Dev: OR (SD): increased EDSS during relapse from 1.5 points to 2.5 points 1.08 (1.58), increased EDSS during relapse of 3 or more 4.98 (9.23), pre‐relapse EDSS at 0 points 1.75 (0.29), age 1.29 (1.65), proprioceptive ataxia 1.05 (0.95), subjective sensory disorder 0.51 (0.17), natural log of disease duration 0.73 (0.12) Ext Val: not applicable Predictor influence measure Dev: not reported Ext Val: not applicable Validation model update or adjustment Dev: not applicable Ext Val: none
Interpretation	Aim of the study To develop and validate a clinical‐based model for predicting the risk of residual disability at 6 months post‐relapse in MS Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Confirmatory Suggested improvements Not reported
Notes	Applicability overall Low Auxiliary references NCT00984984. Efficacy and safety of methylprednisolone per os versus IV for the treatment of multiple sclerosis (MS) relapses. https://ClinicalTrials.gov/show/NCT00984984 (first received 25 September 2009).

Item	Authors' judgement	Support for judgement
Participants	No	Dev: RCT data, considered to be a valid source, were used for modelling. However, > 5% of participants were excluded for missing data. Ext Val: The use of data from routine clinical care may introduce bias. Of the 978 people in the registry, 781 were excluded for missing data. Due to the exclusion of this significant number of participants, it is unclear whether the model results are generalisable.
Predictors	Yes	Dev: Due to the RCT nature, predictors should have been defined and assessed in a similar way across participants. Due to the prospective nature of the RCT, predictors were collected without knowledge of the outcome. The authors specifically set out to create a prediction model in which all predictors were readily available at baseline. Ext Val: There is no reason to believe differential or post‐outcome assessment of the predictors in this routine hospital data set from a single centre.
Outcome	Yes	We consider the outcome, which is based on EDSS, to be robust to sources of bias, such as knowledge of predictors at outcome assessment.
Analysis	No	Dev: The EPV was less than 10. Continuous predictors were dichotomised or possibly categorised without clear explanation. It was unclear how missing predictor data were handled, other than exclusion (handled in Participants section). Although calibration measures for the development set were not reported, they were reported for the external validation set of the same publication. Ext Val: The number of events was fewer than 100. It was unclear how missing predictor data were handled other than exclusion, which was handled in the Participants section.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Malpas 2020.

*Study characteristics*
General information	Model name Dev Ext Val Primary source Journal Data source Registry, secondary Study type Development + external validation, location
Participants	Inclusion criteria Diagnosis of clinically definite relapse‐onset MS Age at onset ≥ 18 years First EDSS recorded within 12 months of symptom onset At least 2 recorded EDSS scores within 10 years of symptom onset At least 10 years of observation time based on last recorded EDSS Exclusion criteria Not reported Recruitment Dev: participants from 139 clinical centres in 34 countries in the MSBase registry Ext Val: participants in the Swedish MS Registry, Sweden Age (years) Dev: mean 31.7 (onset) Ext Val: mean 33.4 (onset) Sex (%F) Dev: 71.2 Ext Val: not reported Disease duration (years) Dev: mean 0.33 (SD 0.3) Ext Val: not reported Diagnosis 100% RRMS Diagnostic criteria McDonald 2010 (Polman 2011) Treatment Dev: At recruitment, unclear number of participants, mean % time on treatment 1st year, first‐line 17.1%, second‐line 0.50% During follow‐up, unclear number of participants, mean % time on treatment 10th year, 46% first‐line, 5.3% second‐line Ext Val: not reported Disease description Dev: first year EDSS mean (SD): 1.78 (1.26), number of relapses mean (SD): 0.74 (0.93) Ext Val: first year EDSS mean (SD): 1.51 (1.28) Recruitment period Not reported
Predictors	Considered predictors Dev: gender, age at symptom onset, within the first year of symptom onset: median EDSS, any hospitalisation associated with a relapse, any treatment with steroids, number of severe relapses, number of any relapses, pyramidal signs, bowel/bladder signs, cerebellar signs, incomplete recovery from a relapse, nuisance variables: disease duration at first visit, total observation time, proportion of time over the first year on first‐line therapy, proportion of time over the first year on second‐line therapy, proportion of time over the 10‐year observation period on first‐line therapy, proportion of time over the 10‐year observation period on second‐line therapy Ext Val: not applicable Number of considered predictors Dev: 17 Ext Val: not applicable Timing of predictor measurement Dev: at symptom onset, at visits up to 1 year following symptom onset, and at final follow‐up Ext Val: not applicable Predictor handling Dev: Continuously No interactions considered Ext Val: not applicable
Outcome	Outcome definition Disability (EDSS): aggressive MS defined as all of (i) EDSS ≥ 6 reached within 10 years of symptom onset, (ii) EDSS ≥ 6 confirmed and sustained over ≥ 6 months, and (iii) EDSS ≥ 6 sustained until the end of follow‐up (≥ 10 years) Timing of outcome measurement Dev: at 10 years after onset (adjustment for time on study), time from onset to meeting aggressive disease criteria mean (SD, range): 6.05 years (2.79 years, 0 years to 9.89 years) Ext Val: at 10 years after onset
Missing data	Number of participants with any missing value Dev: 56,081, unclear exactly how many participants have any missing Ext Val: not reported Missing data handling Exclusion
Analysis	Number of participants (number of events) Dev: 2403 (145) Ext Val: 556 (34) Modelling method Dev: ensemble, Bayesian model averaging with binomial distribution Ext Val: not applicable Predictor selection method Dev: For inclusion in the multivariable model, all candidate predictors During multivariable modelling, posterior inclusion probability > 0.5 Ext Val: not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Dev: modelling method Ext Val: not applicable Performance evaluation dataset Dev: development Ext Val: external validation Performance evaluation method Dev: apparent Ext Val: not applicable Calibration estimate Not reported Discrimination estimate Dev: Full model: c‐statistic = 0.82 (95% CI 0.78 to 0.85) Final model: c‐statistic = 0.80 (95% CI 0.75 to 0.84) Ext Val: c‐statistic = 0.75 (95% CI 0.66 to 0.84) Classification estimate Dev: Full model: cutoff = 0.05, sensitivity = 0.78, specificity = 0.71, PPV = 0.15, NPV = 0.98 Reduced model: cutoff = 0.06, sensitivity = 0.72, specificity = 0.73, PPV = 0.15, NPV = 0.98 Ext Val: cutoff determined in development set (0.06), PPV = 0.15, NPV = 0.97 Overall performance Not reported Risk groups Not reported
Model	Model presentation Dev: Relative risk of aggressive disease by number of positive signs in simplified model (dichotomised based on individual optimal thresholds) BMA coefficients for larger model without intercept and 17th coefficient Ext Val: not applicable Number of predictors in the model Dev: Reduced model: 3 Full model: 17 Ext Val: not applicable Predictors in the model Dev: onset age, median EDSS in first year, pyramidal signs Ext Val: not applicable Effect measure estimates Dev: log OR (95% credible interval) for reduced model: intercept −3.54 (‐3.85 to −3.24), onset age 0.06 (0.04 to 0.08), median EDSS in first year 0.47 (0.35 to 0.59), pyramidal signs 0.80 (0.40 to 1.20) Ext Val: not applicable Predictor influence measure Not applicable Validation model update or adjustment Dev: not applicable Ext. Val: none
Interpretation	Aim of the study To evaluate whether patients who will develop aggressive multiple sclerosis can be identified based on early clinical markers Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on predictor identification. Model interpretation Probably confirmatory Suggested improvements Add MRI and CSF data
Notes	Applicability overall Dev: high This study included participants who had already experienced the outcome at baseline. Ext Val: unclear It is unclear whether patients who had already experienced the outcome at baseline were included in the validation set. Applicability overall rationale Dev: this study included participants who had already experienced the outcome at baseline. Ext Val: it is unclear whether patients who had already experienced the outcome at baseline were included in the validation set. Auxiliary references Butzkueven H, Chapman J, Cristiano E, Grand'Maison F, Hoffmann M, Izquierdo G, et al. MSBase: an international, online registry and platform for collaborative outcomes research in multiple sclerosis. 12(6):769‐74.

Item	Authors' judgement	Support for judgement
Participants	No	Dev: The data source was reported unclearly: called a registry but also a cohort study. Although the authors referred to a Quality Assurance paper for the data source, the concerned article only described the system of quality summary. Also, it was not reported what happens when centres/observations deviated from quality standards. Inappropriate inclusion of participants with outcomes at baseline may lead to bias in predictions and complicate the interpretation, regardless of whether the model estimated change or not. The sensitivity analysis only addressed whether the same predictors were included, not whether the predictions changed. Ext Val: The data source was a registry with inclusion/exclusion depending on the length of follow‐up. Also, it is unclear whether participants had the outcome at baseline, as in the development.
Predictors	Yes	The final model is simple and timely. The included predictors are simple enough to be considered objective. The included predictors were collected up to 1 year after symptom onset. We consider the period up to 1 year after onset to still be onset. For this reason, and because the logistic model, as opposed to the survival model, does not require a starting point, the predictors are considered to be available at the time of model application.
Outcome	Yes	The outcome was pre‐specified. It was based on EDSS, which we consider to be relatively robust, so we are not concerned about the possible lack of blinding of the outcome assessor to the patient history. Participants with the outcome at baseline were included, but this was addressed in the Participants section.
Analysis	No	Dev: The EPV was less than 10. Complete case analysis was performed. No calibration measures were reported, and apparent discrimination was reported. External validation was done in the same paper but also without calibration. No assessment of the need for shrinkage was done. The model coefficients were given at correspondence, but in the paper, the reduced model was presented only in terms of a chart with relative risks based on combinations of predictors. Ext Val: The event number was fewer than 100. Complete case analysis was performed. No calibration measures were reported.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Mandrioli 2008.

*Study characteristics*
General information	Model name Dev Ext Val Primary source Journal Data source Cohort, secondary Study type Development + external validation, time
Participants	Inclusion criteria RRMS course at onset Neurological follow‐up and EDSS evaluation at least every 6 months for at least 10 years Exclusion criteria PPMS No regular neurological follow‐up No CSF sample available Recruitment Consecutive patients identified during regular follow‐ups at the Neurology Clinic of Modena University Hospital, Italy Age (years) Dev: mean 27.6 (onset) Ext Val: mean 33.0 (onset) Sex (%F) Dev: 60.9 Ext Val: 61.5 Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment At recruitment, not reported During follow‐up: Dev: 34.4% IFN‐b, 60.9% azathioprine, 18.8% iv mitoxantrone, 29.7% never treated Ext Val: 46.2% IFN‐b, 36.9% azathioprine, 16.9% iv mitoxantrone, 32.3% never treated Disease description Dev: EDSS at diagnosis mean (SD): BMS 1.76 (0.24), SMS 2.17 (0.18) Ext Val: EDSS at diagnosis mean (SD): BMS 1.65 (0.10), SMS 2.45 (0.23) Recruitment period Dev: 2003 to 2004 Ext Val: not reported
Predictors	Considered predictors Dev: IgG OB presence or absence, IgM OB presence or absence, increased IgG index, increased IgM index, gender, age at onset (> 30, < 31), sensory symptoms at onset, motor symptoms at onset, optic neuritis at onset, brainstem or cerebellar symptoms at onset, time to second relapse (> 24, < 25 months), time to second relapse in months, EDSS score at diagnosis (unclear when these were dropped: IgG OB number, IgM OB number) Ext Val: not applicable Number of considered predictors Dev: 15 Ext Val: not applicable Timing of predictor measurement Dev: at disease onset (RRMS) Ext Val: not applicable Predictor handling Dev: age at onset dichotomised, time to second relapse tested as dichotomised and continuously, IgM and IgG indices and their OB number as dichotomised (justified on the basis of its dependence on the varying laboratory method), EDSS score at diagnosis continuously Ext Val: not applicable
Outcome	Outcome definition Disability (EDSS): severe MS (SMS) defined as an EDSS score of 4 or more after a disease duration of 10 years or less, benign MS (BMS) otherwise (Kurtzke 1977 criteria); progression to a new EDSS score had to be confirmed in 2 consecutive examinations Timing of outcome measurement Dev: unclear when the outcome was measured relative to study start, follow‐up from onset mean (SD): BMS 16.03 years (0.92 years), SMS 13.62 years (0.80 years) Ext Val: unclear when the outcome was measured relative to study start, follow‐up from onset mean (SD): BMS 11.36 years (0.61 years), SMS 11.65 years (1.15 years)
Missing data	Number of participants with any missing value Dev: 29 Ext Val: 39 Missing data handling Exclusion
Analysis	Number of participants (number of events) Dev: 64 (26) Ext Val: 65 (20) Modelling method Dev: logistic regression Ext Val: not applicable Predictor selection method Dev: For inclusion in the multivariable model, univariable analysis During multivariable modelling, significance P value < 0.05 Ext Val: not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Dev: none Ext Val: not applicable Performance evaluation dataset Dev: development Ext Val: external validation Performance evaluation method Dev: apparent Ext Val: not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Dev: error = 0.0937, sensitivity = 0.8846, specificity = 0.9211, PPV = 0.8846, NPV = 0.9211 Ext Val: error = 0.1231, sensitivity = 0.8000, specificity = 0.9111, PPV = 0.8000, NPV = 0.9111 Overall performance Not reported Risk groups Not reported
Model	Model presentation Dev: full regression model Ext Val: not applicable Number of predictors in the model Dev: 4 Ext Val: not applicable Predictors in the model Dev: CSF IgM OB presence, motor symptoms at onset, sensory symptoms at onset, time to second relapse in months Ext Val: not applicable Effect measure estimates Dev: OR (95% CI) IgM OB presence 0.02 (0.00 to 0.16), motor symptoms at onset 0.04 (0.00 to 0.43), sensory symptoms at onset 169.27 (6.95 to 4120.44), time to second relapse in months 0.96 (0.93 to 1.00) Unclear/inconsistent reporting: linear predictor formula = 3.31 pyramidal – 5.13 sensory – 0.03 time + 3.86 IgMOB – 0.76 Ext Val: not applicable Predictor influence measure Not applicable Validation model update or adjustment Dev: not applicable Ext Val: unclear
Interpretation	Aim of the study To create a multifactorial prognostic index (MPI) providing the probability of a severe MS course at diagnosis based on clinical and immunological CSF parameters Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements Include MRI data; validate in a large, prospective cohort
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	No	Data were retrospectively collected and the data source is not clearly reported. Exclusion criteria included follow‐up and predictor availability.
Predictors	Yes	The study authors reported analysing immunological and clinical data blindly. CSF Immunological assessments were performed twice by 2 neurologists. They specifically chose to include only predictors available at RRMS diagnosis in the final model.
Outcome	Yes	The outcome was defined in an independent study (Kurtzke 1977) and was based on EDSS, so we consider it a hard outcome with little risk of bias. Furthermore, the timing of 10 years is a reasonable amount of time for reaching EDSS 4.
Analysis	No	Dev: EPV was less than 10. Missing data were addressed by exclusion and handled in the Participants section. Neither discrimination nor calibration was addressed. Univariate analyses were used for variable selection. It was unclear whether the predictors and their assigned weights in the final model corresponded to the results from multivariable analysis because the OR measures provided in the results table had different signs than the model formula provided in the text. Although an external validation dataset was used in the same study, only classification measures related to it were reported. The need for shrinkage was not assessed. Ext Val: The number of participants was fewer than 100. Neither discrimination nor calibration was addressed. Participants with missing data were excluded and handled in the Participants section.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Manouchehrinia 2019.

*Study characteristics*
General information	Model name Dev Ext Val 1 Ext Val 2 Ext Val 3 Primary source Journal Data source Dev: registry, secondary Ext Val 1: cohort, secondary Ext Val 2 and Ext Val 3: randomised trial participants, secondary Study type Development + external validation, multiple (location, time, spectrum)
Participants	Inclusion criteria Dev: Swedish MS registry (SMSreg) Born between 1940 and 2000 Initial relapsing‐remitting disease course At least 1 EDSS score recorded within the RRMS phase Ext Val 1: SMSreg Enrolled in the database from 1 January 1980 to 31 December 2004 Ext Val 2: SMSreg Visit for assessment at year 10 Ext Val 3: SMSreg Exclusion criteria Not reported Recruitment Dev: national MS registry containing data on about 80% of all prevalent cases of MS in Sweden Ext Val 1: 4 original MS clinics in British Columbia, containing an estimated 80% of the MS population in the province, Canada Ext Val 2: participants in the ACROSS, a multicentre phase 2 RCT from 32 centres in 10 European countries (from clinicaltrial.gov: Denmark, France, Germany, Italy, Poland, Portugal, Spain, Switzerland, United Kingdom) and Canada Ext Val 3: participants in the FREEDOMS and FREEDOMS II extension study, an open‐label, single‐arm, long‐term follow‐up extension study of the phase 3 trials FREEDOMS and FREEDOMS II run at 138 centres in 22 countries Age (years) Dev: mean 31.5 (onset) Ext Val 1: mean 31.1 (onset) Ext Val 2: mean 29.5 (onset) Ext Val 3: mean 29.9 (onset) Sex (%F) Dev: 72 Ext Val 1 and Ext Val 3: 74 Ext Val 2: 67 Disease duration (years) Unclear Diagnosis 100% RRMS Diagnostic criteria Dev: McDonald (undefined) Ext Val 1: Poser 1983 Ext Val 2: McDonald 2001 Ext Val 3: McDonald 2005 (Polman 2005) Treatment Dev: At recruitment, unclear, a minority During follow‐up, unclear number of participants, median duration of exposures first‐line 3, second‐line 0.8 Ext Val 1: At recruitment, not reported During follow‐up, unclear number of participants, median 0 Ext Val 2 and Ext Val 3: At recruitment, 0% During follow‐up, 100% on DMT Disease description Dev and Ext Val 1: first‐recorded EDSS median (IQR): 2 (1 to 3) Ext Val 2: first‐recorded EDSS median (IQR): 2 (1.5 to 3) Ext Val 3: first‐recorded EDSS median (IQR): 2 (1.5 to 3.5) Recruitment period Dev: up to 2016 Ext Val 1: 1980 to 2004 Ext Val 2: 2003 to 2015 Ext Val 3: 2006 to 2018
Predictors	Considered predictors Dev: calendar year of birth, sex, onset age, first‐recorded EDSS score (linear spline with a single knot at 4), age at the first‐recorded EDSS score, duration of exposure to first‐line DMTs, duration of exposure to second‐line DMTs, complete recovery from the first relapse, monofocal or polyfocal type of the first attack, sensory/sensory and motor/motor type of the first attack, relapse rate within the first 2 years from disease onset, relapse rate within the first 5 years from disease onset, relapse rate before the first EDSS score, total number of brain T2 lesions (ref 0/1 to 9/10 to 20/> 20), number of brain gadolinium–enhanced lesions (ref 0/1 to 2/> 2) Ext Val 1, Ext Val 2, and Ext Val 3: not applicable Number of considered predictors Dev: 20 Ext Val 1, Ext Val 2, and Ext Val 3: not applicable Timing of predictor measurement Dev: from disease onset (RRMS) to first EDSS recorded (median 2 years) Ext Val 1, Ext Val 2, and Ext Val 3: not applicable Predictor handling Dev: first‐recorded EDSS score as linear spline with a single knot, total number of brain T2 lesions, and number of brain gadolinium–enhanced lesions as categorised Ext Val 1, Ext Val 2, and Ext Val 3: not applicable
Outcome	Outcome definition Dev and Ext Val 1: Conversion to progressive MS (Lublin 1996): the earliest recognised date of SPMS onset determined by a neurologist at a routine clinic visit Ext Val 2: Conversion to progressive MS (Lublin 1996): the earliest recognised date of SPMS onset determined by a neurologist at a 10‐year follow‐up Ext Val 3: conversion to progressive MS: SPMS defined post hoc as a progressive increase in EDSS (by ≥ 1 from an initial score of 3 to 5 or by ≥ 0.5 for score ≥ 5.5) for at least 6 months duration in the absence or independent of relapses; SPMS not assigned to individuals below EDSS 3 Timing of outcome measurement Dev: follow‐up mean (SD): 12.5 years (8.7 years) Ext Val 1: follow‐up mean (SD): 13.8 years (8.4 years) Ext Val 2: follow‐up mean (SD): 18.6 years (7.9 years) Ext Val 3: follow‐up mean (SD): 14 years (7.8 years)
Missing data	Number of participants with any missing value Dev: ≥ 7684, unclear exactly how many participants have any missing Ext Val 1: ≥ 106, unclear exactly how many participants have any missing Ext Val 2 and Ext Val 3: not reported Missing data handling Dev and Ext Val 1 Mixed: complete case, exclusion Ext Val 2, and Ext Val 3 Complete case
Analysis	Number of participants (number of events) Dev: 8825 (1488) Ext Val 1: 3967 (888) Ext Val 2: 175 (26) Ext Val 3: 2355 (126) Modelling method Dev: survival, Gaussian Ext Val 1, Ext Val 2 and Ext Val 3: not applicable Predictor selection method Dev: For inclusion in the multivariable model, all candidate predictors During multivariable modelling, stepwise selection: P value < 0.05 backward Ext Val 1, Ext Val 2, and Ext Val 3: not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Dev: none Ext Val 1, Ext Val 2, and Ext Val 3: not applicable Performance evaluation dataset Dev: development Ext Val 1, Ext Val 2, and Ext Val 3: external validation Performance evaluation method Dev: bootstrap, B = 200 Ext Val 1, Ext Val 2, and Ext Val 3: not applicable Calibration estimate Dev: calibration plot Ext Val 1, Ext Val 2, and Ext Val 3: not reported Discrimination estimate Dev: Harrel's c‐statistic 0.84 (95% CI 0.83 to 0.85) Ext Val 1: Harrel's c‐statistic 0.77 (95% CI 0.76 to 0.78) Ext Val 2: Harrel's c‐statistic 0.77 (95% CI 0.70 to 0.85) Ext Val 3: Harrel's c‐statistic 0.87 (95% CI 0.84 to 0.89) Classification estimate Not reported Overall performance Not reported Risk groups Not reported
Model	Model presentation Dev: Nomograms for calculating probabilities of 10 years, 15 years, and 20 years Risk web app at https://aliman.shinyapps.io/SPMSnom/, regression coefficients without baseline hazard Ext Val 1, Ext Val 2, and Ext Val 3: not applicable Number of predictors in the model Dev: 5 (6 df) Ext Val 1, Ext Val 2, and Ext Val 3: not applicable Predictors in the model Dev: calendar year of birth, male sex, onset age, first‐recorded EDSS score, age at the first‐recorded EDSS score Ext Val 1, Ext Val 2, and Ext Val 3: not applicable Effect measure estimates Dev: log HR (95% CI): calendar year of birth 0.21 (0.16 to 0.28), male sex −1.24 (–1.79 to −0.70), onset age −0.93 (−0.96 to −0.89), lower EDSS spline −2.42 (−2.66 to −2.18), upper EDSS spline 1.42 (0.89 to 1.96), age at the first‐recorded EDSS score 0.88 (0.82 to 0.94) Ext Val 1, Ext Val 2, and Ext Val 3: not applicable Predictor influence measure Not applicable Validation model update or adjustment Dev: not applicable Ext Val 1, Ext Val 2, and Ext Val 3: none
Interpretation	Aim of the study To design a nomogram, a prediction tool, to predict the individual’s risk of conversion to secondary progressive multiple sclerosis (SPMS) at the time of multiple sclerosis (MS) onset Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Not reported
Notes	Applicability overall Low Auxiliary references Calabresi PA, Radue EW, Goodin D, Jeffery D, Rammohan KW, Reder AT, et al. Safety and efficacy of fingolimod in patients with relapsing‐remitting multiple sclerosis (FREEDOMS II): a double‐blind, randomised, placebo‐controlled, phase 3 trial. Lancet Neurol 2014;13(6):545‐56. Derfuss T, Sastre‐Garriga J, Montalban X, Rodegher M, Wuerfel J, Gaetano L, et al. The ACROSS study: long‐term efficacy of fingolimod in patients with relapsing‐remitting multiple sclerosis. Mult Scler J Exp Transl Clin 2020;6(1):2055217320907951. Hillert J, Stawiarz L. The Swedish MS registry – clinical support tool and scientific resource. Acta Neurol Scand 2015;132(199):11‐9. Kappos L, Antel J, Comi G, Montalban X, O'Connor P, Polman C H, et al. Oral fingolimod (FTY720) for relapsing multiple sclerosis. N Engl J Med 2006;355(11):1124‐40. Kappos L, Radue EW, O'Connor P, Polman C, Hohlfeld R, Calabresi P, et al. A placebo‐controlled trial of oral fingolimod in relapsing multiple sclerosis. N Engl J Med 2010;362(5):387‐401. NCT02307838. Long‐term follow‐up of fingolimod phase II study patients (ACROSS). https://clinicaltrials.gov/ct2/show/NCT02307838 (first received 4 December 2014).

Item	Authors' judgement	Support for judgement
Participants	Yes	Dev: We rated this domain for this analysis as having a high risk of bias. The data source was a nationwide registry; hence, it is expected to be heterogeneous. The inclusion criteria were based on the presence of a predictor. Ext Val 1: We rated this domain for this analysis as having a high risk of bias. The inclusion criteria were based on the presence of a predictor. The data source is not very clear, although it was referred to as a cohort. Ext Val 2: We rated this domain for this analysis as having a high risk of bias. The data source was a secondary use of an RCT. Although the initial inclusion/exclusion criteria were not reported, we expect them to be appropriate due to the inherent prospective nature. However, only participants with complete follow‐up were included for this analysis, even though survival analysis was used. Ext Val 3: We rated this domain for this analysis as having a low risk of bias. The data source was a secondary use of an RCT. Although the initial inclusion/exclusion criteria were not reported, it is expected to be appropriate due to the inherent prospective nature. The number of patients completing the FREEDOMS studies and the number included here are the same.
Predictors	No	Onset age and age at the first‐recorded EDSS score were predictors in the final model. The intended time of model use was stated to be the RRMS onset. Age at first recorded EDSS was available only several years after onset instead of at time of model use.
Outcome	Yes	Dev: We rated this domain for this analysis as having a high risk of bias. The participants were seen 4 to 7 times in 5 years to 10 years, considered to be close to the expected frequency of yearly visits. However, the outcome is not clearly operationalised in the report or in the criteria referred. Ext Val 1: We rated this domain for this analysis as having a high risk of bias. The participants were seen 3 to 5 times in 5 years to 10 years, less than a visit per year. Since the outcome was time‐to‐event, the varying density of observations might introduce a bias. However, the outcome is not clearly operationalised in the report or in the criteria referred. Ext Val 2: We rated this domain for this analysis as having a high risk of bias. The outcome is not clearly operationalised in the report or in the criteria referred. Ext Val 3: We rated this domain for this analysis as having a low risk of bias. The outcome assessment was made outside the trial based on on‐trial EDSS information. The outcome assessment was EDSS‐based and therefore relatively robust to bias due to lack of blinding.
Analysis	No	Dev: Many candidate predictor values were missing, and in which subset of patients the backward selection took place was not reported. Complete case analysis was used. The candidate predictors of the number of lesions were categorised, but EDSS was handled using linear splines. The bootstrap procedure for performance measures did not include predictor selection, but external validation was done. However, the external validations did not address calibration. Ext Val 1: Calibration was not assessed. The amount of missing data and how it was handled was not reported. Ext Val 2: There were too few events in this validation set, and calibration was not assessed. Ext Val 3: Calibration was not assessed. No information was reported on the handling of missing data.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Margaritella 2012.

*Study characteristics*
General information	Model name Not applicable Primary source Journal Data source Routine care, secondary Study type Development
Participants	Inclusion criteria Diagnosis of clinically definite MS according to research criteria at the time of assessment 3 consecutive tests (multimodal sensory EP tests and simultaneous clinical and EDSS assessments) Assessed yearly on 3 consecutive occasions Exclusion criteria Incomplete EP tests Tests performed during a clinical relapse Recruitment Patients referred to a single MS centre in Milan, Italy Age (years) Mean 28.6 (onset) Sex (%F) 79.3 Disease duration (years) Mean 10.1 (SD 7.3) Diagnosis 89.7% RRMS, 3.4% PPMS, 6.9% benign MS Diagnostic criteria Mixed: McDonald 2001, McDonald 2005 (Polman 2005) Treatment Not reported Disease description EDSS mean (SD): 2.1 (1.5) Recruitment period 2005 to 2008
Predictors	Considered predictors mEPS (1 year lag), age, age at onset, gender, disease course type (RR, SP, PP, benign), EDSS (1 year lag) Number of considered predictors ≥ 8 (unclear transformations) Timing of predictor measurement At multiple assessments consecutively for 3 years until 1 year prior to outcome Predictor handling Continuously, unclear: mEPS as square root
Outcome	Outcome definition Disability (EDSS): EDSS score Timing of outcome measurement At 1 year after included mEPS and EDSS predictors (probably occurring multiple yearly periods ‐ up to 3 per patient)
Missing data	Number of participants with any missing value 163 Missing data handling Exclusion
Analysis	Number of participants (number of events) 58 participants, ≤ 174 observations (continuous outcome) Modelling method Linear regression Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, unclear AIC, BIC One variable was excluded based on AIC/BIC during multivariable regression Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Histogram of differences between measured and predicted values Discrimination estimate Not applicable Classification estimate % predictions within ± 0.5 of observed = 0.72 Overall performance R² = 0.8 Risk groups Not reported
Model	Model presentation Full regression model Number of predictors in the model 6 Predictors in the model EDSS, mEPS, age at onset, gender, benign course, PP course Effect measure estimates Linear model coefficients (SE): EDSS 0.86 (0.589), mEPS 0.11 (0.038), age at onset −0.009 (0.014), gender 0.25 (0.201), benign course −0.26 (0.186), PP course −0.98 (0.594), intercept 19.86 (27.93) Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To re‐evaluate the usefulness of mEP for short‐term prediction of the EDSS by considering mEP not as a single predictor but within a multivariate statistic approach derived from economics that can be easily implemented and tested Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the prognostic value of multimodal EP. Model interpretation Probably confirmatory Suggested improvements Test model on more heterogeneous patient groups and test for ability to predict out longer than 1 year, including motor evoked potentials
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	No	Exclusions were based on data availability because of the data source, routine clinical data.
Predictors	Yes	Predictors were collected according to recommended protocols at a single centre by limited number of technicians. Even if post‐processing might be after the outcome, the predictor definitions seem objective.
Outcome	Yes	The outcome was based on EDSS, which is considered to be an objective measure.
Analysis	No	Whether or not the sample size was sufficient could not be judged. There was no appropriate calibration plot. Overfitting and optimism were not addressed. EDSS score was treated as a continuous, normally distributed variable, although it is an ordinal measure. Participants with missing EP and EDSS data were excluded, but further missing data handling was not reported.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Martinelli 2017.

*Study characteristics*
General information	Model name MRI criteria + all significant Primary source Journal Data source Routine care Study type Development
Participants	Inclusion criteria Age between 18 years and 55 years at the time of the first neurological episode Diagnosis of CIS not attributable to any definitive disease Underwent a comprehensive diagnostic workup Hospitalisation within 3 months from symptom onset and all examinations (baseline CSF, EP, MRI) Follow‐up of more than 2 years Exclusion criteria Not reported Recruitment Patients admitted to the MS centre at San Raffaele Hospital in Milan, Italy Age (years) Mean 32.0 Sex (%F) 67.9 Disease duration (years) Up to 3 months Diagnosis 100% CIS Diagnostic criteria Not reported Treatment At recruitment, not reported During follow‐up, 39.5% on DMT Disease description Not reported Recruitment period 2000 to 2013
Predictors	Considered predictors 2010 DIS criteria fulfilled, 2010 DIT criteria fulfilled; age at onset; sex; multifocal or monofocal type of onset; partial or complete recovery; brainstem, optic neuritis, spinal cord, or other type of CIS; binary T2 lesions; binary T1 lesions; binary Gd‐enhancing lesions; binary CSF cells; binary CSF proteins; CSF oligoclonal bands present or absent; binary Link Index; binary Tourtellotte Index; binary Reiber Index; binary blood‐brain barrier damage index; abnormal or normal visual evoked potentials; abnormal or normal auditory evoked potentials; abnormal or normal somatosensory evoked potentials; abnormal or normal motor evoked potentials; abnormal or normal overall evoked potential score (adjusted for steroids use in 4 weeks prior to examinations, DMDs during follow‐up) Number of considered predictors Between 24 and 36 (unclear adjustments and transformations) Timing of predictor measurement At disease onset (CIS) and up to 3 months after disease onset Predictor handling All both continuously and as dichotomised (cutoff points suggested in the literature) At least 1 interaction considered
Outcome	Outcome definition Conversion to definite MS (Poser 1983): time‐to‐CDMS defined as interval between onset of the first neurological event and last neurological visit or CDMS, new symptoms or signs occurring after an interval of at least 1 month from the onset of CIS only when other diagnoses are excluded Timing of outcome measurement Follow‐up median (IQR): 7.3 years (3.5 years to 10.2 years)
Missing data	Number of participants with any missing value 224 Missing data handling Exclusion
Analysis	Number of participants (number of events) 243 (108) Modelling method Survival, Cox Predictor selection method For inclusion in the multivariable model, unclear During multivariable modelling, inclusion of predictors into the final model based on likelihood ratio test comparing the model with only MRI criteria to the model with individual predictors and MRI criteria Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Gronnesby and Borgan statistic Discrimination estimate Pencina's c‐statistic 5‐year: 0.695 (95% CI 0.635 to 0.753), 2‐year: 0.74 (95% CI 0.677 to 0.804) Classification estimate Categories defined as low: 0% to 33.3%, moderate: 33.3% to 66.7%, high: 66.6% to 100%, net reclassification improvement = 0.3 Overall performance Not reported Risk groups 3 risk groups low: 0% to 33.3%, moderate: 33.3% to 66.7%, high: 66.6% to 100%
Model	Model presentation List of selected predictors Number of predictors in the model 5 or 7 (unclear adjustment) Predictors in the model 2010 DIS criteria fulfilled, 2010 DIT criteria fulfilled, age, T1 lesions, CSF oligoclonal bands (steroid use in 4 weeks prior to study, DMT use during follow‐up) Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To determine whether multiple biomarkers improved the prediction of MS in patients with CIS in a real‐world clinical practice Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the added value of considering multiple biomarkers as opposed to univariate prediction. Model interpretation Probably exploratory Suggested improvements Multicentric prospective studies, enroling a larger number of patients with CIS and taking into consideration all the possible biomarkers (e.g.comorbidities, spinal MRI) of CDMS risk
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.

Item	Authors' judgement	Support for judgement
Participants	No	The data were from medical records, probably from routine care. According to the flow chart, the data were retrospectively identified from a database but described as a cohort in Table 1. Inclusion was based on the length of follow‐up (approximately 6% excluded for this reason) and availability of all routine workup measures (n = 195).
Predictors	No	The clinical predictors can be measured relatively objectively and were measured during the inclusion hospitalisation. The EP and MRI assessments were reported to be blinded to the follow‐up data and outcome. Although the time of intended model use is not explicit, the inclusion criteria indicated that it is 3 months from symptom onset, and all predictors were reported as measured at baseline examinations. However, all models were adjusted for treatment during follow‐up. Information was not available at the time of prediction.
Outcome	Yes	Although the blinding of outcome assessment was not reported, the outcome definition based on new symptoms is relatively objective.
Analysis	No	The details of the model itself were not explicitly reported. The number of events per variable was low. Complete case analysis was used. Only P values were reported to address calibration. Assessment occurred only in the full development set.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Misicka 2020.

*Study characteristics*
General information	Model name Ever 10 years 20 years Primary source Journal Data source Registry, secondary Study type Development
Participants	Inclusion criteria Age of MS onset of ≥ 18 years Neurologist confirmed MS diagnosis RRMS at onset Non‐Hispanic white Exclusion criteria Clinical or radiological evidence of stroke History of meningitis, neoplastic, peripheral nervous system, primary muscle disease, other well‐characterised non‐demyelinating diseases of the nervous system Blood‐borne pathogens Allogenic bone marrow transplant Weight < 37 pounds More than 1 person per extended family Recruitment Participants in the Accelerated Cure Project for MS, a repository of biological samples and epidemiological data for persons with demyelinating diseases, recruited from the patient base or the surrounding communities from 10 MS speciality clinics, USA Age (years) Ever: median 32.0 (onset) 10 years and 20 years: median 32.0 (MS onset) Sex (%F) 78.1 Disease duration (years) Median 11.0 (IQR: 5 to 19) Diagnosis 100% RRMS Diagnostic criteria Mixed: McDonald 2005 (Polman 2005) and McDonald 2010 (Polman 2011) Treatment At recruitment, probably 0% During follow‐up, not reported Disease description Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1% Recruitment period 2006 to 2013
Predictors	Considered predictors Age of MS onset, sex, years of education, history of infectious mononucleosis prior MS onset, tobacco smoking within 5 years prior to MS onset, obesity, high cholesterol, high blood pressure, type II diabetes, cancer, neurological disease, physical disease, psychological disorders, other autoimmune diseases; y/n for impaired functional domains: motor, cerebellar, spasticity, optic nerve, facial (motor), facial (sensory), brainstem and bulbar, cognitive, sexual, bladder and bowel, affect mood, fatigue; time to second relapse (TT2R; ≤ 1 year, 2 years to 5 years, and ≥ 6 years), the number of relapses experienced in the first 2 years after MS onset (NR2Y; ≤ 1, 2 to 3, ≥ 4 relapses, and NA), HLA‐A02:01 alleles (0, 1, 2, NA), HLA‐DRB115:01 alleles (0, 1, 2, NA), Genetic Risk Score Number of considered predictors 35 Timing of predictor measurement At study interview (the same as the time of outcome reporting) Predictor handling Continuously except time to second relapse and the number of relapses in the first 2 years after MS onset, which were categorised
Outcome	Outcome definition Conversion to progressive MS: time to SPMS defined as the difference between participant‐reported age of onset of RRMS, age of first symptom or exacerbation, and age of onset of SPMS Timing of outcome measurement Ever: not reported 10 years: up to 10 years 20 years: up to 20 years
Missing data	Number of participants with any missing value ≥ 323, unclear exactly how many participants have any missing Missing data handling Mixed: complete case for genetic variables, single imputation with a forest for other predictors, single imputation with category NA for NR2Y
Analysis	Number of participants (number of events) Ever: 1166 (177) 10 years: 1166 (55) 20 years: 1166 (128) Modelling method Survival, Cox Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, stepwise selection AIC Forward Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Overall performance Ever: R² = 0.28 to 0.32 (with genetic predictors) 10 years: R² = 0.50 to 0.56 (with genetic predictors) 20 years: R² = 0.34 to 0.40 (with genetic predictors) Risk groups Not reported
Model	Model presentation Nomogram Number of predictors in the model 6 (7 df) Predictors in the model Ever: age of MS onset, male sex, time to second relapse, neurological disorders, spasticity, HLA‐A02:01 10 years: age of MS onset, male sex, time to second relapse, cancer, brainstem/bulbar, HLA‐A02:01 0.60 20 years: age of MS onset, male sex, time to second relapse, obesity, neurological disorders, HLA‐A02:01 0.56 Effect measure estimates Ever: HR (95% CI): age of MS onset 1.08 (1.06 to 1.09), male sex 1.84 (1.33 to 2.55), time to second relapse 2 to 5 years 1.07 (0.75 to 1.53), time to second relapse ≥ 6 years 0.59 (0.40 to 0.88), neurological disorders 0.62 (0.40 to 0.94), spasticity 0.61 (0.37 to 1.02), HLA‐A02:01 0.73 (0.56 to 0.97), not reported/given upon request 10 years: HR (95% CI): age of MS onset 1.06 (1.03 to 1.09), male sex 2.62 (1.51 to 4.55), time to second relapse 2 to 5 years 0.69 (0.38 to 1.25), time to second relapse ≥ 6 years 0.25 (0.09 to 0.65), cancer 2.59 (1.01 to 6.67), brainstem/bulbar 0.47 (0.23 to 0.98), HLA‐A02:01 0.60 (0.35 to 1.04), not reported/given upon request 20 years: HR (95% CI): age of MS onset 1.08 (1.06 to 1.10), male sex 1.66 (1.12 to 2.45), time to second relapse 2 to 5 years 0.86 (0.57 to 1.29), time to second relapse ≥ 6 years 0.49 (0.30 to 0.80), obesity 0.33 (0.12 to 0.89), neurological disorders 0.46 (0.26 to 0.79), HLA‐A02:01 0.56 (0.39 to 0.80), not reported/given upon request Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To construct prediction models for SPMS using sociodemographic and self‐reported clinical measures that would be available at or near MS onset, with specific considerations for MS genetic risk factors Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Not reported
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on outcomes and their definition, applicability is unclear. Auxiliary references Saroufim P, Zweig SA, Conway DS, Briggs FBS. Cardiovascular conditions in persons with multiple sclerosis, neuromyelitis optica and transverse myelitis. Mult Scler Relat Disord 2018;25:21‐5.

Item	Authors' judgement	Support for judgement
Participants	No	The clinical data were collected cross‐sectionally by asking the patients about their medical history. Therefore, there is a high chance of recall bias or length‐time bias.
Predictors	No	The nature of clinical data collection by medical history taken from patients introduces recall bias. For example, the patients who had SP might remember the details or the diseases they had more vividly. Alternatively, patients with a shorter disease duration at the time of the interview might remember the details at disease onset more accurately.
Outcome	No	The outcome was based on patient‐reported time of RRMS diagnosis, and while CDMS was confirmed by a neurologist, the authors did not report that the timing was also confirmed. A definition of what was considered SPMS was not given. This makes the outcome assessment non‐standard and non‐uniform. Also, the patients knew all their clinical history while reporting the outcome.
Analysis	No	The EPV was less than 10. Time to second relapse was categorised. Neither calibration nor discrimination was addressed. Evaluation occurred in the full development set only. Missing values for non‐genetic variables were handled with multiple imputation. Participants not contributing genetic data were excluded.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Montolio 2021.

*Study characteristics*
General information	Model name Disability Course ‐ LSTM Primary source Journal Data source Routine care, secondary Study type Development
Participants	Inclusion criteria MS patients Best corrected visual acuity (BCVA) of 20/40 or higher Refractive error within ± 5.00 dioptres equivalent sphere and ± 2.00 dioptres astigmatism Transparent ocular media (nuclear colour or opalescence, cortical or posterior subcapsular lens opacity < 1) according to the Lens Opacities Classification System III Unclear, having 7 visits: baseline, 5 annual, 10‐year visit Exclusion criteria Prior intraocular surgery Diabetes or other diseases affecting the visual field or nervous system Ongoing use of medications that could affect visual function Recruitment Miguel Servet University Hospital in Zarazoga, Spain Age (years) Mean 42.4 Sex (%F) 67.1 Disease duration (years) Mean 10.1 (pooled SD 7.74) Diagnosis 92.7% RRMS, 6.1% SPMS, 1.2% PPMS Diagnostic criteria McDonald 2001 Treatment Unclear timing, 40% on IFN, 30% on immunomodulators, 30% none Disease description EDSS mean 2.6 (SD between 1.27 and 2.02) Recruitment period Not reported
Predictors	Considered predictors Baseline visit: age, sex, MS duration, MS subtype, ON antecedent; at baseline and the following 2 annual visits: BCVA, relapse in past year, EDSS, peripapillary thickness, superior thickness, nasal thickness, inferior thickness, temporal thickness, foveal thickness Number of considered predictors 39 Timing of predictor measurement At 3 visits over 2 years (not defined baseline and annual visits 1 and 2) Predictor handling Continuously, one‐hot encoding for categories
Outcome	Outcome definition Disability (EDSS): worsening defined as at least a 1‐point increase in EDSS between visit 2‐ and 10‐year follow‐up Timing of outcome measurement Follow‐up for 10 years from baseline, 8 years from the last predictors
Missing data	Number of participants with any missing value Not reported Missing data handling Not reported
Analysis	Number of participants (number of events) 82 (37) Modelling method Long short‐term memory recursive neural network Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, LASSO used to select predictors for actual prediction model Hyperparameter tuning Search for optimal number of hidden layers (30), epochs (30), and mini‐batch size (20) in cross validation Shrinkage of predictor weights Not reported Performance evaluation dataset Development Performance evaluation method Cross‐validation, 10‐fold Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.8165 Classification estimate Accuracy = 0.817, sensitivity = 0.811, specificity = 0.822, PPV = 0.789 Overall performance Not reported Risk groups Not reported
Model	Model presentation List of selected predictors Number of predictors in the model 5 (4 of them longitudinal) Predictors in the model Disease duration, relapse in preceding year, EDSS, temporal RNFL thickness, superior RNFL thickness Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable
Interpretation	Aim of the study To improve the MS diagnosis and predict the long‐term course of disability in MS patients based on clinical data and retinal nerve fibre layer (RNFL) thickness, measured by optical coherence tomography (OCT) Primary aim The primary aim of this study is only partially about the prediction of individual outcomes. The focus is on OCT measures and machine learning. Model interpretation Probably exploratory Suggested improvements Use of OCT devices in combination with other techniques such as MRI, EP or CSF analysis, used in combination with clinical data, such as the EDSS
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.

Item	Authors' judgement	Support for judgement
Participants	No	The data source was routine care even though there were clearly defined inclusion and exclusion criteria.
Predictors	Yes	Although the predictors are collected up to year 2 to predict 10‐year outcome from the baseline, the situation is clarified by reporting the prediction as for 8 years.
Outcome	No	The outcome was based on a 1‐point increase in EDSS. However, the meaning of a 1‐point change depends on the baseline value. This study included participants of different MS subtypes and a range of EDSS at baseline, which are expected to have different patterns of change due to disease. The outcome was not reported to be confirmed at a later point.
Analysis	No	The EPV was very low. Information on missing data and handling was not reported. Calibration was not assessed. Parameter tuning, modelling method selection, and final performance resulted from unnested CV. No model was provided.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Olesen 2019.

*Study characteristics*
General information	Model name Routine Candidate Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria > 15 years of age Acute ON diagnosed by independent neurological and ophthalmologic examination Prior to treatment Exclusion criteria Previous diagnosis of MS NMOSD Another cause of optic neuropathy that was apparent at the time of referral (other intraocular pathologic conditions with symptoms mimicking ON such as high ocular pressure and vascular, traumatic, infectious, metabolic, neoplastic, or toxic causes) Recruitment 3 hospital units with ophthalmology departments and 44 ophthalmologists in general practice (Primary Care Ophthalmology) in the administrative unit Region of Southern Denmark Age (years) Median 36.0 Sex (%F) 67.5 Disease duration (years) Not reported Diagnosis 100% CIS (isolated optic neuritis) Diagnostic criteria Optic Neuritis Study Group criteria 1991 Treatment At recruitment, 0% During follow‐up, not reported Disease description Not reported Recruitment period 2014 to 2016
Predictors	Considered predictors Routine: CSF leukocyte count, oligoclonal band positivity, IgG index, albumin ratio Candidate: CSF neurofilament light chain levels (NF‐L), serum IL‐10, serum IL‐6, serum IL‐17A, serum IL‐1beta, serum TRAIL, CSF IL‐10, CSF IL‐6, CSF IL‐17A, CSF IL‐1beta, CSF TRAIL, CSF CXCL13, CSF TNF‐alpha, serum TNF‐alpha Number of considered predictors Routine: 4 Candidate: 14 Timing of predictor measurement At disease onset (ON), from ON onset median (range): 14 days (2 days to 38 days) Predictor handling Continuously
Outcome	Outcome definition Conversion to definite MS (McDonald 2010, Polman 2011): MS diagnosed according to McDonald 2010 Timing of outcome measurement Follow‐up median (range): 29.6 months (19 months to 41 months)
Missing data	Number of participants with any missing value Routine: 2 Candidate: 7 Missing data handling Not reported
Analysis	Number of participants (number of events) Routine: unclear if 38, 40 reported (≤ 16) Candidate: unclear if 33, 40 reported (≤ 16) Modelling method Logistic regression Predictor selection method For inclusion in the multivariable model, univariable analysis During multivariable modelling, full model approach Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Bootstrap, B = 500 Calibration estimate Calibration plot, Hosmer‐Lemeshow test Discrimination estimate c‐Statistic Routine: 0.86 (95% CI 0.74 to 0.98), optimism‐corrected 0.83 Candidate: 0.89 (95% CI 0.77 to 1.00), optimism‐correct 0.87 Classification estimate Not reported Overall performance Routine: adjusted McFadden R² = 0.16 Candidate: adjusted McFadden R² = 0.15 Risk groups Not reported
Model	Model presentation Nomogram Number of predictors in the model 3 Predictors in the model Routine: OCB, leukocytes, IgG index Candidate: IL‐10, NF‐L, CXCL13 Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study We propose that markers of inflammation and of neurodegeneration (a) may differ between patients with MS‐related ON and patients with ON unrelated to MS and (b) may predict development of MS in patients with acute ON. Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the association of CSF markers with CDMS. Model interpretation Probably exploratory Suggested improvements Validation in larger, well‐designed cohorts including differential diagnoses and other ethnicities from multiple centres
Notes	Applicability overall High Applicability overall rationale The predictors used were CSF biomarkers and no other predictor domain was considered for use in the model. Auxiliary references Soelberg K, Jarius S, Skejoe H, Engberg H, Mehlsen JJ, Nilsson AC, et al. A population‐based prospective study of optic neuritis. Mult Scler 2017;23(14):1893‐901. Soelberg K, Skejoe HPB, Grauslund J, Smith TJ, Lillevang ST, Jarius S, et al. Magnetic resonance imaging findings at the first episode of acute optic neuritis. Mult Scler Relat Disord 2018;20:30‐6.

Item	Authors' judgement	Support for judgement
Participants	Unclear	A prospectively collected population‐based cohort was used, but it is unclear if the participants who experienced the outcome were included. 12 participants were diagnosed with clinically definite MS in less than 2 months.
Predictors	Yes	Predictors were collected shortly after onset and were collected without knowledge of the outcome due to the prospective collection.
Outcome	No	From the 40 patients included and 16 events of clinically definite MS, 12 were diagnosed with MS at the acute stage of optic neuritis in less than 2 months, while the predictors, venous blood and CSF, from all included patients were collected within 38 days of ON onset (median, 14 days; range 2 to 38). Thus, the time difference between predictor collection and outcome seems to be too short.
Analysis	No	The number of participants was too low. Not all participants were included in the analysis, and a complete case analysis was probably applied. Besides, the number of patients with missing data was about 5% and was not expected to increase the risk of bias. Univariate analyses were used to select candidate predictors for multivariate analysis. Logistic regression was applied even though there was no defined timing of the outcome and follow‐up duration varied amongst participants. The resampling method excluded the variable selection process. Effect estimates were not reported, so it is unclear whether the final model corresponds to the multivariable analysis.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Oprea 2020.

*Study characteristics*
General information	Model name Mixed treatment ‐ disability Primary source Journal Data source Routine care, secondary Study type Development
Participants	Inclusion criteria RRMS or PPMS At least 2 neurological evaluations in the past 2 years Exclusion criteria Not reported Recruitment Neurology Department of the Bucharest Emergency University Hospital (BEUH), Romania Age (years) Mean 40.3 Sex (%F) 61.6 Disease duration (years) Mean 10.2 Diagnosis RRMS, PPMS Diagnostic criteria Not reported Treatment At recruitment, not reported During follow‐up, 30.5% IFN, 29.8% GA, 2.0% teriflunomide, 37.1% natalizumab, 0.7% unknown Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Gender, age at diagnosis, age, EDSS at onset, disease duration, number of treatments Number of considered predictors 6 Timing of predictor measurement At a single time point during outcome determination Predictor handling Unclear, continuously or EDSS at onset categorised
Outcome	Outcome definition Disability (EDSS): keeping an EDSS score less than or equal to EDSS score threshold (chosen model with EDSS threshold ≤ 2.5) at final visit Timing of outcome measurement Not reported
Missing data	Number of participants with any missing value Not reported Missing data handling Not reported
Analysis	Number of participants (number of events) 151 (not reported) Modelling method Logistic regression Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, full model approach Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Cross‐validation, 10,000 shuffle split, train‐test: 14:1 Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.8221 Classification estimate Accuracy = 0.7662, sensitivity = 0.7775, PPV = 0.8145, F1 = 0.7806 Overall performance Brier score = 0.1754 Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model 6 Predictors in the model Gender, age at diagnosis, age, EDSS at onset, disease duration, number of treatments Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To develop a disability and outcome prediction algorithm in MS patients Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements More patients, more relevant predictors, online platform
Notes	Applicability overall Unclear Applicability overall rationale Due to ambiguities and the lack of reporting on participants, predictors, and outcomes, applicability is unclear.

Item	Authors' judgement	Support for judgement
Participants	No	The study data come from routine care and eligibility criteria are unclear.
Predictors	No	The timing of data collection for predictors and outcome measurement are the same. Hence, probably the predictors are not available at the time of the intended model use.
Outcome	No	The outcome definition is unclear and does not mention confirmation. Also, the timing of the outcome assessment with respect to the prognostication is unclear, probably making the outcome highly variable for different patients with different periods between onset and assessment visit.
Analysis	No	The number of events was unclear, but even in the best case scenario, the number of events per predictor was lower than 15. No information on missing data and its handling was reported. Timing of predictor and outcome assessment was not considered. The final model was not presented. Although cross‐validation was used for internal validation, the need for shrinkage was not assessed.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Pellegrini 2019.

*Study characteristics*
General information	Model name Final model with 3 predictors Primary source Journal Data source Randomised trial participants, secondary Study type Development
Participants	Inclusion criteria Common to all EDSS 0 to 5 (ADVANCE) Age between 18 years and 65 years RRMS diagnosis (McDonald 2005, Polman 2005) At least 2 clinically documented relapses in the previous 3 years, with at least one having occurred within the past 12 months (CONFIRM) Age between 18 years and 55 years Diagnosis of RRMS (McDonald 2005, Polman 2005) At least 1 clinically documented relapse in the previous 12 months or at least 1 gadolinium‐enhancing lesion 0 weeks to 6 weeks before randomisation (DEFINE) Age between 18 years and 55 years RRMS diagnosis (McDonald 2005, Polman 2005) Disease activity as evidenced by at least 1 clinically documented relapse within 12 months before randomisation or a brain magnetic resonance imaging (MRI) scan, obtained within 6 weeks before randomisation, that showed at least 1 gadolinium‐enhancing lesion. (AFFIRM) Males and females between the ages of 18 years and 50 years Diagnosis of RRMS (McDonald 2001, McDonald 2001) At least 1 medically documented relapse within the 12 months before the study began MRI showing lesions consistent with MS Exclusion criteria Common to all Progressive forms of MS (ADVANCE) Prespecified laboratory abnormalities Previous treatment with interferon for MS for more than 4 weeks or discontinuation less than 6 months before baseline (CONFIRM) Other clinically significant illness Prespecified laboratory abnormalities Prior exposure to glatiramer acetate or contraindicated medications (DEFINE) Another major disease that would preclude participation in a clinical trial Abnormal results on prespecified laboratory tests Recent exposure to contraindicated medications (AFFIRM) A relapse within 50 days before the administration of the first dose of the study drug Treatment with cyclophosphamide or mitoxantrone within the previous year Treatment with interferon beta, glatiramer acetate, cyclosporine, azathioprine, methotrexate, or intravenous immune globulin within the previous 6 months Treatment with interferon beta, glatiramer acetate, or both for more than 6 months Recruitment Placebo arm participants in the ADVANCE, DEFINE, CONFIRM, and AFFIRM, multi‐site RCTs Australia, Austria, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Canada, Chile, Colombia, Costa Rica, Croatia, Czech Republic, Estonia, France, Georgia, Germany, Greece, Guatemala, India, Ireland, Israel, Latvia, Mexico, Macedonia, Netherlands, Moldova, New Zealand, Peru, Poland, Romania, Russian Federation, Puerto Rico, Serbia, Slovakia, South Africa, Switzerland, Spain, Ukraine, United Kingdom, United States, Virgin Islands (USA) Age (years) Mean 37.1 Sex (%F) 71.0 Disease duration (years) Mean 7.5 (SD 6.5) Diagnosis 100% RRMS Diagnostic criteria Mixed: McDonald 2001, McDonald 2005 Treatment Prior treatment, 34.5% During follow‐up, 0% Disease description EDSS mean (SD): 2.5 (1.2), number of relapses 1 year prior to study entry mean (SD): 1.4 (0.7) Recruitment period Not reported
Predictors	Considered predictors Age (in years), gender (male vs female), ethnicity (white vs other), number of relapses 1 year prior to study entry, number of relapses 3 years prior to study entry, MS disease duration (in years), time since pre‐study relapse (in months), prior treatment (yes vs no), EDSS, T25FW, 9HPT, PASAT, VFT 2.5%, gadolinium‐enhancing lesion number, T1 lesion volume (log‐scale), T2 lesion volume (log‐scale), brain volume standardised Z‐score, brain parenchymal fraction, SF‐36 Physical Component Summary, SF‐36 Mental Component Summary, study identifier (as fixed term adjustment) Number of considered predictors 23 Timing of predictor measurement At study baseline (RCT) Predictor handling Continuously Interactions tested by model based decision trees
Outcome	Outcome definition Composite (EDSS, T25FW, 9HPT, PASAT, VFT): time to disability progression confirmed at 24 weeks on either EDSS (≥ 1 point increase if baseline EDSS ≥ 1.0 or 1.5 point increase otherwise) or any of timed 25‐foot walk (T25FW) test, 9HPT, Paced Auditory Serial Addition Test (PASAT), and visual function test (VFT; 2.5% contrast level) components (20% worsening on either T25FW or 9HPT or PASAT or 10‐letter worsening on VFT) Timing of outcome measurement Up to 2 years
Missing data	Number of participants with any missing value Missing MRI data for 44% and 48% by design (DEFINE and CONFIRM) Missing data handling Multiple imputation, 10 MCMC‐based imputation sets
Analysis	Number of participants (number of events) 1582 (434) Modelling method Survival, Cox Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, 3 best predictors (number selected based on bootstrapped c‐index) from median variable importance rank calculated from 6 modelling algorithms Hyperparameter tuning Parameter tuning of ML models leading to predictor selection well‐described in supplementary material Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Bootstrap Calibration estimate Calibration slope 1 year = 1.10 (bootstrap = 1.08, SE 0.17), 2 years = 1.00 (bootstrap = 0.97, SE 0.15) Discrimination estimate Survival c‐statistic: 1 year 0.59 (SE 0.02), bootstrap 0.59 2 years 0.59 (SE 0.01), bootstrap 0.59 Classification estimate Not reported Overall performance Not reported Risk groups Not reported
Model	Model presentation Regression coefficients without baseline hazard Number of predictors in the model 3 Predictors in the model PASAT, SF‐36 physical component summary, visual function test Effect measure estimates HR (95% CI): PASAT 0.94 (0.90 to 0.98), SF‐36 physical component summary 0.92 (0.88 to 0.97), visual function test 0.95 (0.92 to 0.99) Predictor influence measure Relative importance ranking Validation model update or adjustment Not applicable
Interpretation	Aim of the study To compare the aforementioned regression and machine learning methods in their ability to assess the ranking of common prognostic factors for MS progression and to generate consistent risk predictions in clinical trial data settings Primary aim The primary aim of this study is the prediction of individual outcomes. The focus is on the factors. Model interpretation Exploratory Suggested improvements Explore alternative predictors and their change over time, sensitivity of the endpoints’ definition to a set of baseline characteristics using a multivariate (i.e. joint) endpoint assessment based on variance components
Notes	Applicability overall Low Auxiliary references Calabresi PA, Kieseier BC, Arnold DL, Balcer LJ, Boyko A, Pelletier J, et al. Pegylated interferon beta‐1a for relapsing‐remitting multiple sclerosis (ADVANCE): a randomised, phase 3, double‐blind study. Lancet Neurol 2014;13(7):657‐65. Fox RJ, Miller DH, Phillips JT, Hutchinson M, Havrdova E, Kita M, et al. Placebo‐controlled phase 3 study of oral BG‐12 or glatiramer in multiple sclerosis. N Engl J Med 2012;367(12):1087‐97. Gold R, Kappos L, Arnold DL, Bar‐Or A, Giovannoni G, Selmaj K, et al. Placebo‐controlled phase 3 study of oral BG‐12 for relapsing multiple sclerosis. N Engl J Med 2012;367(12):1098‐107. Polman CH, O'Connor PW, Havrdova E, Hutchinson M, Kappos L, Miller DH, et al. A randomized, placebo‐controlled trial of natalizumab for relapsing multiple sclerosis. N Engl J Med 2006;354(9):899‐910. NCT00906399. Efficacy and safety study of peginterferon beta‐1a in participants with relapsing multiple sclerosis (ADVANCE). https://clinicaltrials.gov/ct2/show/NCT00906399 (first received 21 May 2009). NCT00027300. Safety and efficacy of natalizumab in the treatment of multiple sclerosis. https://clinicaltrials.gov/ct2/show/NCT00027300 (first received 3 December 2001). NCT00420212. Efficacy and safety of oral BG00012 in relapsing‐remitting multiple sclerosis (DEFINE). https://clinicaltrials.gov/ct2/show/NCT00420212 (first received 11 January 2007). NCT00451451. Efficacy and safety study of oral BG00012 with active reference in relapsing‐remitting multiple sclerosis (CONFIRM). https://clinicaltrials.gov/ct2/show/NCT00451451 (first received 23 March 2007).

Item	Authors' judgement	Support for judgement
Participants	Yes	Data from an RCT were used. Although the inclusion and exclusion criteria for the prediction study were not described, the number of patients per study matched up with the original RCT publications; hence, there is no reason to assume that there were additional eligibility criteria for the prediction study.
Predictors	Yes	The predictors were collected during an RCT; therefore, they are expected to be collected in the same way across all patients.
Outcome	Yes	The outcome was composite with clear components of clinical interest, which are considered to be objective measurements. Assessments occurred during RCTs and are expected to be standardised.
Analysis	Yes	The EPV was around 20. Overfitting and optimism were accounted for. Calibration and discrimination were assessed.
Overall	Yes	All domains are at low risk of bias.

Open in a new tab

Pinto 2020.

*Study characteristics*
General information	Model name SP Severity 6 years Severity 10 years Primary source Journal Data source Routine care, secondary Study type Development
Participants	Inclusion criteria SP: Tracked since onset diagnosis or with SP diagnosis only after the fifthyear of tracking Minimum of 6 years of tracking Minimum of 5 annotated visits Severity 6 years and severity 10 years: Tracked since onset diagnosis Minimum of 10 years of tracking Minimum of 5 annotated visits Exclusion criteria Patients with PPMS Recruitment Neurology Department of Centro Hospitalar e Universitario de Coimbra, Portugal Age (years) SP: mean 31.1 (onset) Severity 6 years: mean 30.3 (onset) Severity 10 years: mean 32.3 (onset) Sex (%F) SP: 72.7 Severity 6 years: 69.7 Severity 10 years: 77.6 Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria McDonald (undefined) Treatment Not reported Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Identification (static): gender, age of onset, initial supratentorial manifestations, initial optic pathway manifestations, initial brainstem or cerebellum manifestations, initial spinal cord manifestations, clinical evidence in the MS initial manifestations, initial manifestations visualised in MRI, initial manifestations visualised in evoked potentials test, initial manifestations visualised in CSF Mean, median, standard deviation, and mode and 2 segmentation windows (normal and accumulative) for dynamic features Visits (dynamic): routine visit; FS scores for pyramidal, cerebellar, brainstem, sensory, bowel and bladder, visual, mental, and ambulation; cerebellar weakness, visual symptoms, gait disturbances related to ataxia, dysaesthesiae, lower extremities ataxia, paresthesiae, perturbances in cognition, gait disturbances related to paresis, gait disturbances related to spasticity, muscular weakness in upper extremities, perturbances in micturition, fatigue, muscular weakness in lower extremities, mood perturbances, EDSS Relapses (dynamic): impact on ADL functions, recovery, severity; manifestations related to pyramidal tract, brain stem, bowel and bladder, neuropsychological functions, cerebellum, visual functions, and sensory functions; hospitalisation, effect on ambulatory capacity Number of considered predictors 1306 Timing of predictor measurement At multiple visits dependent on which n‐year model (n = 1 to 5) Predictor handling Continuously No interactions considered
Outcome	Outcome definition SP: Conversion to progressive MS (not reported): the indication in the database of an SP course diagnosis by the clinicians Severity 6 years: Disability (EDSS): severe disease defined as EDSS > 3 by 6 years based on the mean EDSS from all visits to the clinic that happened in the 6th year; when the year did not contain any annotated visits for a given patient, 1 of 2 possible consecutive years were considered; the chosen model is 2 years from the onset Severity 10 years: Disability (EDSS): severe disease defined as EDSS > 3 by 10 years based on the mean EDSS from all visits to the clinic that happened in the 10th year; when the year did not contain any annotated visits for a given patient, 1 of 2 possible consecutive years were considered; the chosen model is 5 years from the onset Timing of outcome measurement SP: as long as availability in the database/by year 2 (from baseline which formed the basis for year‐n models) 7 patients already had SP Severity 6 years: at 6 years (the 2 years from the onset model) until year 2 (from baseline which formed the basis for year‐n models) at least 19 patients already met the definition of severe disease Severity 10 years: at 10 years (the 5 years from onset model) until year 5 (from baseline which formed the basis for year‐n models) at least 15 patients already met the definition of severe disease
Missing data	Number of participants with any missing value Unclear exactly how many participants have any missing Missing data handling Mixed: single imputation of the feature mean for predictors, exclusion for outcome
Analysis	Number of participants (number of events) SP: 187 (21) Severity 6 years: 145 (38) Severity 10 years: 67 (30) Modelling method Support vector machine Predictor selection method For inclusion in the multivariable model, univariable analysis During multivariable modelling, LASSO used to select predictors for other prediction models, tuning parameter chosen in order to yield at least 5 training samples per predictor Hyperparameter tuning Default parameters of MatLab function Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, (10 times) 10‐fold Calibration estimate Not reported Discrimination estimate c‐Statistic SP: 0.86 (SD 0.07) Severity 6 years: 0.89 (SD 0.03) Severity 10 years: 0.85 (SD 0.07) Classification estimate SP: sensitivity = 0.76 (SD 0.14), specificity = 0.77 (SD 0.05), F1 score = 0.20 (0.05), geometric mean = 0.76 (SD 0.08) Severity 6 years: sensitivity = 0.84 (SD 0.11), specificity = 0.81 (SD 0.05), F1 score = 0.53 (SD 0.07), geometric mean = 0.82 (SD 0.06) Severity 10 years: sensitivity = 0.77 (SD 0.13), specificity = 0.79 (SD 0.09), F1 score = 0.72 (SD 0.09), geometric mean = 0.78 (SD 0.08) Overall performance Not reported Risk groups Not applicable
Model	Model presentation Not reported Number of predictors in the model Unclear which predictors make up the final model Predictors in the model Not reported Effect measure estimates Not reported Predictor influence measure Predictive power (% of iterations predictor selected in) Validation model update or adjustment Not applicable
Interpretation	Aim of the study To predict MS progression, based on the clinical characteristics of the first 5 years of the disease Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements Include more information such as MRI examination; use on different disease severeness criteria and compare; considering disease phenotypes as an interaction
Notes	Applicability overall High Applicability overall rationale Approximately half of the participants had already experienced the outcome before the measurement of predictors, which included the baseline measure of the outcome itself.

Item	Authors' judgement	Support for judgement
Participants	No	The data source was routine care, and inclusion was based on the availability of data.
Predictors	No	SP: The intended time of prediction is unclear and defined as availability in the dataset. Hence, using the predictors at 2 years seems to be unavailable at baseline. There is no enough information to judge whether predictors were defined and assessed in a similar way across patients, especially predictors related to instruments since the timing of data collection is unclear. Severity 6 years and severity 10 years: The prognosis was presented as a 6‐year prediction while the predictors were from the second year, effectively shortening the prediction window. There is no enough information to judge whether predictors were defined and assessed in a similar way across patients, especially predictors related to instruments since the timing of data collection is unclear.
Outcome	No	SP: The outcome was SPMS from a routine care database, which is expected to be not standardised or operationalised. Also, the timing of the outcome was not clearly defined but was limited to the availability in the database. At 2 years, which was the chosen model, half of the events had had already occurred. Severity 6 years and severity 10 years: EDSS scores were included in the predictors, and almost half of the participants have already experienced the event, the definition of which is based on EDSS by year 2. Also, the EDSS change was not reported to be confirmed.
Analysis	No	The amount of missing data was substantial and was handled by mean imputation within the cross‐validation structure. The sample size was too small. Univariable predictor selection was used. Calibration was not assessed. Parameter tuning for the main chosen model, SVM, was not reported. At correspondence, the defaults were reported to be used. It is unclear whether model selection and evaluation were properly separated. The final model is unclear.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Pisani 2021.

*Study characteristics*
General information	Model name SP‐RiSc Primary source Journal Data source Cohort, secondary Study type Development
Participants	Inclusion criteria RRMS patients Exclusion criteria Not reported Recruitment MS specialist centre of Verona University Hospital, Italy Age (years) Mean 33.5 Sex (%F) 58.4 Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria McDonald 2005 (Polman 2005) Treatment At recruitment, 100% on first‐line DMT (IFN, GA) During follow‐up, 43.5% switched to second‐line therapy (fingolimod, natalizumab) Disease description EDSS median (range): 1.5 (0 to 3.5) Recruitment period 2005 to 2018
Predictors	Considered predictors At onset: age, sex, EDSS, cortical lesion number, white matter lesion number, spinal cord lesions At 2 years: EDSS, number of relapses, new CL number, new WM lesion number Difference (2 years to 0 years): global cortical thickness, cerebellar cortical volume (adjusted for switch to second‐line DMT during follow‐up) Number of considered predictors 12 or 13 (unclear adjustment) Timing of predictor measurement At diagnosis (RRMS) and up to 2 years after diagnosis Predictor handling Continuously
Outcome	Outcome definition Conversion to progressive MS (Lublin 1996): time to the occurrence of continuous disability accumulation independently of relapses, confirmed 12 months later, transitory plateaus in the progressive course were allowed, steady progression was the rule Timing of outcome measurement Examination every 6 months or when a relapse occurred; mean (range) follow‐up: 9.55 years (6.8 years to 13.13 years)
Missing data	Number of participants with any missing value Not reported Missing data handling Not reported
Analysis	Number of participants (number of events) 262 (69) Modelling method Random survival forest Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, inclusion if minimal depth < mean minimal depth Hyperparameter tuning Parameters, but not tuning methods, mentioned in appendix Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Random split for tool performance; development out‐of‐bag for random forest performance Calibration estimate Not reported Discrimination estimate Harrel’s c‐index on development out‐of‐bag for evaluating random forest at: 7 years 92.0% 8.5 years 91.0% 9.5 years 91.4% 10.5 years 90% 13.5 years 90.0% Classification estimate Cutoff = 17.7, accuracy = 0.88 (95% CI 0.75 to 0.96), sensitivity = 0.92 (95% CI 0.70 to 1.00), specificity = 0.87 (95% CI 0.70 to 0.96), PPV = 0.75 (95% CI 0.48 to 0.93), NPV = 0.96 (95% CI 0.81 to 1.00) from evaluation of final tool using random split Overall performance Brier score (95% CI) using development out‐of‐bag for evaluating RF at: 7 years: 0.08 (0.06 to 0.10) 8.5 years: 0.09 (0.08 to 0.11) 9.5 years: 0.08 (0.06 to 0.11) 10.5 years: 0.05 (0.03 to 0.06) 13.5 years: 0.02 (0.01 to 0.03) Risk groups 3 risk groups (high for those with ensemble mortality > third quartile, medium, low for those with ensemble mortality < first quartile)
Model	Model presentation Combination of heat map value for 2 predictors plus other predictor values weighted by their minimal depth Number of predictors in the model 7 Predictors in the model At onset: cortical lesion number, age, EDSS, white matter lesion number Difference (2 years to 0 years): global cortical thickness, cerebellar cortical volume, new cortical lesion number Effect measure estimates Not reported Predictor influence measure Minimal depth Validation model update or adjustment Not applicable
Interpretation	Aim of the study To develop the secondary progressive risk score (SP‐RiSc), which integrates demographic, clinical, and MRI data collected from a cohort of RRMS patients during the first 2 years after the disease diagnosis Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements An additional validation, especially on a larger independent cohort with neuroimaging data from different field strength MRI scanners
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	Unclear	Although the data source is reported to be a cohort study, there are no details related to eligibility criteria.
Predictors	No	Images were produced at a single centre and analysed by 2 clinicians with experience. The included predictors are relatively objective. The model is meant to be used at RRMS diagnosis and the survival model counts time from diagnosis only. This means that the predictors measured at 2 years from the diagnosis are should be considered unavailable.
Outcome	No	The secondary progression conversion outcome was clearly defined but it was not operationalised and hence the application of this definition might vary greatly based on assessors and experience level.
Analysis	No	The number of events was low. Missing data and its handling was not mentioned. Discrimination and overall performance of the original RF is evaluated internally with out‐of‐bag error, but the final model is only assessed with classification measures. There is no mention of parameter tuning. The final prediction tool does not correspond to the multivariable model.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Roca 2020.

*Study characteristics*
General information	Model name Aggregated model Primary source Journal Data source Registry, secondary Study type Development
Participants	Inclusion criteria Patients with MS with initial FLAIR MRI and EDSS score at 2 years Exclusion criteria Not reported Recruitment Subset of the OFSEP (Observatoire français de la sclérose en Plaques) registry from 37 institutions in 13 French cities, France Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis Not reported Diagnostic criteria Not applicable Treatment Unclear Disease description Not reported Recruitment period From 2008 onward
Predictors	Considered predictors Non‐tabular: FLAIR images, lesion masks from White Matter Hyperintensities segmentation from FLAIR images Tabular: 60 tracts of interest from the ICBM‐DTI 81 white matter labels and sensorimotor tracts atlases in MNI space, whole‐brain lesion load, volume of the lateral ventricles, age, gender, 3D/2D nature of FLAIR sequence Number of considered predictors Non‐tabular data + 65 Timing of predictor measurement At FLAIR imaging (initial in the dataset) Predictor handling Unclear, probably continuously
Outcome	Outcome definition Disability (EDSS): EDSS score Timing of outcome measurement At 2 years from the initial imaging
Missing data	Number of participants with any missing value 19 Missing data handling Not reported
Analysis	Number of participants (number of events) 1427 (continuous outcome) Modelling method Ensemble: convolutional neural network (linear and non‐linear registration), random forest (single and dual), and manifold learning Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, full model approach Hyperparameter tuning Not reported Shrinkage of predictor weights Unclear Performance evaluation dataset Development Performance evaluation method Random split of approximately 1/3 for test set, further random split of remaining data (90% training, 10% validation) Calibration estimate Plot of MSE per EDSS category, MSE 2.21 (validation), 3 (test) Discrimination estimate Not applicable Classification estimate Not applicable Overall performance Not reported Risk groups Not applicable
Model	Model presentation Not reported Number of predictors in the model Unstructured data + 65 Predictors in the model Non‐tabular: FLAIR images, lesion masks from White Matter Hyperintensities segmentation from FLAIR images Tabular: 60 tracts of interest from the ICBM‐DTI 81 white matter labels and sensorimotor tracts atlases in MNI space, whole‐brain lesion load, volume of the lateral ventricles, age, gender, 3D/2D nature of FLAIR sequence Effect measure estimates Not reported Predictor influence measure Most informative features by RF variable importance Validation model update or adjustment Not applicable
Interpretation	Aim of the study To create an algorithm that combines multiple machine‐learning techniques to predict the expanded disability status scale (EDSS) score of patients with multiple sclerosis at 2 years solely based on age, sex and fluid‐attenuated inversion recovery (FLAIR) MRI data Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Using additional factors such as baseline EDSS score, including quantitative metrics coming from T1‐weighted‐based segmentation, a larger cohort or oversampling of high EDSS score examples or generating synthetic data, further validated on an external larger test cohort
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to provide a model/tool for the prediction of individual MS outcomes. Auxiliary references Vukusic S, Casey R, Rollot F, Brochet B, Pelletier J, Laplaud DA, et al. Observatoire Francais de la Sclerose en Plaques (OFSEP): a unique multimodal nationwide MS registry in France. Mult Scler 2020;26(1):118‐22. NCT02889965. The French multiple sclerosis registry (OFSEP). https://clinicaltrials.gov/ct2/show/NCT02889965(first received 7 September 2016).

Item	Authors' judgement	Support for judgement
Participants	No	The data, although collected prospectively in a registry, were known to have inclusion biases. The full dataset (DS1, DS2, DS3) corresponded to all the MRI scans that were recorded in the OFSEP database, meaning inclusion was defined by availability of data.
Predictors	No	The final features are based on heterogeneously collecting imaging data. It was unclear whether outcomes were known when features were created, but we did not believe this to be a source of bias.
Outcome	Yes	The outcome was based on EDSS, which we assume to be standard and robust to predictor knowledge. The visit frequency was reported to be about yearly.
Analysis	No	Calibration was not fully explored and reported (the bar chart and MSE did not allow for an understanding of the direction of the errors). Missing data were addressed in the Participant section. An additional 19 people were dropped due to data quality, but this number was very small (~1%) compared to the total amount. Random splits of the data were used for evaluation. Hyperparameter tuning details were unclear. The number of participants per predictor may be low given the complex modelling techniques used. The internal evaluation used the validation set to weight the models in the aggregate and then again to assess performance of this model. Presentation of the final model was unclear.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Rocca 2017.

*Study characteristics*
General information	Model name 15‐month clinical and MR Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria PPMS or probable PPMS (negative CSF examination and positive MRI findings) Patients participated in the study by Rovaris 2006 Exclusion criteria Not reported Recruitment Consecutively from the outpatient populations attending MS clinics of the participating institutions, unclear which, Italy Age (years) Mean 51.3 Sex (%F) 50.0 Disease duration (years) Median (range) 10 (2 to 26) Diagnosis 100% PPMS Diagnostic criteria Thompson 2000 Treatment At recruitment, 16.7% azathioprine, 7.4% mitoxantrone, 9.3% methotrexate, 66.7% no DMT During follow‐up, unclear, at 15 years, 13% azathioprine, 3.7% mitoxantrone, 1.9% methotrexate, 81.5% no DMT Disease description EDSS median (IQR): 6.0 (4.5 to 6.5) Recruitment period Not reported
Predictors	Considered predictors Age, log disease duration, baseline EDSS, baseline MS severity score, change in EDSS at 15 months, log baseline T2 lesion volume, T2 lesion volume percentage change, log baseline T1 lesion volume, T1 lesion volume percentage change, number of new T2 lesions, number of new T1 lesions, normalised brain volume, percentage brain volume change, cervical cord cross‐sectional area, cervical cord cross‐sectional area percentage change, average lesion mean diffusivity, average lesion mean diffusivity percentage change, average lesion fractional anisotropy, average lesion fractional anisotropy percentage change, average normal‐appearing white matter mean diffusivity, average normal‐appearing white matter mean diffusivity percentage change, average normal‐appearing white matter fractional anisotropy, average normal‐appearing white matter fractional anisotropy percentage change, average grey matter mean diffusivity, average grey matter mean diffusivity percentage change, (in another model: change in EDSS at 5 years) Number of considered predictors 26 Timing of predictor measurement At study baseline (cohort entry), at median 15 months after baseline, and at median 56 months (called 5 years) after baseline Predictor handling Continuously
Outcome	Outcome definition Disability (EDSS): EDSS change between baseline and at 15‐year follow‐up; any EDSS change is always confirmed by a second visit after a further 3 months Timing of outcome measurement Median (IQR): 15.1 years (13.9 years to 15.4 years)
Missing data	Number of participants with any missing value 5, only missing outcome reported Missing data handling Complete case Not explicitly reported
Analysis	Number of participants (number of events) 49 (continuous outcome) Modelling method Linear regression Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, hybrid stepwise selection using multiple models P value < 0.1 Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Cross‐validation, LOOCV Calibration estimate Not reported Discrimination estimate Not applicable Classification estimate EDSS change precision within 1 point = 0.776 Overall performance R² = 0.61 Risk groups Not reported
Model	Model presentation Regression coefficients without the intercept Number of predictors in the model 5 Predictors in the model Baseline EDSS, 15‐month EDSS change, 15‐month new T1 hypointense lesions, percentage brain volume change, baseline grey matter mean diffusivity Effect measure Linear model coefficients (P value): baseline EDSS −0.54 (< 0.001), 15‐month EDSS change 0.39 (0.09), 15‐month new T1 hypointense lesions 0.28 (0.003), percentage brain volume change −0.24 (0.05), baseline grey matter mean diffusivity 3.86 (0.03) Predictor influence Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To investigate the added value of magnetic resonance imaging measures of brain and cervical cord damage in predicting long‐term clinical worsening of primary progressive multiple sclerosis compared to simple clinical assessment Primary aim The primary aim of this study is not the prediction of individual outcomes. Rather, the focus is on the usefulness of MRI measures. Model interpretation Exploratory Suggested improvements To widen clinical measures and include further MRI measures
Notes	Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of new MRI measures. Auxiliary references Rovaris M, Gallo A, Valsasina P, Benedetti B, Caputo D, Ghezzi A, et al. Short‐term accrual of gray matter pathology in patients with progressive multiple sclerosis: an in vivo study using diffusion tensor MRI. Neuroimage 2005;24(4):1139‐46. Rovaris M, Judica E, Gallo A, Benedetti B, Sormani MP, Caputo D, et al. Grey matter damage predicts the evolution of primary progressive multiple sclerosis at 5 years. Brain 2006;129(Pt 10):2628‐34.

Item	Authors' judgement	Support for judgement
Participants	Yes	The data source was a cohort study, and that was collected with the aim of imaging predictor search, but no eligibility criteria were mentioned other than the diagnosis. It was explicitly stated that inclusion did not depend on disease duration, progression rate, and disability level.
Predictors	Unclear	The stated interest was in predicting 15 years outcomes, but the data used were at 15 months after baseline, which brought the window down to less than 14 years. The intended time of the model used is unclear.
Outcome	Unclear	The outcome was based on EDSS, which is considered to be measured objectively. It was conceptualised as a change in EDSS and treated as a score that can be subtracted, and the change was treated as a continuous outcome in a linear regression. However, EDSS is not a numeric scale. It is accepted to be an ordinal scale, and it is unclear if treating it as numeric is appropriate or not.
Analysis	No	The 5 participants (> 10% of the sample size) lost to follow‐up were excluded from the analysis, without any mention of how they compared to other patients. The number of events per predictor was far lower than 10. Change in EDSS was assumed to follow a normal distribution by modelling it linearly ‐ without any interaction terms ‐ although there was no information if this assumption was violated or not since EDSS itself is not considered a linear scale but an ordinal one. Although cross‐validation was used, it is unclear whether the variable selection process was included within this procedure. There is no predicted vs observed plot or a similar measure of calibration.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Rovaris 2006.

*Study characteristics*
General information	Model name Not applicable Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria PPMS patients Exclusion criteria Other neurological conditions Recruitment Consecutively from the outpatient populations attending MS clinics of the participating institutions, unclear which, Italy Age (years) Mean 51.3 Sex (%F) 50.0 Disease duration (years) Median 10 (range: 2 to 26) Diagnosis 100% PPMS, 45 definite, 9 probable Diagnostic criteria Thompson 2000 Treatment At recruitment, 16.7% azathioprine, 7.4% mitoxantrone, 9.3% methotrexate, and 66.7% no DMT During follow‐up, unclear, at final follow‐up, 13% azathioprine, 3.7% mitoxantrone, 1.9% methotrexate, and 81.5% no DMT Disease description EDSS median (range): 5.5 (2.5 to 7.5) Recruitment period Not reported
Predictors	Considered predictors Age, gender, disease duration, EDSS, baseline T2 LV, T2 LV percent change, baseline T1 LV, T1 LV percent change, number of new T2 lesions, number of new T1 lesions, normalised brain volume, percentage brain volume percent change, cervical cord cross‐sectional cord area, cervical cord cross‐sectional cord area percent change, average lesion mean diffusivity, average lesion MD percent change, average lesion fractional anisotropy, average lesion fractional anisotropy percent change, average lesion normal‐appearing white matter mean diffusivity, average lesion normal‐appearing white matter mean diffusivity percent change, average lesion normal‐appearing white matter fractional anisotropy, average lesion normal‐appearing white matter fractional anisotropy percent change, average lesion grey matter mean diffusivity, average lesion grey matter mean diffusivity percent change, (adjustment for follow‐up time) Number of considered predictors 25 Timing of predictor measurement At study baseline (cohort entry), at 15 months post‐baseline (follow‐up), at final follow‐up (outcome measurement) Predictor handling Continuously
Outcome	Outcome definition Disability (EDSS): clinically worsened defined as an EDSS score increase ≥ 1.0, when baseline EDSS was < 6.0, or an EDSS score increase ≥ 0.5, when baseline EDSS was ≥ 6.0; confirmed by a second visit after a 3‐month interval Timing of outcome measurement Follow‐up median (range): 56.0 months (35 months to 63 months)
Missing data	Number of participants with any missing value ≤ 11, unclear exactly how many participants have any missing Missing data handling Complete case, the details are not explicitly reported
Analysis	Number of participants (number of events) 52 (35) Modelling method Logistic regression Predictor selection method For inclusion in the multivariable model, univariable analysis During multivariable modelling, significance P value < 0.05 Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Cross‐validation, LOOCV Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 0.808, sensitivity = 31/35 = 0.89, specificity = 11/17 = 0.65 Overall performance Nagelkerke's R² = 0.44 Risk groups Not reported
Model	Model presentation Regression coefficients without intercept and follow‐up time Number of predictors in the model 2 or 3 (unclear if follow‐up time included) Predictors in the model Baseline EDSS, grey matter mean diffusivity, follow‐up Effect measure estimates OR (95% CI): baseline EDSS 0.48 (0.26 to 0.91), average grey matter mean diffusivity 1.21 (1.06 to 1.38), follow‐up not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To investigate whether conventional and DT‐MRI‐derived measures can predict the long‐term clinical evolution of PP multiple sclerosis Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on the usefulness of MRI measures. Model interpretation Exploratory Suggested improvements Not reported
Notes	Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of MRI measures. Auxiliary references Rovaris M, Gallo A, Valsasina P, Benedetti B, Caputo D, Ghezzi A, et al. Short‐term accrual of gray matter pathology in patients with progressive multiple sclerosis: an in vivo study using diffusion tensor MRI. Neuroimage 2005;24(4):1139‐46.

Item	Authors' judgement	Support for judgement
Participants	Yes	The data source was a cohort study, and the data seems to be collected with the aim of imaging predictor search, but no eligibility criteria were mentioned other than the diagnosis. It was specifically stated that inclusion did not depend on disease duration, progression rate, and disability level.
Predictors	Yes	The predictors in the final model were based on baseline measurements. Due to the prospective nature of data collection and automated analysis of images, predictors are considered to be assessed without knowledge of outcome data. Both automated MR analysis and EDSS measurements are considered to be objective.
Outcome	Yes	Even though the same physician assessed EDSS, EDSS is considered to be an objective measure. There was approximately a full 2‐year range in outcome assessment time, but the clinical authors did not find this problematic.
Analysis	No	The EPV was very low. Predictor selection started with univariate analyses. Neither calibration nor discrimination was addressed for the final model. Cross‐validation was used, but it did not cover all modelling steps. Final model coefficients were provided for EDSS and average GM MD, but not for follow‐up time. At least 6 participants (> 10%) had missing data, and complete case analysis was probably used. The model was adjusted for follow‐up time, which we consider to be an inappropriate use of post‐baseline data.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Runia 2014.

*Study characteristics*
General information	Model name Not applicable Primary source Dissertation Data source Cohort, primary Study type Development
Participants	Inclusion criteria CIS suggestive of MS Age between 18 years and 50 years Included within 6 months after symptom onset No serious comorbidity Exclusion criteria Alternative diagnoses Recruitment Consecutive patients at the Rotterdam MS Centre, Netherlands Age (years) Unclear Sex (%F) 72.9 Disease duration (years) Up to 0.5 Diagnosis 100% CIS Diagnostic criteria Own definition Treatment Not reported Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Age (unclear if linear or 3 categories), sex, optic nerve (binary), fatigue, presence of first‐ or second‐degree relatives with MS, abnormal MRI (1 or more lesions), number of T2 lesions (0 lesions/1 to 9 lesions/> 9 lesions), gadolinium enhancement, presence of a lesion in the corpus callosum, modified Barkhof criteria (at least 3 of 4 criteria fulfilled), Swanton criteria, DIS + DIT2010 (the baseline scan fulfils criteria for dissemination in time and place according to the 2010 revised McDonald criteria (Polman 2011)), IgG index, presence of oligoclonal bands, serum 25‐OH‐vitamin D (fatigue as continuous and localisation of first symptoms as optic nerve, spinal cord, brainstem, or other were chosen to be included otherwise due to 'discriminating ability') Number of considered predictors ≥ 16 or 21 (unclear transformations) Timing of predictor measurement At disease onset (CIS) (at study baseline within 6 months after onset) Predictor handling All categorised or dichotomised in Table 2/FSS (justified by comparison to the continuous version based on discriminative ability) and number of T1 lesions dichotomised, number of T2 lesions categorised/unclear: age categorised, 25‐OH‐vitamin D dichotomised, IgG Index dichotomised
Outcome	Outcome definition Conversion to definite MS (Poser 1983): time from start of first symptoms to CDMS diagnosed in case of clinical evidence for dissemination in space and time Timing of outcome measurement Unclear, up to > 90 months
Missing data	Number of participants with any missing value ≥ 356, unclear exactly how many participants have any missing Missing data handling Mixed: complete case for outcome, multiple imputation for predictors
Analysis	Number of participants (number of events) 431 (109 by 2 years) Modelling method Survival, Cox Predictor selection method For inclusion in the multivariable model, univariable analysis During multivariable modelling, stepwise selection Backward Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Bootstrap Calibration estimate Not reported Discrimination estimate c‐Statistic: Raw: 0.72 Optimism‐corrected: 0.71 Simple model with 3 groups: 0.66 Classification estimate Not reported Overall performance Not reported Risk groups 3 risk categories from the sum score: low (0 to 1), intermediate (2 to 3), and high (4 to 5)
Model	Model presentation Unweighted sum score from 0 to 5 Regression model without baseline survival, risk groups and KM plots Number of predictors in the model 5 Predictors in the model DIS + DIT2010, corpus callosum lesions, oligoclonal bands, fatigue, abnormal MRI Effect measure estimates HR (95% CI): DIS + DIT2010 2.2 (1.4 to 3.3), corpus callosum lesions 1.9 (1.2 to 2.9), oligoclonal bands 1.7 (1.1 to 2.6), fatigue 2.3 (1.4 to 3.9), abnormal MRI 2.3 (0.9 to 6.0) Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To develop a simple and reliable prediction model for MS in patients with CIS Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements External validation
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	Yes	The study used data collected for the Predicting the Outcome of a Demyelinating Event (PROUD) study protocol, and the inclusion/exclusion criteria were based on the baseline status.
Predictors	Yes	Predictors were collected before the outcome and therefore blinded, and all are available at onset.
Outcome	Yes	A standard definition of conversion to definite MS was used. The outcome was probably not blinded to predictors due to the clinical setting, but the outcome is considered relatively objective. The predictors dissemination in time (DIT) and dissemination in space (DIS) in McDonald include MRI, but the Poser does not, so the predictors were not included in the outcome definition.
Analysis	No	The EPV was low. Patients lost to follow‐up, with no reported reason or comparison with the remaining cohort, were excluded even though it was a survival analysis. Calibration was not assessed. Bootstrap methods were used to account for optimism but probably did not include the whole modelling process. Predictors were selected based on univariate analyses. Many continuous predictors seem to be categorised, although the reason is not very clear.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Seccia 2020.

*Study characteristics*
General information	Model name 180 days 360 days 720 days Primary source Journal Data source Routine care, secondary Study type Development
Participants	Inclusion criteria Visits of patients with RR or SP as disease subtype Exclusion criteria Visits after the transition of the patient to the SP phase Recruitment Sant’Andrea University hospital in Rome, Italy Age (years) Mean 29.0 (onset) Sex (%F) 69.8 Disease duration (years) Mean 19.0 Diagnosis 100% RRMS Diagnostic criteria Latest criteria at time of diagnosis Treatment Unclear timing, 73% on DMT at some point Disease description Not reported Recruitment period 1985 to 2018
Predictors	Considered predictors Longitudinal trajectories of age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs, feature saving: status T1, status T2, oligoclonal banding Number of considered predictors 21 predictor trajectories Timing of predictor measurement At multiple visits comprising patient history to the current visit of interest Predictor handling Continuously
Outcome	Outcome definition Conversion to progressive MS: transition from the RR to the SP phase within 180 days as assessed by the treating clinician Timing of outcome measurement 180 days: 180 days from the index visit 360 days: 360 days from the index visit 720 days: 720 days from the index visit
Missing data	Number of participants with any missing value 0 Missing data handling Exclusion of 3 variables with missing values
Analysis	Number of participants (number of events) 180 days: 1515 participants, 14,923 records (207) 360 days: 1449 participants, 14,238 records (207) 720 days: 1375 participants, 13,178 records (207) Modelling method Long short‐term memory recurrent neural network Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, full model approach Hyperparameter tuning Number of neurones chosen through trial and error procedure, dropout probability set to 0.2 Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Random split, train‐test splits preserving outcome proportions with balance‐inducing bagging Calibration estimate Not reported Discrimination estimate Not reported Classification estimate 180 days: cutoff = 0.5, accuracy = 0.98, sensitivity = 0.385, specificity = 0.988, PPV = 0.308 360 days: cutoff = 0.5, accuracy = 0.975, sensitivity = 0.50, specificity = 0.982, PPV = 0.295 720 days: cutoff = 0.5, accuracy = 0.98, sensitivity = 0.673, specificity = 0.985, PPV = 0.427 Overall performance Not reported Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model 18 predictor trajectories Predictors in the model Longitudinal trajectories of age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable
Interpretation	Aim of the study To explore the possibility of predicting whether a patient will pass from RR to SP phase in a given time window, using a real‐world dataset, built in close collaboration between computer experts and neurologists Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements Using the LSTM model with different endpoints that are less unbalanced, using large and well maintained clinical databases
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to provide a model/tool for the prediction of individual MS outcomes.

Item	Authors' judgement	Support for judgement
Participants	No	The data source was routine medical records, and there were no eligibility criteria other than the MS subtype.
Predictors	No	The data were collected between 1978 and 2018. The first patient entering analysis was seen in 1985, probably due to the missingness of predictors prior to that time. Due to changing diagnostic criteria and technology, predictors such as age at onset, T1/T2 status, and treatment options are expected to be heterogeneous over time.
Outcome	No	The outcome was SPMS from a routine care database, which is expected to be not standardised or operationalised. Given that the diagnostic criteria changed over time, the outcome definition is expected to be somewhat different over time.
Analysis	No	The sample size and number of events were low. No discrimination or calibration measures were assessed. Many participants were dropped in the feature‐saving analysis, but here we focused on the record‐saving analysis. For computational reasons, a random split was used for assessment. There was no separation of data used for parameter tuning and data used to estimate performance in future patients. A final model did not appear to be selected, fitted, and presented.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Skoog 2014.

*Study characteristics*
General information	Model name MSPS Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria RRMS patients with at least 1 distinct second attack that confirmed the diagnosis of MS according to the Poser criteria Exclusion criteria Patients with a single‐attack progressive MS or a second attack in the year of onset of SP Recruitment Medical records from the Sahlgrenska Neurology Department and outpatient clinic, the only neurological service in the Gothenburg area, Sweden Age (years) Mean 33.5 Sex (%F) 65.0 Disease duration (years) Median 2 Diagnosis 100% RRMS Diagnostic criteria Poser 1983 Treatment 0% Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Age at onset attack, current age (spline), gender, time from the second attack, number of previous attacks, monofocal symptoms at onset attack, afferent symptoms at onset attack, complete remission from the onset attack, monofocal symptoms at the most recent attack, afferent symptoms at the most recent attack, complete remission from the most recent attack, time since the most recent attack, severity grade of attack (0 to 2, number of unfavourable 'no' responses to afferent symptoms and complete remission), interaction term between the attack grade and the interval between the most recent attack and current time Number of considered predictors ≥ 15 (unclear transformations) Timing of predictor measurement At last relapse, at time of prognostication Predictor handling Continuously, unclear: as linear splines At least one interaction was considered
Outcome	Outcome definition Conversion to progressive MS (Lublin 1996): continuous progression for at least 1 year without remission and detectable at time intervals of months or years, determined retrospectively after 1 year of observation and recorded the probable year of onset retrospectively; observation terminated at onset of secondary progression, at censoring due to competing causes of death, other disabling diseases, migration or the end of follow‐up; time since RRMS onset Timing of outcome measurement Time from the first relapse to censoring or outcome median (range): 11.5 years (0.7 years to 56.7 years)
Missing data	Number of participants with any missing value 171 attacks, unclear exactly how many participants Missing data handling Mixed, complete case for attacks, and regression methods for loss to follow‐up
Analysis	Number of participants (number of events) 157 (118 participants, unit of analysis is participants, 749 attacks) Modelling method Survival, Poisson Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, significance P value < 0.05 Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate O:E table Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups Low‐risk periods: score < 0.04, high‐risk periods: score > 0.06
Model	Model presentation Web app at http://msprediction.com Full regression model Number of predictors in the model 3 (4 df) Predictors in the model Age, attack grade, time since last relapse (interaction with attack grade) Effect measure estimates log HR (SE): constant −11.5081 (4.0138), lower age predictor 0.3167 (0.1507), upper age predictor −0.0199 (0.0088), attack grade 0.7164 (0.1467), attack grade × time since last relapse −0.0457 (0.0158) Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To search for independent demographic and clinical factors that contributed to the risk of transition to SP and to simplify these complex relationships into a continuous individualised prediction based on repeated assessments expressed as a clinically and scientifically useful score Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements Investigation and replication in an independent patient cohort, taking into account the therapy
Notes	Applicability overall Low Auxiliary references Runmarker B, Andersen O. Prognostic factors in a multiple sclerosis incidence cohort with twenty‐five years of follow‐up. Brain 1993;116 (Pt 1):117‐34. Skoog B, Runmarker B, Winblad S, Ekholm S, Andersen O. A representative cohort of patients with non‐progressive multiple sclerosis at the age of normal life expectancy. Brain 2012;135(Pt 3):900‐11.

Item	Authors' judgement	Support for judgement
Participants	No	Exclusion criteria contained a second attack in the year of onset of SP, but it was not explained. The author defined the data source as a cohort, but the diagnoses and other categorisations probably needed to be performed retrospectively. Thus, it is unclear whether it was a cohort study or not.
Predictors	Yes	Although the recruitment lasted 14 years and the median time of follow‐up till event/censoring was over 11 years, the data was from a single centre, and the predictor definitions seem to be clear. Hence, the assessment is considered to be similar amongst patients. The predictor assessments might have been performed retrospectively, but the definitions seem to be clear, leaving no space for subjective judgement.
Outcome	No	The outcome was clearly defined but it is not an operationalised one. Assessors, experience level, and blinding were not explicitly reported, hence the application of this definition might vary greatly.
Analysis	No	The EPV was low whether assessed in terms of a binary outcome or continuous one. No information on missing data was provided. All the reported measures were evaluated in the full development set. Discrimination and optimism were not addressed.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Skoog 2019.

*Study characteristics*
General information	Model name Val Ext Val Primary source Journal Data source Val: cohort, primary Ext Val: registry, secondary Study type Val: validation (internal validation ‐ some participants from the development excluded) Ext Val: external validation, multiple (location, time)
Participants	Inclusion criteria All patients with RRMS that fulfilled the Poser criteria Exclusion criteria Patients with SP occurring before the second distinct attack Recruitment Val: medical records from the Sahlgrenska Neurology Department and outpatient clinic, the only neurological service in the Gothenburg area, Sweden Ext Val: patients in the Swedish National MS Registry participating from the Uppsala University Neurology Department, Sweden Age (years) Mean 33.0 (CDMS onset, i.e. 2nd attack) Sex (%F) Val: 65.0 Ext Val: 76.0 Disease duration (years) Val: median 2 Ext Val: not reported Diagnosis 100% RRMS Diagnostic criteria Poser 1983 Treatment Val: 0% Ext Val: At recruitment, 0% During follow‐up, unclear, few patients received first‐generation DMT (IFN‐beta or glatiramer acetate), 99 out of 1762 patient‐years Disease description Not reported Recruitment period Val: not reported Ext Val: up to 2000
Predictors	Considered predictors Not applicable Number of considered predictors Not applicable Timing of predictor measurement Not applicable Predictor handling Not applicable
Outcome	Outcome definition Conversion to progressive MS (Lublin 1996): continuous progression for at least 1 year, without remission, and detectable at time intervals of months or years, determined retrospectively after one year of observation and recorded the probable year of onset retrospectively Timing of outcome measurement Val: yearly for 25 years starting January 1st after clinically defining attack, KM estimate time to outcome from 2nd attack median (95% CI): 11.5 years (9.2 years to 13.8 years) Ext Val: yearly for 25 years starting January 1st after clinically defining attack, KM estimate time to outcome from 2nd attack median (95% CI): 15.0 years (10.9 years to 19.1 years)
Missing data	Number of participants with any missing value Val: ≤ 12, unclear exactly how many participants have any missing Ext Val: ≤ 27, unclear exactly how many participants have any missing Missing data handling Not reported
Analysis	Number of participants (number of events) Val: 144 (100) Ext Val: 145 (54) Modelling method Not applicable Predictor selection method Not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Not applicable Performance evaluation dataset Val: development Ext Val: external validation Performance evaluation method Val: apparent, some development participants excluded Ext Val: not applicable Calibration estimate Val: calibration plot, O:E table, O:E 0.829 Ext Val: calibration plot, O:E table, O:E 0.599 Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups Nor calibration measures: periods with predetermined MSPS strata < 0.025, 0.025 to 0.05, 0.05 to 0.075, 0.075 to 0.10, 0.10 to 0.125, > 0.125 (simplified to < 0.05, 0.05 to 0.075, 0.075 to 0.10, > 0.10)
Model	Model presentation Val: (original model)recalibration ratio (0.829) Ext Val: (original model)recalibration ratio (0.599) Number of predictors in the model Not applicable Predictors in the model Not applicable Effect measure estimates Not applicable Predictor influence measure Not applicable Validation model update or adjustment Recalibration
Interpretation	Aim of the study To validate this model with an essentially untreated Swedish cohort Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements Demonstrating generalisability in non‐Swedish cohorts collected with different methods, considering DMT use
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	No	Val: Exclusion criteria contained a second attack in the year of onset of SP, but it was not explained. The author defined the data source as a cohort, but the diagnoses and other categorisations probably needed to be performed retrospectively. Thus, it is unclear whether it was a cohort study or not. Ext Val: The data source was a registry for which patient data were entered retrospectively. Also, the diagnoses and other categorisations probably needed to be performed retrospectively.
Predictors	Yes	The data were from a single centre and the predictor definitions seem to be clear. Hence, the assessment is considered to be similar amongst patients. The predictor assessments might have been performed retrospectively, but the definitions seem to be clear, leaving no space for subjective judgement.
Outcome	No	The outcome was clearly defined but it is not an operationalised one. Assessors, experience level, and blinding were not explicitly reported, hence the application of this definition might vary greatly.
Analysis	No	Val: Missing values and how they were treated were not clearly discussed. Discrimination was not addressed. Ext Val: The number of events in the validation was low. Missing data were not clearly discussed. Discrimination was not addressed.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Sombekke 2010.

*Study characteristics*
General information	Model name Outcome dichotomous MSSS, predictors clinical + genetics Primary source Journal Data source Unclear, secondary Study type Development
Participants	Inclusion criteria Confirmed diagnosis of MS Availability of DNA and clinical assessment of disability (Unclear) unrelated Dutch Caucasian Exclusion criteria Not reported Recruitment Natural history studies at the MS Centre of the VU University Medical Centre in Amsterdam, Netherlands Age (years) Mean 32.4 (onset) Sex (%F) 63.8 Disease duration (years) Mean 13.1 (SD 8.3) Diagnosis 51.2% RRMS, 31.4% SPMS, 17.4% PPMS Diagnostic criteria Mixed: Poser 1983, McDonald 2005 (Polman 2005) Treatment Not reported Disease description EDSS median (IQR): 4.0 (3.5) Recruitment period Not reported
Predictors	Considered predictors Gender, onset type, age at onset, SNPs (69) Number of considered predictors 72 Timing of predictor measurement At baseline (already available or retrospectively collected) Predictor handling Age continuously, SNPs categorised
Outcome	Outcome definition Disability (MSSS): MSSS ≥ 2.5; MSSS denotes the speed of disability accumulation of an individual patient compared with a large patient cohort Timing of outcome measurement Not reported
Missing data	Number of participants with any missing value Not reported Missing data handling Exclusion
Analysis	Number of participants (number of events) 605 (86) Modelling method Logistic regression Predictor selection method For inclusion in the multivariable model, univariable analysis During multivariable modelling, stepwise selection Backward Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Bootstrap (B = 500), unclear if for optimism correction or just confidence intervals Calibration estimate Hosmer‐Lemeshow test Discrimination estimate c‐Statistic = 0.78 (95% CI 0.75 to 0.84) Classification estimate Sensitivity = 0.37, specificity = 0.953, LR+ = 7.9 Overall performance Nagelkerke's R² = 0.219 Risk groups Not reported
Model	Model presentation Regression coefficients without intercept Number of predictors in the model 9 (13 df) Predictors in the model Age at onset, male gender, progressive onset type, NOS2 level, PITPNCI level, IL2 level, CCL5 level, ILIRN level, PNMT level Effect measure estimates OR (95% CI): age at onset 1.05 (1.02 to 1.08), male gender 2.02 (1.14 to 3.57), progressive onset type 4.69 (1.32 to 16.63), NOS2 level AG 0.53 (0.32 to 0.89), NOS2 level AA 0.24 (0.09 to 0.67), PITPNCI level AG 0.45 (0.27 to 0.75), PITPNCI GG 0.59 (0.18 to 1.95), IL2 level GT 0.39 (0.22 to 0.70), IL2 level TT 0.38 (0.17 to 0.84), CCL5 level CT 2.04 (1.12 to 3.70), CCL5 level TT 1.47 (0.38 to 5.67), ILIRN level CT/TT 0.60 (0.36 to 0.99), PNMT level GG 0.52 (0.29 to 0.92) Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To evaluate the additional prognostic value of genetic information of a DNA chip, containing a set of candidate genes, previously correlated to MS (either susceptibility or phenotypes) over available demographics and clinical characteristics, aiming to improve the prediction of the expected disease severity for future patients Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the prognostic value of genetic data. Model interpretation Exploratory Suggested improvements Test on patients with longer disease duration, use SNPs assessed during the GWAS era, include MRI parameters, yet‐to‐be‐discovered genes, environmental factors
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	No	The data source on which the prediction model study relied on is unclear. Some predictors were collected retrospectively, and the inclusion criteria contained the availability of DNA and clinical assessment of disability.
Predictors	Yes	Genetic data are not likely to be biased. Clinical data were simple and easy to collect. The predictors are objective measures and could be available at the time of model use.
Outcome	Yes	Assuming that MSSS is a relatively standard outcome, it accounts for the difference in time from disease onset in patients, and the outcome was collected at a single point in time.
Analysis	No	The EPV was less than 10. Only discrimination was assessed, and it is unclear how missing information was handled, other than through exclusion criteria, that was handled in the Participants section. It seems like univariate analyses were used to select the predictors. It is unclear if model overfitting and optimism in model performance were accounted for or not.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Sormani 2007.

*Study characteristics*
General information	Model name Dev Ext Val Primary source Journal Data source Randomised trial participants, secondary Study type Development + external validation, spectrum
Participants	Inclusion criteria Age between 18 years and 50 years Complete clinical and MRI data at baseline Not treated with disease‐modifying agents during the study Diagnosis of MS for at least 1 year RR disease course EDSS score of 0.0 to 5.0 At least one documented relapse in the preceding 2 years At least one gadolinium‐enhancing lesion on their screening brain MRI Relapse‐free and steroid‐free in the 30 days prior to inclusion into the study Exclusion criteria Dev: Prior use of glatiramer acetate, oral myelin, cladribine, and total body irradiation or total lymphoid irradiation Use of immunosuppressive drugs in the 12 months before study entry, or the use of interferons, intravenous immunoglobulins More than 30 consecutive days of chronic steroid treatment, or participation in clinical studies of experimental drugs in the 6 months before study entry Life‐threatening or unstable clinically significant disease, pregnant or lactating Major current gastrointestinal disorders used medication that could cause major gastrointestinal disturbances Medical or psychiatric conditions that could affect their ability to give informed consent Sensitivity to gadolinium chelates or an inability to undergo MRI Ext Val: not reported Recruitment Dev: Placebo arm participants in the CORAL, an RCT run in 158 centres worldwide Argentina, Australia, Austria, Belgium, Canada, Denmark, France, Germany, Hungary, Israel, Italy, Netherlands, New Zealand, Spain, Sweden, Switzerland, UK, USA Ext Val: Placebo arm participants in European/Canadian GA study from 29 centres Europe (undefined), Canada Age (years) Dev: median 37.0 Ext Val: median 34.0 Sex (%F) Not reported Disease duration (years) Dev: median 5.9 (range 0.6 to 30) Ext Val: median 3.8 (range 0.5 to 22) Diagnosis 100% RRMS Diagnostic criteria Poser 1983 Treatment Dev: At recruitment, 0% During follow‐up, 0.6% on DMT Ext Val: 0% Disease description Dev: EDSS median (range): 2.0 (0.0 to 5.0), prior 2‐year number of relapses (range): 2 (1 to 11) Ext Val: EDSS median (range): 2.0 (0.0 to 4.0), prior 2‐year number of relapses (range): 2 (1 to 8) Recruitment period Dev: 2000 to 2001 Ext Val: 1997 to 1998
Predictors	Considered predictors Dev: age at onset, disease duration, prior 2‐year relapses, EDSS, Gd‐enhancing lesions, Gd‐enhancing lesion volume, T2‐hyperintense lesion volume, T1‐hypointense lesion volume Ext Val: not applicable Number of considered predictors Dev: ≥ 12 (unclear transformations) Ext Val: not applicable Timing of predictor measurement Dev: at study baseline (RCT, entry at least 1 year after disease onset) Ext Val: not applicable Predictor handling Dev: continuously except testing square‐root and dichotomised transformations in the final model Ext Val: not applicable
Outcome	Outcome definition Relapse: time of first relapse occurrence defined as appearance of one or more new neurological symptoms or the reappearance of one or more previously experienced neurological symptoms; neurological deterioration had to last at least 48 hours and be preceded by a relatively stable or improving neurological state in the prior 30 days; the symptoms had to be accompanied by objective changes in the neurological examination corresponding to an increase of at least 0.5 points on the EDSS, or one grade in the score of 2 or more functional systems or 2 grades in 1 functional system; deterioration associated with fever or infections that can cause transient, secondary impairment of neurological function or change in bowel, bladder, or cognitive function alone was not accepted as a relapse Timing of outcome measurement Dev: follow‐up median (range): 14 months (0.4 months to 16 months), time to outcome from study entry mean (SD): 47 weeks (0.9 weeks) Ext Val: follow‐up median (range): 9 months (2.6 months to 10 months), time to outcome from study entry mean (SD): 26 weeks (1.4 weeks)
Missing data	Number of participants with any missing value 9, not explicitly reported in this report Missing data handling Exclusion
Analysis	Number of participants (number of events) Dev: 539 (unclear, approximately 270) Ext Val: 117 (not reported) Modelling method Dev: survival, Cox Ext Val: not applicable Predictor selection method Dev: For inclusion in the multivariable model, univariable analysis During multivariable modelling, significance P value < 0.01 Ext Val: not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Dev: none Ext Val: not applicable Performance evaluation dataset Dev: development Ext Val: external validation Performance evaluation method Dev: apparent Ext Val: not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups Dev: according to the score distribution low risk (score below the 95th percentile) and high risk (score above the 95th percentile) of relapse occurrence, score at 95th percentile = 1.84 Ext Val: according to cutoffs from development set (score = 1.84)
Model	Model presentation Dev: regression model formula with survival probability for 6 months and 1 year Ext Val: not applicable Number of predictors in the model Dev: 2 Ext Val: not applicable Predictors in the model Dev: previous 2 years relapses, number of enhancing lesions Ext Val: not applicable Effect measure estimates Dev: log HR: 6‐month baseline survival 0.92, square root of previous 2 years relapses 0.64, square root of number of enhancing lesions 0.26 (1 year baseline survival 0.86) Ext Val: not applicable Predictor influence measure Not applicable Validation model update or adjustment Dev: not applicable Ext Val: none
Interpretation	Aim of the study To generate and validate a composite (clinical and MRI‐based) score able to identify individual patients with relapsing‐remitting multiple sclerosis (RRMS) with a high risk of experiencing relapses in the short term Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements A validation in natural history cohorts (but this is not feasible because current patients are treated)
Notes	Applicability overall Low Auxiliary references Comi G, Filippi M, Wolinsky JS. European/Canadian multicentre, double‐blind, randomized, placebo‐controlled study of the effects of glatiramer acetate on magnetic resonance imaging‐‐measured disease activity and burden in patients with relapsing multiple sclerosis. European/Canadian glatiramer acetate study group. Ann Neurol 2001;49(3):290‐7. Filippi M, Wolinsky JS, Comi G. Effects of oral glatiramer acetate on clinical and MRI‐monitored disease activity in patients with relapsing multiple sclerosis: a multicentre, double‐blind, randomised, placebo‐controlled study. Lancet Neurol 2006;5(3):213‐20.

Item	Authors' judgement	Support for judgement
Participants	Yes	Although participants with missing predictor measurements were excluded from the current study, they probably comprised only a small percentage (< 5%) of the eligible population from the original RCT cohort.
Predictors	Yes	Dev: The predictors appear to be collected at baseline. The predictors were not explicitly named in the text but table 2 consists of predictors entering the univariable and multivariable analysis. The data source is an RCT, so assessment is assumed to be similar across patients. Val: The data from this trial were collected using different MRI machines of various strengths, but contrast‐enhancing lesions should be robust to the use of different machines.
Outcome	Yes	The details of the outcome definition were not explicitly reported in the prediction model study but can be found in the RCT. We expect the outcome to be standardised and determined appropriately due to the data source. The outcome may or may not be determined with the knowledge of predictors, but the outcome is considered an objective one.
Analysis	No	Dev: Variable selection began with univariable analysis. No discrimination or calibration measures were reported. Although external validation was done, there was no indication of model shrinkage or other attempts at addressing overfitting and optimism. Val: The number of events was not reported but was expected to be at best 56.5. No relevant performance measures were reported.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Spelman 2017.

*Study characteristics*
General information	Model name Not applicable Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria MS patients with CIS Onset less than 12 months from enrolment Minimum data collection at each visit (EDSS, KFSS, relapse onset date, glucocorticoid therapy for relapse, initiation and discontinuation dates for DMTs collected at each visit) Annual follow‐up At least 1 EDSS recorded within 12 months of onset but not included within 30 days of onset First brain MRI scan classification using Barkhof–Tintore criteria for lesion dissemination in space within 12 months of onset Exclusion criteria PPMS Recruitment Fifty MS clinics participating in MSBase Incident Study (MSBASIS), a substudy within MSBase registry Age (years) Median 31.6 (at MS onset) Sex (%F) 70.5 Disease duration (years) Up to 1 year Diagnosis 100% CIS Diagnostic criteria Poser 1983 Treatment At recruitment, not reported During follow‐up, 11.8% IM‐IFNβ‐1a, 9.3% SC‐IFNβ‐1a, 5.1% IFNβ‐1b, 3.8% glatiramer acetate (adds up to 30% not 27.6%) Disease description EDSS median (IQR): 2 (1 to 2.5) Recruitment period From 2004 onward
Predictors	Considered predictors Sex, age at onset, EDSS, first symptom location (categorical with optic pathways as reference, supratentorial, brainstem or spinal cord), T1 gadolinium lesions (binary), T2 hyperintense lesions (3 levels), infratentorial lesions (binary), juxtacortical (binary), periventricular (3 levels), number of spinal T1 gadolinium lesions (binary), number of spinal T2 lesions (binary), oligoclonal bands (binary), (unclear adjustment for country) Number of considered predictors ≥ 16 (unclear how many interactions tested) Timing of predictor measurement At disease onset (CIS) (up to 12 months after disease onset) Predictor handling Age and EDSS continuously, predictors based on number of lesions dichotomised or categorised and also continuously, (for unclear predictors) linearity tested by incorporating quadratic transformations into the model At least one interaction considered
Outcome	Outcome definition Conversion to definite MS (Poser 1983): time to first relapse following CIS, i.e. CDMS, defined as examination evidence of a symptomatic second neurological episode attributable to demyelination of more than 24 hours duration and more than 4 weeks from the initial attack; follow‐up time was defined as the time that lapsed between the date of CIS onset (baseline) and either the date of first post‐CIS relapse or, where no subsequent post‐CIS relapse was observed, the date of the last recorded clinic visit Timing of outcome measurement Follow‐up median (IQR): 1.92 years (0.90 years to 3.71 years)
Missing data	Number of participants with any missing value ≤ 1017, unclear how many of the exclusions are due to missing Missing data handling Exclusion
Analysis	Number of participants (number of events) 3296 (1953) Modelling method Survival, Cox Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, significance P value < 0.05 Hyperparameter tuning Not applicable Shrinkage of predictor weights Unclear Performance evaluation dataset Development Performance evaluation method Bootstrap (B = 1500) Calibration estimate Calibration plot Discrimination estimate At 6 months, c‐statistic = 0.76 At 1 year, c‐statistic = 0.81 At 2 years, c‐statistic = 0.81 At 3 years, c‐statistic = 0.82 At 4 years, c‐statistic = 0.83 At 5 years, c‐statistic = 0.83 Classification estimate Not reported Overall performance Not reported Risk groups Not reported
Model	Model presentation Nomogram for 1‐year outcomes Nomograms for 6‐month, 2, 3, 4, and 5‐year outcomes Number of predictors in the model 7 (11 df) Predictors in the model Sex, age, EDSS, first symptom location, T2 Infratentorial lesions, T2 periventricular lesions, OCB in CSF Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To examine determinants of second attack and validate a prognostic nomogram for individualised risk assessment of clinical conversion Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements External validation with a larger sample, more patients with 0 T2 lesions
Notes	Applicability overall Unclear Applicability overall rationale It is unclear whether some patients had already experienced the outcome at the time of predictor collection.

Item	Authors' judgement	Support for judgement
Participants	No	Although the data source is appropriate, eligibility criteria contained both baseline predictor measurement and regular follow‐up, which may cause risk of bias.
Predictors	No	Due to the prospective collection of the data, predictors were probably assessed without knowledge of outcomes and because only baseline variables were used, all predictors should be available at intended time of prediction. No information is reported on whether predictors were defined and assessed in a similar way for all patients. Especially the imaging predictors from multiple centres in many countries participating in the MSBase registry are likely to be introducing risk of bias.
Outcome	Unclear	We consider relapses to be a relatively objective outcome; therefore, we believe that the assessment with knowledge of predictor information does not increase the risk of bias. However, the predictors were collected within 12 months of onset, and according to the survival curves, a substantial amount of patients (between 0.2 and 0.7) already may have had the event at the time of predictor collection.
Analysis	No	Some continuous predictors were categorised with only 2 to 3 levels. Over 1000 enrolled patients were excluded from the study without any description of the reasons and how they differed from those included; thus, it is unclear whether complete case analysis was appropriate. The authors mentioned adjusting for country, but the methods were not described, making it unclear whether hierarchical models were used or a categorical predictor included, or some other method was used. The method of arriving at the weights in the nomogram and if any optimism correction was done are unclear.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Szilasiová 2020.

*Study characteristics*
General information	Model name Not applicable Primary source Journal Data source Cohort, secondary Study type Development
Participants	Inclusion criteria Diagnosis of MS based on the revised 2001 McDonald criteria (McDonald 2001) Being older than 18 years The ability to give written informed consent Exclusion criteria Major hypacusis or deafness Relapse or corticosteroid use within 30 days preceding the study assessments Recruitment Department of Neurology of Louis Pasteur University Hospital in Kosice, Slovak Republic Age (years) Unclear Sex (%F) 64.7 Disease duration (years) Mean 6.7 (range 0.5 to 30) Diagnosis 63.5% RRMS, 29.4% SPMS, 7.1% PPMS Diagnostic criteria McDonald 2001 Treatment Reported for the original cohort of 110 patients, unclear timing: 64.7% interferon‐beta and 35.3% some DMT Disease description EDSS mean (SD, range): 3.03 (1.5, 1.0 to 7.0) Recruitment period 2003 to 2018
Predictors	Considered predictors Age, sex, disease duration, EDSS, MS form (SP vs R or P), P300 latency, P300 amplitude, lesion load (# T2 lesions), education (primary, secondary, university) Number of considered predictors 11 Timing of predictor measurement At study baseline (cohort entry) Predictor handling Continuously
Outcome	Outcome definition Disability (EDSS): clinically worsened defined as an EDSS ≥ 5 Timing of outcome measurement 15 years
Missing data	Number of participants with any missing value 25 Missing data handling Complete case
Analysis	Number of participants (number of events) 85 (not reported) Modelling method Logistic regression Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, stepwise selection Backward Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate Unclear due to mismatch between ROC curve and reported statistics: 0.94 (95% CI 0.889 to 0.984) Classification estimate Unclear because these points do not correspond to point on plot, sensitivity = 0.94, specificity = 0.89 Overall performance Not reported Risk groups Not reported
Model	Model presentation Full regression model Number of predictors in the model 6 (7 df) Predictors in the model Sex, age, MS form, EDSS, MS duration, P300 latency (ms) Effect measure estimates OR (95% CI); sex: 0.17 (0.02 to 1.295), age: 0.87 (0.74 to 1.040), RRMS: 3,156,828,983.597 (0.000 to NA), PMS: 751,474,054.21 (0.000 to NA), EDSS: 3.06 (1.028 to 9.139), MS duration: 1.21 (1.007 to 1.451), P300 latency (ms): 1.06 (1.008 to 1.110), constant: 0.0 (NA to NA) Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To determine whether ERPs have a prognostic significance for a patient’s future disability Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on usefulness of ERPs. Model interpretation Probably exploratory Suggested improvements Not reported
Notes	Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to identify predictors. Additionally, this study included participants who had already experienced the outcome at baseline.

Item	Authors' judgement	Support for judgement
Participants	No	The data source is a cohort study. But, according to Table 1, the EDSS range at study entry was from 1.0 to 7.0, which means participants with the outcome at entry were included in the analysis.
Predictors	Yes	This was a single‐centre study with well described procedures for electrophysiological predictor collection. The other predictors were standard and/or easy to assess. Predictors were assessed at study entry.
Outcome	Yes	The outcome was based on an EDSS landmark and was assessed at 15‐year follow‐up. Predictor information was probably known, but we consider EDSS to be a robust outcome measure.
Analysis	No	The sample size was too low and the number of events was not reported. Participants lost to follow‐up were excluded from the analysis instead of being accounted for in time‐to‐event analysis. Calibration was not assessed. Shrinkage was not applied and only apparent performance measures were reported.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Tacchella 2018.

*Study characteristics*
General information	Model name 180 days 360 days 720 days Primary source Journal Data source Routine care, secondary Study type Development
Participants	Inclusion criteria RRMS at the time of the visit(s) included in the database and transitioned to the SP phase at some time point Exclusion criteria Not reported Recruitment Outpatients of the MS service of Sant'Andrea hospital in Rome, Italy Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria McDonald 2017 (Thompson 2018b) Treatment Unclear timing and distribution, 89.3% on DMTs, 43% on first‐line treatments, 57% on second‐line treatments Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, timed 25‐foot walk, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score Number of considered predictors 46 Timing of predictor measurement At visit of interest Predictor handling Continuously
Outcome	Outcome definition Conversion to progressive MS: SP stage defined as a history of gradual worsening following the initial RR course determined by objective measure of change of disability (EDSS score) independent of relapses over a period of at least 6 or 12 months Timing of outcome measurement 180 days: at 180 days after visit of interest 360 days: at 360 days after visit of interest 720 days: at 720 days after visit of interest
Missing data	Number of participants with any missing value Not reported Missing data handling Not reported
Analysis	Number of participants (number of events) 180 days: 527, unit of analysis is visit of 84 participants (65) 360 days: 527, unit of analysis is visit of 84 participants (125) 720 days: 527, unit of analysis is visit of 84 participants (211) Modelling method Random forest Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, full model approach Hyperparameter tuning Default parameters of SciKit library Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, (a form of) LOOCV Calibration estimate Not reported Discrimination estimate c‐Statistic: 180 days: 0.71 (95% CI 0.66 to 0.76) 360 days: 0.67 (95% CI 0.62 to 0.71) 720 days: 0.68 (95% CI 0.64 to 0.72) Classification estimate Not reported Overall performance Not reported Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model 46 Predictors in the model Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, timed 25‐foot walk, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable
Interpretation	Aim of the study To obtain predictions on the probability that MS patients in the RR phase will convert to a SP form within a certain time frame Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on collective intelligence rather than individual prediction. Model interpretation Exploratory Suggested improvements (For hybrid model) to investigate the best ways to combine predictions of different agents, to recruit more expert opinions
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.

Item	Authors' judgement	Support for judgement
Participants	No	Routine care data were used with no reported inclusion/exclusion criteria other than diagnostic subtype.
Predictors	Yes	The study was conducted on data from a single centre, and data were collected according to international standards.
Outcome	Yes	The outcome was defined based on gradual increase in EDSS.
Analysis	No	180 days: Calibration was not assessed. Parameter and model tuning was not described. The number of events was low. The final model is unclear. 360 days and 720 days: Calibration was not assessed. Parameter and model tuning was not described. The number of events is low. There is no indication that the model is fit to the entire dataset to finalise the model.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Tommasin 2021.

*Study characteristics*
General information	Model name Radiological Primary source Journal Data source Unclear, secondary Study type Development
Participants	Inclusion criteria Diagnosis of MS according to McDonald's criteria (2010 (Polman 2011), 2017 (Thompson 2018b)) Between 18 years and 70 years of age Clinical assessment and MRI examination not more than 1 month apart Clinical follow‐up available after a minimum of 2 years from MRI examination Exclusion criteria Not reported Recruitment The Human Neuroscience Department of Sapienza University, the MS centre of the Federico II University, Italy Age (years) Mean 39.7 Sex (%F) 63.8 Disease duration (years) Mean 9.9 (SD 8.06) Diagnosis 74.8% RRMS, 25.2% PMS Diagnostic criteria Mixed: McDonald 2010 (Polman 2011), McDonald 2017 (Thompson 2018b) Treatment Unclear timing, 32.5% first line, 39.9% 2nd line, 27.6% none Disease description EDSS median (range): 3.0 (0.0 to 7.5) Recruitment period 2003 to 2018
Predictors	Considered predictors Clinical: disease duration, age, sex, disease phenotype, EDSS at baseline, therapy, time‐to‐follow‐up; radiological: mean diffusivity of normal appearing WM, GM volume, WM volume, T2 lesion load, cerebellar volume, thalamic volume, fractional anisotropy of normal appearing WM, site, random feature Number of considered predictors 16 Timing of predictor measurement At assessment (not defined), at follow‐up Predictor handling Continuously
Outcome	Outcome definition Disability (EDSS): disability progression defined as a minimum increase in EDSS (since baseline EDSS) of 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5) Timing of outcome measurement Follow‐up mean (SD, range): 3.93 years (0.95 years, 2 years to 6 years)
Missing data	Number of participants with any missing value Not reported Missing data handling Not reported
Analysis	Number of participants (number of events) 163 (58) Modelling method Random forest Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, multiple models Hyperparameter tuning Not reported Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 1000 random splits (only those with accuracy difference < 0.02 between training and validation considered) Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.92 Classification estimate Accuracy = 0.92, sensitivity = 0.92, specificity = 0.91 Overall performance Not reported Risk groups Not reported
Model	Model presentation List of predictors (model selected) Number of predictors in the model 4 Predictors in the model T2 lesion load, cerebellar volume, thalamic volume, fractional anisotropy of normal appearing WM Effect measure estimates Not reported Predictor influence measure Feature importance (percentage of classifiers in which predictor more important than random feature) Validation model update or adjustment Not applicable
Interpretation	Aim of the study To evaluate the accuracy of a data‐driven approach, such as machine learning classification, in predicting disability progression in MS Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on imaging and machine learning. Model interpretation Exploratory Suggested improvements Prospective studies to evaluate other aspects of brain involvement, as well as other CNS structures (e.g. spinal cord) using additional techniques (e.g. fMRI, MTR, qMRI)
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.

Item	Authors' judgement	Support for judgement
Participants	No	The data source is unclear. Participants were included based on availability of follow‐up data at least 2 years later.
Predictors	Unclear	The final model contains radiological predictors that were assessed by trained experts at 2 centres at study entry. However, it is unclear if follow‐up time was included in the final model as a predictor.
Outcome	No	The timing of the outcome assessment was any time between 2 years and 6 years, making assessment different across patients.
Analysis	No	The sample size was small. No information on missing data was reported. It was unclear if the differing follow‐up time among the patients was appropriately accounted for. Only discrimination was assessed. It was not clear that the methods used optimally accounted for overfitting and optimism. A final model was not presented.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Tousignant 2019.

*Study characteristics*
General information	Model name 3D CNN + lesion masks Primary source Conference proceeding Data source Randomised trial participants, secondary Study type Development
Participants	Inclusion criteria RRMS Placebo arm of trial Exclusion criteria Participants not completing study Recruitment Participants in 2 large proprietary, multi‐scanner, multi‐centre clinical trials (names not reported) Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment 0% Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors MRI channels: volumes from T1‐weighted pre‐contrast (T1c), T1‐weighted post‐contrast (T1p), T2‐weighted (T2w), proton density‐weighted (Pdw), fluid‐attenuated inversion (FLAIR); T2‐weighted lesion masks; gadolinium enhanced lesion masks Number of considered predictors Non‐tabular data Timing of predictor measurement At imaging Predictor handling Continuously
Outcome	Outcome definition Disability (EDSS): increase in EDSS score within 1 year and sustained for ≥ 12 weeks (baseline EDSS of 0: increase of ≥ 1.5; baseline EDSS of 0.5 to 5.5: increase of ≥ 1; baseline EDSS of ≥ 6: increase of ≥ 0.5) Timing of outcome measurement 1 year
Missing data	Number of participants with any missing value Not reported Missing data handling Exclusion
Analysis	Number of participants (number of events) 1083, unit of analysis is probably observations, of 465 participants (103) Modelling method 3D convolutional neural network Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, multiple models Hyperparameter tuning Unclear, tuning parameters and cross‐validation mentioned, but not tuning details Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 4‐fold (75% training, 15% validation, 10% test) Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.701 (SD 0.027) Classification estimate Not reported Overall performance Not reported Risk groups Not reported
Model	Model presentation List of predictors (no selection) Number of predictors in the model Unstructured data Predictors in the model MRI channels: volumes from T1‐weighted pre‐contrast (T1c), T1‐weighted post‐contrast (T1p), T2‐weighted (T2w), proton density‐weighted (Pdw), fluid‐attenuated inversion (FLAIR); T2‐weighted lesion masks; gadolinium enhanced lesion masks Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable
Interpretation	Aim of the study To present the first automatic end‐to‐end deep learning framework for the prediction of future patient disability progression (1 year from baseline) based on multi‐modal brain magnetic resonance images (MRI) of patients with multiple sclerosis (MS) Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Alternative ways of quantifying uncertainty, adapting architecture to leverage longitudinal clinical information (e.g. age, disability stage)
Notes	Applicability overall High Applicability overall rationale The predictors used were imaging features and no other predictor domain was considered for use in the model.

Item	Authors' judgement	Support for judgement
Participants	Unclear	The data source was unspecified randomised clinical trials but the participants were excluded based on incomplete follow‐up without any specification of the reasons or the number.
Predictors	Yes	The predictors were collected during a clinical trial, so we expect them to be defined and assessed homogeneously. Several scanners across multiple sites were used, but standardisation across sites is mentioned. Expert raters were used in semi‐automated procedures.
Outcome	Yes	We consider standard EDSS outcomes rather objective. The outcome was assessed within clinical trials.
Analysis	No	The sample size was probably small (the highest possible number of events was 103 with 7 inputs and a very complex model). Discrimination was addressed but not calibration. It was unclear whether complexities in the data were appropriately addressed as the analysis appeared to be at the visit level. It was unclear if the optimism in performance was accounted for due to the fact that the outcome assessment periods might be overlapping. No model was provided for future use.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Vasconcelos 2020.

*Study characteristics*
General information	Model name Dev Ext Val Primary source Journal Data source Unclear Study type Development + external validation, time
Participants	Inclusion criteria Definitive diagnosis of RRMS based on Poser criteria for patients attended until 2001 and the McDonald criteria 2001 for patients attended since 2002 Disease duration of at least 2 years Available data on disease progression, with complete data provided in an equal interval of time Exclusion criteria Patients who had incomplete data on the longitudinal evolution of the disease Recruitment MS Centre of the Hospital da Lagoa in Rio de Janeiero, Brazil Age (years) Dev: mean 28.7 (onset) Ext Val: mean 28.5 (onset) Sex (%F) Dev: 76.0 Ext Val: 78.5 Disease duration (years) Dev: mean 16.0 (SD 9.42) Ext Val: mean 13.22 (SD 9.72) Diagnosis 100% RRMS Diagnostic criteria Mixed: Poser 1983, McDonald 2001 Treatment Dev: unclear timing, 58% treated before EDSS 3 Ext Val: unclear timing, 77% treated before EDSS 3 Disease description Patients with more than one relapse at first year of disease: 74% Recruitment period 1993 to 2017
Predictors	Considered predictors Dev: gender, > 1 relapse in the first year of the disease, pyramidal and cerebellar impairment at onset of the disease, treatment before to reach EDSS 3, < 30 years of age at onset, African descent, < 2 years time between the first and second relapses, recovery after first relapse Ext Val: not applicable Number of considered predictors Dev: 8 Ext Val: not applicable Timing of predictor measurement Dev: at multiple visits (unclear if CIS or RR onset) to at least 2 years post‐onset Ext Val: not applicable Predictor handling Dev: all dichotomised Ext Val: not applicable
Outcome	Outcome definition Conversion to progressive MS: time elapsed until the year of confirmed progressive and sustained worsening for at least 6 months and not associated with the occurrence of acute relapse, an irreversible increase of at least 1.0 points in the EDSS when its value was ≤ 5.5 or 0.5 point when it was > 5.5 (independent of relapses and corticosteroid treatment) Timing of outcome measurement Dev: time to outcome mean (SD): 13.70 (8.88) Ext Val: time to outcome mean (SD): 11.45 (7.40)
Missing data	Number of participants with any missing value Dev: 249 Ext Val: 250 Missing data handling Exclusion
Analysis	Number of participants (number of events) Dev: 287 (88) Ext Val: 142 (31) Modelling method Dev: survival, Cox Ext Val: not applicable Predictor selection method Dev: For inclusion in the multivariable model, univariable analysis During multivariable modelling, significance P value < 0.05 Ext Val: not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Dev: none Ext Val: not applicable Performance evaluation dataset Dev: development Ext Val: external validation Performance evaluation method Dev: apparent Ext Val: not applicable Calibration estimate Dev: events per score level Ext Val: O:E table (unclear), Hosmer‐Lemeshow test Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups 2 risk categories: high (> 2 points), low (≤ 2 points)
Model	Model presentation Dev: Unweighted sum score from 0 to 5 (unclear if based on refit without 'recovery', found to be insignificant at multivariable analysis) Risk groups Ext Val: not applicable Number of predictors in the model Dev: 5, unclear if the coefficient for 'recovery' is needed or the model fit without 'recovery' is presented Ext Val: not applicable Predictors in the model Dev: pyramidal and cerebellar impairment at onset of the disease, treatment before EDSS 3, age at disease onset, African descent, time between first and second relapses (unclear if the coefficient for 'recovery' is needed or the model fit without 'recovery' is presented) Ext Val: not applicable Effect measure estimates Dev: HR (95% CI): pyramidal and cerebellar impairment at onset of the disease: 2.5 (1.2 to 5.1), treatment before EDSS 3 2.6 (1.6 to 4.2), age at disease onset 2.0 (1.2 to 3.1), African descent 1.8 (1.1 to 2.8), time between first and second relapses 1.9 (1.2 to 3.0), not results of entire model (unclear if the coefficient for 'recovery' is needed or the model fit without 'recovery' is presented) Ext Val: not applicable Predictor influence measure Not applicable Validation model update or adjustment Dev: not applicable Ext Val: none
Interpretation	Aim of the study To construct a clinical risk score for MS long‐term progression that could be easily applied in clinical practice Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements Validation in different cohorts, especially those with greater diversity concerning the genetic background, and exploration of other factors capable of influencing disease progression (e.g. neuroimaging data)
Notes	Applicability overall Low

Item	Authors' judgement	Support for judgement
Participants	No	The data source is unclear. Excluding participants without complete follow‐up may introduce selection bias. Also, at least 248 patients were excluded for missing data.
Predictors	Unclear	There is no reason to believe predictors were assessed differently across patients. Predictors were collected from onset up to at least 2 years later. It is not clearly stated at what point the model was applied and whether onset referred to CIS onset or RR onset.
Outcome	Yes	The outcome was based on observing an EDSS increase that was confirmed at a later time point. It is unclear whether the outcome was assessed blinded to the predictors, but we do not consider this to be problematic because EDSS assessment is relatively objective, and the definition required confirmation at 6 months. Participants have regular follow‐ups due to inclusion criteria, so assessment timing is likely homogenous.
Analysis	No	Dev: The EPV was 11, which is relatively low. Continuous variables such as age were treated as binary variables. Univariable predictor selection was used. Discrimination and calibration were not addressed properly. The statistical model was simplified into an unweighted sum score (by unclear rounding rules) without the performance of this model being assessed. Besides the large number of participants excluded for irregularly timed data, only 1 participant was reported as being excluded after enrolment, and this probably had little effect on results. Although an external validation set was reported, no assessment was performed for need of shrinkage. Ext Val: The number of events was low, and discrimination was not addressed. Complete case analysis was used for enrolled participants, but this only led to a drop of 2 participants.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Vukusic 2004.

*Study characteristics*
General information	Model name Not applicable Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria MS diagnosis for at least 1 year prior to conception Pregnant for at least 4 weeks but less than 36 weeks at entry into the study Full‐term delivery of live infant First pregnancy in dataset (if multiple observed) Follow‐up until at least delivery Exclusion criteria Not reported Recruitment PRIMS (The Pregnancy in Multiple Sclerosis) natural history study participants from multiple centres across Europe, 76% of females known to their neurologists before recruitment Unclear (total PRIMS cohort): France, Austria, Belgium, Netherlands, Italy, Denmark, Spain, Germany, United Kingdom, Portugal, Switzerland, Ireland Age (years) Mean 30.0 Sex (%F) 100.0 Disease duration (years) Mean 6 (SD 4) Diagnosis 96% RRMS, 4% SPMS Diagnostic criteria Poser 1983 Treatment At recruitment, 0% During follow‐up, 1.8% azathioprine and 0.4% mitoxantrone Disease description DSS at beginning of pregnancy mean (SD): 1.3 (1.4), annualised relapse rate during the year before pregnancy (95% CI): 0.7 (0.6 to 0.8) Recruitment period 1993 to 1995
Predictors	Considered predictors Number of relapses in pre‐pregnancy year, number of relapses during pregnancy, DSS at pregnancy onset, epidural analgesia (ref: no), breast‐feeding (ref: no), total number of relapses before pregnancy, disease duration, age at multiple sclerosis onset, age at pregnancy onset, number of previous pregnancies, child gender (ref: male) Number of considered predictors 11 Timing of predictor measurement At study baseline (cohort entry during pregnancy week 4 to 36), at examinations at 20, 28, 36 weeks of gestation, and also post‐partum Predictor handling Continuously
Outcome	Outcome definition Relapse: a post‐partum relapse, defined as the appearance, reappearance or worsening of symptoms of neurological dysfunction lasting > 24 hours; fatigue alone not considered as a relapse Timing of outcome measurement During 3 months after delivery
Missing data	Number of participants with any missing value ≥ 17, unclear exactly how many participants have any missing Missing data handling Complete case
Analysis	Number of participants (number of events) 223 (63) Modelling method Logistic regression Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, unclear Significance based on P value threshold implied Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.72 Classification estimate Accuracy = 0.72 (cutoff = 0.5) Overall performance Not reported Risk groups Not reported
Model	Model presentation Full regression model Number of predictors in the model 3 Predictors in the model Number of relapses in pre‐pregnancy year, number of relapses during pregnancy, MS duration Effect measure estimates OR (95% CI): number of relapses in pre‐pregnancy year 1.94 (1.32 to 2.80), number of relapses during pregnancy 1.87 (1.12 to 3.13), MS duration 1.11 (1.03 to 1.20) Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To report the 2‐year post‐partum follow‐up and to analyse the factors predictive of relapse in the 3 months after delivery Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on predictor identification. Model interpretation Exploratory Suggested improvements Not reported
Notes	Applicability overall Low Auxiliary references Confavreux C, Hutchinson M, Hours MM, Cortinovis‐Tourniaire P, Moreau T. Rate of pregnancy‐related relapse in multiple sclerosis. Pregnancy in multiple sclerosis group. N Engl J Med 1998;339(5):285‐91.

Item	Authors' judgement	Support for judgement
Participants	Yes	The study used cohort study data collected to assess the effect of pregnancy on MS courses, and the inclusion criteria are appropriate.
Predictors	No	Almost 50% of patients were not followed up prospectively, and nearly 25% were not known to their neurologists before recruitment. Hence, the number of relapses before pregnancy was probably collected non‐uniformly from a mixture of patients and neurologists in a retrospective or prospective manner.
Outcome	Yes	The outcome is a relatively objective one, so even if the predictor information was available at the time of its assessment, it would not introduce risk of bias.
Analysis	No	The EPV was below 10. Calibration was not addressed, and only apparent validation was reported. Participants lost to follow‐up were excluded from the analysis. Reporting of missing data handling was ambiguous but probably based on complete case analysis.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Weinshenker 1991.

*Study characteristics*
General information	Model name M3 Dev Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria MS Exclusion criteria Not reported Recruitment Consecutive patients referred to the MS Clinic at the University Hospital in London, Ontario, Canada Age (years) Mean 30.5 (onset) Sex (%F) 65.7 Disease duration (years) Mean 11.9 (SE 0.3) Diagnosis Other: 65.8% RRMS, 14.8% relapsing progressive, 18.7% chronically progressive, 0.9% unknown/83.3% diagnosis probable, 16.4% diagnosis possible Diagnostic criteria Poser 1983 Treatment 0% Disease description Not reported Recruitment period 1972 to 1984
Predictors	Considered predictors Unclear if it is the complete list, age at onset, sex, seen at onset of MS, initial symptoms ‐ motor, systems involved ‐ brainstem, systems involved cerebellar, systems involved cerebral, systems involved ‐ pyramidal, (in other models: initial symptoms ‐ limb ataxia and balance, remitting at onset, first interattack interval, number of attacks in first 2 years, DSS at 2 years, DSS at 5 years) Number of considered predictors ≥ 13 (unclear if complete list) Timing of predictor measurement At assessment (not defined), at follow‐up Predictor handling Continuously
Outcome	Outcome definition Disability (DSS): time to reach DSS 6 Timing of outcome measurement Follow‐up for 12 years
Missing data	Number of participants with any missing value 38 Missing data handling Complete case
Analysis	Number of participants (number of events) 1060 (498) Modelling method Survival, Weibull Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, stepwise selection P value < 0.05 A form of forward selection Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset None Performance evaluation method Not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups Not reported
Model	Model presentation Full regression model Number of predictors in the model 7 Predictors in the model Age at onset, seen at MS onset, motor (insidious), brainstem, cerebellar, cerebral, pyramidal Effect measure Log HR (SE): intercept 4.25 (0.132), age at onset −0.030 (0.003), seen at MS onset −0.568 (0.104), motor (insidious) −0.224 (0.077), brainstem −0.184 (0.061), cerebellar −0.430 (0.073), cerebral −0.255 (0.100), pyramidal −0.230 (0.090), scale 0.648 (0.022) Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study A multivariate hierarchical analysis to assess the significance of several demographic and clinical factors in multiple sclerosis patients (analysis similar to multiple regression was used to generate predictive models which permit the calculation of the median time to DSS 6 for patients with a given set of covariates). Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the factors. Model interpretation Exploratory Suggested improvements Not reported
Notes	Applicability overall Low Auxiliary references Weinshenker BG, Bass B, Rice GP, Noseworthy J, Carriere W, Baskerville J, et al. The natural history of multiple sclerosis: a geographically based study. I. Clinical course and disability. Brain 1989;112 (Pt 1):133‐46.

Item	Authors' judgement	Support for judgement
Participants	Yes	The authors described collecting a clinical cohort that intended to include all MS patients in the geographical area. They followed up with the patients regularly, and the study data were separate from the routine clinical charts. No inclusion criteria were discussed explicitly, but the study aimed to include all patients with MS in the entire area and called itself a natural history study.
Predictors	No	Although this study is a population‐based cohort and standardised data fields were created with MS research in mind, almost 4/5 of the patients were not seen from the onset onwards. Thus, data on predictors related to the onset were collected differently for patients as retrospectively or prospectively.
Outcome	Yes	The outcome was probably defined with knowledge of the predictors because only a few clinicians saw the patients in a routine care setting. DSS 6 is a relatively 'hard' outcome in which patients become dependent on a walking aid, so we judge the risk of bias due to knowledge of predictors to be low.
Analysis	No	Although not all enrolled participants were included in modelling due to complete case analysis, the missing data were in less than 5% of patients and hence are not expected to introduce risk of bias. Calibration and discrimination were both not addressed; also, model optimism was not addressed. The evaluation only included patients experiencing the outcome instead of using methods that account for the censoring.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Weinshenker 1996.

*Study characteristics*
General information	Model name Short term M3 Ext Val Primary source Journal Data source Routine care, secondary Study type Short term: development M3 Ext Val: external validation, location
Participants	Inclusion criteria Not reported Exclusion criteria Not reported Recruitment Consecutive participants seen by first author at Ottawa Regional MS clinic, Canada Age (years) Mean 44.1 Sex (%F) 69.1 Disease duration (years) Mean 12 Diagnosis Other: 84.3% RRMS, 2.0% relapsing progressive, 13.7% chronically progressive Diagnostic criteria Not reported Treatment Tot reported Disease description Unclear Recruitment period Not reported
Predictors	Considered predictors Short term: unclear if it is the complete list, disease duration (bin, ref: < 20 years), EDSS (difference from 4.5), progression index, predicted time to DSS 6 (model 1 from Weinshenker 1991), follow‐up time M3 Ext Val: not applicable Number of considered predictors Short term: ≥ 5 (unclear if complete list) M3 Ext Val: not applicable Timing of predictor measurement Short term: at assessment (not defined), at follow‐up (unclear: outcome measurement) M3 Ext Val: not applicable Predictor handling Short term: continuously except duration, which was dichotomised M3 Ext Val: not applicable
Outcome	Outcome definition Short term: Disability (EDSS): short‐term progression defined as change in EDSS over 1 year to 3 years of follow‐up M3 Ext Val: Disability (DSS): time to reach DSS 6 (equivalent to EDSS 6.0 or 6.5) defined as the point at which patients required a cane at all times when walking outside the home and the time at which the patient was barely able to walk half a block Timing of outcome measurement Short term: definition 1 year to 3 years, follow‐up summarised for 2 years M3 Ext Val: time to outcome mean (SD): 20.7 years (0.90 years)
Missing data	Number of participants with any missing value Short term: not reported M3 Ext Val: 10, only missing outcome reported Missing data handling Complete case
Analysis	Number of participants (number of events) Short term: 84 or, probably, 174 (43 with worsening, 28 with +1‐point change or higher) M3 Ext Val: ≤ 259 (66) Modelling method Short term: logistic regression M3 Ext Val: not applicable Predictor selection method Short term: For inclusion in the multivariable model, not reported During multivariable modelling, full model approach M3 Ext Val: not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Short term: none M3 Ext Val: not applicable Performance evaluation dataset Short term: development M3 Ext Val: external validation Performance evaluation method Short term: apparent M3 Ext Val: not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Short term: Cutoff = 0.5: accuracy = 0.75, sensitivity = 0.21, specificity = 0.93 0.3 cutoff = 0.3: accuracy = 0.67, sensitivity = 0.54, specificity = 0.72 M3 Ext Val: not reported Overall performance Short term: not reported M3 Ext Val: mean prediction error 5.25 (SD 4.58) Risk groups Not reported
Model	Model presentation Short term: full regression model M3 Ext Val: not applicable Number of predictors in the model Short term: 5 M3 Ext Val: not applicable Predictors in the model Short term: duration, EDSS, progression index, predicted time to DSS 6 from model 1, follow‐up M3 Ext Val: not applicable Effect measure Short term: log OR (SE): intercept −1.45, duration −1.70 (0.68), EDSS −0.65 (0.19), follow‐up 0.64 (0.26), progression index −0.16 (0.27), predicted time to DSS 6 from model 1 0.05 (0.04) M3 Ext Val: not applicable Predictor influence measure Not applicable Validation model update or adjustment Short term: not applicable M3 Ext Val: none
Interpretation	Aim of the study Short term: to establish predictors of short‐term outcome of MS M3 Ext Val: to validate previously published models predicting time to EDSS 6 Primary aim Short term: the primary aim of this study is not the prediction of individual outcomes. Rather, the focus is on predictor identification. M3 Ext Val: the primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements Implicitly suggests that model should be applied to correct patient population (based on temporal course and baseline disability) as opposed to any patients available
Notes	Applicability overall Short term: high M3 Ext Val: low Applicability overall rationale Short term: although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to explore predictors for short‐term disease progression.

Item	Authors' judgement	Support for judgement
Participants	No	Although the authors discussed the probable lack of referral bias, the data were collected for reasons other than this study where no inclusion/exclusion criteria other than the diagnosis were reported.
Predictors	No	Short‐term: The stated interest was in predicting 15‐year outcomes, but the data used were at 15 months after baseline, which brought the window down to less than 14 years. The intended time of the model used is unclear. M3 Ext Val: The fact that 'seen at MS onset' was a predictor in the model made it likely that the data on the participants were collected as a mixture of retrospective and prospective, just like in Weinshenker 1991.
Outcome	Yes	Short‐term: We rated this domain for this analysis as having a high risk of bias. The short‐term progression was not confirmed, although the EDSS might fluctuate. Although this study probably pre‐dates the standard definition of progression, an outcome based on EDSS is standard and considered to be objective. M3 Ext Val: We rated this domain for this analysis as having a low risk of bias. Although the outcome was probably assessed with the knowledge of the predictors, DSS 6 can be considered a hard outcome; thus, knowledge of predictors introduces a little risk of bias.
Analysis	No	Short‐term: The EPV was low. Disease duration was dichotomised, justified by clinical knowledge, but the nonlinearity could have been more thoroughly explored. Many participants were excluded without reporting reasons and comparing these participants to those included. Complete case analysis was used. Neither calibration nor discrimination was assessed, and classification measures were assessed in‐sample. Follow‐up time was added as a predictor instead of using methods to deal with different observation times. M3 Ext Val: The number of events was far below 100, only complete case analysis was done, and the models were not evaluated using calibration and discrimination measures accounting for censoring.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Wottschel 2015.

*Study characteristics*
General information	Model name 1 year 3 years Primary source Journal Data source Cohort, secondary Study type Development
Participants	Inclusion criteria At least one demyelinating lesion visible on baseline scans Available scans, and corresponding lesion masks Available clinical data at 1‐ and 3‐year follow‐up Exclusion criteria Not reported Recruitment UK Age (years) 1 year: mean 33.1 3 years: mean 33.2 Sex (%F) 1 year: 66.2 3 years: 67.1 Disease duration (years) Mean 0.1 (SD 0.07) Diagnosis 100% CIS Diagnostic criteria Not reported Treatment 0% Disease description EDSS median (range): 1 (0 to 8) Recruitment period 1995 to 2004
Predictors	Considered predictors Age, gender, type of CIS (brainstem/cerebellum, spinal cord, optic neuritis, other), EDSS, lesion count, lesion load, average lesion PD intensity, average lesion T2 intensity, average distance of lesions from the centre of the brain, presence of lesions in proximity of the centre of the brain, the shortest horizontal distance of a lesion from the vertical axis of the brain, lesion size profile Number of considered predictors 14 Timing of predictor measurement At disease onset (CIS) and up to a mean of 6.15 weeks (SD 3.4) after disease onset Predictor handling Continuously (polynomial kernel)
Outcome	Outcome definition Conversion to definite MS: clinical conversion to MS due to the occurrence of a second clinical attack attributable to demyelination of more than 24 hours in duration and at least 4 weeks from the initial attack Timing of outcome measurement 1 year: 1 year 3 years: 3 years
Missing data	Number of participants with any missing value 1 year: 0 3 years: ≥ 4, unclear exactly how many participants have any missing Missing data handling 1 year Exclusion 3 years Mixed: complete case, and exclusion
Analysis	Number of participants (number of events) 1 year: 74 (22) 3 years: 70 (31) Modelling method Support vector machine, polynomial kernel Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, stepwise selection Forward selection based on bootstrap classification accuracy Hyperparameter tuning Several values for polynomial degree considered in cross‐validation, other tuning parameters not reported Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, LOOCV repeated on 100 balanced bootstrap samples Calibration estimate Not reported Discrimination estimate Not reported Classification estimate 1 year: accuracy = 0.714 (95% CI 0.58 to 0.84), sensitivity = 0.77, specificity = 0.66, PPV = 0.70, NPV = 0.74 3 years: accuracy = 0.68 (95% CI 0.61 to 0.73), sensitivity = 0.60, specificity = 0.76, PPV = 0.72, NPV = 0.65 Overall performance Not reported Risk groups Not reported
Model	Model presentation List of selected predictors and kernel degree Number of predictors in the model 1 year: 3 (df unclear) 2 years: 6 (df unclear) Predictors in the model 1 year: type of presentation, gender, lesion load 2 years: lesion count, average lesion PD intensity, average distance of lesions from the centre of the brain, shortest horizontal distance of a lesion from the vertical axis, age, EDSS at onset Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable
Interpretation	Aim of the study To determine if machine learning techniques, such as support vector machines (SVMs), can predict the occurrence of a second clinical attack, which leads to the diagnosis of clinically definite multiple sclerosis (CDMS) in patients with a clinically isolated syndrome (CIS), on the basis of single patient's lesion features and clinical/demographic characteristics Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements Use automatically derived features (instead of semi‐automated/manual features), features containing information on different aspects of imaging data (scale, directionality), imaging features not related to lesions (magnetism transfer imaging), other para‐clinical predictors (OCB, grey matter atrophy, genetic factors, spinal cord lesions, cortical lesions, Gd enhancing lesions), larger independent dataset, including temporal ordering of events, novel algorithms
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.

Item	Authors' judgement	Support for judgement
Participants	No	The data source was a secondary use of cohort study, but only participants with complete data were included.
Predictors	Yes	It is unclear if the neurologist circling the lesions was informed of the outcome. Still, we do not believe this would induce any considerable bias as imaging is considered to be an objective predictor. Other predictors are basic and objective.
Outcome	Yes	Although the outcome might have been measured with knowledge of the predictors, clinical attacks are considered objective.
Analysis	No	1 year: The EPV was very low. Only classification measures were evaluated. Model selection and model evaluation occurred on the same data. The final model is unclear. 3 year: The EPV was very low. 4 participants were lost to follow‐up by 3 years, but this was only a small amount of the total patients. Only classification measures were evaluated. Model selection and model evaluation occurred on the same data. The final model is unclear.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Wottschel 2019.

*Study characteristics*
General information	Model name BCGLMS Primary source Journal Data source Cohort, secondary Study type Development
Participants	Inclusion criteria Patients with a CIS examined within 3 months from symptoms onset T1‐weighted MRI sequences of the brain obtained at onset, using standard‐of‐care local protocols Demographic (age, sex) and clinical information (e.g. type of CIS) at baseline The presence/absence of a second relapse at one year follow‐up available Presence of T2‐hyperintense WM brain lesions as outlined in each centre on PD/T2‐weighted or FLAIR MRI by experienced researchers, resulting in binary lesion masks Exclusion criteria Not reported Recruitment 6 MAGNIMS network centres Spain, Denmark, Austria, UK, and Italy Age (years) Mean 32.7 (onset) Sex (%F) 66.3 Disease duration (years) Up to 0.27 Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Not reported Disease description EDSS median (range): 2 (0 to 8) Recruitment period Not reported
Predictors	Considered predictors Global features: (whole‐brain measures) GM volume, WM volume, brain volume as a percentage of the intracranial volume, age, sex, CIS type (brainstem/optic nerve/spinal cord/other), EDSS; region of interest (ROI) features: 143 ROIs (excluding ROIs describing ventricles, skull and background) based on the Neuromorphometrics atlas, each ROI from the brain parcellation used to mask each patient's GM probability map, CT map, lesion segmentation and T1 scan (to estimate the volume); lobe features: (ROIs were merged into nine larger areas according to their anatomical location) limbic, insular, frontal, parietal, temporal, occipital, cerebellum, GM and WM, deep grey matter defined as thalamus, hippocampus, nucleus accumbens, amygdala, caudate nucleus, pallidum, putamen and basal ganglia Number of considered predictors 214 Timing of predictor measurement At disease onset (CIS) and up to 14 weeks after disease onset Predictor handling Continuously
Outcome	Outcome definition Conversion to definite MS: occurrence of a second clinical episode Timing of outcome measurement 1 year
Missing data	Number of participants with any missing value Not reported Missing data handling Exclusion
Analysis	Number of participants (number of events) 400 (91) Modelling method Support vector machine, linear kernel Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, stepwise selection Backward recursive feature elimination removing 20% of predictors with bootstrap averaged SVM weights closest to zero and repeated until accuracy no longer improves Hyperparameter tuning Not applicable Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, k‐fold CV (k = 2, 5, 10 (where possible), LOO) repeated on 100 balanced bootstrap samples Calibration estimate Not reported Discrimination estimate Not reported Classification estimate 5‐fold CV: accuracy = 0.685 (95% CI 0.683 to 0.687), sensitivity = 0.678, specificity = 0.693, LOOCV: accuracy = 0.708 (95% CI 0.706 to 0.71), sensitivity = 0.703, specificity = 0.713 Overall performance Not reported Risk groups Not reported
Model	Model presentation List of selected predictors for peak accuracy when using 2‐fold CV Number of predictors in the model 36 (for 2‐fold CV) Predictors in the model Type of CIS, WM lesion load ‐ whole brain, WM lesion load ‐ frontal, WM lesion load ‐ limbic, WM lesion load ‐ temporal, WM lesion load ‐ dGM, WM lesion load ‐ WM, GM ‐ cerebellum, GM ‐ thalamus, GM ‐ frontal operculum, GM ‐ middle cingulate gyrus, GM ‐ precentral gyrus medial segment, GM ‐ posterior cingulate gyrus, GM ‐ praecuneus, GM ‐ parietal operculum, GM ‐ post‐central gyrus, GM ‐ planum polare, GM ‐ subcallosal area, GM ‐ supplementary motor cortex, GM ‐ superior occipital gyrus, cortical thickness ‐ central operculum, cortical thickness ‐ cuneus, cortical thickness ‐ fusiform gyrus, cortical thickness ‐ inferior temporal gyrus, cortical thickness ‐ middle occipital gyrus, cortical thickness ‐ post‐central gyrus medial segment, cortical thickness ‐ occipital pole, cortical thickness ‐ opercular part of the inferior frontal gyrus, cortical thickness ‐ orbital part of the inferior frontal gyrus, cortical thickness ‐ planum temporale, cortical thickness ‐ superior occipital gyrus, volume ‐ whole brain, volume ‐ ventral diencephalon, volume ‐ middle temporal gyrus, volume ‐ supramarginal gyrus, volume ‐ limbic Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable
Interpretation	Aim of the study To distinguish CIS converters from non‐converters at onset of a CIS, using recursive feature elimination and weight averaging with support vector machines. Also, to assess the influence of cohort size and cross‐validation methods on the accuracy estimate of the classification Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the influence of sample size and CV methods on results. Model interpretation Probably exploratory Suggested improvements To compare 2 or more cross‐validation schemes to estimate potential biases when it is not possible to use completely distinct data sets for training and testing, advanced imaging techniques such as magnetisation transfer imaging (MTR) or double or phase‐shifted inversion recovery (DIR/PSIR), genetic or environmental predictors, larger cohort, longitudinal MRI data, prospective harmonised imaging protocols
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.

Item	Authors' judgement	Support for judgement
Participants	No	A multicentre cohort was used, but patients with missing data were excluded.
Predictors	Yes	Even though imaging predictors might have been a result of preprocessing after the outcome, we do not expect such information to affect an automatised procedure. Imaging data from several sites were used, but they were all MAGNIMS sites and collaborated in defining imaging protocols for the field.
Outcome	Yes	Although the outcome might have been measured with knowledge of the predictors, clinical attacks are considered objective.
Analysis	No	The EPV was very low. Only classification measures were evaluated. Selection and assessment occurred at the same resampling level. The final model is unclear.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Ye 2020.

*Study characteristics*
General information	Model name 5‐gene signature Nomogram Primary source Journal Data source Unclear, secondary Study type Development
Participants	Inclusion criteria Diagnosed with definite MS or CIS Free of steroids and immunomodulatory treatments for at least 30 days before blood withdrawal At least 1 year after treatment with cyclophosphamide Exclusion criteria Patients with neuromyelitis optica (NMO) according to the criteria of Wingerchuk Recruitment Unclear, Sheba Medical Centre, Israel Age (years) Mean 36.3 (unclear when) Sex (%F) 63.8% Disease duration (years) Mean 5.7 (pooled SD 0.89) Diagnosis 34.0% CIS, 66.0% CDMS Diagnostic criteria McDonald 2001 Treatment Unclear timing, 30.9% on DMT Disease description EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1) Recruitment period Not reported
Predictors	Considered predictors 5‐gene signature: differentially expressed genes Nomogram: age, gender, disease type (CIS vs MS), DMT, genetic risk score (based on the 5‐gene signature) Number of considered predictors 5‐gene signature: 202 Nomogram: 206 Timing of predictor measurement At study baseline (cohort entry) Predictor handling 5‐gene signature: unclear, probably continuously Nomogram: age dichotomised, genetic risk score continuously
Outcome	Outcome definition Relapse: relapse‐free survival (relapse defined as the onset of new objective neurological symptoms and signs or worsening of existing neurological disability not accompanied by metabolic changes, fever or other signs of infection, and lasting for a period of at least 48 hours accompanied by objective change of at least 0.5 in the EDSS score) Timing of outcome measurement Follow‐up mean (SD): 1.97 (1.3)
Missing data	Number of participants with any missing value Not reported Missing data handling Single imputation, using k‐nearest neighbours
Analysis	Number of participants (number of events) 94 (64) Modelling method 5‐gene signature: survival, LASSO Cox Nomogram: survival, Cox Predictor selection method 5‐gene signature For inclusion in the multivariable model, univariable analysis During multivariable modelling, modelling method Nomogram For inclusion in the multivariable model, all candidate predictors During multivariable modelling, full model approach Hyperparameter tuning 5‐gene signature: not reported Nomogram: not applicable Shrinkage of predictor weights 5‐gene signature: modelling method Nomogram: none Performance evaluation dataset Development Performance evaluation method 5‐gene signature: random split, 2/3 training and 1/3 test Nomogram: random split, 2/3 training and 1/3 test (B = 1000 bootstraps on training data) Calibration estimate Not reported Discrimination estimate 5‐gene signature c‐Statistic: 1 year 0.518, 2 year 0.655, 3 year 0.729 (development set: 1 year 0.785, 2 year 0.86, 3 year 0.897) Nomogram Survival c‐statistic: 0.59 (development set: 0.67) Classification estimate Not reported Overall performance Not reported Risk groups 5‐gene signature: high‐ and low‐risk groups defined by median of risk score (predicted log hazard ratios in training set): 1.12 Nomogram: not reported
Model	Model presentation 5‐gene signature: regression coefficients without baseline hazard Nomogram: nomogram Number of predictors in the model 5 Predictors in the model 5‐gene signature: FTH1, GBP2, MYL6, NCOA4, SRP9 Nomogram: age, gender, disease type, DMT, risk score Effect measure estimates 5‐gene signature: HR (95% CI): FTH1 9.080 (2.31309 to 35.65), GBP2 0.155 (0.02757 to 0.88), MYL6 0.019 (0.00028 to 1.23), NCOA4 0.106 (0.02277 to 0.49), SRP9 23.045 (3.00729 to 176.60) Nomogram: HR (95% CI): age 1.032 (0.442 to 2.411), gender 0.727 (0.371 to 1.425), disease type 1.657 (0.726 to 3.784), DMT 0.707 (0.307 to 1.628), risk score 1.159 (1.076 to 1.248) Predictor influence measure 5‐gene signature: not reported Nomogram: not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To develop and validate an effective and noninvasive prognostic gene signature for predicting the probability of relapse and remission period in MS patients via an integrated analysis of blood microarrays Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Include Asian participants
Notes	Applicability overall 5‐gene signature: high Nomogram: low Applicability overall rationale 5‐gene signature: the predictors used were genomic features and no other predictor domain was considered for use in the model. Auxiliary references Gurevich M, Tuller T, Rubinstein U, Or‐Bach R, Achiron A. Prediction of acute multiple sclerosis relapses by transcription levels of peripheral blood cells. BMC Medical Genomics [Electronic Resource] 2009;2:46.

Item	Authors' judgement	Support for judgement
Participants	Unclear	The data source was not clearly reported in this study, nor in the original study from which the data came.
Predictors	Yes	According to the study from which the data came, although microarray analysis of transcriptome can be affected by different batch effects, there were efforts to circumvent it. Even though the microarray analysis might have occurred after the outcomes became known, the procedure was relatively automated and expected not to be affected by the information. The intended time of model use concerning patient's disease history is unclear, but it may be any time that blood was drawn.
Outcome	Yes	Information related to the outcome was retrieved from Gurevich 2009. A standard definition of relapse was used, which we considered robust to possible predictor knowledge.
Analysis	No	5‐gene signature: The number of predictors compared to number of events was too large. Univariable analysis was used for predictor selection. Although no information related to missing data was reported, the number of included patients matches with the publication on the data source. Only discrimination, which did not appear to account for censored‐data, was assessed. A random split was used for assessment. Parameter tuning was not discussed and the plots corresponding to the Cox LASSO model selection do not correspond to the final model presented (plots depict optimal predictor number 11, not 5). Nomogram: The number of predictors compared to number of events was too large. Although no information related to missing data was reported, the number of included patients matches with the publication on the data source. Only discrimination was assessed. A random split was used for assessment, in addition to a bootstrap procedure in the training set that did not correct for optimism.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Yoo 2019.

*Study characteristics*
General information	Model name CNN EDT, pretraining, all user‐defined features Primary source Journal Data source Randomised trial participants, secondary Study type Development
Participants	Inclusion criteria Age between 18 years and 60 years Patients with onset of their first demyelinating symptoms within the previous 180 days A minimum of 2 lesions that were at least 3 mm in diameter on a T2‐weighted (T2w) screening brain MRI (one had to be ovoid, periventricular or infratentorial) For patients over the age of 50 years cerebrospinal fluid oligoclonal bands or spinal MRI changes typical of demyelination Exclusion criteria Better explanation for the event Previous event reasonably attributable to demyelination Meeting the 2005 McDonald criteria for MS (Polman 2005) Recruitment Minocycline RCT participants recruited from MS clinics in Canada and USA: Cumming School of Medicine and the Hotchkiss Brain Institute, Calgary, AB University of British Columbia, Vancouver, BC University of Montreal, Montreal, QC Tufts University, Boston, MA Western University, London, ON, Fraser Health MS Clinic, Burnaby, BC University of Ottawa and the Ottawa Hospital Research Institute, Ottawa, ON, Dalhousie University, Halifax, NS University of Alberta, Edmonton, AB University of Manitoba, Winnipeg, MB, Clinique Neuro Rive‐Sud, Greenfield Park, QC University of Toronto, Toronto, ON, CHA‐Hôpital Enfant‐Jésus, Québec, QC Age (years) Mean 35.9 (onset) Sex (%F) 69.0 Disease duration (years) Median 0.2 (range 0.06 to 0.52) Diagnosis 100% CIS Diagnostic criteria Not reported Treatment At recruitment, 0% During follow‐up, approximately 50% on minocycline Disease description EDSS median (range): 1.5 (0 to 4.5) Recruitment period 2009 to 2013
Predictors	Considered predictors MRI mask images/user‐defined: T2w lesion volume, brain parenchymal fraction, diffusely abnormal white matter, gender, initial CIS event cerebrum, initial CIS event optic nerve, initial CIS event cerebellum, initial CIS event brainstem, initial CIS event spinal cord, EDSS, CIS monofocal or multifocal type at onset Number of considered predictors Non‐tabular data + 11 (user‐defined) Timing of predictor measurement At disease onset (CIS) (RCT baseline within 180 days after disease onset) Predictor handling Continuously except for DAWM, which was dichotomised (justified as binary being more reliable)
Outcome	Outcome definition Conversion to definite MS (McDonald 2005 (Polman 2005)): MS at the end of 2 years determined by new T2 lesions, new T1 gadolinium enhancing lesions and/or new clinical relapse Timing of outcome measurement 2 years
Missing data	Number of participants with any missing value 9, only missing outcome reported Missing data handling Not reported
Analysis	Number of participants (number of events) 140 (80) Modelling method Convolutional neural network Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, full model approach Hyperparameter tuning Empirically determined L1 and L2‐norm parameters (Montavon 2012), early stopping convergence target found by test error increase during cross‐validation, grid search over several values for replication and scale factors using cross‐validation accuracy Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 7‐fold Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.746 (SD 0.114) Classification estimate Accuracy = 0.75 (SD 0.113), sensitivity = 0.787 (SD 0.122), specificity = 0.704 (SD 0.154) Overall performance Not reported Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model Non‐tabular data + 11 Predictors in the model Non‐tabular: MRI mask images, tabular: T2w lesion volume, brain parenchymal fraction, diffusely abnormal white matter, gender, initial CIS event cerebrum, initial CIS event optic nerve, initial CIS event cerebellum, initial CIS event brainstem, initial CIS event spinal cord, EDSS, CIS monofocal or multifocal type at onset Effect measure estimates Not reported Predictor influence measure Average relative importance of the user‐defined features Validation model update or adjustment Not applicable
Interpretation	Aim of the study To determine whether deep learning can extract latent MS lesion features that, when combined with user‐defined radiological and clinical measurements, can predict conversion to MS ... in patients with early MS symptoms (clinically isolated syndrome), a prodromal stage of MS, more accurately than imaging biomarkers that have been used in clinical studies to evaluate overall disease state, such as lesion volume and brain volume Primary aim The primary aim of this study is not entirely on the prediction of individual outcomes. The focus is on the ability of deep learning to extract latent features. Model interpretation Exploratory Suggested improvements Examine more sophisticated strategies such as augmenting input feature vectors with the squared values or by taking polynomial combinations of feature vectors to increase feature dynamic range and creating an augmented network that has the ability to learn higher order features.
Notes	Applicability overall High Applicability overall rationale Although this study contained a model, the main aim was not to create a model for prediction of individual outcomes but was rather to how the ability of deep learning algorithms to extract prognostic factors. Auxiliary references Metz LM, Li DKB, Traboulsee AL, Duquette P, Eliasziw M, Cerchiaro G, et al. Trial of minocycline in a clinically isolated syndrome of multiple sclerosis. N Engl J Med 2017;376(22):2122‐33.

Item	Authors' judgement	Support for judgement
Participants	Unclear	The data came from an RCT with well‐explained, appropriate inclusion/exclusion criteria. It is unclear why cerebrospinal fluid oligoclonal bands, or spinal MRI changes typical of demyelination, were required for participants over age 50 years. This means that the patients were known to have the outcome by more current diagnostic criteria. It is unclear if this introduces any risk of bias.
Predictors	Yes	Even though imaging predictors might have been a result of preprocessing after the outcome, we do not expect such information to affect an automatised procedure. The seeds were set by a single expert and checked by another single expert. Other predictors were assessed by MS clinicians.
Outcome	Yes	Although the outcome might have been measured with knowledge of the predictors, the diagnostic criteria are considered objective.
Analysis	No	The number of participants and events relative to the number of tabular features was very low. It was unclear how missing data were handled, but 12‐month outcomes were used for 9 participants. Calibration was not assessed. Evaluation and tuning occurred at the same level, where there was no nested structure to resampling. No final model/tool was given.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Yperman 2020.

*Study characteristics*
General information	Model name RF literature + time series predictors Primary source Journal Data source Routine care, secondary Study type Development
Participants	Inclusion criteria MS patients with at least one EP measurement visit (complete MEP (left/right hands/feet)) 2‐year follow‐up Exclusion criteria Visits that do not contain all 4 (2 APB for the hands and 2 AH for the feet) evoked potential time series (EPTS) Visits without 2‐year follow‐up Motor evoked potentials (MEPs) with facilitation method Any EPTS that have a spectral power above an empirically determined threshold at the starting segment (determined by the values of the latency of a healthy patient) of the measurement Measurements of a duration differing from 100 ms Recruitment Rehabilitation & MS Centre in Overpelt, Belgium Age (years) Mean 45.0 Sex (%F) 71.8 Disease duration (years) Not reported Diagnosis CIS 1.7%, RRMS 53.2%, SPMS 10.7%, PPMS 2.9%, unknown 32.9% Diagnostic criteria Unclear, unrecorded in the dataset Treatment At recruitment, 74% on DMT During follow‐up, 79% on DMT at 2 years Disease description EDSS mean (SD): 3.0 (1.8) Recruitment period Not reported
Predictors	Considered predictors Latencies, EDSS at T0, age, peak‐to‐peak amplitude (L and R), gender, type of MS, around 5885 time series features extracted from the EPTS Number of considered predictors 5893 Timing of predictor measurement At visit of interest Predictor handling Continuously
Outcome	Outcome definition Disability (EDSS): disability progression defined as EDSS(T1) ‐ EDSS(T0) ≥ 1.0 for EDSS(T0) ≤ 5.5, or if EDSS(T1) ‐ EDSS(T0) ≥ 0.5 for EDSS(T0) > 5.5 Timing of outcome measurement Time from EDSS_baseline measurement to EDSS_outcome median (IQR): 1.98 years (1.84 years to 2.08 years), time from MEP_baseline measurement to EDSS_outcome median (IQR): 1.99 years (1.87 years to 2.08 years)
Missing data	Number of participants with any missing value 3717, unclear exactly how many participants have any missing Missing data handling Mixed, exclusion, complete case (on visit level), and complete feature analysis
Analysis	Number of participants (number of events) 2502 visits (unit of analysis is visit) of 419 participants (275) Modelling method Random forest Predictor selection method For inclusion in the multivariable model, univariable analysis During multivariable modelling, choosing 1 predictor per cluster in hierarchical clustering, Boruta, top n features with variable importance and cross‐validated performance as criteria (mutual information at univariable level) Hyperparameter tuning Unclear, maximum number of features, number of trees and minimum samples for split chosen in cross‐validation Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 100 grouped stratified shuffle split within 1000 grouped stratified shuffle split Calibration estimate Calibration plot upon request Discrimination estimate c‐Statistic = 0.75 (SD 0.07) Classification estimate Not reported Overall performance Not reported Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model ≤ 9 (unclear subset) Predictors in the model Selected predictors unclear, at least latencies, EDSS at T0, age Effect measure estimates Not reported Predictor influence measure 20 highest ranking features across all splits Validation model update or adjustment Not applicable
Interpretation	Aim of the study To investigate whether a machine learning approach that includes extra features from the EPTS can increase the predictive performance of EP in MS (progression in 2 years) Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on the value of EP features. Model interpretation Exploratory Suggested improvements Data augmentation to expand the size of the training set, to stabilise the performance estimate, analysing the whole longitudinal trajectory of the patient, to use TS algorithms not included in HCTSA, using short timescale EPTS changes (e.g. 6 months) to predict EDSS changes on longer time‐scales or to detect non‐response to treatment, incorporating the left/right symmetry in a more advanced way, other variables such as MRI, cerebrospinal fluid, and genomic data, evaluation in larger datasets (preferably multicentre), VEP and SEP should be included in prediction process
Notes	Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the added usefulness of EP with extra time series features using machine learning. Additionally, no final model was reported. Auxiliary references Fulcher BD, Little MA, Jones NS. Highly comparative time‐series analysis: the empirical structure of time series and their methods. J R Soc Interface 2013;10(83):20130048.

Item	Authors' judgement	Support for judgement
Participants	No	The data source was routine care, and the exclusion of visits/measurements was dependent on the quality of measurements and the availability of the outcome measurement.
Predictors	No	Two different machines were used; however, our clinical authors do not find this to be problematic. The predictors were probably assessed without outcome knowledge and are available at prediction model use. However, the disease type variable did not exist in the original data source and was inferred based on other variables only for a subset of patients.
Outcome	No	Progression was not confirmed. Table 1 suggested similar rates of worsening across disease subtypes in the paper, which is not expected and could be due to the lack of confirmation.
Analysis	No	The sample size was small relative to the large number of predictors. Exclusion of patients for missing data was addressed in the Participants section, but further exclusion due to missing data seems likely. The analysis was done at the visit level. Group stratified internal validation was used to address this, but it is unclear if this was enough. The correlation between observations was not addressed in the fitting. It is unclear how it would be addressed had a final model been selected and fit to the entire dataset. Univariable selection was used. The feature extraction and standardisation were done on the entire dataset, instead of within cross‐validation, making data leakage possible. A calibration plot was provided at follow‐up and showed severe miscalibration. No final model seems to be selected, fit, and presented.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Zakharov 2013.

*Study characteristics*
General information	Model name Not applicable Primary source Journal Data source Unclear Study type Development
Participants	Inclusion criteria Patients with monofocal CIS Exclusion criteria Not reported Recruitment Department of Neurology and Neurosurgery of the Samara State Medical University and at the Centre for MS at the Samara Regional Clinical Hospital, Russia Age (years) Mean 25.1 Sex (%F) 70.0 Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Not reported Disease description EDSS ≤ 2 Recruitment period 2004 to 2012
Predictors	Considered predictors Unclear if it is the complete list, age, number of foci, location of foci, size of the demyelination foci Number of considered predictors ≥ 2 (unclear if complete list) Timing of predictor measurement Unclear, at first MRI after CIS onset (timing distribution unknown) Predictor handling Not reported
Outcome	Outcome definition Conversion to definite MS (McDonald 2010 (Polman 2011)): development of CDMS defined as the time of the onset of the second attack Timing of outcome measurement Follow‐up for 8 years
Missing data	Number of participants with any missing value Not reported Missing data handling Not reported
Analysis	Number of participants (number of events) 102 (23) Modelling method Logistic regression Predictor selection method Not reported Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Sensitivity = 0.727, specificity = 0.345 Overall performance Not reported Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model 2 Predictors in the model Age at disease onset, size of the foci of demyelination Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable
Interpretation	Aim of the study To study the clinical and instrumental parameters of the patient population with the first attack of the demyelinating process and involvement of only one functional system, most relevant to the term 'monofocal CIS' Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the predictors. Model interpretation Probably exploratory Suggested improvements Increase in the number of variables involved in the model, such variables as immunological indicators and data from neurophysiological methods – multimodal evoked potential
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on predictors and model timing, applicability is unclear.

Item	Authors' judgement	Support for judgement
Participants	Unclear	The data source is unclear without an indication of an associated study or registry. There were no detailed exclusion/inclusion criteria other than the diagnosis of monofocal CIS.
Predictors	Unclear	There are few details on how predictors were assessed or when they were assessed.
Outcome	Unclear	The definition of a second attack for a CDMS is standard and is expected to be measured relatively objectively. It is unclear how much time there was between the predictor assessment and outcome determination because the timing of the predictor measurement is unclear. It is unclear if there were regular visits for outcome assessment or it was assessed whenever/if a patient came in.
Analysis	No	Although many details of the analysis were not reported, there are clear indicators to assess the risk of bias of this domain as high. EPV is at most 11.5, based on the number of variables in the final model, not the unknown number of variables considered. There was no information on missing data, including censoring, during the 8‐year follow‐up period. The only model performance measures reported were sensitivity and specificity evaluated in the development set. A final model is not presented.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Zhang 2019.

*Study characteristics*
General information	Model name Shape Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria Patients who initially presented with CIS, i.e. showed symptoms suggestive of an inflammatory central nervous disease without fulfilling the 2010 McDonald criteria (Polman 2011) for MS Patients with at least 3 years of follow‐up (or earlier diagnosis of conversion to MS) The presence of a baseline MR scan, including a FLAIR and T1w image Exclusion criteria Not reported Recruitment Prospectively from a single centre (unclear), Germany Age (years) Mean 42.4 (unclear when) Sex (%F) 69.9 Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria McDonald 2010 (Polman 2011) Treatment At recruitment, 1.2% IFN‐b During follow‐up, not reported Disease description EDSS median 1 Recruitment period 2009 to 2013
Predictors	Considered predictors Total lesion number, total lesion volume, minimum, maximum, mean, standard deviation for surface area, sphericity, surface‐volume‐ratio, and volume of individual lesions, (other models: minimum, maximum, mean, standard deviation for skewness, kurtosis, entropy of intensity histograms) Number of considered predictors 30 Timing of predictor measurement At disease onset (CIS) (during primary clinical work‐up for CIS) Predictor handling Continuously (by summary statistics of parameters from multiple lesions within single patients)
Outcome	Outcome definition Conversion to definite MS (McDonald 2010): demonstration of dissemination in time by a clinical relapse or the occurrence of new MRI lesions Timing of outcome measurement 3 years
Missing data	Number of participants with any missing value 2 Missing data handling Exclusion
Analysis	Number of participants (number of events) 84 (66) Modelling method Random forest, oblique ‐ linear multivariable model splitting Predictor selection method For inclusion in the multivariable model, all candidate predictors During multivariable modelling, multiple models Hyperparameter tuning Number of variables considered at each node (considered 3, sqrt (number of variables) and 7 variables) and number of trees (considered 100, 200 and 300 trees) were optimised on out‐of‐bag error during 3‐fold CV Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 3‐fold Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 0.85 (95% CI 0.75 to 0.91), sensitivity = 0.94 (95% CI 0.85 to 0.98), specificity = 0.50 (95% CI 0.26 to 0.74), PPV = 0.87 (95% CI 0.81 to 0.91), NPV = 0.69 (95% CI 0.44 to 0.87), DOR = 15.50 (95% CI 3.93‐ to 60.98), balanced accuracy = 0.72 (posterior probability interval 0.60 to 0.82) Overall performance Not reported Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model 18 Predictors in the model Total lesion number, total lesion volume, minimum, maximum, mean, standard deviation for surface area, sphericity, surface‐volume‐ratio, and volume of individual lesions Effect measure estimates Not reported Predictor influence measure Bootstrapped importance scores Validation model update or adjustment Not applicable
Interpretation	Aim of the study To predict the conversion from CIS to multiple sclerosis (MS) based on the baseline MRI scan by studying image features of these lesions Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the new MRI features. Model interpretation Probably exploratory Suggested improvements Independent validation, other features (texture, advanced deep learning, clinical, paraclinical), predict disease course not only conversion
Notes	Applicability overall High Applicability overall rationale The predictors used were imaging features and no other predictor domain was considered for use in the model. Auxiliary references Filippi M, Preziosa P, Meani A, Ciccarelli O, Mesaros S, Rovira A, et al. Prediction of a multiple sclerosis diagnosis in patients with clinically isolated syndrome using the 2016 MAGNIMS and 2010 McDonald criteria: a retrospective study. Lancet Neurol 2018;17(2):133‐42.

Item	Authors' judgement	Support for judgement
Participants	No	The data source was described as a cohort, and patients could not have the outcome at baseline according to the information received during follow‐up. However, the amount of follow‐up was an inclusion criterion, which may introduce risk of bias.
Predictors	Yes	The predictors were collected at a single centre, and sensitivity to the lesion extraction method was examined.
Outcome	Yes	The outcome was based on well‐defined standard diagnostic criteria and we believe it is robust to knowledge of predictor information.
Analysis	No	The EPV was low. Neither calibration nor discrimination was addressed. During follow‐up, it was confirmed that patients were included in the analysis based on the availability of follow‐up, but the frequency of patients excluded from the analysis due to missing outcome data was less than 5% and hence is not expected to increase the risk of bias. The final model was unclear.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

Zhao 2020.

*Study characteristics*
General information	Model name XGB All XGB Common LGBM All LGBM Common XGB Common Val LGBM Common Val Primary source Journal Data source Cohort, primary Study type Development + validation (unclear if model refit), location
Participants	Inclusion criteria XGB All, XGB Common, LGBM All, and LGBM Common: Unclear, for the source population as reported in Gauthier 2006 Age ≥ 18 years Definitive diagnosis of MS within the last 3 years whether treated or untreated Adult with a diagnosis of MS meeting 2010 International Panel criteria from all CLIMB participants Recruited into the QOL arm of the CLIMB study (enrolled between 4 May 2000 and 9 March 2013) or participants who had 10 years of follow‐up since first symptom XGB Common Val and LGBM Common Val: Unclear, for the source population as reported in Bove 2018 Age between 18 years and 65 years Exclusion criteria Not reported Recruitment XGB All, XGB Common, LGBM All, and LGBM Common: prospectively from the clinical practice of Brigham and Women’s Hospital forming the CLIMB cohort within the SUMMIT consortium, USA XGB Common Val and LGBM Common Val: prospective observational research cohort from San Francisco MS Center at University of California, preferential recruitment of ambulatory participants and those with a recent onset of CDMS (2001 International Panel Diagnostic Criteria) or CIS, forming the EPIC cohort within the SUMMIT consortium, USA Age (years) XGB All, XGB Common, LGBM All, and LGBM Common: unclear, mean 39.0 (all participant data is for source population of the selected cohort) XGB Common Val and LGBM Common Val: unclear, mean 42.5 (all participant data is for source population of the selected cohort) Sex (%F) XGB All, XGB Common, LGBM All, and LGBM Common: 76.1 XGB Common Val and LGBM Common Val: 68.7 Disease duration (years) XGB All, XGB Common, LGBM All, and LGBM Common: median 2.0 (range: 0 to 44) XGB Common Val and LGBM Common Val: median 6.0 (range: 0 to 45) Diagnosis XGB All, XGB Common, LGBM All, and LGBM Common: unclear, 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS XGB Common Val and LGBM Common Val: unclear, 15.9% CIS, 70.8% RRMS, 9.3% SPMS, 3.9% PPMS Diagnostic criteria Not reported Treatment XGB All, XGB Common, LGBM All, and LGBM Common: At recruitment, unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other During follow‐up, unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment XGB Common Val and LGBM Common Val: At recruitment, unclear, for source cohort, 93.4% DMT first line, 0.7% DMT oral, 1.6% DMT high, 0.7% experimental, 0.5% immune, 1.6% steroid, 1.6% MS other During follow‐up, unclear, for source cohort, treatment at last visit, 57.4% DMT first line, 19.8% DMT oral, 9.6% DMT high, 3% experimental, 0.5% immune, 5% steroid, 4.8% MS other, 15.1% never on treatment Disease description XGB All, XGB Common, LGBM All, and LGBM Common: unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5) XGB Common Val and LGBM Common Val: unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 3; 0 to 7) Recruitment period From 2000 onward
Predictors	Considered predictors XGB All, XGB Common, LGBM All, and LGBM Common: unclear if it is the complete list, Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time XGB Common Val and LGBM Common Val: not applicable Number of considered predictors XGB All and LGBM All: 198 XGB Common and LGBM Common: ≤ 105 (unclear subset) XGB Common Val and LGBM Common Val: not applicable Timing of predictor measurement XGB All and LGBM All: at multiple assessments every 6 months from baseline (undefined) to year 2 XGB Common and LGBM Common: at multiple assessments every year from baseline (undefined) to year 2 XGB Common Val and LGBM Common Val: not applicable Predictor handling XGB All, XGB Common, LGBM All, and LGBM Common: unclear, probably continuously XGB Common Val and LGBM Common Val: not applicable
Outcome	Outcome definition Disability (EDSS): worsening defined as an increase in EDSS ≥ 1.5 Timing of outcome measurement Up to 5 years
Missing data	Number of participants with any missing value Not reported Missing data handling Mixed Exclusion of variables with excessive missing values (Unclear) missing values in the time series: numeric values interpolated/extrapolated linearly using the nearest data points (Unclear) categorical values filled using the mode of existing values in the patient
Analysis	Number of participants (number of events) XGB All, XGB Common, LGBM All, and LGBM Common: 724 (165) XGB Common Val and LGBM Common Val: 400 (130) Modelling method XGB All, and XGB Common: XGBoost LGBM All, and LGBM Common: LightGBM XGB Common Val and LGBM Common Val: not applicable Predictor selection method XGB All, XGB Common, LGBM All, and LGBM Common: For inclusion in the multivariable model, all candidate predictors During multivariable modelling, full model approach XGB Common Val and LGBM Common Val: For inclusion in the multivariable model, not applicable During multivariable modelling, not applicable Hyperparameter tuning XGB All, XGB Common, LGBM All, and LGBM Common: unclear, best cost‐sensitive learning weight selected using grid search over list of weights and chosen based on 5‐fold CV AUC, algorithm‐specific parameters not discussed, code suggests some tuning XGB Common Val and LGBM Common Val: not applicable Shrinkage of predictor weights XGB All, XGB Common, LGBM All, and LGBM Common: modelling method XGB Common Val and LGBM Common Val: not applicable Performance evaluation dataset XGB All, XGB Common, LGBM All, and LGBM Common: development XGB Common Val and LGBM Common Val: external validation Performance evaluation method XGB All, XGB Common, LGBM All, and LGBM Common: cross‐validation, 10‐fold (nested) XGB Common Val and LGBM Common Val: unclear if model refit to new data Calibration estimate Not reported Discrimination estimate XGB All, and LGBM All: c‐statistic = 0.78 XGB Common, and LGBM Common: c‐statistic = 0.76 XGB Common Val and LGBM Common Val: c‐statistic = 0.82 Classification estimate XGB All: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.74, sensitivity = 0.68, specificity = 0.76 XGB Common: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.65, sensitivity = 0.75, specificity = 0.62 LGBM All: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.77, sensitivity = 0.58, specificity = 0.82 LGBM Common: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.64, sensitivity = 0.75, specificity = 0.61 XGB Common Val: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.68, sensitivity = 0.85, specificity = 0.60 LGBM Common Val: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.73, sensitivity = 0.73, specificity = 0.73 Overall performance Not reported Risk groups Not reported
Model	Model presentation Not reported Number of predictors in the model XGB All and LGBM All: 198 XGB Common and LGBM Common: ≤ 105 (unclear subset) XGB Common Val and LGBM Common Val: not applicable Predictors in the model XGB All, XGB Common, LGBM All, and LGBM Common: unclear if it is the complete list, Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time XGB Common Val and LGBM Common Val: not applicable Effect measure estimates XGB All, XGB Common, LGBM All, and LGBM Common: not reported XGB Common Val and LGBM Common Val: not applicable Predictor influence measure XGB All and LGBM All: top 10 predictive features XGB Common and LGBM Common: not reported XGB Common Val and LGBM Common Val: not applicable Validation model update or adjustment XGB All, XGB Common, LGBM All, and LGBM Common: not applicable XGB Common Val and LGBM Common Val: model probably refit
Interpretation	Aim of the study To apply machine learning techniques to predict the disability level of MS patients at the 5‐year time point using the first 2 years of clinical and neuroimaging longitudinal data Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on machine learning methods. Model interpretation Probably exploratory Suggested improvements Time series models to better capture the temporal dependencies, incorporate genetic information, and additional biomarkers
Notes	Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. Additionally, it is unclear how many participants had already experienced the outcome by 2 years, at which time point the predictors were still being collected. Auxiliary references Bove R, Chitnis T, Cree BA, Tintoré M, Naegelin Y, Uitdehaag BM, et al. SUMMIT (serially unified multicenter multiple sclerosis investigation): creating a repository of deeply phenotyped contemporary multiple sclerosis cohorts. Mult Scler 2018;24(11):1485‐98. Gauthier SA, Glanz B I, Mandel M, Weiner HL. A model for the comprehensive investigation of a chronic autoimmune disease: the multiple sclerosis CLIMB study. Autoimmun Rev 2006;5(8):532‐6.

Item	Authors' judgement	Support for judgement
Participants	Unclear	The data were prospectively collected from a cohort. Although references cited in the article have some study inclusion/exclusion criteria, the number of patients used in this article does not match them. Thus, the inclusion/exclusion criteria used in the article are unclear.
Predictors	Unclear	Due to the fact that the data were from cohort studies, the predictors are expected to be similar. The intended time of model use is unclear, and the predictors from 2 years were used to predict 5‐year outcome.
Outcome	No	A pre‐specified outcome was probably used, but EDSS change was not confirmed. We are not concerned that the outcome assessment could be biased by knowledge of predictors' values. However, it is unclear how many patients had had the outcome by 2 years at the time point where predictors were still collected.
Analysis	No	XGBoost All, XGBoost Common, LightGBM All and LightGBM Common: The number of events per variable was low compared to the unknown number of predictors considered. Calibration was not assessed. Parameter tuning was reported to occur over a grid of values within an inner CV loop, so optimism was probably addressed. The final model was unclear. XGBoost Common and LightGBM Common: Calibration was not assessed. The model appears to be refit in the external validation set.
Overall	No	At least one domain is at high risk of bias.

Open in a new tab

25FW (also seen as T25FW): timed 25‐foot walk 2D: 2‐dimensional 3D: 3‐dimensional 3:4‐DAP: 3,4‐diaminopyridine 4‐AP: 4‐aminopyridine 9‐HPT (also seen as 9HPT): 9‐hole peg test ABILHAND: interview‐based assessment of a patient‐reported measure of the perceived difficulty in using their hand to perform manual activities ACTH: adrenocorticotropic hormone Ada (AdaBoost): adaptive boosting ADL: activities of daily living AH: abductor hallucis AIC: Akaike information criterion AISM: Italian Multiple Sclerosis Society APB: abductor pollicis brevis AUC: area under the curve BCVA: best corrected visual acuity BENEFIT: Betaferon/Betaseron in Newly Emerging MS for Initial Treatment BIC: Bayesian information criterion BMA: Bayesian model averaging BMS: benign MS BPF: brain parenchymal fraction BPTF: Bayesian probabilistic tensor factorisation BREMS: Bayesian Risk Estimate for Multiple Sclerosis BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset CAO: clinician‐assessed outcomes CCA: current course assignment CCV: cerebral cortical volume CDMS: clinically definite multiple sclerosis CEL: contrast‐enhancing lesion CI: confidence interval CIS: clinically isolated syndrome CL: cortical lesion CLCN4: chloride voltage‐gated channel 4 CLIMB: Comprehensive Longitudinal Investigation of Multiple Sclerosis at Brigham and Women's CMCT: central motor conduction time CombiWISE: Combinatorial Weight‐adjusted Disability Score COMRIS‐CTD: Combinatorial MRI scale of CNS tissue destruction CNN: convolutional neural network CNS: central nervous system COPOUSEP: Corticothérapie Orale dans les Poussées de Sclérose en Plaques CPT: current procedural terminology CSF: cerebrospinal fluid CT: computed tomography CTh: cortical thickness CUIs: concept unique identifiers CXCL13: chemokine ligand 13 CV: cross‐validation DAWM: diffusely abnormal white matter  Dev: development df: degrees of freedom Dgm: deep grey matter DIR: double inversion recovery DIS: dissemination in space DIT2010: dissemination in time according to McDonald 2010 criteria DMD: disease‐modifying drug DMT: disease‐modifying treatment DNA: deoxyribonucleic acid DSS: Disability Status Scale DT: decision time; decision tree EDSS: expanded disability status scale EDT: Euclidean distance transform EHR: electronic health record EP: evoked potential EPIC: expression, proteomics, imaging, clinical EPTS: evoked potential time series EPV: events per variable Ext: external F: female F1: F‐score FCA: future course assignment FLAIR: fluid‐attenuated inversion recovery FLP: first level predictor FREEDOMS: FTY720 Research Evaluating Effects of Daily Oral therapy in Multiple Sclerosis FS: functional systems FTP: fine tuning predictor GA: glatiramer acetate Gd (also seen as “GD”): gadolinium Gd‐DTPA: gadolinium diethylenetriamine penta‐acetic acid GM: grey matter GRU‐ODE‐Bayes: Gated Recurrent Unit‐Ordinary Differential Equation‐Bayes HLA: human leukocyte antigen HR: hazard ratio ICBM‐DTI: International Consortium of Brain Mapping diffusion tensor imaging ICD: International Classification of Disease IFN: interferon IgG: immunoglobulin G IL2: interleukin‐2 ILIRN: interleukin‐1 receptor antagonist IQR: interquartile range JHU‐MNI: Johns Hopkins University‐Montreal Neurological Institute KFSS: Kurtzke Functional Systems Scores LASSO: least absolute shrinkage and selection operator LGBM: light gradient‐boosting machine logMAR: logarithm of the minimum angle of resolution LOO: leave‐one‐out LOOCV: leave‐one‐out cross‐validation LR: logistic regression LSTM: long short‐term memory MAGNIMS: Magnetic Resonance Imaging in MS MEP (also seen as “mEPS”): motor evoked potentials mEPS: motor evoked potentials MF: motor function MFIS: Modified Fatigue Impact Scale ML: machine learning MNI: Montreal Neurological Institute MPI: multifactorial prognostic index MR: magnetic resonance MRI: magnetic resonance imaging MS: multiple sclerosis MSBASIS: MSBase Incident Study MS‐DSS: MS disease severity scale MSE: mean squared error MSFC: multiple sclerosis functional composite MSPS: multiple sclerosis prediction score MSSS: MS severity score MT/MTR: magnetisation transfer imaging NA: not applicable NDH‐9HPT: non‐dominant hand 9‐hole peg test NEMO: network modification tool NF‐L: neurofilament light chain level NHPT: nine‐hole peg test NMO: neuromyelitis optica NMOSD: neuromyelitis optica spectrum disorder NN: neural network NPV: negative predictive value NR: not reported NR2Y: number of relapses experienced in the first 2 years after MS onset O:E: observed to expected ratio OB: oligoclonal bands OCB: oligoclonal bands OCT: optical coherence tomography OFSEP: Observatoire Français de la Sclérose en Plaques ON: optic neuritis OND: other neurologic disease OR: odds ratio PASAT: Paced Auditory Serial Addition Test PBMC: peripheral blood mononuclear cells PD: patient‐determined PDCD2: human programmed cell death‐2 gene Pdw: proton density‐weighted PP: primary progressive PPMS: primary progressive MS PPV: positive predictive value PRIMS: pregnancy in MS PRO: patient‐reported outcome PSIR: phase‐shifted inversion recovery QOL: quality of life RCT: randomised controlled trial RF: random forest RH: relapse history RNA: ribonucleic acid RNRL: retinal nerve fibre layer ROC: receiver operating characteristic ROI: region of interest RR: relapsing‐remitting RRMS: relapsing‐remitting multiple sclerosis RT‐PCR: reverse transcription polymerase chain reaction SCL: spinal CL SD: standard deviation SDMT: symbol digits modality test SE: standard error SF‐36: 36‐Item Short Form Survey SMS: severe multiple sclerosis SMSreg: Swedish MS registry SNP: single nucleotide polymorphism SNRS: Scripps neurological rating scale SP: secondary progression SPMS: secondary progressive multiple sclerosis SUMMIT: Serially Unified Multicenter Multiple Sclerosis Investigation SVM: support vector machine T1c: T1‐weighted pre‐contrast T1p: T1 weighted post‐contrast T2LV (also seen as “T2 LV”): T2 lesion volume T2w: T2‐weighted TT2R: time to second relapse TWT: timed walk test Val: validation VFT: visual function test WBC: white blood cell WM: white matter XGB: extreme gradient boosting

Characteristics of excluded studies [ordered by study ID]

Study	Reason for exclusion
Achiron 2006	Ineligible model: there is no multivariable prognostic prediction model in this study. Rather, there is the description of EDSS evolution in a cohort and its comparison with new patients.
Ahlbrecht 2016	Ineligible study type: the objective of this study is to assess associations between microRNAs detected in the CSF and conversion from CIS to RRMS. No model is developed or validated for prognostic prediction.
Andersen 2015	Ineligible study type: the aim of this study is not to create a prognostic prediction model but to describe the natural history of the disease.
Azevedo 2019	Ineligible study type: the aim of this conference abstract is to identify minimum clinically meaningful differences in brain atrophy rather than using multivariable models for prediction of future outcomes in individuals.
Barkhof 1997	Ineligible model: this is a count‐score study. The multivariable logistic regression is used for predictor selection. This is followed by counting the abnormal variables and forming a univariable logistic regression with the count to derive predicted risk.
Brettschneider 2006	Ineligible model: this study aims to assess whether cerebrospinal fluid biomarkers can improve diagnostic criteria for prediction of conversion from CIS to CDMS. However, no statistically developed multivariable models are used for predicting future conversion in individual patients; rather, diagnostic criteria are combined.
Bsteh 2021	Ineligible model: the models aim to predict outcomes after treatment withdrawal, which can be considered treatment response.
Castellaro 2015	Ineligible study type: based on the presented aim, results, and conclusion, the aim of this conference abstract is to show that specific brain measures are predictive of conversion.
Chalkou 2021	Ineligible model: the objective of this study is treatment effect prediction, and the prognostic model is only a step towards that goal.
Costa 2017	Ineligible study type: this poster presents a prognostic factor study that aims to investigate the prognostic role of different biomarkers.
Cutter 2014	Ineligible study type: the objective of this poster presentation is not prognostic prediction but an indirect comparison of different treatment regimens.
Damasceno 2019	Ineligible study type: the aim of this study is to analyse cognitive trajectories using longitudinal models. Hence, there is no prediction of outcomes in individuals.
Daumer 2007	Ineligible model: there is no prediction in this study. A matching algorithm based on similarity is used, which is followed by a description of the data.
Dekker 2019	Ineligible study type: the objective of this study is to show the predictive value of brain measures. The multivariable models fit in the study are not used for predictions and are not interpreted as prognostic models in the discussion section.
Esposito 2011	Ineligible outcome: the outcome, classification of lesions as normal or abnormal, is not a clinical outcome.
Filippi 2010	Ineligible study type: the objective of this conference abstract is to develop diagnostic criteria for MS.
Filippi 2013	Ineligible study type: the objective of this study is to identify MRI predictors. Although random forests are used, it is to assess the importance of predictors for future outcomes.
Fuchs 2021	Ineligible study type: the aim of this study is to compare the use of imaging features extracted from routine clinical data with modified methods to those collected according to research standards.
Gasperini 2021	Ineligible model: the developed score is not statistically derived, and it is unclear whether the aim of the study is the prediction of treatment responses.
Gomez‐Gonzalez 2010	Ineligible study type: the aim of this study is to demonstrate the use of an automated tool for oligoclonal band analysis and to show that the information extracted relates to patient subgroups.
Hakansson 2017	Ineligible study type: the objective is to search for prognostic markers in CSF, and there is no individual‐level prediction with a multivariable model.
Ho 2013	Ineligible population: this study aims to predict the risk of MS diagnosis in the general population, not the risk of future MS outcomes.
Ignatova 2018	Ineligible study type: the aim of this study is to find predictors of progression.
Invernizzi 2011	Ineligible model: this study investigates the prognostic value of an evoked potentials score, which is not a multivariable model for prognostic prediction.
Jackson 2020	Ineligible timing: the outcome in this study is based on cross‐sectionally collected data, precluding prognostic prediction.
Kalincik 2013	Ineligible study type: the stated aim of this study is to evaluate associations between genetic susceptibility markers and MS phenotypes.
Leocani 2017	Ineligible study type: the objective is of this conference abstract is to demonstrate the prognostic value of evoked potentials, not prognostic prediction.
Morelli 2020	Ineligible study type: the objective of this study is to show the predictive value of putamen hypertrophy with cognitive impairment instead of prognostic prediction.
Palace 2013	Ineligible study type: the objective of this study is to assess cost‐effectiveness.
Pappalardo 2020	Ineligible outcome: the study looks at endpoints that are unrelated to relapse, disability, or conversion to a more advanced disease stage.
Petrou 2018	Ineligible study type: the aim of this conference abstract is to assess correlations between biomarkers and clinical outcomes. Also, the study contains no multivariable prognostic model for the prediction of future outcomes but rather assesses a non‐statistical combination of two biomarkers.
Preziosa 2015	Ineligible study type: this study aims to show the value of MRI measures and uses a multivariable model to this end.
Rajda 2019	Ineligible population: at the moment of prognostication, the people included are those yet to be diagnosed, and the outcome is differentiation of people with MS vs controls.
Rio 2019	Ineligible model: this conference presentation compares different treatment response scores and a count score, which is not a multivariable model, with the stated intention of treatment response prediction.
Rodriguez 2012	Ineligible study type: the aim is to apply a novel model to an MS dataset. The focus is not clinical prediction but demonstration of the methodology.
Rothman 2016	Ineligible study type: in this study, multivariable models are used to assess the association between retinal measurements, visual function, and future disease disability rather than predicting individual outcomes.
Roura 2018	Ineligible timing: this study aims to evaluate the longitudinal changes in brain fractal geometry and its association with disability worsening. Correspondence with the authors has confirmed that the models are not predicting future outcomes but current states.
Sbardella 2011	Ineligible study type: the objective of this study is to demonstrate the predictive value of diffuse brain damage as opposed to prognostic prediction.
Schlaeger 2012	Ineligible study type: the objective of this study is to demonstrate the predictive value of evoked potentials.
Srinivasan 2020	Ineligible outcome: in this abstract, the presented outcomes (QoL, fatigue, depression, falls) are not related to clinical disability with respect to the definition we are using in our review.
Tintore 2015	Ineligible model: this conference presentation contains no prediction but rather categorisation into different groups by analysis of time‐to‐event data and description of these groups' characteristics.
Tomassini 2019	Ineligible model: this is a count‐score. In this study, Cox regression is used to select predictors that are later counted to give a discrete score. This score is used in a univariate model as a factor to report risk stratification.
Tossberg 2013	Ineligible model: the model in this study is not used for prognostic prediction but for diagnostic purposes. The developed score for diagnostic purposes is used for prognostic prediction only in those that convert to MS.
Uher 2017a	Ineligible model: in this study, multivariable models are used to select adjusted predictors, followed by counting the positive predictors to create a score.
Uher 2017b	Ineligible study type: the purpose is not prognostic prediction but to demonstrate the concurrent predictive value of MRI measures on cognitive impairment.
Veloso 2014	Ineligible model: this publication presents a simulation interface based on previously published studies, most relevant to our review being BREMS, but does not perform any new prediction and only describes or reports correlations for the included study participants.
Vukusic 2006	Ineligible study type: unrelated review
Wahid 2019	Ineligible model: the two models in this conference abstract are not longitudinal in nature, and the only longitudinal model is presented as a treatment response prediction tool.
Zephir 2009	Ineligible study type: the objective of this study is to demonstrate the usefulness of IgG as a biomarker of pathology.
Ziemssen 2019	Ineligible outcome: in this poster presentation, the objective is differentiating between the relapsing and progressing diagnoses instead of prognostic prediction.

Open in a new tab

BREMS: Bayesian Risk Estimate for Multiple Sclerosis CDMS: clinically definite multiple sclerosis CIS: clinically isolated syndrome CSF: cerebrospinal fluid EDSS: Expanded Disability Status Scale IgG: immunoglobulin G MRI: magnetic resonance imaging MS: multiple sclerosis QoL: quality of life RNA: ribonucleic acid RRMS: relapsing–remitting multiple sclerosis

Characteristics of studies awaiting classification [ordered by study ID]

Achiron 2007.

General information	Reason for awaiting classification It is unclear whether the study design is longitudinal or whether the sampling is performed at the same time as the outcome assessment. Model name Not reported Primary source Journal Data source Cohort, primary Study type Development
Participants	Inclusion criteria RRMS Participants with good outcome (no deterioration in neurological disability, no relapse during 2‐year follow‐up) or poor outcome (EDSS score change ≥ 0.5) Exclusion criteria Not reported Recruitment Israel Age (years) Mean 43 Sex (%F) 69.8 Disease duration (years) Mean 10.5 (pooled SD 2.4) Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment 100% on interferon beta‐1a Disease description (For the source population including all outcomes) EDSS (unclear if mean and SD): development 2.0 (1.0), validation 2.5 (0.2); mean annualised relapse rate: 1.1 Recruitment period Not reported
Predictors	Considered predictors PBMC RNA microarray analysis of gene transcripts
Outcome	Outcome definition Composite (includes relapse and scores (EDSS)): good outcome defined as no deterioration in neurological disability and no relapse, poor outcome as EDSS score change ≥ 0.5 that needed to be confirmed at 3 months during 2‐year follow‐up Timing of outcome measurement Unclear, follow‐up for 2 years
Analysis	Number of participants (number of events) 56 (unclear how many events in the validation set, ≥ 9) Modelling method Support vector machine Performance evaluation dataset Development Performance evaluation method Random split Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Classification rate = 88.9% Predictors in the model 34 gene transcripts from the following 29 genes: ADD1, CA11, CCL17, CD44, COL11A2, CRYGD, DNM1, DR1, GNMT, GPP3, GSTA1, HAB1, HSPA8, IGLJ3, IGLVJ, IL3RA, KIAA0980, KLF4, KLK1, MUC4, NY‐REN‐24, ODZ2, PTN, RRN3, S100B, TCRBV, TOP3B, TPSB2, VEGFB
Interpretation	Aim of the study To evaluate whether gene expression profiling can differentiate RRMS patients according to their clinical course – either favourable or poor
Notes	—

Open in a new tab

Behling 2019.

General information	Reason for awaiting classification Conference abstract Model name Not reported Primary source Abstract Data source Mixed (routine care, claims), secondary Study type Development
Participants	Inclusion criteria MS patients treated with a DMT prior to 31 December 2017 (index date) No evidence of relapse in the 30 days prior to the index Exclusion criteria Not reported Recruitment Patients from a variety of provider practice types across the USA included in the OM1 Data cloud, USA Age (years) Median 54 Sex (%F) Not reported Disease duration (years) Not reported Diagnosis Not reported Diagnostic criteria Not reported Treatment At recruitment: unclear During follow‐up: 100% on DMT Disease description Not reported Recruitment period 2015 to 2018
Predictors	Considered predictors Probably not the complete list, most significant predictors: the number of relapses in the previous 12 months, antiemetic medication use, skeletal muscle relaxants, MS‐related fatigue symptoms
Outcome	Outcome definition Relapse: MS‐related inpatient stay, emergency room visit, or outpatient visit with documented MS and a corticosteroid prescription within 7 days Timing of outcome measurement Within 6 months after index
Analysis	Number of participants (number of events) 18,137 (unclear, calculated from reported event rate, 1415) Modelling method Random forest Performance evaluation dataset Development Performance evaluation method Random split, 80% training, 20% test Calibration Not reported Discrimination estimate c‐Statistic > 0.70 Classification estimate Cutoff determined from data, PPV = 0.203, 1‐NPV = 0.058; unclear whether other reported measure (0.84) is accuracy or sensitivity Predictors in the model Probably not the complete list, most significant predictors: the number of relapses in the previous 12 months, antiemetic medication use, skeletal muscle relaxants, MS‐related fatigue symptoms
Interpretation	Aim of the study To use advanced analytics to predict relapses amongst MS patients treated with DMTs identified from a large, representative database of linked EMR and claims data
Notes	—

Open in a new tab

Castellazzi 2019.

General information	Reason for awaiting classification The age range of the included patients is not reported, and it is unclear whether the objective is the development of a diagnostic or prognostic model. Model name Not reported Primary source Poster Data source Not reported Study type Development
Participants	Inclusion criteria RRMS patients and healthy controls for developing the classifier and CIS patients for applying Exclusion criteria Not reported Recruitment Not reported Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 39.6% CIS, 30.2% RRMS, and 30.2% healthy controls Diagnostic criteria McDonald (undefined) Treatment Not reported Disease description EDSS (unclear if mean and SD): CIS 1.3 (0.8), RRMS 1.7 (1.2) Recruitment period Not reported
Predictors	Considered predictors Thresholded and processed cross‐correlation matrix of mean rs‐fMRI signals of parcellated preprocessed rs‐fMRI images
Outcome	Outcome definition Conversion to definite MS (McDonald, undefined): RRMS Timing of outcome measurement 12 months
Analysis	Number of participants (number of events) 106 (unclear how many events in the prediction group of CIS patients, ≥ 32) Modelling method Multiple models Support vector machine Performance evaluation dataset External validation Performance evaluation method Model developed to differentiate healthy controls from RRMS is used to predict RRMS conversion in CIS patients Classification estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 69% (SVM), 56% (ANFIS) Predictors in the model 10 features from 10 distinct AAL areas, including cuneus, pallidum, calcarine, fusiform, cbl‐6/7b/8, supp motor area, sup/mid occip gyri
Interpretation	Aim of the study To predict the conversion to RRMS in participants with CIS
Notes	—

Open in a new tab

Chaar 2019.

General information	Reason for awaiting classification The time points used in the model are not reported, and it is unclear whether the design was longitudinal in nature. Also, the age range of included patients is not described. Model name Not reported Primary source Abstract Data source Cohort, primary Study type Development
Participants	Inclusion criteria Patients on fingolimod Exclusion criteria Not reported Recruitment Not reported Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis Not reported Diagnostic criteria Not reported Treatment 100% on fingolimod Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Unclear features extracted from magnetic resonance imaging, magnetic resonance spectroscopy, magnetisation transfer ratio, diffusion tensor imaging, and optical coherence tomography
Outcome	Outcome definition Disability (EDSS) Timing of outcome measurement Unclear, 3 time points at 1‐year intervals
Analysis	Number of participants (number of events) Unclear unit of analysis, 50 participants, 135 time points (not reported) Modelling method Neural network, single hidden‐layered feed‐forward ANN with Bayesian regularisation Performance evaluation dataset Development Performance evaluation method Random split, 85% training, 15% test Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Mean squared error = 1.213, accuracy = 77.9% Predictors in the model Unclear, possibly non‐tabular data from MRI, MRS, MTR, DTI, and OCT
Interpretation	Aim of the study To predict the clinical disability based on multiple important imaging biomarkers
Notes	—

Open in a new tab

Dalla Costa 2014.

General information	Reason for awaiting classification It is unclear whether all the predictors are used to predict an outcome in the future or concurrent to the predictor measurement. Model name Not reported Primary source Abstract Data source Not reported Study type Development
Participants	Inclusion criteria Admittance within 3 months from the onset of a CIS Exclusion criteria Not reported Recruitment Patients admitted to the San Raffaele Hospital, Neurological Department, Italy Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Not reported Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Unclear features from clinical data as well as MRI, multimodal EP, and CSF data
Outcome	Outcome definition Conversion to definite MS Timing of outcome measurement Unclear, follow‐up mean 6.82 (SD 2.78)
Analysis	Number of participants (number of events) 227 (120) Modelling method Neural network, multilayer perceptron with a back propagation algorithm Performance evaluation dataset Development Performance evaluation method Random split, 80% training, 20% validation Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 87% Predictors in the model Clinical, MRI, CSF, and EP data
Interpretation	Aim of the study To develop an ANN‐based diagnostic model integrating both clinical and paraclinical baseline data
Notes	—

Open in a new tab

Ghosh 2009.

General information	Reason for awaiting classification The age range of included patients is not reported. It is not clear whether individual prediction occurred. Also, the multivariable nature of the model cannot be determined from the limited information. Model name Not reported Primary source Abstract Data source Randomised trial participants, secondary Study type Development
Participants	Inclusion criteria RRMS Complete information on MRI, on‐study relapses, and baseline covariates Exclusion criteria Not reported Recruitment Ian McDonald database Age (years) Mean 27.9 (at onset) Sex (%F) Not reported Disease duration (years) Mean 7.5 Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment Not reported Disease description EDSS mean 3 Recruitment period Not reported
Predictors	Considered predictors Number of Gd‐enhancing lesions, T2 lesion volume
Outcome	Outcome definition Relapse Timing of outcome measurement Unclear, follow‐up ≤ 129 weeks
Analysis	Number of participants (number of events) 108 (58) Modelling method Joint longitudinal model, 3 models connected via random effects, parameter estimates by Markov chain Monte Carlo Performance evaluation dataset Not reported Performance evaluation method Not reported Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Predictors in the model Not reported
Interpretation	Aim of the study To establish a model that allows the prediction of occurrence of relapses by including longitudinal information on the number of Gd‐enhancing lesions and T2 lesion volume simultaneously
Notes	—

Open in a new tab

Kister 2015.

General information	Reason for awaiting classification Conference abstract Model name Not reported Primary source Poster Data source Secondary Dev: registry Val: routine care Study type Development and validation (unclear whether predictions adapted), location
Participants	Inclusion criteria Dev: Age ≥ 19 years (unclear) Diagnosis of MS Completed disability self‐assessment at enrolment and at 2 years and 5 years of follow‐up Val: MS patients 2 or more PDDS scores that were more than 6 months apart unless a relapse within 3 months of clinic visit Exclusion criteria Not reported Recruitment Dev: NARCOMS Registry USA, Canada (unclear) Val: Consecutive patients at 2 outpatient MS centres in the greater New York area USA Age (years) Dev: median 47.1 Val: mean 45.5 Sex (%F) Dev: 79.8 Val: 73.7 Disease duration (years) Dev: not reported Val: mean 12 (SD 8.7) Diagnosis Dev: not reported Val: 80.54% RRMS, 10.65% SPMS, 4.09% PPMS, 0.75% PRMS, 3.97% other Diagnostic criteria Not reported Treatment Dev: not reported Val: 80.4% on DMT Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Gender, age, baseline P‐MSSS
Outcome	Outcome definition Dev: disability (P‐MSSS) – aggressive disease defined as worse disability than in 5/6 patients with the same disease duration (P‐MSSS > 0.83); P‐MSSS is PDDS rank‐adjusted by disease duration Val: disability (P‐MSSS) – severe MS defined as P‐MSSS > 0.83 Timing of outcome measurement Dev: follow‐up for 2 years Val: follow‐up mean (range): 10.6 months (6 months to 17 months)
Analysis	Number of participants (number of events) Dev: 2364 (not reported) Val: 930 (80) Modelling method Logistic regression Performance evaluation dataset Dev: development Val: external validation Performance evaluation method Dev: apparent Val: unclear whether development data are also used for validation Calibration estimate Dev: calibration slope = 0.992, calibration intercept = −0.008 Val: not reported Discrimination estimate Dev: c‐statistic = 0.925 Val: not reported Classification estimate Dev: cutoff (0.296) chosen to give correct prediction (PPV) of severe MS in 50% of patients, sensitivity = 0.77, specificity = 0.90 Val: sensitivity = 0.90, specificity = 0.91, PPV = 0.49, NPV = 0.99 Predictors in the model Gender, age, P‐MSSS
Interpretation	Aim of the study Dev: to develop and internally validate a logistic regression model that uses patients' gender, age, and baseline P‐MSSS as predictor variables to estimate the probability of aggressive MS 2 years later Val: to determine short‐term stability of P‐MSSS in MS clinic patients and to explore the utility of the newly developed P‐MSSS‐based risk calculator for this population
Notes	Auxiliary references Charlson R, Herbert J, Kister I. >CME/CNE article: severity grading in multiple sclerosis: a proposal. Int J MS Care 2016;18(5):265‐70. Kister I, Chamot E, Salter AR, Cutter GR, Bacon TE, Herbert J. Disability in multiple sclerosis: a reference for patients and clinicians. Neurology 2013;80(11):1018‐24. Kister I, Bacon TE, Cutter GR. Short‐term disability progression in two multiethnic multiple sclerosis centers in the treatment era. Ther Adv Neurol Disord 2018;11:1756286418793613. Kister I, Kantarci OH. Multiple sclerosis severity score: concept and applications. Mult Scler 2020;26(5):548‐53. Learmonth YC, Motl RW, Sandroff BM, Pula JH, Cadavid D. Validation of patient determined disease steps (PDDS) scale scores in persons with multiple sclerosis. BMC Neurol 2013;13:37.

Open in a new tab

Mallucci 2019.

General information	Reason for awaiting classification Conference abstract Model name Not reported Primary source Abstract Data source Not reported Study type Development
Participants	Inclusion criteria CIS patients Exclusion criteria Not reported Recruitment Not reported Age (years) Median 32.3 Sex (%F) 65.6 Disease duration (years) Unclear, upper limit 1 year Diagnosis 100% CIS Diagnostic criteria Not reported Treatment At recruitment: 29.4% on DMT During follow‐up: not reported Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Not reported
Outcome	Outcome definition Composite (includes symptoms, disability): no evidence of disease activity (NEDA3) status in which NEDA3 maintenance is defined by no relapses, no disability progression, and no MRI activity Timing of outcome measurement Unclear, 12 months
Analysis	Number of participants (number of events) 279 (not reported) Modelling method Logistic regression, Bayesian Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.83 Classification estimate Accuracy = 0.77 (95% CI 0.70 to 0.83) Sensitivity = 0.69, specificity = 0.82 Predictors in the model Age, onset with optic neuritis, abnormal upper sensory EPs, abnormal visual EPs, therapy with DMD
Interpretation	Aim of the study To define a prognostic model for the early forecast of losing NEDA3 status (no relapses, no disability progression, no MRI activity) in CIS patients within 12 months from disease onset
Notes	—

Open in a new tab

Medin 2016.

General information	Reason for awaiting classification Conference abstract Model name Composite Primary source Poster Data source Routine care, secondary Study type Development
Participants	Inclusion criteria Confirmed diagnosis of RRMS At least 12 months of follow‐up data post‐index date Non‐missing baseline EDSS score Receiving BRACE (interferons, glatiramer acetate) therapy Subgroup analyses were performed for each possible combination of therapy (BRACE cont., BRACE to BRACE, BRACE to first line, BRACE to second line), with inclusion/exclusion as appropriate Exclusion criteria Not reported Recruitment Neuro Trans Data, a group of neurology practices, Germany Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment At recruitment, 100% on interferon or glatiramer acetate During follow‐up, 100% on DMT Disease description Not reported Recruitment period 2010 to 2015
Predictors	Considered predictors Unclear whether it is the complete list, demographics (born in Central Europe, aged < 30 years at index date, aged ≥ 30 years and < 40 years at index date), diagnostic history, treatment (fingolimod was available at the index date, teriflunomide was available at the index date), disability status (EDSS score of 0 earlier than 360 days prior to index date), disability history (at least one relapse in the 180 days to 360 days prior to index date, at least one relapse in the 360 days to 720 days prior to index date), cranial and spinal lesion count
Outcome	Outcome definition Relapse: relapse defined as binary over the 12‐month follow‐up period defined as patient‐reported or objectively observed events typical of an acute inflammatory demyelinating event in the central nervous system, current or historical, with duration of at least 24 hours, in the absence of fever or infection Timing of outcome measurement 12 months; the period is randomly chosen
Analysis	Number of participants (number of events) 4129 (751 or 752, calculated from reported event rate) Modelling method Logistic regression, elastic net Performance evaluation dataset Development Performance evaluation method Cross‐validation, k‐fold Calibration estimate Quintiles of predicted probability of relapse vs actual relapse rate Discrimination estimate c‐Statistic = 0.69 (95% CI 0.67 to 0.71) Classification estimate Not reported Predictors in the model Whether the patient experienced at least 1 relapse in the 180 days to 360 days prior to index date, whether the patient was aged < 30 years at index date, whether the patient experienced at least one relapse in the 360 days to 720 days prior to index date, whether the patient was aged ≥ 30 years and < 40 years at index date, whether the patient was born in Central Europe, whether Gilenya was available at the index date, whether Aubagio was available at the index date, whether the patient has an EDSS score of 0 earlier than 360 days prior to index date
Interpretation	Aim of the study To predict disease activity for patients with RRMS using EMR
Notes	—

Open in a new tab

Pareto 2017.

General information	Reason for awaiting classification Conference abstract Model name Converter and nonconverter Primary source Abstract Data source Not reported Study type Development
Participants	Inclusion criteria Not reported Exclusion criteria Not reported Recruitment Consecutively Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Not reported Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors PRoNTo‐based imaging parameters from segmented grey matter masks
Outcome	Outcome definition Conversion to definite MS (McDonald 2010 (Polman 2011)): either MRI or clinical demonstration of dissemination in space and time Timing of outcome measurement Follow‐up for 3 years
Analysis	Number of participants (number of events) 90 (45) Modelling method Support vector machine Performance evaluation dataset Development Performance evaluation method Cross‐validation, LOOCV Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Sensitivity (converters) = 0.65 Specificity (nonconverters) = 0.63 Predictive values = 0.65 (converters) and 0.64 (nonconverters) Predictors in the model PRoNTo‐based imaging parameters from segmented grey matter masks
Interpretation	Aim of the study To test whether 3D‐T1‐weighted structural images in conjunction with the pattern recognition tool PRoNTo could differentiate between CIS patients that converted and CIS patients that did not convert to MS
Notes	Auxiliary references Schrouff J, Rosa MJ, Rondina JM, Marquand AF, Chu C, Ashburner J, et al. PRoNTo: pattern recognition for neuroimaging toolbox. Neuroinformatics 2013;11(3):319‐37.

Open in a new tab

Sharmin 2020.

General information	Reason for awaiting classification Conference abstract Model name Not reported Primary source Presentation Data source Registry, secondary Study type Development
Participants	Inclusion criteria Not reported Exclusion criteria Not MS < 4 visits record for a patient Patient from centre with < 10 patients Patients with missing data at follow‐up Recruitment MSBase registry Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis Unclear, 16.99% CIS, 66.63% RRMS, 7.46% SPMS, 7.34% PPMS, 1.57% PRMS Diagnostic criteria Mixed: McDonald 2005 (Polman 2005), McDonald 2010 (Polman 2011) Treatment Not reported Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Age (years); sex (female ref); MS course (CIS ref, RR, SP, PP, PR); disease duration (years); EDSS (0 to 5.5 ref, 6+); change in EDSS; recency of relapse (> 2 months ref, < 1 month, 1 month to 2 months); number of affected FSSs; separate predictor for worsening in each of pyramidal, cerebellar, brainstem, sensory, bowel‐bladder, visual, and cerebral systems; 2‐way interaction between disease duration and each of FSS worsening predictors; annualised visit density
Outcome	Outcome definition Disability (unclear): risk of 6‐month confirmed disability progression event being sustained over the long term Timing of outcome measurement Median (IQR): 9.48 years (6.02 years to 13.32 years)
Analysis	Number of participants (number of events) 14,802, unit of analysis is event of 8741 participants (not reported) Modelling method Survival (Cox) Performance evaluation dataset Development Performance evaluation method Random split Calibration estimate Not reported Discrimination estimate Harrell's c‐statistic = 0.89 Classification estimate Not reported Predictors in the model Age, male, primary progressive, relapsing‐remitting, relapse in previous month, EDSS ≥ 6, EDSS change, number of affected FSSs, worsening pyramidal FSS, worsening in cerebellar FSS, worsening in brainstem FSS, worsening in sensory FSS, worsening in visual FSS, worsening in cerebral FSS, worsening in pyramidal FSS: disease duration, worsening in sensory FSS: disease duration, worsening in cerebral FSS: disease duration, (other: annualised visit density)
Interpretation	Aim of the study To identify those 6‐month confirmed disability progression events that are more likely to represent a long‐term disability worsening
Notes	Auxiliary references Giovannoni G, Comi G, Cook S, Rammohan K, Rieckmann P, Soelberg Sørensen P, et al. A placebo‐controlled trial of oral cladribine for relapsing multiple sclerosis. N Engl J Med 2010;362(5):416‐26. NCT00641537. CLARITY extension study. https://ClinicalTrials.gov/show/NCT00641537 (first received 28 March 2008).

Open in a new tab

Silva 2017.

General information	Reason for awaiting classification Conference abstract Model name MS‐COT Relapse LASSO Relapse stepwise Disability Primary source Poster Data source Randomised trial participants Study type Development
Participants	Inclusion criteria Age between 18 years and 55 years Diagnosed with RRMS One or more confirmed relapses during the preceding year (or 2 or more confirmed relapses during the previous 2 years) EDSS score of 0 to 5.5 No relapse or steroid treatment within 30 days before randomisation Interferon β or glatiramer acetate therapy stopped at least 3 months before randomisation Exclusion criteria Active infection Macular oedema Diabetes mellitus Immune suppression (drug‐ or disease‐induced) or clinically significant systemic disease Recruitment Participants in the FREEDOMS II, an RCT, from 117 academic and tertiary referral centres in 8 participating countries Unclear subset: Australia, Austria, Belgium, Canada, Czech Republic, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Israel, Netherlands, Poland, Romania, Russia, Slovakia, South Africa, Sweden, Switzerland, Turkey, United Kingdom, United States Age (years) Mean 38.7 Sex (%F) 73.6 Disease duration (years) 9.3 (range 0 to 37) Diagnosis 100% RRMS Diagnostic criteria McDonald 2005 (Polman 2005) Treatment At recruitment, not applicable During follow‐up, 67.2% on fingolimod Disease description EDSS mean 2.4, previous year number of relapses 1.5 Recruitment period 2006 to 2011
Predictors	Considered predictors Treatment (fingolimod or placebo), T1 hypointense volume, gender, T2 lesion volume rate, age, NBV, number of relapses in the last year, individualised NBV, number of relapses in the last 2 years, number of Gd+ T1 lesions, EDSS, T2 lesion volume, duration of MS since the first symptom, total number of relapses since the first diagnosis, number of previous DMTs, progression index
Outcome	Outcome definition Relapse LASSO and relapse stepwise Relapse: relapse verified by the examining neurologist within 7 days after the onset of symptoms, the symptoms had to be accompanied by an increase of at least half a point in the EDSS score, of 1 point in each of 2 EDSS functional‐system scores, or of 2 points in 1 EDSS functional‐system score, excluding scores for the bowel–bladder or cerebral functional systems Disability Disability (EDSS): 3‐/6‐month confirmed disability progression defined as an increase of 1 point in the EDSS score (or half a point if the baseline EDSS score was equal to 5.5), confirmed after 3/6 months, with an absence of relapse at the time of assessment and with all EDSS scores measured during that time meeting the criteria for disability progression Timing of outcome measurement Unclear which of the models is reported, at 1 year or at 2 years
Analysis	Number of participants (number of events) Relapse LASSO and relapse stepwise 2355 (unclear, 831) Disability 2355 (unclear, 3 months confirmed 521, 6 months confirmed 343) Modelling method Relapse LASSO Logistic regression (LASSO) Relapse stepwise Logistic regression Disability Generalised additive model (binary, nonlinear) Performance evaluation dataset Development Performance evaluation method Random split with CV within training for predictor ranking Calibration Not reported Discrimination estimate c‐Statistic Relapse LASSO: 0.66 Relapse stepwise and disability: 0.67 Classification estimate Not reported Predictors in the model Not reported
Interpretation	Aim of the study To develop an educational predictor tool based on machine learning techniques to help physicians identify clinical and imaging parameters that influence and contribute to long‐term outcomes in patients with RMS
Notes	Auxiliary references Calabresi PA, Radue EW, Goodin D, Jeffery D, Rammohan KW, Reder AT, et al. Safety and efficacy of fingolimod in patients with relapsing‐remitting multiple sclerosis (FREEDOMS II): a double‐blind, randomised, placebo‐controlled, phase 3 trial. Lancet Neurol 2014;13(6):545‐56. Kappos L, Radue EW, O'Connor P, Polman C, Hohlfeld R, Calabresi P, et al. A placebo‐controlled trial of oral fingolimod in relapsing multiple sclerosis. N Engl J Med 2010;362(5):387‐401.

Open in a new tab

Tayyab 2020.

General information	Reason for awaiting classification Conference abstract Model name Not reported Primary source Presentation Data source Randomised trial participants, secondary Study type Development
Participants	Inclusion criteria Age between 18 years and 60 years Patients with onset of their first demyelinating symptoms within the previous 180 days A minimum of 2 lesions that were at least 3 mm in diameter on a T2‐weighted (T2w) screening brain MRI (one had to be ovoid, periventricular, or infratentorial) For participants over the age of 50, cerebrospinal fluid oligoclonal bands or spinal MRI changes typical of demyelination Exclusion criteria Better explanation for the event Previous event reasonably attributable to demyelination Meeting the 2005 McDonald criteria for MS (Polman 2005) Recruitment Participants in the placebo‐controlled randomised trial of minocycline, Canada Age (years) Mean 35.9 (onset) Sex (%F) 69.0 Disease duration (years) Median 0.23 (range 0.06 to 0.52) Diagnosis 100% CIS Diagnostic criteria McDonald 2005 (Polman 2005) Treatment At recruitment, not applicable During follow‐up, 50.7% on minocycline Disease description EDSS median (range): 1.5 (0 to 4.5) Recruitment period 2009 to 2013
Predictors	Considered predictors Unclear whether it is the complete list: individual DGM nuclei volumes, minocycline vs placebo, CIS type (monofocal vs multifocal), NBV, sex, EDSS, variable for each location of initial CIS event: cerebrum, optic nerve, cerebellum, brainstem, spinal cord, brain parenchymal fraction
Outcome	Outcome definition Composite (includes relapse): new disease activity (clinical or MRI) within 2 years of a first clinical demyelinating event, defined by the McDonald 2005 criteria (Polman 2005) for conversion to definite MS Timing of outcome measurement 2 years
Analysis	Number of participants (number of events) 140 (60) Modelling method Random forest Performance evaluation dataset Development Performance evaluation method Cross‐validation, 3‐fold Calibration Not reported Discrimination estimate c‐Statistic = 0.76 Classification estimate Accuracy = 0.821, sensitivity = 0.81, PPV = 0.87, F1 = 0.84 Predictors in the model DGM volumes
Interpretation	Aim of the study To develop a machine learning model for predicting new disease activity (clinical or MRI) within 2 years of a first clinical demyelinating event, using baseline DGM volumes
Notes	Auxiliary references Metz LM, Li DKB, Traboulsee AL, Duquette P, Eliasziw M, Cerchiaro G, et al. Trial of minocycline in a clinically isolated syndrome of multiple sclerosis. N Engl J Med 2017;376(22):2122‐33.

Open in a new tab

Thiele 2009.

General information	Reason for awaiting classification Conference abstract Model name Model‐based approach Primary source Abstract Data source Registry, secondary Study type Development
Participants	Inclusion criteria RRMS Exclusion criteria Not reported Recruitment Danish MS register, Denmark Age (years) Mean 32.3 (at onset) Sex (%F) 69 Disease duration (years) Mean 5.6 (range 0 to 30) Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment Not reported Disease description EDSS mean (range): 2.63 (0 to 7.5), 24 months pre‐study attacks mean (range): 2.64 (1 to 10) Recruitment period 1997 to 2001
Predictors	Considered predictors Sex, age at onset, disease duration, number of attacks in the 24 months prior to study, and baseline EDSS
Outcome	Outcome definition Relapse: annualised relapse rates Timing of outcome measurement Not reported
Analysis	Number of participants (number of events) 1202 (continuous outcome) Modelling method Count data glm (quasi‐Poisson, negative‐binomial, zero‐inflated Poisson) Performance evaluation dataset Development Performance evaluation method Cross‐validation, LOOCV Calibration estimate Other: mean prediction error 0.53 to 0.54 Discrimination estimate Not applicable Classification estimate Not applicable Predictors in the model Sex, age at onset, disease duration, number of attacks in the 24 months prior to study, baseline EDSS
Interpretation	Aim of the study To compare the performance of a matching‐based approach to predict annualised relapse rates of MS patients versus statistical models
Notes	—

Open in a new tab

Tintoré 2015.

General information	Reason for awaiting classification It is unclear whether the study is longitudinal in nature. Model name Not reported Primary source Presentation Data source Cohort, primary Study type Development
Participants	Inclusion criteria Within 3 months of CIS < 50 years Exclusion criteria Not reported Recruitment Spain Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Not reported Disease description Not reported Recruitment period 1996 to 2014
Predictors	Considered predictors Unclear whether the list is complete: gender, age (40 years to 49 years, 30 years to 39 years, 20 years to 29 years, 0 years to 19 years), optic neuritis, number of T2 lesions (0, 1 to 3, 4 to 9, ≥ 10), DMT before second attack, DMT after second attack, topography, CSF: OB, 12‐month number of T2 lesions, 12‐month Gd+, treatment, relapse during first year
Outcome	Outcome definition Unclear, composite: conversion to definite MS, EDSS ≥ 3 Timing of outcome measurement Unclear, follow‐up every 12 months and 5 years
Analysis	Number of participants (number of events) 1059 (unclear, different numbers reported in abstract and presentation) Modelling method Multiple models: decision tree based on survival model (Cox) Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate Harrell's c‐statistic (for 12‐month model) CDMS 0.76, EDSS 0.75 Classification estimate Not reported Predictors in the model At baseline: number of T2 lesions, oligoclonal bands, optic neuritis, sex, age; at first year: number of new T2 lesions, onset of DMD during first year, relapse during the first year
Interpretation	Aim of the study To elaborate a dynamic model for predicting long‐term prognosis
Notes	—

Open in a new tab

Tommasin 2019.

General information	Reason for awaiting classification Conference abstract Model name Not reported Primary source Poster Data source Not reported Study type Development
Participants	Inclusion criteria Age between 18 years and 70 years Diagnosis of MS according to the McDonald criteria (2010 (Polman 2011), 2017 (Thompson 2018b)) Baseline clinical assessment and MRI examination not more than 1 month apart Clinical follow‐up available after 2 years to 6 years from the MRI examination Exclusion criteria Relapses in the last 3 months Contraindication to MRI Recruitment Not reported Age (years) Mean 38.3 Sex (%F) 76.2 Disease duration (years) Not reported Diagnosis 81% RRMS, 19% SPMS Diagnostic criteria Mixed: McDonald 2010, McDonald 2017 Treatment 30.5% first line, 29.5% second line, 40% none Disease description EDSS median 2.0 (range 0.0 to 7.5) Recruitment period Not reported
Predictors	Considered predictors 3D T1 images (slices of the sagittal, axial, coronal projections)
Outcome	Outcome definition Disability (EDSS): 5‐year disease progression defined as 1.5‐point increase for patients with a baseline EDSS of 0, 1 point for scores from 1.0 to 5.0, and 0.5 points for scores equal to or higher than 5.5; confirmed at 6 months Timing of outcome measurement 4 to 6 years
Analysis	Number of participants (number of events) 105 (36) Modelling method Convolutional neural network Performance evaluation dataset Development Performance evaluation method Random split, 90% training, 10% validation Calibration Not reported Discrimination estimate Not reported Classification estimate Cutoff = 0.5, sensitivity and specificity reported for unclear selections Predictors in the model 3D T1 images
Interpretation	Aim of the study To investigate the efficacy of deep learning models to accurately predict those patients who will have disability progression in the following 5 years and those who will be stable, based on 3D T1 MRI images acquired at 3T
Notes	Auxiliary references Rio J, Rovira A, Tintore M, Otero‐Romero S, Comabella M, Vidal‐Jordana A, et al. Disability progression markers over 6‐12 years in interferon‐beta‐treated multiple sclerosis patients. Mult Scler 2018;24(3):322‐30.

Open in a new tab

Wahid 2018.

General information	Reason for awaiting classification Conference abstract Model name Not reported Primary source Abstract Data source Randomised trial participants, secondary Study type Development
Participants	Inclusion criteria Age between 18 years and 60 years Available T1 and FLAIR baseline MRI scans and EDSS scores at 3 years RRMS diagnosis EDSS between 0 and 5.5 At least 2 exacerbations in the prior 3 years (one exacerbation may utilise the McDonald MRI criteria for dissemination in time) Exclusion criteria Not reported Recruitment Subset of participants in CombiRx RCT, USA Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Mixed: Poser 1983, McDonald (undefined) Treatment Unclear number of participants on interferon beta, glatiramer acetate, their combination Disease description Not reported Recruitment period Not reported
Predictors	Considered predictors Radiomics (shape, intensity, texture), age, sex, baseline EDSS, lesion volume
Outcome	Outcome definition Disability (EDSS): EDSS < 2 vs EDSS ≥ 2 Timing of outcome measurement 3 years
Analysis	Number of participants (number of events) 33 (not reported) Modelling method Gradient boosting Performance evaluation dataset Development Performance evaluation method Cross‐validation, repeated Calibration Not reported Discrimination estimate Not reported Classification estimate Accuracy = 0.867 (SD 0.024) Predictors in the model Radiomic shape, intensity, and texture measures
Interpretation	Aim of the study To evaluate the predictive performance of machine learning models constructed from MRI radiomic features at baseline to predict clinical outcomes at 3 years in RRMS
Notes	Auxiliary references Bhanushali MJ, Gustafson T, Powell S, Conwit RA, Wolinsky JS, Cutter GR, et al. Recruitment of participants to a multiple sclerosis trial: the CombiRx experience. Clinical Trials. 2014;11(2):159‐66. NCT00211887. Combination therapy in patients with relapsing‐remitting multiple sclerosis (MS)CombiRx. https://clinicaltrials.gov/ct2/show/NCT00211887(first received 21 September 2005).

Open in a new tab

3T: 3 Tesla AAL: automated anatomical labelling ANFIS: adaptive‐neuro‐fuzzy‐inference system ANN: artificial neural network BRACE: Betaseron (interferon beta‐1b), Rebif (interferon beta‐1a), Avonex (interferon beta‐1a), Copaxone (Glatiramer acetate), and Extavia (interferon beta‐1b) CDMS: clinically definite multiple sclerosis CIS: clinically isolated syndrome CSF: cerebrospinal fluid CV: cross‐validation DGM: deep grey matter DMD: disease‐modifying drug DMT: disease‐modifying therapy DTI: diffusion tensor imaging EDSS: Expanded Disability Status Scale EMR: electronic medical records EP: evoked potential FLAIR: fluid‐attenuated inversion recovery FSS: Functional Systems Score Gd: gadolinium IQR: interquartile range LASSO: least absolute shrinkage and selection operator LOOCV: leave‐one‐out cross‐validation MRI: magnetic resonance imaging MRS: magnetic resonance spectroscopy MS: multiple sclerosis MS‐COT: multiple sclerosis care optimisation tool MTR: magnetisation transfer ratio NARCOMS: North American Research Consortium on Multiple Sclerosis NBV: normalised brain volume NEDA3: no evidence of disease activity 3 NPV: negative predictive value OB: oligoclonal bands OCT: optical coherence tomography P‐MSSS: patient‐derived MS Severity Score PBMC: peripheral blood mononuclear cell PDDS: patient‐determined disease steps PPMS: primary progressive MS PPV: positive predictive value PRMS: progressive‐relapsing multiple sclerosis PRoNTo: Pattern Recognition for Neuroimaging Toolbox RCT: randomised controlled trial RNA: ribonucleic acid RRMS: relapsing–remitting multiple sclerosis rs‐fMRI: resting state functional magnetic resonance imaging SD: standard deviation SPMS: secondary progressive multiple sclerosis SVM: support vector machine

Differences between protocol and review

Objectives

We relocated the details on the investigation of sources of heterogeneity between studies from the Objectives to the Methods for conciseness and readability.

Criteria for considering studies for this review

In 'Types of studies', the eligibility criterion of aiming to develop or validate a prognostic model was already present at the protocol stage. We further operationalised its implementation in the review text. We also clarified that the statistical method used to develop the prognostic model was not a criterion for selection, but that studies on prognostic factors or treatment response prediction were excluded. The possible data sources for prognostic model studies and what is meant by validation was also defined in the review text rather than the protocol.
During the review, we came across eligible prognostic model validation studies of models whose development studies would not meet the eligibility criteria outlined in the protocol. In order to have the necessary details on these models, we added a new eligibility criterion to include studies that developed models which were validated in other eligible prognostic prediction studies.
In 'Targeted population', we clarified that we included prognostic model studies in people with MS regardless of the MS subtyping they reported. For transparency, we also reported that we considered an episode of optic neuritis as a clinically isolated syndrome and thus studies on people with this condition were eligible.
In 'Types of outcomes', we clarified that the data type of the outcome was not a criterion for selection. Also, what was considered to constitute one of the five outcomes (four clinical outcome categories plus their composite) as defined in the protocol was further detailed by giving no evidence of disease activity as an example to the composite outcome and clarifying that cognitive disability fit into one of those categories, but that fatigue, depression, or falls did not.

Search

As per the recently published PRISMA statement (Page 2021), we gave details on the platforms used to search the databases and the studies we used during the validation of the search.
Originally we had planned to perform backward citation tracking by handsearching the references of related studies. While using Web of Science for the forward search, we realised that it contains a similar functionality for the backward search. We decided to use this convenient functionality because it allowed not only deduplication but also simultaneous screening of titles and abstracts of the references, which would not have been the case during handsearching.

Selection of studies

We reported the details of how the pilot screening was conducted, which were absent in the protocol text.
During screening, we additionally searched the Internet for or contacted the authors of studies that could not be included or excluded with the reported information, including all conference abstracts.
At the protocol stage, we had not planned on how to proceed with eligible conference abstracts without any full‐text report. During the review, it became clear that the information contained in an abstract was not sufficient for selection or assessment of risk of bias. Hence, we decided to present the data extracted from the conference abstracts without a full‐text report in Characteristics of studies awaiting classification.
How we were going to screen non‐English abstracts (by using online translators) and full‐texts (with support from native speakers) was missing from the protocol and is clarified in the review text.
We reported the study selection based on the flow‐chart of the recently published updated PRISMA statement (Page 2021), rather than the PRISMA statement (Moher 2009), as proposed in the protocol.
For transparency, we elaborated on the details of how we operationalised and interpreted study eligibility criteria in a new subsection titled 'Details regarding selection of studies' of the review text.

Data extraction and management

During the review, we came across multiple reports from a single study that sometimes had conflicting information. Our prioritisation in such cases was not defined in the protocol but was in the review text.
Due to the range of studies we came across, there were minor changes to the extracted data items during the review, e.g. adding a tuning parameter item in order to collect important details related to models developed with machine learning (ML) or using the terms primary/secondary data use rather than prospective/retrospective due to the confusion on and misuse of the latter in the literature. These are reflected in the list of items in this section and elaborated in the Appendices.

Assessment of reporting deficiencies

In the protocol, this section came after 'Dealing with missing data'. For a better flow of the text, this section was reported after 'Data extraction and management'.
In the protocol we had only mentioned that TRIPOD was going to be used for the assessment of reporting. In the review text we gave the details of our operationalisation based on the domains and items we used for this task.

Assessment of risk of bias in included studies

In this section of the protocol we had referred to PROBAST as the risk of bias and applicability assessment tool for prognostic model studies and had briefly summarised its domains. Due to the importance of the risk of bias assessment, the challenges encountered by studies in people with MS, and the limitation of the current tool in its applicability to models developed using machine learning, we had to interpret the items in PROBAST. For transparency, in the review we elaborated on our interpretations and assessment of the risk of bias and applicability in the included analyses.

Measures of association or predictive performance measures to be extracted

In the protocol we had proposed describing the adjusted effect measures of prognostic factors in models developed over time. Although we extracted data on the effect measures and their uncertainties from studies that could and did report these, comparing them was not possible due to the variety of the predictors considered, differences in their definitions, and the considerable amount of included ML methods for which traditional effect measures may not be applicable.
For clarity, we operationalised the classification measures and validation categories we collected.

Dealing with missing data

In the protocol we had proposed to contact the authors for missing information needed for quantitative data synthesis or risk of bias assessment. In the review we reported that we also contacted them for unclear or missing information needed for not only the aforementioned purposes but also study eligibility and basic study description.
In the protocol we had proposed applying methods to derive missing performance measures (the c‐statistic for discrimination and the O:E ratio for calibration) and their precision from the reported information. The data reported in the studies did not allow for calculation of missing c‐statistics or missing calibration measures, specifically O:E ratios. Thus, we have changed this in the review to only describe the method we used to derive the missing precision of a reported c‐statistic.

Data synthesis

We had intended to perform meta‐analysis for models with at least three external validations and had described the methods under the subheading 'Data synthesis and meta‐analysis approaches' in the protocol. However, no single model had at least three independent external validation studies outside its development study, thus we decided against performing a meta‐analysis and this subheading was removed from the review text. Instead, we added a subheading called 'Synthesis' to describe how we described and summarised the findings in this review.
Because there was no meta‐analysis, we did not perform any sensitivity analysis and removed the subsection 'Sensitivity analysis'.

Investigation of sources of heterogeneity between studies

In the protocol there were two subheadings to give details on assessment of heterogeneity, 'Assessment of heterogeneity' under 'Data collection' and 'Subgroup analysis and investigation of heterogeneity' under 'Data synthesis'. It was planned to report heterogeneity measures from the meta‐analysis and to perform meta‐regression for different models with the same outcome when at least 10 models for that outcome were identified. Due to the large variability in outcome definitions and poor reporting of performance measures, we were not able to perform this meta‐regression but could only describe the heterogeneity qualitatively. Hence, we reduced the space allocated to this topic to one subsection under 'Data synthesis'.

Terms used for reporting and synthesis

For clarity, we added this section to define the terms we used in reporting the review.

Contributions of authors

Task	Authors responsible
Draft the protocol	BIO, KAR, JH, MG, JB, UH, UM
Develop and run the search strategy	MG, KAR, BIO, ZA
Obtain copies of studies	AA, BIO, KAR, MG
Select which studies to include	BIO, KAR, AA, ZA, AG
Provide consultation on which studies to include	UH, JH, JB, UM
Extract data from the studies	KAR, BIO, AA, AG
Provide consultation on data extraction	HS, UH, UM, JB, JH
Assess risk of bias	KAR, BIO, AA, AG
Provide consultation on risk of bias assessment	HS, UH, UM, JB, JH
Enter data into RevMan 5	BIO, KAR, ZA, AG
Carry out the analysis	KAR, BIO, ZA
Interpret the analysis	KAR, BIO, JB, UH, UM, HS, JH
Draft the final review	BIO, KAR, JB, JH, UH, UM, MG, HS, AG, ZA, AA
Update the review	UM, UH

Open in a new tab

Sources of support

Internal sources

DIFUTURE Project at Ludwig‐Maximilians‐Universität München, Germany

DIFUTURE is funded by the German Federal Ministry of Education and Research under 01ZZ1804B and 01ZZ1804C.
Clinical Research Priority Program (CRPP), University of Zurich, Switzerland

The CRPP funded the project Precision^MS: Implementing Precision Medicine in Multiple Sclerosis.
Privatdozenten‐Stiftung, University of Zurich, Switzerland

Privatdozenten‐Stiftung provided partial financial support for project costs including electronic search consulting and research assistant help.

External sources

No sources of support provided

Declarations of interest

JH reports a grant for OCT research from the Friedrich‐Baur‐Stiftung and Merck, personal fees and non‐financial support from Alexion, Bayer HealthCare Pharmaceuticals, Biogen, Celgene, F. Hoffman‐La Roche, Janssen Biotech, Merck, Novartis, and Sanofi Genzyme and non‐financial support from the Guthy‐Jackson Charitable Foundation, all outside the submitted work.
UH received financial compensation once for a lecture organised by CSL Behring, after submission of the manuscript, and outside the submitted work.
BIO has provided consultancy to Roche once on a topic outside the submitted work.
KAR, JB, MG, AA, ZA, AG, HS, UM: nothing to declare

These authors should be considered joint first author

These authors contributed equally to this work

New

References

References to studies included in this review

Aghdam 2021 {published data only}

Abri Aghdam K, Aghajani A, Kanani F, Soltan Sanjari M, Chaibakhsh S, Shirvaniyan F, et al. A novel decision tree approach to predict the probability of conversion to multiple sclerosis in Iranian patients with optic neuritis. Multiple Sclerosis and Related Disorders 2021;47:102658. [DOI] [PubMed] [Google Scholar]

Agosta 2006 {published data only}

Agosta F, Rovaris M, Pagani E, Sormani MP, Comi G, Filippi M. Magnetization transfer MRI metrics predict the accumulation of disability 8 years later in patients with multiple sclerosis. Brain 2006;129(Pt 10):2620-7. [DOI: 10.1093/brain/awl208] [DOI] [PubMed] [Google Scholar]

Ahuja 2021 {published data only}

Ahuja Y, Kim N, Liang L, Cai T, Dahal K, Seyok T, et al. Leveraging electronic health records data to predict multiple sclerosis disease activity. Annals of Clinical and Translational Neurology 2021;8(4):800-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

Bejarano 2011 {published data only}

Bejarano B, Bianco M, Gonzalez-Moron D, Sepulcre J, Goni J, Arcocha J, et al. Computational classifiers for predicting the short-term course of multiple sclerosis. BMC Neurology 2011;11:67. [DOI: 10.1186/1471-2377-11-67] [DOI] [PMC free article] [PubMed] [Google Scholar]

Bendfeldt 2019 {published data only}

Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. MRI-based prediction of conversion from clinically isolated syndrome to clinically definite multiple sclerosis using SVM and lesion geometry. Brain Imaging and Behavior 2019;13(5):1361-74. [DOI: 10.1007/s11682-018-9942-9] [DOI] [PMC free article] [PubMed] [Google Scholar]
Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. Predicting conversion to clinically definite multiple sclerosis using machine learning on the basis of cerebral grey matter segmentations. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116222.
Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. Predicting conversion to clinically definite multiple sclerosis using machine learning on the basis of cerebral grey matter segmentations. Multiple Sclerosis Journal 2015;23(Suppl 11):498-9. [Google Scholar]
Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. MRI-based prediction of conversion from clinically isolated syndrome to clinically definite multiple sclerosis using SVM and lesion geometry. Brain Imaging and Behavior 2018;13(5):1361-74. [DOI] [PMC free article] [PubMed] [Google Scholar]

Bergamaschi 2001 {published data only}

Bergamaschi R, Berzuini C, Romani A, Cosi V. Predicting secondary progression in relapsing-remitting multiple sclerosis: a bayesian analysis. Journal of the Neurological Sciences 2001;189(1-2):13-21. [DOI] [PubMed] [Google Scholar]

Bergamaschi 2007 {published data only}

Bergamaschi R, Quaglini S, Trojano M, Amato MP, Tavazzi E, Paolicelli D, et al. Early prediction of the long term evolution of multiple sclerosis: the bayesian risk estimate for multiple sclerosis (BREMS) score. Journal of Neurology, Neurosurgery and Psychiatry 2007;78(7):757-9. [DOI: 10.1136/jnnp.2006.107052] [DOI] [PMC free article] [PubMed] [Google Scholar]

Bergamaschi 2015 {published data only}

Bergamaschi R, Montomoli C, Mallucci G, Lugaresi A, Izquierdo G, Grand'Maison F, et al. BREMSO: a simple score to predict early the natural course of multiple sclerosis. European Journal of Neurology 2015;22(6):981-9. [DOI: 10.1111/ene.12696] [DOI] [PubMed] [Google Scholar]
Bergamaschi R, Montomoli C, Mallucci G. Bayesian risk estimate for multiple sclerosis at onset (BREMSO): a simple clinical score for the early prediction of multiple sclerosis long-term evolution. In: 29th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2013 October 2-5; Copenhagen (Denmark). ECTRIMS, 2013. Available at onlinelibrary.ectrims-congress.eu/ectrims/2013/copenhagen/34238.
Bergamaschi R, Montomoli C, Mallucci G. Bayesian risk estimate for multiple sclerosis at onset (BREMSO): a simple clinical score for the early prediction of multiple sclerosis long-term evolution. Multiple Sclerosis Journal 2013;19(Suppl 1):338. [Google Scholar]

Borras 2016 {published data only}

Borras E, Canto E, Choi M, Maria Villar L, Alvarez-Cermeno JC, Chiva C, et al. Protein-based classifier to predict conversion from clinically isolated syndrome to multiple sclerosis. Molecular and Cellular Proteomics 2016;15(1):318-28. [DOI: 10.1074/mcp.M115.053256] [DOI] [PMC free article] [PubMed] [Google Scholar]
Comabella M, Borràs E, Cantó E, Choi M, Villar LM, Álvarez-Cermeño JC, et al. Protein-based biomarker predicts conversion from clinically isolated syndrome to multiple sclerosis. Multiple Sclerosis Journal 2015;21(Suppl 11):634. [Google Scholar]

Brichetto 2020 {published data only}

Brichetto G, Monti Bragadin M, Fiorini S, Battaglia MA, Konrad G, Ponzio M, et al. The hidden information in patient-reported outcomes and clinician-assessed outcomes: multiple sclerosis as a proof of concept of a machine learning approach. Journal of the Neurological Sciences 2020;41(2):459-62. [DOI: 10.1007/s10072-019-04093-x] [DOI] [PMC free article] [PubMed] [Google Scholar]
Tacchino A, Fiorini S, Ponzio M, Barla A, Verri A, Battaglia MA, et al. Multiple sclerosis disease course prediction: a machine learning model based on patient reported and clinician assessed outcomes. In: 7th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2017 October 25-28; Paris (France). ECTRIMS, 2017. Available at onlinelibrary.ectrims-congress.eu/ectrims/2017/ACTRIMS-ECTRIMS2017/202553.
Tacchino A, Fiorini S, Ponzio M, Barla A, Verri A, Battaglia MA, et al. Multiple sclerosis disease course prediction: a machine learning model based on patient reported and clinician assessed outcomes. Multiple Sclerosis Journal 2017;23(Suppl 3):58-9. [Google Scholar]

Calabrese 2013 {published data only}

Calabrese M, Poretto V, Favaretto A, Seppi D, Alessio S, Rinaldi F, et al. The grey matter basis of disability progression in multiple sclerosis. Multiple Sclerosis Journal 2012;18(Suppl 4):121-2. [Google Scholar]
Calabrese M, Romualdi C, Poretto V, Favaretto A, Morra A, Rinaldi F, et al. The changing clinical course of multiple sclerosis: a matter of gray matter. Annals of Neurology 2013;74(1):76-83. [DOI: 10.1002/ana.23882] [DOI] [PubMed] [Google Scholar]

De Brouwer 2021 {published data only}

De Brouwer E, Becker T, Moreau Y, Havrdova EK, Trojano M, Eichau S, et al. Longitudinal machine learning modeling of MS patient trajectories improves predictions of disability progression. Computer Methods and Programs in Biomedicine 2021;208:106180. [DOI] [PubMed] [Google Scholar]
De Brouwer E, Peeters L, Becker T, Altintas A, Soysal A, Van Wijmeersch B, et al. Introducing machine learning for full MS patient trajectories improves predictions for disability score progression. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279466.
De Brouwer E, Peeters L, Becker T, Altintas A, Soysal A, Van Wijmeersch B, et al. Introducing machine learning for full MS patient trajectories improves predictions for disability score progression. Multiple Sclerosis Journal 2019;25(Suppl 2):63-5. [Google Scholar]

de Groot 2009 {published data only}

Groot V, Beckerman H, Uitdehaag BM, Hintzen RQ, Minneboo A, Heymans MW, et al. Physical and cognitive functioning after 3 years can be predicted using information from the diagnostic process in recently diagnosed multiple sclerosis. Archives of Physical Medicine and Rehabilitation 2009;90(9):1478-88. [DOI: 10.1016/j.apmr.2009.03.018] [DOI] [PubMed] [Google Scholar]

Gout 2011 {published data only}

Gout O, Bouchareine A, Moulignier A, Deschamps R, Papeix C, Gorochov G, et al. Prognostic value of cerebrospinal fluid analysis at the time of a first demyelinating event. Multiple Sclerosis Journal 2011;17(2):164-72. [DOI: 10.1177/1352458510385506] [DOI] [PubMed] [Google Scholar]

Gurevich 2009 {published data only}

Gurevich M, Tuller T, Rubinstein U, Or-Bach R, Achiron A. Prediction of acute multiple sclerosis relapses by transcription levels of peripheral blood cells. BMC Medical Genomics 2009;2:46. [DOI: 10.1186/1755-8794-2-46] [DOI] [PMC free article] [PubMed] [Google Scholar]

Kosa 2022 {published data only}

Barbour C, Kosa P, Greenwood M, Bielekova B. Constructing a molecular model of disease severity in multiple sclerosis. Neurology 2019;92(Suppl 15):P3.2-006. [Google Scholar]
Barbour C, Kosa P, Varosanec M, Greenwood M, Bielekova B. Molecular models of multiple sclerosis severity identify heterogeneity of pathogenic mechanisms. medRxiv 2020 May 22 [Epub ahead of print]. [DOI: ] [DOI] [PMC free article] [PubMed]
Barbour CR, Kosa P, Greenwood M, Bielekova B. Constructing a molecular model of disease severity in multiple sclerosis. Multiple Sclerosis Journal 2019;25:23. [Google Scholar]
Kosa P, Barbour C, Varosanec M, Wichman A, Sandford M, Greenwood M, et al. Molecular models of multiple sclerosis severity identify heterogeneity of pathogenic mechanisms. Nature Communications 2022;13(1):7670. [DOI: 10.1038/s41467-022-35357-4] [DOI] [PMC free article] [PubMed] [Google Scholar]

Kuceyeski 2018 {published data only}

Kuceyeski A, Monohan E, Morris E, Fujimoto K, Vargas W, Gauthier SA. Baseline biomarkers of connectome disruption and atrophy predict future processing speed in early multiple sclerosis. NeuroImage: Clinical 2018;19:417-24. [DOI: 10.1016/j.nicl.2018.05.003] [DOI] [PMC free article] [PubMed] [Google Scholar]

Law 2019 {published data only}

Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning in secondary progressive multiple sclerosis: an improved predictive model for short-term disability progression. Multiple Sclerosis Journal Experimental Translational and Clinical 2019;5(4):2055217319885983. [DOI: 10.1177/2055217319885983] [DOI] [PMC free article] [PubMed] [Google Scholar]
Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning outperforms linear regression for predicting disability progression in SPMS. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/228174.
Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning outperforms linear regression for predicting disability progression in SPMS. Multiple Sclerosis Journal 2018;24(Suppl 2):1025. [Google Scholar]

Lejeune 2021 {published data only}

Lejeune F, Chatton A, Laplaud D, Wiertlewski S, Edan G, Le Page E, et al. SMILE: a predictive model for scoring the severity of relapses in multIple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):198. [DOI] [PubMed] [Google Scholar]
Lejeune F, Chatton A, Laplaud DA, Le Page E, Wiertlewski S, Edan G, et al. SMILE: a predictive model for scoring the severity of relapses in multIple sclerosis. Journal of Neurology 2021;268(2):669-79. [DOI] [PubMed] [Google Scholar]
Lejeune F, Chatton A, Laplaud DA, Wiertlewski S, Edan G, Lepage E, et al. SCOPOUSEP: a predictive model for scoring the severity of relapses in multiple sclerosis. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/229235.
Lejeune F, Chatton A, Laplaud DA, Wiertlewski S, Edan G, Lepage E, et al. SCOPOUSEP: a predictive model for scoring the severity of relapses in multiple sclerosis. Multiple Sclerosis Journal 2018;24(Suppl 2):791-2. [DOI] [PubMed] [Google Scholar]

Malpas 2020 {published data only}

Malpas CB, Manouchehrinia A, Sharmin S, Roos I, Horakova D, Havrdova EK, et al. Aggressive form of multiple sclerosis can be predicted early after disease onset. Multiple Sclerosis Journal 2019;25(Suppl 2):605-7. [Google Scholar]
Malpas CB, Manouchehrinia A, Sharmin S, Roos I, Horakova D, Havrdova EK, et al. Early clinical markers of aggressive multiple sclerosis. Brain 2020;143(5):1400-13. [DOI: 10.1093/brain/awaa081] [DOI] [PubMed] [Google Scholar]

Mandrioli 2008 {published data only}

Mandrioli J, Sola P, Bedin R, Gambini M, Merelli E. A multifactorial prognostic index in multiple sclerosis. Cerebrospinal fluid IgM oligoclonal bands and clinical features to predict the evolution of the disease. Journal of Neurology 2008;255(7):1023-31. [DOI: 10.1007/s00415-008-0827-5] [DOI] [PubMed] [Google Scholar]

Manouchehrinia 2019 {published data only}

Manouchehrinia A, Zhu F, Piani-Meier D, Lange M, Silva DG, Carruthers R, et al. Predicting risk of secondary progression in multiple sclerosis: a nomogram. Multiple Sclerosis Journal 2019;25(8):1102-12. [DOI: 10.1177/1352458518783667] [DOI] [PubMed] [Google Scholar]

Margaritella 2012 {published data only}

Margaritella N, Mendozzi L, Garegnani M, Colicino E, Gilardi E, Deleonardis L, et al. Sensory evoked potentials to predict short-term progression of disability in multiple sclerosis. Journal of the Neurological Sciences 2012;33(4):887-92. [DOI: 10.1007/s10072-011-0862-3] [DOI] [PubMed] [Google Scholar]

Martinelli 2017 {published data only}

Martinelli V, Dalla Costa G, Messina MJ, Di Maggio G, Moiola L, Rodegher M, et al. Use of multiple biomarkers to improve the prediction of multiple sclerosis in patients with clinically isolated syndromes. Journal of the Neurological Sciences 2015;23(Suppl 11):370-1. [Google Scholar]
Martinelli V, Dalla Costa G, Messina MJ, Di Maggio G, Sangalli F, Moiola L, et al. Multiple biomarkers improve the prediction of multiple sclerosis in clinically isolated syndromes. Acta Neurologica Scandinavica 2017;136(5):454-61. [DOI: 10.1111/ane.12761] [DOI] [PubMed] [Google Scholar]

Misicka 2020 {published data only}

Misicka E, Sept C, Briggs FBS. Predicting onset of secondary-progressive multiple sclerosis using genetic and non-genetic factors. Journal of Neurology 2020;267(8):2328-39. [DOI: 10.1007/s00415-020-09850-z] [DOI] [PubMed] [Google Scholar]

Montolio 2021 {published data only}

Montolio A, Martin-Gallego A, Cegonino J, Orduna E, Vilades E, Garcia-Martin E, et al. Machine learning in diagnosis and disability prediction of multiple sclerosis using optical coherence tomography. Computers in Biology and Medicine 2021;133:104416. [DOI] [PubMed] [Google Scholar]

Olesen 2019 {published data only}

Olesen MN, Soelberg K, Debrabant B, Nilsson AC, Lillevang ST, Grauslund J, et al. Cerebrospinal fluid biomarkers for predicting development of multiple sclerosis in acute optic neuritis: a population-based prospective cohort study. Journal of Neuroinflammation 2019;16(1):59. [DOI: 10.1186/s12974-019-1440-5] [DOI] [PMC free article] [PubMed] [Google Scholar]

Oprea 2020 {published data only}

Oprea S, Văleanu A, Negreș S. The development and validation of a disability and outcome prediction algorithm in multiple sclerosis patients. Farmacia 2020;68(6):1147-54. [Google Scholar]

Pellegrini 2019 {published data only}

Copetti M, Fontana A, Freudensprung U, De Moor C, Hyde R, Bovis F, et al. Predicting MS disease progression remains a significant challenge: results from advanced statistical models of RCT placebo arms. In: 7th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2017 October 25-28; Paris (France). ECTRIMS, 2017. Available at onlinelibrary.ectrims-congress.eu/ectrims/2017/ACTRIMS-ECTRIMS2017/199979.
Copetti M, Fontana A, Freudensprung U, De Moor C, Hyde R, Bovis F, et al. Predicting MS disease progression remains a significant challenge: results from advanced statistical models of RCT placebo arms. Multiple Sclerosis Journal 2017;23(Suppl 3):113. [Google Scholar]
Pellegrini F, Copetti M, Sormani M P, Bovis F, Moor C, Debray TP, et al. Predicting disability progression in multiple sclerosis: insights from advanced statistical modeling. Multiple Sclerosis Journal 2019;26(14):1828-36. [DOI: 10.1177/1352458519887343] [DOI] [PubMed] [Google Scholar]

Pinto 2020 {published data only}

Pinto MF, Oliveira H, Batista S, Cruz L, Pinto M, Correia I, et al. Prediction of disease progression and outcomes in multiple sclerosis with machine learning. Scientific Reports 2020;10(1):21038. [DOI: 10.1038/s41598-020-78212-6] [DOI] [PMC free article] [PubMed] [Google Scholar]

Pisani 2021 {published data only}

Pisani AI, Scalfari A, Crescenzo F, Romualdi C, Calabrese M. A novel prognostic score to assess the risk of progression in relapsing-remitting multiple sclerosis patients. European Journal of Neurology 2021;28(8):2503-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pisani AI, Scalfari A, Romualdi C, Calabrese M. The progressive multiple sclerosis score: a prognostic assistant tool in multiple sclerosis disease. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279464.
Pisani AI, Scalfari A, Romualdi C, Calabrese M. The progressive multiple sclerosis score: a prognostic assistant tool in multiple sclerosis disease. Multiple Sclerosis Journal 2019;25(Suppl 2):62. [Google Scholar]

Roca 2020 {published data only}

Roca P, Attye A, Colas L, Tucholka A, Rubini P, Cackowski S, et al. Artificial intelligence to predict clinical disability in patients with multiple sclerosis using FLAIR MRI. Diagnostic and Interventional Imaging 2020;101(12):795-802. [DOI: 10.1016/j.diii.2020.05.009] [DOI] [PubMed] [Google Scholar]

Rocca 2017 {published data only}

Filippi M, Rovaris MG, Sormani MP, Caputo D, Ghezzi A, Montanari E, et al. Earlier prognostication in primary progressive multiple sclerosis using MRI: a 15-year longitudinal study. European Journal of Neurology 2017;24(Suppl 1):43. [Google Scholar]
Rocca MA, Sormani MP, Rovaris M, Caputo D, Ghezzi A, Montanari E, et al. Anticipation of long-term disability progression in PPMS using MRI: a 15-year longitudinal study. Multiple Sclerosis Journal 2017;23(Suppl 3):292-3. [Google Scholar]
Rocca MA, Sormani MP, Rovaris M, Caputo D, Ghezzi A, Montanari E, et al. Long-term disability progression in primary progressive multiple sclerosis: a 15-year study. Brain 2017;140(11):2814-9. [DOI: 10.1093/brain/awx250] [DOI] [PubMed] [Google Scholar]

Rovaris 2006 {published data only}

Rovaris M, Judica E, Gallo A, Benedetti B, Sormani MP, Caputo D, et al. Grey matter damage predicts the evolution of primary progressive multiple sclerosis at 5 years. Brain 2006;129(Pt 10):2628-34. [DOI: ] [DOI] [PubMed] [Google Scholar]

Runia 2014 {published data only}

Runia TF, Jafari N, Siepman DAM, Nieboer D, Steyerberg E, et al. A clinical prediction model for definite multiple sclerosis in patients with clinically isolated syndrome. Multiple Sclerosis 2014;20(Suppl 1):404. [Google Scholar]
Runia TF. Multiple Sclerosis - Predicting the Next Attack [Dissertation]. Rotterdam (Netherlands): Erasmus University Rotterdam, 2015. [Google Scholar]

Seccia 2020 {published data only}

Seccia R, Gammelli D, Dominici F, Romano S, Landi AC, Salvetti M, et al. Considering patient clinical history impacts performance of machine learning models in predicting course of multiple sclerosis. PLOS One 2020;15(3):e0230219. [DOI: 10.1371/journal.pone.0230219] [DOI] [PMC free article] [PubMed] [Google Scholar]

Skoog 2014 {published data only}

Skoog B, Runmarker B, Oden A, Andersen O. Multiple sclerosis: a method to identify high risk for secondary progression. Neurology 2012;78(Suppl 1):P05.089. [Google Scholar]
Skoog B, Tedeholm H, Runmarker B, Oden A, Andersen O. Continuous prediction of secondary progression in the individual course of multiple sclerosis. Multiple Sclerosis and Related Disorders 2014;3(5):584-92. [DOI: 10.1016/j.msard.2014.04.004] [DOI] [PubMed] [Google Scholar]
Tedeholm H, Skoog B, Andersen O. A method to identify the risk of transition to the secondary progressive course in multiple sclerosis patients. Neurology 2013;80(Suppl 7):P04.131. [Google Scholar]
Tedeholm H, Skoog B, Runmarker B, Oden A, Andersen O. A new method to identify multiple sclerosis patients with a high risk for secondary progression. Multiple Sclerosis Journal 2012;18(Suppl 4):91. [Google Scholar]

Skoog 2019 {published data only}

Skoog B, Link J, Tedeholm H, Longfils M, Nerman O, Fagius J, et al. Short-term prediction of secondary progression in a sliding window: a test of a predicting algorithm in a validation cohort. Multiple Sclerosis Journal - Experimental, Translational and Clinical 2019;5(3):2055217319875466. [DOI: 10.1177/2055217319875466] [DOI] [PMC free article] [PubMed] [Google Scholar]

Sombekke 2010 {published data only}

Sombekke MH, Arteta D, de Wiel MA, Crusius JB, Tejedor D, Killestein J, et al. Analysis of multiple candidate genes in association with phenotypes of multiple sclerosis. Multiple Sclerosis 2010;16(6):652-9. [DOI: 10.1177/1352458510364633] [DOI] [PubMed] [Google Scholar]

Sormani 2007 {published data only}

Sormani MP, Rovaris M, Comi G, Filippi M. A composite score to predict short-term disease activity in patients with relapsing-remitting MS. Neurology 2007;69(12):1230-5. [DOI: 10.1212/01.wnl.0000276940.90309.15] [DOI] [PubMed] [Google Scholar]

Spelman 2017 {published data only}

Spelman T, Meyniel C, Rojas JI, Lugaresi A, Izquierdo G, Grand'Maison F, et al. Quantifying risk of early relapse in patients with first demyelinating events: prediction in clinical practice. Multiple Sclerosis Journal 2017;23(10):1346-57. [DOI: 10.1177/1352458516679893] [DOI] [PubMed] [Google Scholar]

Szilasiová 2020 {published data only}

Szilasiová J, Rosenberger J, Mikula P, Vitková M, Fedičová M, Gdovinová Z. Cognitive event-related potentials-the P300 wave is a prognostic factor of long-term disability progression in patients with multiple sclerosis. Journal of Clinical Neurophysiology 2020 Oct 05 [Epub ahead of print]. [DOI: 10.1097/WNP.0000000000000788] [DOI] [PubMed]

Tacchella 2018 {published data only}

Tacchella A, Romano S, Ferraldeschi M, Salvetti M, Zaccaria A, Crisanti A, et al. Collaboration between a human group and artificial intelligence can improve prediction of multiple sclerosis course: a proof-of-principle study. F1000Research 2017;6:2172. [DOI: 10.12688/f1000research.13114.2] [DOI] [PMC free article] [PubMed] [Google Scholar]
Tacchella A, Romano S, Ferraldeschi M, Salvetti M, Zaccaria A, Crisanti A, et al. Collaboration between a human group and artificial intelligence can improve prediction of multiple sclerosis course: a proof-of-principle study. F1000Research 2018;6:2172. [DOI: 10.12688/f1000research.13114.1] [DOI] [PMC free article] [PubMed] [Google Scholar]

Tommasin 2021 {published data only}

Tommasin S, Cocozza S, Taloni A, Gianni C, Petsas N, Pontillo G, et al. Machine learning classifier to identify clinical and radiological features relevant to disability progression in multiple sclerosis. Journal of Neurology 2021;268(12):4834-45. [DOI] [PMC free article] [PubMed] [Google Scholar]

Tousignant 2019 {published data only}

Tousignant A, Lemaître P, Precup D, Arnold DL, Arbel T. Prediction of disease progression in multiple sclerosis patients using deep learning analysis of MRI data. Proceedings of Machine Learning Research 2019;102:483-92. [Google Scholar]

Vasconcelos 2020 {published data only}

Aurenção JCK, Vasconcelos CCF, Thuler LCS, Alvarenga RMP. Validation of a clinical risk score for long-term progression of MS. Multiple Sclerosis Journal 2017;23(Suppl 3):740. [Google Scholar]
Vasconcelos CCF, Aurenção JCK, Alvarenga RMP, Thuler LCS. Long-term MS secondary progression: derivation and validation of a clinical risk score. Clinical Neurology and Neurosurgery 2020;194:105792. [DOI: 10.1016/j.clineuro.2020.105792] [DOI] [PubMed] [Google Scholar]
Vasconcelos CCF, Thuler LCS, Calvet Kallenbach Aurenção JCK, Papais-Alvarenga RM. A proposal for a risk score for long-term progression of multiple sclerosis. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/115269.
Vasconcelos CCF, Thuler LCS, Calvet Kallenbach Aurenção JCK, Papais-Alvarenga RM. A proposal for a risk score for long-term progression of multiple sclerosis. Multiple Sclerosis Journal 2015;21(Suppl 11):732. [Google Scholar]

Vukusic 2004 {published data only}

Vukusic S, Hutchinson M, Hours M, Moreau T, Cortinovis-Tourniaire P, Adeleine P, et al. Erratum: Pregnancy and multiple sclerosis (the PRIMS study) - clinical predictors of post-partum relapse. Brain 2004;127(Pt 8):1912. [DOI] [PubMed] [Google Scholar]
Vukusic S, Hutchinson M, Hours M, Moreau T, Cortinovis-Tourniaire P, Adeleine P, et al. Pregnancy and multiple sclerosis (the PRIMS study) - clinical predictors of post-partum relapse. Brain 2004;127(Pt 6):1353-60. [DOI: 10.1093/brain/awh152] [DOI] [PubMed] [Google Scholar]

Weinshenker 1991 {published data only}

Weinshenker BG, Rice GPA, Noseworthy JH, Carriere W, Baskerville J, Ebers GC. The natural history of multiple sclerosis: a geographically based study. 3. Multivariate analysis of predictive factors and models of outcome. Brain 1991;114(Pt 2):1045-56. [DOI: 10.1093/brain/114.2.1045] [DOI] [PubMed] [Google Scholar]

Weinshenker 1996 {published data only}

Weinshenker BG, Issa M, Baskerville J. Long-term and short-term outcome of multiple sclerosis: a 3-year follow-up study. Archives of Neurology 1996;53(4):353-8. [DOI: 10.1001/archneur.1996.00550040093018] [DOI] [PubMed] [Google Scholar]

Wottschel 2015 {published data only}

Ciccarelli O, Kwok PP, Wottschel V, Chard D, Stromillo ML, De Stefano N, et al. Predicting clinical conversion to multiple sclerosis in patients with clinically isolated syndrome using machine learning techniques. Multiple Sclerosis Journal 2012;18(Suppl 4):30-1. [Google Scholar]
Wottschel V, Alexander DC, Kwok PP, Chard DT, Stromillo ML, De Stefano N, et al. Predicting outcome in clinically isolated syndrome using machine learning. NeuroImage: Clinical 2015;7:281-7. [DOI: 10.1016/j.nicl.2014.11.021] [DOI] [PMC free article] [PubMed] [Google Scholar]
Wottschel V, Ciccarelli O, Chard DT, Miller DH, Alexander DC. Prediction of second neurological attack in patients with clinically isolated syndrome using support vector machines. In: 2013 International Workshop on Pattern Recognition in Neuroimaging. 2013:82-5.

Wottschel 2019 {published data only}

Wottschel V, Chard DT, Enzinger C, Filippi M, Frederiksen JL, Gasperini C, et al. SVM recursive feature elimination analyses of structural brain MRI predicts near-term relapses in patients with clinically isolated syndromes suggestive of multiple sclerosis. NeuroImage: Clinical 2019;24:102011. [DOI: 10.1016/j.nicl.2019.102011] [DOI] [PMC free article] [PubMed] [Google Scholar]

Ye 2020 {published data only}

Ye F, Liang J, Li J, Li H, Sheng W. Development and validation of a five-gene signature to predict relapse-free survival in multiple sclerosis. Frontiers in Neurology 2020;11:579683. [DOI] [PMC free article] [PubMed] [Google Scholar]

Yoo 2019 {published data only}

Yoo Y, Tang LW, Brosch T, Li DKB, Metz L, Traboulsee A, et al. Deep Learning and Data Labeling for Medical Applications. Springer, 2016. [Google Scholar]
Yoo Y, Tang LYW, Li DKB, Metz L, Kolind S, Traboulsee AL, et al. Deep learning of brain lesion patterns and user-defined clinical and MRI features for predicting conversion to multiple sclerosis from clinically isolated syndrome. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization 2019;7(3):250-9. [DOI: 10.1080/21681163.2017.1356750] [DOI] [Google Scholar]

Yperman 2020 {published data only}

Yperman J, Becker T, Valkenborg D, Popescu V, Hellings N, Van Wijmeersch B, et al. Machine learning analysis of motor evoked potential time series to predict disability progression in multiple sclerosis. BMC Neurology 2020;20(1):105. [DOI: 10.1186/s12883-020-01672-w] [DOI] [PMC free article] [PubMed] [Google Scholar]
Yperman J, Becker T, Valkenborg D, Popescu V, Hellings N, Van Wijmeersch B, et al. Machine learning analysis of motor evoked potential time series to predict disability progression in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):874-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

Zakharov 2013 {published data only}

Zakharov AV, Khinivtseva EV, Poverennova IE, Gindullina EA, Vlasov Ia V, Sineok EV. Assessment of the risk of the transition of a monofocal clinically isolated syndrome to clinically definite multiple sclerosis. Zhurnal Nevrologii i Psikhiatrii Imeni S.S. Korsakova 2013;113(2 Pt 2):28-31. [PMID: ] [PubMed] [Google Scholar]

Zhang 2019 {published data only}

Zhang H, Alberts E, Pongratz V, Mühlau M, Zimmer C, Wiestler B, et al. Predicting conversion from clinically isolated syndrome to multiple sclerosis - an imaging-based machine learning approach. NeuroImage: Clinical 2019;21:101593. [DOI: 10.1016/j.nicl.2018.11.003] [DOI] [PMC free article] [PubMed] [Google Scholar]

Zhao 2020 {published data only}

Chitnis T, Zhao Y, Healy BC, Rotstein D, Guttmann CRG, Bakshi R, et al. Predicting clinical course in multiple sclerosis using machine learning. In: 6th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2014 September 10-13; Boston (MA). ECTRIMS, 2014. Available at onlinelibrary.ectrims-congress.eu/ectrims/2014/ACTRIMS-ECTRIMS2014/64470.
Chitnis T, Zhao Y, Healy BC, Rotstein D, Guttmann CRG, Bakshi R, et al. Predicting clinical course in multiple sclerosis using machine learning. Multiple Sclerosis Journal 2014;20(Suppl 1):404. [Google Scholar]
Zhao Y, Chitnis T, Doan T. Ensemble learning for predicting multiple sclerosis disease course. Multiple Sclerosis Journal 2019;25(Suppl 1):160-1. [Google Scholar]
Zhao Y, Healy BC, Rotstein D, Guttmann CR, Bakshi R, Weiner HL, et al. Exploration of machine learning techniques in predicting multiple sclerosis disease course. PLOS One 2017;12(4):e0174866. [DOI: 10.1371/journal.pone.0174866] [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao Y, Wang T, Bove R, Cree B, Henry R, Lokhande H, et al. Ensemble learning predicts multiple sclerosis disease course in the SUMMIT study. NPJ Digital Medicine 2020;3:135. [DOI: 10.1038/s41746-020-00361-9] [DOI] [PMC free article] [PubMed] [Google Scholar]

References to studies excluded from this review

Achiron 2006 {published data only}

Achiron A. Measuring disability progression in multiple sclerosis. Journal of Neurology 2006;253(6):vi31-6. [Google Scholar]

Ahlbrecht 2016 {published data only}

Ahlbrecht J, Martino F, Pul R, Skripuletz T, Suhs KW, Schauerte C, et al. Deregulation of microRNA-181c in cerebrospinal fluid of patients with clinically isolated syndrome is associated with early conversion to relapsing-remitting multiple sclerosis. Multiple Sclerosis Journal 2016;22(9):1202-14. [DOI: 10.1177/1352458515613641] [DOI] [PubMed] [Google Scholar]

Andersen 2015 {published data only}

Andersen O, Skoog B, Runmarker B, Lisovskaja V, Nerman O, Tedeholm H. Fifty years untreated prognosis of multiple sclerosis based on an incidence cohort. European Journal of Neurology 2015;22(Suppl 1):25. [Google Scholar]

Azevedo 2019 {published data only}

Azevedo C, Cen S, Zheng L, Jaberzadeh A, Pelletier D. Minimum clinically important difference for brain atrophy measures in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):697-8. [Google Scholar]

Barkhof 1997 {published data only}

Barkhof F, Filippi M, Miller DH, Scheltens P, Campi A, Polman CH, et al. Comparison of MRI criteria at first presentation to predict conversion to clinically definite multiple sclerosis. Brain 1997;120(Pt 11):2059-69. [DOI: 10.1093/brain/120.11.2059] [DOI] [PubMed] [Google Scholar]

Brettschneider 2006 {published data only}

Brettschneider J, Petzold A, Junker A, Tumani H. Axonal damage markers in the cerebrospinal fluid of patients with clinically isolated syndrome improve predicting conversion to definite multiple sclerosis. Multiple Sclerosis Journal 2006;12(2):143-8. [DOI] [PubMed] [Google Scholar]

Bsteh 2021 {published data only}

Bsteh G, Hegen H, Riedl K, Altmann P, Auer M, Berek K, et al. Quantifying the risk of disease reactivation after interferon and glatiramer acetate discontinuation in multiple sclerosis: the VIAADISC score. European Journal of Neurology 2021;28(5):1609-16. [DOI] [PMC free article] [PubMed] [Google Scholar]

Castellaro 2015 {published data only}

Castellaro M, Bertoldo A, Morra A, Monaco S, Calabrese M, Doyle O. Prediction of conversion to secondary progression phase in multiple sclerosis. Multiple Sclerosis Journal 2015;23:198-9. [Google Scholar]

Chalkou 2021 {published data only}

Chalkou K, Steyerberg E, Egger M, Manca A, Pellegrini F, Salanti G. A two-stage prediction model for heterogeneous effects of treatments. Statistics in Medicine 2021;40(20):4362-75. [DOI] [PMC free article] [PubMed] [Google Scholar]

Costa 2017 {published data only}

Costa GD, Di Maggio G, Sangalli F, Moiola L, Colombo B, Comi G, et al. Prognostic factors for multiple sclerosis in patients with spinal isolated syndromes. European Journal of Neurology 2017;24:62. [Google Scholar]

Cutter 2014 {published data only}

Cutter G, Wolinsky JS, Comi G, Ladkani D, Knappertz V, Vainstein A, et al. Indirect comparison of glatiramer acetate 40mg/mL TIW and 20mg/mL QD dosing regimen effects on relapse rate: results of a predictive statistical model. Multiple Sclerosis Journal 2014;20:112. [Google Scholar]

Damasceno 2019 {published data only}

Damasceno A, Pimentel-Silva LR, Damasceno BP, Cendes F. Cognitive trajectories in relapsing–remitting multiple sclerosis: a longitudinal 6-year study. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/278685. [DOI] [PubMed]
Damasceno A, Pimentel-Silva LR, Damasceno BP, Cendes F. Cognitive trajectories in relapsing–remitting multiple sclerosis: a longitudinal 6-year study. Multiple Sclerosis Journal 2019;26(13):1740-51. [DOI: 10.1177/1352458519878685] [DOI] [PubMed] [Google Scholar]

Daumer 2007 {published data only}

Daumer M, Neuhaus A, Lederer C, Scholz M, Wolinsky JS, Heiderhoff M, et al. Prognosis of the individual course of disease - steps in developing a decision support tool for multiple sclerosis. BMC Medical Informatics and Decision Making 2007;7:11. [DOI: 10.1186/1472-6947-7-11] [DOI] [PMC free article] [PubMed] [Google Scholar]

Dekker 2019 {published data only}

Dekker I, Eijlers AJC, Popescu V, Balk LJ, Vrenken H, Wattjes MP, et al. Predicting clinical progression in multiple sclerosis after 6 and 12 years. European Journal of Neurology 2019;26(6):893-902. [DOI] [PMC free article] [PubMed] [Google Scholar]

Esposito 2011 {published data only}

Esposito M, De Falco I, De Pietro G. An evolutionary-fuzzy DSS for assessing health status in multiple sclerosis disease. International Journal of Medical Informatics 2011;80(12):e245-54. [DOI] [PubMed] [Google Scholar]

Filippi 2010 {published data only}

Filippi M, Rocca MA, Calabrese M, Sormani MP, Rinaldi F, Perini P, et al. Intracortical lesions and new magnetic resonance imaging diagnostic criteria for multiple sclerosis. Multiple Sclerosis Journal 2010;16:S42. [DOI] [PubMed] [Google Scholar]

Filippi 2013 {published data only}

Filippi M, Preziosa P, Copetti M, Riccitelli G, Horsfield MA, Martinelli V, et al. Gray matter damage predicts the accumulation of disability 13 years later in MS. Neurology 2013;81(20):1759-67. [DOI] [PubMed] [Google Scholar]

Fuchs 2021 {published data only}

Fuchs TA, Dwyer MG, Jakimovski D, Bergsland N, Ramasamy DP, Weinstock-Guttman B, et al. Quantifying disease pathology and predicting disease progression in multiple sclerosis with only clinical routine T2-FLAIR MRI. NeuroImage: Clinical 2021;31:102705. [DOI] [PMC free article] [PubMed] [Google Scholar]

Gasperini 2021 {published data only}

Gasperini C, Prosperini L, Rovira A, Tintore M, Sastre-Garriga J, Tortorella C, et al. Scoring the 10-year risk of ambulatory disability in multiple sclerosis: the RoAD score. European Journal of Neurology 2021;28(8):2533-42. [DOI] [PubMed] [Google Scholar]
Gasperini C, Prosperini L, Tortorella C, Haggiag S, Ruggieri S, Mancinelli CR, et al. Scoring the 10-year risk of ambulatory disability in DMD treated multiple sclerosis patients: the RoAD score. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/231905.
Gasperini C, Prosperini L, Tortorella C, Haggiag S, Ruggieri S, Mancinelli CR, et al. Scoring the 10-year risk of ambulatory disability in DMD treated multiple sclerosis patients: the RoAD score. Multiple Sclerosis Journal 2018;24(Suppl 2):58. [Google Scholar]

Gomez‐Gonzalez 2010 {published data only}

Gomez-Gonzalez E, Garcia-Sanchez MI, Izquierdo-Ayuso G, Coca De La Torre A, Ramirez-Martinez D, Marco-Ramirez AM, et al. Application of image and signal processing algorithms to oligoclonal IgG bands classification. Multiple Sclerosis Journal 2010;16:341-2. [Google Scholar]

Hakansson 2017 {published data only}

Hakansson I, Tisell A, Cassel P, Blennow K, Zetterberg H, Lundberg P, et al. Neurofilament light chain in cerebrospinal fluid and prediction of disease activity in clinically isolated syndrome and relapsing-remitting multiple sclerosis. European Journal of Neurology 2017;24(5):703-12. [DOI] [PubMed] [Google Scholar]

Ho 2013 {published data only}

Ho J, Ghosh J, Unnikrishnan K. Risk prediction of a multiple sclerosis diagnosis. In: 2013 IEEE International Conference on Healthcare Informatics. 2013:175-83.

Ignatova 2018 {published data only}

Ignatova V, Todorova L, Haralanov L. Predictors of long term disability progression in patients with relapsing remitting multiple sclerosis. Multiple Sclerosis Journal 2018;24(Suppl 2):788-9. [Google Scholar]

Invernizzi 2011 {published data only}

Invernizzi P, Bertolasi L, Bianchi MR, Turatti M, Gajofatto A, Benedetti MD. Prognostic value of multimodal evoked potentials in multiple sclerosis: the EP score. Journal of Neurology 2011;258(11):1933-9. [DOI] [PubMed] [Google Scholar]

Jackson 2020 {published data only}

Jackson KC, Sun K, Barbour C, Hernandez D, Kosa P, Tanigawa M, et al. Genetic model of MS severity predicts future accumulation of disability. Annals of Human Genetics 2020;84(1):1-10. [DOI: 10.1111/ahg.12342] [DOI] [PMC free article] [PubMed] [Google Scholar]

Kalincik 2013 {published data only}

Kalincik T, Guttmann CR, Krasensky J, Vaneckova M, Lelkova P, Tyblova M, et al. Multiple sclerosis susceptibility loci do not alter clinical and MRI outcomes in clinically isolated syndrome. Genes & Immunity 2013;14(4):244-8. [DOI: 10.1038/gene.2013.17] [DOI] [PubMed] [Google Scholar]

Leocani 2017 {published data only}

Leocani L, Pisa M, Bianco M, Guerrieri S, Di Maggio G, Romeo M, et al. Multimodal EPs predict no evidence of disease activity at two years of first line multiple sclerosis treatment. Neurology 2017;88(Suppl 16):P4.386. [Google Scholar]

Morelli 2020 {published data only}

Morelli ME, Baldini S, Sartori A, D'Acunto L, Dinoto A, Bosco A, et al. Early putamen hypertrophy and ongoing hippocampus atrophy predict cognitive performance in the first ten years of relapsing-remitting multiple sclerosis. Neurological Sciences 2020;41(10):2893-904. [DOI] [PubMed] [Google Scholar]

Palace 2013 {published data only}

Palace J, Bregenzer T, Tremlett H, Duddy M, Boggild M, Zhu F, et al. Modelling natural history for the UK multiple sclerosis risk-sharing scheme. Multiple Sclerosis Journal 2013;19(Suppl 1):339. [Google Scholar]

Pappalardo 2020 {published data only}

Pappalardo F, Russo G, Pennisi M, Parasiliti Palumbo GA, Sgroi G, Motta S, et al. The potential of computational modeling to predict disease course and treatment response in patients with relapsing multiple sclerosis. Cells 2020;9(3):586. [DOI] [PMC free article] [PubMed] [Google Scholar]

Petrou 2018 {published data only}

Petrou P, Yagmour N, Karussis D. Biomarkes for diagnosis and prognosis in multiple sclerosis. Multiple Sclerosis Journal 2018;24:15. [Google Scholar]

Preziosa 2015 {published data only}

Preziosa P, Rocca M, Mesaros S, Copetti M, Petrolini M, Drulovic J, et al. Different MRI measures predict clinical deterioration and cognitive impairment in MS: a 5 year longitudinal study. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116658.

Rajda 2019 {published data only}

Rajda C, Galla Z, Polyák H, Maróti Z, Babarczy K, Pukoli D, et al. High neurofilament light chain and high quinolinic acid levels in the CSF of patients with multiple sclerosis are independent predictors of active, disabling disease. Multiple Sclerosis Journal 2019;25:856. [Google Scholar]

Rio 2019 {published data only}

Rio J, Rovira A, Gasperini C, Tintore M, Prosperini L, Otero-Romero S, et al. Treatment response scoring systems to assess long term prognosis in relapsing-remitting multiple sclerosis patients. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279564.
Rio J, Rovira A, Gasperini C, Tintore M, Prosperini L, Otero-Romero S, et al. Treatment response scoring systems to assess long term prognosis in relapsing-remitting multiple sclerosis patients. Multiple Sclerosis Journal 2019;25:121-2. [Google Scholar]

Rodriguez 2012 {published data only}

Rodriguez JD, Perez A, Arteta D, Tejedor D, Lozano JA. Using multidimensional bayesian network classifiers to assist the treatment of multiple sclerosis. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2012;42(6):1705-15. [DOI: 10.1109/TSMCC.2012.2217326] [DOI] [Google Scholar]

Rothman 2016 {published data only}

Rothman AM, Button J, Balcer LJ, Frohman EM, Frohman TC, Reich DS, et al. Retinal measurements predict 10-year disability in multiple sclerosis. In: 32nd European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2016 September 14-17; London (UK). ECTRIMS, 2016. Available at onlinelibrary.ectrims-congress.eu/ectrims/2016/32nd/146960.
Rothman AM, Button J, Balcer LJ, Frohman EM, Frohman TC, Reich DS, et al. Retinal measurements predict 10-year disability in multiple sclerosis. Multiple Sclerosis Journal 2016;22:20-1. [Google Scholar]

Roura 2018 {published data only}

Roura E, Maclair G, Martinez-Lapiscina EH, Andorra M, Villoslada P. Brain complexity and damage in patients with multiple sclerosis using fractal analysis: a new imaging outcome for monitoring MS severity. Multiple Sclerosis Journal 2018;24(2):210. [Google Scholar]

Sbardella 2011 {published data only}

Sbardella E, Tomassini V, Stromillo ML, Filippini N, Battaglini M, Ruggieri S, et al. Pronounced focal and diffuse brain damage predicts short-term disease evolution in patients with clinically isolated syndrome suggestive of multiple sclerosis. Multiple Sclerosis Journal 2011;17(12):1432-40. [DOI] [PubMed] [Google Scholar]

Schlaeger 2012 {published data only}

Schlaeger R, D'Souza M, Schindler C, Grize L, Dellas S, Radue EW, et al. Prediction of long-term disability in multiple sclerosis. Multiple Sclerosis Journal 2012;18(1):31-8. [DOI] [PubMed] [Google Scholar]

Srinivasan 2020 {published data only}

Srinivasan J, Gudesblatt M. Multiple sclerosis management: predicting disease trajectory of multiple sclerosis on multi-dimensional data including digital cognitive assessments and patient reported outcomes using machine learning techniques. In: 5th Annual Americas Committee for Treatment and Research in Multiple Sclerosis (ACTRIMS); 2020 February 27-29; West Palm Beach (FL). West Palm Beach (FL): ACTRIMS, 2020.

Tintore 2015 {published data only}

Tintoré M. Predicting MS extremes: benign and aggressive. Multiple Sclerosis 2015;23:56. [Google Scholar]

Tomassini 2019 {published data only}

Tomassini V, Fanelli F, Prosperini L, Cerqua R, Cavalla P, Pozzilli C. Predicting the profile of increasing disability in multiple sclerosis. Multiple Sclerosis Journal 2019;25(9):1306-15. [DOI] [PMC free article] [PubMed] [Google Scholar]

Tossberg 2013 {published data only}

Tossberg JT, Crooke PS, Henderson MA, Sriram S, Mrelashvili D, Vosslamber S, et al. Using biomarkers to predict progression from clinically isolated syndrome to multiple sclerosis. Journal of Clinical Bioinformatics 2013;3(1):18. [DOI] [PMC free article] [PubMed] [Google Scholar]

Uher 2017a {published data only}

Uher T, Vaneckova M, Sobisek L, Tyblova M, Seidl Z, Krasensky J, et al. Combining clinical and magnetic resonance imaging markers enhances prediction of 12-year disability in multiple sclerosis. Multiple Sclerosis Journal 2017;23(1):51-61. [DOI] [PubMed] [Google Scholar]

Uher 2017b {published data only}

Uher T, Vaneckova M, Sormani MP, Krasensky J, Sobisek L, Dusankova JB, et al. Identification of multiple sclerosis patients at highest risk of cognitive impairment using an integrated brain magnetic resonance imaging assessment approach. European Journal of Neurology 2017;24(2):292-301. [DOI] [PubMed] [Google Scholar]

Veloso 2014 {published data only}

Veloso M. A web-based decision support tool for prognosis simulation in multiple sclerosis. Multiple Sclerosis and Related Disorders 2014;3(5):575-83. [DOI] [PubMed] [Google Scholar]

Vukusic 2006 {published data only}

Vukusic S, Confavreux C. Pregnancy and multiple sclerosis: the children of PRIMS. Clinical Neurology and Neurosurgery 2006;108(3):266-70. [DOI] [PubMed] [Google Scholar]

Wahid 2019 {published data only}

Wahid K, Charron O, Colen R, Shinohara RT, Kotrotsou A, Papadimitropoulos G, et al. Prediction of disability and treatment response from radiomic features: a machine learning analysis from the combirx multi-center cohort. Multiple Sclerosis Journal 2019;25:112-3. [Google Scholar]

Zephir 2009 {published data only}

Zephir H, Lefranc D, Dubucquoi S, Seze J, Boron L, Prin L, et al. Serum IgG repertoire in clinically isolated syndrome predicts multiple sclerosis. Multiple Sclerosis Journal 2009;15(5):593-600. [DOI] [PubMed] [Google Scholar]

Ziemssen 2019 {published data only}

Ziemssen T, Piani-Meier D, Bennett B, Johnson C, Tinsley K, Trigg A, et al. Validation of the scoring algorithm for a novel integrative MS progression discussion tool. European Journal of Neurology 2019;26:872. [Google Scholar]

References to studies awaiting assessment

Achiron 2007 {published data only}

Achiron A, Gurevich M, Snir Y, Segal E, Mandel M. Zinc-ion binding and cytokine activity regulation pathways predicts outcome in relapsing-remitting multiple sclerosis. Clinical and Experimental Immunology 2007;149(2):235-42. [DOI: 10.1111/j.1365-2249.2007.03405.x] [DOI] [PMC free article] [PubMed] [Google Scholar]

Behling 2019 {published data only}

Behling M, Bryant A, Brecht T, Cerf S, Gliklich R, Su Z. Predicting relapse episodes in patients with multiple sclerosis treated with disease modifying therapies in a large representative real-world cohort in the United States. Pharmacoepidemiology and Drug Safety 2019;28(Suppl 2):130. [Google Scholar]

Castellazzi 2019 {published data only}

Castellazzi G, Martinelli D, Collorone S, Alhamadi A, Debernard L, Melzer TR, et al. A clinical decision system based on resting state fMRI-derived features to predict the conversion of CIS to RRMS. Multiple Sclerosis Journal 2019;25(Suppl 2):686-7. [Google Scholar]
Castellazzi G, Martinelli D, Collorone S, Alhamadi A, Debernard L, Melzer TR, et al. A clinical decision system based on resting state fMRI-derived features to predict the conversion of CIS to RRMS. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/278467: ECTRIMS, 2019.

Chaar 2019 {published data only}

Chaar D, Kakara M, Razmjou S, Bernitsas E. Predicting EDSS in MS through imaging biomarkers using artificial neural networks. Neurology 2019;92(Suppl 15):P5.2-010. [Google Scholar]

Dalla Costa 2014 {published data only}

Dalla Costa G, Moiola L, Leocani L, Furlan R, Filippi M, Comi G, et al. Artificial intelligence techniques in the diagnosis of clinically definite multiple sclerosis. Multiple Sclerosis Journal 2014;20(Suppl 1):170. [Google Scholar]

Ghosh 2009 {published data only}

Ghosh P, Neuhaus A, Daumer M, Basu S. Joint modelling of multivariate longitudinal data for mixed responses and survival in multiple sclerosis. Multiple Sclerosis 2009;15:S157-8. [Google Scholar]

Kister 2015 {published data only}

Kister I, Bacon T, Levinas M, Green R, Cutter G, Chamot E. Stability and prognostic utility of patient-derived MS severity score (P-MSSS) among MS clinic patients. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/115800.
Kister I, Bacon T, Levinas M, Green R, Cutter G, Chamot E. Stability and prognostic utility of patient-derived MS severity score (P-MSSS) among MS clinic patients. Multiple Sclerosis Journal 2015;21(Suppl 11):410-1. [Google Scholar]
Kister I, Cutter G, Salter A, Herbert J, Chamot E. Novel, easy-to-use prediction tool accurately estimates probability of “aggressive MS” at 2-year follow up. Neurology 2015;84(Suppl 14):P3.214. [Google Scholar]

Mallucci 2019 {published data only}

Mallucci G, Trivelli L, Colombo E, Trojano M, Amato MP, Zaffaroni M, et al. The RECIS (risk estimate in CIS) study: a novel model to early predict clinically isolated syndrome evolution. Multiple Sclerosis Journal 2019;25(Suppl 2):405-6. [Google Scholar]

Medin 2016 {published data only}

Medin J, Joyeux A, Braune S, Bergmann A, Rigg J, Wang L. Predicting disease activity for patients with relapsing remitting multiple sclerosis using electronic medical records. In: American Academy of Neurology Annual Meeting; 2016 April 15-21; Vancouver (Canada). 2016. Available at neurotransdata.com/images/publikationen/2016-predicting-disease-activity-aan.pdf.
Medin J, Joyeux A, Braune S, Bergmann A, Rigg J, Wang L. Predicting disease activity for patients with relapsing remitting multiple sclerosis using electronic medical records. Neurology 2016;86(Suppl 16):P1.395. [Google Scholar]

Pareto 2017 {published data only}

Pareto D, Garcia A, Huerga E, Auger C, Sastre-Garriga J, Tintore M, et al. Pattern recognition for neuroimaging toolbox PRoNTo: a pilot study in predicting clinically isolated syndrome conversion. Multiple Sclerosis Journal 2017;23(Suppl 3):231-2. [Google Scholar]

Sharmin 2020 {published data only}

Sharmin S, Bovis F, Malpas C, Horakova D, Havrdova E, Ayuso GI, et al. Predicting long-term sustained disability progression in multiple sclerosis. Neurology 2020;94(Suppl 15):2002. [Google Scholar]
Sharmin S, Bovis F, Sormani MP, Butzkueven H, Kalincik T. Predicting long-term sustained disability progression in multiple sclerosis: application in the clarity trial. Multiple Sclerosis Journal 2020;26(Suppl 3):181. [Google Scholar]
Sharmin S, Malpas C, Horakova D, Havrdova EK, Izquierdo G, Eichau S, et al. Predicting long-term sustained disability progression in multiple sclerosis. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279563.
Sharmin S, Malpas C, Horakova D, Havrdova EK, Izquierdo G, Eichau S, et al. Predicting long-term sustained disability progression in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):119-21. [Google Scholar]
Sharmin S. Follow-up for cochrane review - prognostc predicton models in multiple sclerosis [pers comm]. Email to: On BI 12 April 2021.

Silva 2017 {published data only}

Silva D, Meier DP, Ritter S, Davorka T, Medin J, Lange M, et al. Multiple sclerosis care optimization tool (MS-COT): a clinical application prototype to predict future disease activity. Neurology 2017;88(16):P1.368. [Google Scholar]
Silva D, Meier DP, Ritter S, Tomic D, Medin J, Lange M, et al. Multiple sclerosis care optimization tool (MSCOT): a clinical application prototype to predict future disease activity. In: 69th Congress of the American Academy of Neurology; 2017 April 22-28; Boston (MA). Novartis Pharma AG, 2017. Available at novartis.medicalcongressposters.com/Default.aspx?doc=ac1bf.

Tayyab 2020 {published data only}

Tam R. Follow-up for cochrane review - prognostic prediction models in multiple sclerosis (Tayyab 2020) [pers comm]. Email to: K Reeve 20 July 2021.
Tayyab M, Metz L, Dvorak A, Kolind S, Au S, Carruthers R, et al. Machine learning of deep grey matter volumes on mri for predicting new disease activity after a first clinical demyelinating event. Multiple Sclerosis Journal 2020;26(Suppl 3):116-7. [Google Scholar]

Thiele 2009 {published data only}

Thiele A, Lederer C, Neuhaus A, Strobl R, Fahrmeir L, Koch-Henriksen N, et al. Comparison of model-based and matching-based prediction of the annualised relapse-rate of MS-patients. Multiple Sclerosis 2009;15(9):S163. [Google Scholar]

Tintoré 2015 {published data only}

Tintoré M, Río J, Otero-Romero S, Arrambide G, Tur C, Comabella M, et al. Dynamic model for predicting prognosis in CIS patients. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116690.
Tintoré M, Río J, Otero-Romero S, Arrambide G, Tur C, Comabella M, et al. Dynamic model for predicting prognosis in CIS patients. Multiple Sclerosis Journal 2015;21(Suppl 11):33. [Google Scholar]

Tommasin 2019 {published data only}

Tommasin S, Taloni A, Farrelly FA, Petsas N, Ruggieri S, Gianni C, et al. Evaluation of 5-year disease progression in multiple sclerosis via magnetic-resonance-based deep learning techniques. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279263.
Tommasin S, Taloni A, Farrelly FA, Petsas N, Ruggieri S, Gianni C, et al. Evaluation of 5-year disease progression in multiple sclerosis via magnetic-resonance-based deep learning techniques. Multiple Sclerosis Journal 2019;25(Suppl 2):468. [DOI] [PMC free article] [PubMed] [Google Scholar]

Wahid 2018 {published data only}

Wahid K, Colen R, Kotrotsou A, Lincoln J, Narayana PA, Cofield SS, et al. Radiomic prediction of clinical outcome in multiple sclerosis patients from the combirx cohort. Multiple Sclerosis Journal 2018;24(Suppl 1):71-2. [Google Scholar]

Additional references

Adelman 2013

Adelman G, Rane SG, Villa KF. The cost burden of multiple sclerosis in the United States: a systematic review of the literature. Journal of Medical Economics 2013;16(5):639-47. [DOI: 10.3111/13696998.2013.778268] [DOI] [PubMed] [Google Scholar]

Altman 2000

Altman DG, Royston P. What do we mean by validating a prognostic model? Statistics in Medicine 2000;19(4):453-73. [DOI: 10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.0.CO;2-5] [DOI] [PubMed] [Google Scholar]

Altman 2014

Altman DG. The time has come to register diagnostic and prognostic research. Clinical Chemistry 2014;60(4):580-2. [DOI: 10.1373/clinchem.2013.220335] [DOI] [PubMed] [Google Scholar]

Attfield 2022

Attfield KE, Jensen LT, Kaufmann M, Friese MA, Fugger L. The immunology of multiple sclerosis. Nature Reviews Immunology 2022;22(12):734-50. [DOI: 10.1038/s41577-022-00718-z] [DOI] [PubMed] [Google Scholar]

Bakshi 2005

Bakshi R, Dandamudi VS, Neema M, De C, Bermel RA. Measurement of brain and spinal cord atrophy by magnetic resonance imaging as a tool to monitor multiple sclerosis. Journal of Neuroimaging 2005;15(4 Suppl):30s-45s. [DOI] [PubMed] [Google Scholar]

Belbasis 2015

Belbasis L, Bellou V, Evangelou E, Ioannidis JPA, Tzoulaki I. Environmental risk factors and multiple sclerosis: an umbrella review of systematic reviews and meta-analyses. Lancet Neurology 2015;14(3):263-73. [DOI: 10.1177/1352458517690270] [DOI] [PubMed] [Google Scholar]

Bjornevik 2023

Bjornevik K, Münz C, Cohen JI, Ascherio A. Epstein–Barr virus as a leading cause of multiple sclerosis: mechanisms and implications. Nature Reviews Neurology 2023;19(3):160-71. [DOI: 10.1038/s41582-023-00775-5] [DOI] [PubMed] [Google Scholar]

Bluemke 2020

Bluemke DA, Moy L, Bredella MA, Ertl-Wagner BB, Fowler KJ, Goh VJ, et al. Assessing radiology research on artificial intelligence: a brief guide for authors, reviewers, and readers—from the Radiology editorial board. Radiology 2020;294(3):487–89. [DOI: 10.1148/radiol.2019192515] [DOI] [PubMed] [Google Scholar]

Bossuyt 2015

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 2015;351:h5527. [DOI: 10.1136/bmj.h5527] [DOI] [PMC free article] [PubMed] [Google Scholar]

Boulesteix 2019

Boulesteix A, Janitza S, Hornung R, Probst P, Busen H, Hapfelmeier A. Making complex prediction rules applicable for readers: current practice in random forest literature and recommendations. Biometrical Journal 2019;61(5):1314-28. [DOI: 10.1002/bimj.201700243] [DOI] [PubMed] [Google Scholar]

Bouwmeester 2012

Bouwmeester W, Zuithoff NPA, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, et al. Reporting and methods in clinical prediction research: a systematic review. PLOS Medicine 2012;9(5):e1001221. [DOI: 10.1371/journal.pmed.1001221] [DOI] [PMC free article] [PubMed] [Google Scholar]

Bovis 2019

Bovis F, Carmisciano L, Signori A, Pardini M, Steinerman JR, Li T, et al. Defining responders to therapies by astatistical modeling approach appliedto randomized clinical trial data. BMC Medicine 2019;17:113. [DOI: 10.1186/s12916-019-1345-2] [DOI] [PMC free article] [PubMed] [Google Scholar]

Briggs 2019

Briggs FB, Thompson NR, Conway DS. Prognostic factors of disability in relapsing remitting multiple sclerosis. Multiple Sclerosis and Related Disorders 2019;30:9-16. [DOI: 10.1016/j.msard.2019.01.045] [DOI] [PubMed] [Google Scholar]

Briscoe 2020

Briscoe S, Bethel A, Rogers M. Conduct and reporting of citation searching in Cochrane systematic reviews: a cross-sectional study. Research Synthesis Methods 2020;11(2):169-80. [DOI: 10.1002/jrsm.1355] [DOI] [PMC free article] [PubMed] [Google Scholar]

Brown 2020

Brown FS, Glasmacher SA, Kearns PKA, MacDougall N, Hunt D, Connick P, et al. Systematic review of prediction models in relapsing remitting multiple sclerosis. PLOS One 2020;15(5):e0233575. [DOI: 10.1371/journal.pone.0233575] [DOI] [PMC free article] [PubMed] [Google Scholar]

Chatfield 1995

Chatfield C. Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society. Series A (Statistics in Society) 1995;158(3):419-66. [DOI: ] [Google Scholar]

Chen 2017

Chen JH, Asch SM. Machine learning and prediction in medicine — beyond the peak of inflated expectations. New England Journal of Medicine 2017;376(26):2507-9. [DOI: 10.1056/NEJMp1702071] [DOI] [PMC free article] [PubMed] [Google Scholar]

Cochrane 2021

Cochrane Multiple Sclerosis and Rare Disease of the CNS. Our reviews. msrdcns.cochrane.org/our-review (accessed 30 October 2021).

Cohen 1988

Cohen J. Statistical Power Analysis for the Behavioral Sciences. Hillsdale (NJ): L. Erlbaum Associates, 1988. [Google Scholar]

Collins 2015

Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Journal of Clinical Epidemiology 2015;68(2):112-21. [DOI: 10.1016/j.jclinepi.2014.11.010] [DOI] [PubMed] [Google Scholar]

Concato 1993

Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models. Annals of Internal Medicine 1993;118(3):201-10. [DOI: 10.7326/0003-4819-118-3-199302010-00009] [DOI] [PubMed] [Google Scholar]

Correale 2012

Correale J, Ysrraelit MC, Fiol MP. Benign multiple sclerosis: does it exist? Current Neurology and Neuroscience Reports 2012;12(5):601-9. [DOI: 10.1007/s11910-012-0292-5] [DOI] [PubMed] [Google Scholar]

Cree 2016

Cree BAC, Gourraud P-A, Oksenberg JR, Bevan C, Crabtree-Hartman E, Gelfand JM, et al. Long-term evolution of multiple sclerosis disability in the treatment era. Annals of Neurology 2016;80(4):499-510. [DOI: 10.1002/ana.24747] [DOI] [PMC free article] [PubMed] [Google Scholar]

Cree 2019

Cree BA, Hollenbach JA, Bove R, Kirkish G, Sacco S, Caverzasi E, et al. Silent progression in disease activity-free relapsing multiple sclerosis. Annals of Neurology 2019;85(5):653-66. [DOI: 10.1002/ana.25463] [DOI] [PMC free article] [PubMed] [Google Scholar]

Day 2018

Day GS, Rae-Grant A, Armstrong MJ, Pringsheim T, Cofield SS, Marrie RA. Identifying priority outcomes that influence selection of disease-modifying therapies in MS. Neurology Clinical Practice 2018;8(3):179-85. [DOI: 10.1212/CPJ.0000000000000449] [DOI] [PMC free article] [PubMed] [Google Scholar]

Debray 2017

Debray TP, Damen JA, Snell KI, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ 2017;356:i6460. [DOI: 10.1136/bmj.i6460] [DOI] [PubMed] [Google Scholar]

Debray 2019

Debray TP, Damen JA, Riley RR, Snell K, Reitsma JB, Hooft L, et al. A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes. Statistical Methods in Medical Research 2019;28(9):2768-86. [DOI: 10.1177/0962280218785504] [DOI] [PMC free article] [PubMed] [Google Scholar]

Derfuss 2012

Derfuss T. Personalized medicine in multiple sclerosis: hope or reality? BMC Medicine 2012;10:116. [DOI: 10.1186/1741-7015-10-116] [DOI] [PMC free article] [PubMed] [Google Scholar]

Dhiman 2021

Dhiman P, Ma J, Navarro CA, Speich B, Bullock G, Damen JAA, et al. Reporting of prognostic clinical prediction models based on machine learning methods in oncology needs to be improved. Journal of Clinical Epidemiology 2021;138:60-72. [DOI: 10.1016/j.jclinepi.2021.06.024] [DOI] [PMC free article] [PubMed] [Google Scholar]

Diamond 1989

Diamond GA. Future imperfect: the limitations of clinical prediction models and the limits of clinical prediction. Journal of the American College of Cardiology 1989;14(3):A12-22. [DOI: 10.1016/0735-1097(89)90157-5] [DOI] [PubMed] [Google Scholar]

Diaz 2019

Diaz C, Zarco LA, Rivera DM. Highly active multiple sclerosis: an update. Multiple Sclerosis and Related Disorders 2019;30:215-24. [DOI: 10.1016/j.msard.2019.01.039] [DOI] [PubMed] [Google Scholar]

Ferrazzano 2020

Ferrazzano G, Crisafulli SG, Baione V, Tartaglia M, Cortese A, Frontoni M, et al. Early diagnosis of secondary progressive multiple sclerosis: focus on fluid and neurophysiological biomarkers. Journal of Neurology 2021;268(10):3626-45. [DOI: 10.1007/s00415-020-09964-4] [DOI] [PubMed] [Google Scholar]

Foroutan 2020

Foroutan F, Guyatt G, Zuk V, Vandvik PO, Alba AC, Mustafa R, et al. GRADE Guidelines 28: Use of GRADE for the assessment of evidence about prognostic factors: rating certainty in identification of groups of patients with different absolute risks. Journal of Clinical Epidemiology 2020;121:62-70. [DOI: 10.1016/j.jclinepi.2019.12.023] [DOI] [PubMed] [Google Scholar]

Freedman 2016

Freedman MS, Rush CA. Severe, highly active, or aggressive multiple sclerosis. Continuum 2016;22(3):761-84. [DOI: 10.1212/CON.0000000000000331] [DOI] [PubMed] [Google Scholar]

Gafson 2017

Gafson A, Craner MJ, Matthews PM. Personalised medicine for multiple sclerosis care. Multiple Sclerosis Journal 2017;23(3):362-9. [DOI: 10.1177/1352458516672017] [DOI] [PubMed] [Google Scholar]

Gauthier 2007

Gauthier SA, Mandel M, Guttmann CRG, Glanz BI, Khoury SJ, Betensky RA. Predicting short-term disability in multiple sclerosis. Neurology 2007;68(24):2059-65. [DOI: 10.1212/01.wnl.0000264890.97479.b1.] [DOI] [PubMed] [Google Scholar]

Ge 2006

Ge Y. Multiple sclerosis: the role of MR imaging. American Journal of Neuroradiology 2006;27(6):1165-76. [PMID: ] [PMC free article] [PubMed] [Google Scholar]

Geersing 2012

Geersing G-J, Bouwmeester W, Zuithoff P, Spijker R, Leeflang M, Moons K. Search filters for finding prognostic and diagnostic prediction studies in MEDLINE to enhance systematic reviews. PLOS One 2012;7(2):e32844. [DOI: 10.1371/journal.pone.0032844] [DOI] [PMC free article] [PubMed] [Google Scholar]

Hanley 1982

Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143(1):29-36. [DOI: ] [DOI] [PubMed] [Google Scholar]

Harrell 1996

Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 1996;15(4):361-87. [DOI: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4] [DOI] [PubMed] [Google Scholar]

Harrell 2001

Harrell F. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York (NY): Springer-Verlag, 2001. [Google Scholar]

Hastie 2009

Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. 2nd edition. New York (NY): Springer, 2009. [Google Scholar]

Havas 2020

Havas J, Leray E, Rollot F, Casey R, Michel L, Lejeune F, et al. Predictive medicine in multiple sclerosis: a systematic review. Multiple Sclerosis and Related Disorders 2020;40:101928. [DOI: 10.1016/j.msard.2020.101928] [DOI] [PubMed] [Google Scholar]

Hemmer 2021

Hemmer B, et al. Diagnosis and therapy of multiple sclerosis, neuromyelitis optica spectrum diseases and MOG-IgG-associated diseases, S2k Guideline [Diagnose und Therapie der Multiplen Sklerose, Neuromyelitis-optica-Spektrum-Erkrankungen und MOG-IgG-assoziierten Erkrankungen, S2k-Leitlinie]. Deutsche Gesellschaft für Neurologie (Hrsg.), Leitlinien für Diagnostik und Therapie in der Neurologie. (www.dgn.org/leitlinien) 2021 (accessed 17 June 2021).

Hempel 2017

Hempel S, Graham GD, Fu N, Estrada E, Chen AY, Miake-Lye I, et al. A systematic review of modifiable risk factors in the progression of multiple sclerosis. Multiple Sclerosis Journal 2017;23(4):525-33. [DOI: 10.1177/1352458517690270] [DOI] [PubMed] [Google Scholar]

Hernández 2004

Hernández AV, Steyerberg EW, Habbema JDF. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. Journal of Clinical Epidemiology 2004;57(5):454-60. [DOI: 10.1016/j.jclinepi.2003.09.014] [DOI] [PubMed] [Google Scholar]

Hohlfeld 2016a

Hohlfeld R, Dornmair K, Meinl E, Wekerle H. The search for the target antigens of multiple sclerosis, part 1: autoreactive CD4+ T lymphocytes as pathogenic effectors and therapeutic targets. Lancet Neurology 2016;15(2):198-209. [DOI: 10.1016/S1474-4422(15)00334-8] [DOI] [PubMed] [Google Scholar]

Hohlfeld 2016b

Hohlfeld R, Dornmair K, Meinl E, Wekerle H. The search for the target antigens of multiple sclerosis, part 2: CD8+ T cells, B cells, and antibodies in the focus of reverse-translational research. Lancet Neurology 2016;15(3):317-31. [DOI: 10.1016/S1474-4422(15)00313-0] [DOI] [PubMed] [Google Scholar]

Iorio 2015

Iorio A, Spencer FA, Falavigna M, Alba C, Lang E, Burnand B, et al. Use of GRADE for assessment of evidence about prognosis: rating confidence in estimates of event rates in broad categories of patients. BMJ 2015;350:h870. [DOI: 10.1136/bmj.h870] [DOI] [PubMed] [Google Scholar]

Jarman 2010

Jarman B, Pieter D, Veen AA, Kool RB, Aylin P, Bottle A, et al. The hospital standardised mortality ratio: a powerful tool for Dutch hospitals to assess their quality of care? BMJ Quality & Safety 2010;19(1):9-13. [DOI: 10.1136/qshc.2009.032953] [DOI] [PMC free article] [PubMed] [Google Scholar]

Justice 1999

Justice AC. Assessing the generalizability of prognostic information. Annals of Internal Medicine 1999;130(6):515-24. [DOI: 10.7326/0003-4819-130-6-199903160-00016] [DOI] [PubMed] [Google Scholar]

Kalincik 2017

Kalincik T, Manouchehrinia A, Sobisek L, Jokubaitis V, Spelman T, Horakova D, et al. Towards personalized therapy for multiple sclerosis: prediction of individual treatment response. Brain 2017;140(9):2426-43. [DOI: 10.1093/brain/awx185] [DOI] [PubMed] [Google Scholar]

Kalincik 2018

Kalincik T. Reply: towards personalized therapy for multiple sclerosis: limitations of observational data. Brain 2018;141(5):e39. [DOI] [PubMed] [Google Scholar]

Kaufman 2011

Kaufman S, Rosset S, Perlich C. Leakage in data mining: formulation, detection, and avoidance. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 6. 2011:556–63. [DOI: 10.1145/2020408.2020496] [DOI]

Korevaar 2020

Korevaar DA, Salameh J-P, Vali Y, Cohen JF, McInnes MDF, Spijker R, et al. Searching practices and inclusion of unpublished studies in systematic reviews of diagnostic accuracy. Research Synthesis Methods 2020;11(3):343-53. [DOI] [PMC free article] [PubMed] [Google Scholar]

Kreuzberger 2020

Kreuzberger N, Damen JAAG, Trivella M, Estcourt LJ, Aldin A, Umlauff L, et al. Prognostic models for newly-diagnosed chronic lymphocytic leukaemia in adults: a systematic review and meta-analysis. Cochrane Database of Systematic Reviews 2020, Issue 7. Art. No: CD012022. [DOI: 10.1002/14651858.CD012022.pub2] [DOI] [PMC free article] [PubMed] [Google Scholar]

Kurtzke 1977

Kurtzke JF, Beebe GW, Nagler B, Kurland LT, Auth TL. Studies on the natural history of multiple sclerosis--8. Early prognostic features of the later course of the illness. Journal of Chronic Diseases 1977;30(12):819-30. [DOI: 10.1016/0021-9681(77)90010-8] [DOI] [PubMed] [Google Scholar]

Lorscheider 2016

Lorscheider J, Buzzard K, Jokubaitis V, Spelman T, Havrdova E, Horakova D, et al. Defining secondary progressive multiple sclerosis. Brain 2016;139(Pt 9):2395-405. [DOI: 10.1093/brain/aww173] [DOI] [PubMed] [Google Scholar]

Lublin 1996

Lublin FD, Reingold SC. Defining the clinical course of multiple sclerosis: results of an international survey. National Multiple Sclerosis Society (USA) Advisory Committee on Clinical Trials of New Agents in Multiple Sclerosis. Neurology 1996;46(4):907-11. [DOI] [PubMed] [Google Scholar]

Lublin 2014

Lublin FD, Reingold SC, Cohen JA, Cutter GR, Sørensen PS, Thompson AJ, et al. Defining the clinical course of multiple sclerosis. Neurology 2014;83(3):278-86. [DOI: 10.1212/WNL.0000000000000560] [DOI] [PMC free article] [PubMed] [Google Scholar]

Mateen 2020

Mateen BA, Liley J, Denniston AK, Holmes CC, Vollmer SJ. Improving the quality of machine learning in health applications and clinical research. Nature Machine Intelligence 2020;2:554-6. [DOI: 10.1038/s42256-020-00239-1] [DOI] [Google Scholar]

McDonald 2001

McDonald WI, Compston A, Edan G, Goodkin D, Hartung HP, Lublin FD, et al. Recommended diagnostic criteria for multiple sclerosis: guidelines from the International Panel on the diagnosis of multiple sclerosis. Annals of Neurology 2001;50(1):121-7. [DOI: 10.1002/ana.1032] [DOI] [PubMed] [Google Scholar]

Meyer‐Moock 2014

Meyer-Moock S, Feng Y-S, Maeurer M, Dippel F-W, Kohlmann T. Systematic literature review and validity evaluation of the Expanded Disability Status Scale (EDSS) and the Multiple Sclerosis Functional Composite (MSFC) in patients with multiple sclerosis. BMC Neurology 2014;14:58. [DOI: 10.1186/1471-2377-14-58] [DOI] [PMC free article] [PubMed] [Google Scholar]

Miller 2008

Miller A, Avidan N, Tzunz-Henig N, Glass-Marmor L, Lejbkowicz I, Pinter RY, et al. Translation towards personalized medicine in multiple sclerosis. Journal of the Neurological Sciences 2008;274(1):68-75. [DOI: 10.1016/j.jns.2008.07.028] [DOI] [PubMed] [Google Scholar]

Moher 2009

Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLOS Medicine 2009;6(7):e1000097. [DOI: 10.1371/journal.pmed.1000097] [DOI] [PMC free article] [PubMed] [Google Scholar]

Montalban 2018

Montalban X, Gold R, Thompson AJ, Otero-Romero S, Amato MP, Chandraratna D, et al. ECTRIMS/EAN Guideline on the pharmacological treatment of people with multiple sclerosis. Multiple Sclerosis Journal 2018;24(2):25. [DOI: 10.1177/1352458517751049] [DOI] [PubMed] [Google Scholar]

Montavon 2012

Montavon G, Orr G, Müller KR. Neural networks: tricks of the trade. Springer, 2012. [Google Scholar]

Moons 2014

Moons KG, Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLOS Medicine 2014;11(10):e1001744. [DOI: 10.1371/journal.pmed.1001744] [DOI] [PMC free article] [PubMed] [Google Scholar]

Moons 2019

Moons KG, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Annals of Internal Medicine 2019;170(1):W1-33. [DOI: 10.7326/M18-1377] [DOI] [PubMed] [Google Scholar]

Newcombe 2006

Newcombe RG. Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 2: asymptotic methods and evaluation. Statistics in Medicine 2006;25(4):559-73. [DOI: 10.1002/sim.2324] [DOI] [PubMed] [Google Scholar]

Niculescu‐Mizil 2005

Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Proceedings of the 22nd International Conference on Machine Learning. 2005:625-32. [DOI: 10.1145/1102351.1102430] [DOI]

Ontaneda 2019

Ontaneda D, Tallantyre E, Kalincik T, Planchon SM, Evangelou N. Early highly effective versus escalation treatment approaches in relapsing multiple sclerosis. Lancet Neurology 2019;18(10):973-80. [DOI: 10.1016/S1474-4422(19)30151-6] [DOI] [PubMed] [Google Scholar]

Optic Neuritis Study Group 1991

Optic Neuritis Study Group. The clinical profile of optic neuritis. Experience of the optic neuritis treatment trial. Archives of Ophthalmology 1991;109(12):1673-8. [DOI: 10.1001/archopht.1991.01080120057025] [DOI] [PubMed] [Google Scholar]

Ouzzani 2016

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan - a web and mobile app for systematic reviews. Systematic Reviews 2016;5(1):210. [DOI: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Page 2021

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. PLoS Medicine 2021;18(3):e1003583. [DOI: 10.1371/journal.pmed.1003583] [DOI] [PMC free article] [PubMed] [Google Scholar]

Patsopoulos 2019

Patsopoulos NA, Baranzini SE, Santaniello A, Shoostari P, Cotsapas C, Wong G et al. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science 2019;365(6460):eaav7188. [DOI: 10.1126/science.aav7188] [DOI] [PMC free article] [PubMed] [Google Scholar]

Peat 2014

Peat G, Riley RD, Croft P, Morley KI, Kyzas PA, Moons KGM, Group for the PROGRESS. Improving the transparency of prognosis research: the role of reporting, data sharing, registration, and protocols. PLOS Medicine 2014;11(7):e1001671. [DOI: 10.1371/journal.pmed.1001671] [DOI] [PMC free article] [PubMed] [Google Scholar]

Platt 1999

Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers 1999;10(3):61-74. [Google Scholar]

Polman 2005

Polman CH, Reingold SC, Edan G, Filippi M, Hartung H-P, Kappos L, et al. Diagnostic criteria for multiple sclerosis: 2005 revisions to the "McDonald Criteria". Annals of Neurology 2005;58(6):840-6. [DOI: 10.1002/ana.20703] [DOI] [PubMed] [Google Scholar]

Polman 2011

Polman CH, Reingold SC, Banwell B, Clanet M, Cohen JA, Filippi M, et al. Diagnostic criteria for multiple sclerosis: 2010 revisions to the McDonald criteria. Annals of Neurology 2011;69(2):292-302. [DOI: 10.1002/ana.20703] [DOI] [PMC free article] [PubMed] [Google Scholar]

Poser 1983

Poser CM, Paty DW, Scheinberg L, McDonald WI, Davis FA, Ebers GC, et al. New diagnostic criteria for multiple sclerosis: guidelines for research protocols. Annals of Neurology 1983;13(3):227-31. [DOI: 10.1002/ana.410130302] [DOI] [PubMed] [Google Scholar]

Rae‐Grant 2018

Rae-Grant A, Day GS, Marrie RA, Rabinstein A, Cree BA, Gronseth GS, et al. Comprehensive systematic review summary: disease-modifying therapies for adults with multiple sclerosis. Neurology 2018;90(17):789-800. [DOI: 10.1212/WNL.0000000000005345] [DOI] [PubMed] [Google Scholar]

Reich 2018

Reich DS, Lucchinetti CF, Calabresi PA. Multiple sclerosis. New England Journal of Medicine 2018;378(2):169-80. [DOI: 10.1056/NEJMra1401483] [DOI] [PMC free article] [PubMed] [Google Scholar]

Riley 2019

Riley RD, Snell KIE, Ensor J, Burke DL, Harrell Jr FE, Moons KGM, et al. Minimum sample size for developing a multivariable prediction model: part II - binary and time-to-event outcomes. Statistics in Medicine 2019;38(7):1276-96. [DOI: 10.1002/sim.7992] [DOI] [PMC free article] [PubMed] [Google Scholar]

Riley 2020

Riley RD, Ensor J, Snell KIE, Harrell FE Jr, Martin GP, Reitsma JB, et al. Calculating the sample size required for developing a clinical prediction model. BMJ 2020;368:m441. [DOI: 10.1136/bmj.m441] [DOI] [PubMed] [Google Scholar]

Roozenbeek 2009

Roozenbeek B, Maas AIR, Lingsma HF, Butcher I, Lu J, Marmarou A, et al. Baseline characteristics and statistical power in randomized controlled trials: selection, prognostic targeting, or covariate adjustment? Critical Care Medicine 2009;37(10):2683-90. [DOI: 10.1097/ccm.0b013e3181ab85ec] [DOI] [PubMed] [Google Scholar]

Rotstein 2019

Rotstein D, Montalban X. Reaching an evidence-based prognosis for personalized treatment of multiple sclerosis. Nature Reviews Neurology 2019;15(5):287-300. [DOI: 10.1038/s41582-019-0170-8] [DOI] [PubMed] [Google Scholar]

Runmarker 1994

Runmarker B, Andersson C, Odén A, Andersen O. Prediction of outcome in multiple sclerosis based on multivariate models. Journal of Neurology 1994;241(10):597-604. [DOI: 10.1007/BF00920623] [DOI] [PubMed] [Google Scholar]

Río 2009

Río J, Comabella M, Montalban X. Predicting responders to therapies for multiple sclerosis. Nature Reviews Neurology 2009;5(10):553-60. [DOI: 10.1038/nrneurol.2009.139] [DOI] [PubMed] [Google Scholar]

Río 2016

Río J, Ruiz-Peña JL. Short-term suboptimal response criteria for predicting long-term non-response to first-line disease modifying therapies in multiple sclerosis: a systematic review and meta-analysis. Journal of the Neurological Sciences 2016;361:158-67. [DOI: 10.1016/j.jns.2015.12.043] [DOI] [PubMed] [Google Scholar]

Sawcer 2011

Sawcer S. The major cause of multiple sclerosis is environmental: genetics has a minor role--no. Multiple Sclerosis 2011;17(10):1174-5. [DOI: 10.1177/1352458511421106] [DOI] [PubMed] [Google Scholar]

Seccia 2021

Seccia R, Romano S, Salvetti M, Crisanti A, Palagi L, Grassi F. Machine learning use for prognostic purposes in multiple sclerosis. Life 2021;11(2):122. [DOI: 10.3390/life11020122] [DOI] [PMC free article] [PubMed] [Google Scholar]

Sekula 2016

Sekula P, Pressler JB, Sauerbrei W, Goebell PJ, Schmitz-Dräger BJ. Assessment of the extent of unpublished studies in prognostic factor research: a systematic review of p53 immunohistochemistry in bladder cancer as an example. BMJ Open 2016;6(8):e009972. [DOI] [PMC free article] [PubMed] [Google Scholar]

Simera 2008

Simera I, Altman DG, Moher D, Schulz KF, Hoey J. Guidelines for reporting health research: the EQUATOR Network's survey of guideline authors. PLOS Medicine 2008;5(6):e139. [DOI: 10.1371/journal.pmed.0050139] [DOI] [PMC free article] [PubMed]

Snell 2020

Snell KIE, Allotey J, Smuk M, Hooper R, Chan C, Ahmed A, et al. External validation of prognostic models predicting pre-eclampsia: individual participant data meta-analysis. BMC Medicine 2020;18(1):302. [DOI: 10.1186/s12916-020-01766-9] [DOI] [PMC free article] [PubMed] [Google Scholar]

Sormani 2013

Sormani MP, Rio J, Tintorè M, Signori A, Li D, Cornelisse P, et al. Scoring treatment response in patients with relapsing multiple sclerosis. Multiple Sclerosis Journal 2013;19(5):605-12. [DOI: 10.1177/1352458512460605] [DOI] [PubMed] [Google Scholar]

Sormani 2016

Sormani MP, Gasperini C, Romeo M, Rio J, Calabrese M, Cocco E et al. Assessing response to interferon-β in a multicenter dataset of patients with MS. Neurology 2016;87(2):134-40. [DOI: 10.1212/WNL.0000000000002830] [DOI] [PubMed] [Google Scholar]

Sormani 2017

Sormani MP. Prognostic factors versus markers of response to treatment versus surrogate endpoints: three different concepts. Multiple Sclerosis Journal 2017;23(3):378-81. [DOI] [PubMed] [Google Scholar]

Steyerberg 2013

Steyerberg EW, Moons KG, Windt DA, Hayden JA, Perel P, Schroter S, et al. Prognosis research strategy (PROGRESS) 3: prognostic model research. PLOS Medicine 2013;10(2):e1001381. [DOI: 10.1371/journal.pmed.1001381] [DOI] [PMC free article] [PubMed] [Google Scholar]

Steyerberg 2018

Steyerberg EW, Claggett B. Towards personalized therapy for multiple sclerosis: limitations of observational data. Brain 2018;141(5):e38. [DOI] [PubMed] [Google Scholar]

Steyerberg 2019

Steyerberg E. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2nd edition. New York (NY): Springer-Verlag, 2019. [Google Scholar]

Thompson 2000

Thompson AJ, Montalban X, Barkhof F, Brochet B, Filippi M, Miller DH, et al. Diagnostic criteria for primary progressive multiple sclerosis: a position paper. Annals of Neurology 2000;47(6):831-35. [DOI: 10.1002/1531-8249(200006)47:6<831::AID-ANA21>3.0.CO;2-H] [DOI] [PubMed] [Google Scholar]

Thompson 2018a

Thompson AJ, Baranzini SE, Geurts J, Hemmer B, Ciccarelli O. Multiple sclerosis. Lancet 2018;391(10130):1622-36. [DOI: 10.1016/S0140-6736(18)30481-1] [DOI] [PubMed] [Google Scholar]

Thompson 2018b

Thompson AJ, Banwell BL, Barkhof F, Carroll WM, Coetzee T, Comi G, et al. Diagnosis of multiple sclerosis: 2017 revisions of the McDonald criteria. Lancet Neurology 2018;17(2):162-73. [DOI: 10.1016/S1474-4422(17)30470-2] [DOI] [PubMed] [Google Scholar]

van der Ploeg 2014

Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology 2014;14:137. [DOI: 10.1186/1471-2288-14-137] [DOI] [PMC free article] [PubMed] [Google Scholar]

van Munster 2017

Munster CEP, Uitdehaag BMJ. Outcome measures in clinical trials for multiple sclerosis. CNS Drugs 2017;31(3):217-36. [DOI: 10.1007/s40263-017-0412-5] [DOI] [PMC free article] [PubMed] [Google Scholar]

van Smeden 2018

Smeden M. Should a risk prediction model be developed?; 3 August 2018. https://twitter.com/maartenvsmeden/status/1025315100796899328 (accessed 26 November 2021).

von Elm 2007

Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Annals of Internal Medicine 2007;147(8):573-7. [DOI: 10.7326/0003-4819-147-8-200710160-00010] [DOI] [PubMed] [Google Scholar]

Völler 2017

Völler S, Flint RB, Stolk LM, Degraeuwe PLJ, Simons SHP, Pokorna P, et al. Model-based clinical dose optimization for phenobarbital in neonates: an illustration of the importance of data sharing and external validation. European Journal of Pharmaceutical Sciences 2017;109:S90-7. [DOI: 10.1016/j.ejps.2017.05.026] [DOI] [PubMed] [Google Scholar]

Walton 2020

Walton C, Rachel K, Lindsay R, Wendy K, Emmanuelle L, Ruth AM, et al. Rising prevalence of multiple sclerosis worldwide: Insights from the atlas of MS. Multiple Sclerosis 2020;26(14):1816-21. [DOI: 10.1177/1352458520970841] [DOI] [PMC free article] [PubMed] [Google Scholar]

Warnke 2019

Warnke C, Havla J, Kitzrow M, Biesalski A-S, Knauss S. Entzündliche Erkrankungen. In: Sturm D, Biesalski A-S, Höffken O, editors(s). Neurologische Pathophysiologie: Ursachen und Mechanismen neurologischer Erkrankungen. Berlin, Heidelberg: Springer, 2019:51-98. [DOI: 10.1007/978-3-662-56784-5_2] [DOI] [Google Scholar]

Weinshenker 1989a

Weinshenker BG, Bass B, Rice GP, Noseworthy J, Carriere W, Baskerville J, et al. The natural history of multiple sclerosis: a geographically based study. I. Clinical course and disability. Brain 1989;112(1):133-46. [DOI: 10.1093/brain/112.1.133] [DOI] [PubMed] [Google Scholar]

Wiendl 2021

Wiendl H, Gold R, Berger T, Derfuss T, Linker R, Mäurer M, et al. Multiple Sclerosis Therapy Consensus Group (MSTCG): position statement on disease-modifying therapies for multiple sclerosis (white paper). Therapeutic Advances in Neurological Disorders 2021;14:17562864211039648. [DOI: 10.1177/17562864211039648] [DOI] [PMC free article] [PubMed] [Google Scholar]

Wingerchuk 2016

Wingerchuk DM, Weinshenker BG. Disease modifying therapies for relapsing multiple sclerosis. BMJ 2016;354:i3518. [DOI: 10.1136/bmj.i3518] [DOI] [PubMed] [Google Scholar]

Wolff 2019

Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Annals of Internal Medicine 2019;170(1):51-8. [DOI: 10.7326/M18-1376] [DOI] [PubMed] [Google Scholar]

Wynants 2017

Wynants L, Collins GS, Van Calster B. Key steps and common pitfalls in developing and validating risk models. BJOG: An International Journal of Obstetrics and Gynaecology 2017;124(3):423-32. [DOI: 10.1111/1471-0528.14170] [DOI] [PubMed] [Google Scholar]

Wynants 2020

Wynants L, Van Calster B, CollinsG S, Riley R D, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 2020;369:m1328. [DOI: 10.1136/bmj.m1328] [DOI] [PMC free article] [PubMed] [Google Scholar]

Zadrozny 2001

Zadrozny B, Elkan C. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proceedings of the Eighteenth International Conference on Machine Learning. 2001:609-16.

References to other published versions of this review

On Seker 2020

On Seker BI, Reeve K, Havla J, Burns J, Gosteli MA, Lutterotti A, et al. Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis. Cochrane Database of Systematic Reviews 2020, Issue 5. Art. No: CD013606. [DOI: 10.1002/14651858.CD013606] [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CD013606-bib-0001] Abri Aghdam K, Aghajani A, Kanani F, Soltan Sanjari M, Chaibakhsh S, Shirvaniyan F, et al. A novel decision tree approach to predict the probability of conversion to multiple sclerosis in Iranian patients with optic neuritis. Multiple Sclerosis and Related Disorders 2021;47:102658. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0002] Agosta F, Rovaris M, Pagani E, Sormani MP, Comi G, Filippi M. Magnetization transfer MRI metrics predict the accumulation of disability 8 years later in patients with multiple sclerosis. Brain 2006;129(Pt 10):2620-7. [DOI: 10.1093/brain/awl208] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0003] Ahuja Y, Kim N, Liang L, Cai T, Dahal K, Seyok T, et al. Leveraging electronic health records data to predict multiple sclerosis disease activity. Annals of Clinical and Translational Neurology 2021;8(4):800-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0004] Bejarano B, Bianco M, Gonzalez-Moron D, Sepulcre J, Goni J, Arcocha J, et al. Computational classifiers for predicting the short-term course of multiple sclerosis. BMC Neurology 2011;11:67. [DOI: 10.1186/1471-2377-11-67] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0005] Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. MRI-based prediction of conversion from clinically isolated syndrome to clinically definite multiple sclerosis using SVM and lesion geometry. Brain Imaging and Behavior 2019;13(5):1361-74. [DOI: 10.1007/s11682-018-9942-9] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0006] Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. Predicting conversion to clinically definite multiple sclerosis using machine learning on the basis of cerebral grey matter segmentations. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116222.

[CD013606-bib-0007] Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. Predicting conversion to clinically definite multiple sclerosis using machine learning on the basis of cerebral grey matter segmentations. Multiple Sclerosis Journal 2015;23(Suppl 11):498-9. [Google Scholar]

[CD013606-bib-0008] Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. MRI-based prediction of conversion from clinically isolated syndrome to clinically definite multiple sclerosis using SVM and lesion geometry. Brain Imaging and Behavior 2018;13(5):1361-74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0009] Bergamaschi R, Berzuini C, Romani A, Cosi V. Predicting secondary progression in relapsing-remitting multiple sclerosis: a bayesian analysis. Journal of the Neurological Sciences 2001;189(1-2):13-21. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0010] Bergamaschi R, Quaglini S, Trojano M, Amato MP, Tavazzi E, Paolicelli D, et al. Early prediction of the long term evolution of multiple sclerosis: the bayesian risk estimate for multiple sclerosis (BREMS) score. Journal of Neurology, Neurosurgery and Psychiatry 2007;78(7):757-9. [DOI: 10.1136/jnnp.2006.107052] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0011] Bergamaschi R, Montomoli C, Mallucci G, Lugaresi A, Izquierdo G, Grand'Maison F, et al. BREMSO: a simple score to predict early the natural course of multiple sclerosis. European Journal of Neurology 2015;22(6):981-9. [DOI: 10.1111/ene.12696] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0012] Bergamaschi R, Montomoli C, Mallucci G. Bayesian risk estimate for multiple sclerosis at onset (BREMSO): a simple clinical score for the early prediction of multiple sclerosis long-term evolution. In: 29th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2013 October 2-5; Copenhagen (Denmark). ECTRIMS, 2013. Available at onlinelibrary.ectrims-congress.eu/ectrims/2013/copenhagen/34238.

[CD013606-bib-0013] Bergamaschi R, Montomoli C, Mallucci G. Bayesian risk estimate for multiple sclerosis at onset (BREMSO): a simple clinical score for the early prediction of multiple sclerosis long-term evolution. Multiple Sclerosis Journal 2013;19(Suppl 1):338. [Google Scholar]

[CD013606-bib-0014] Borras E, Canto E, Choi M, Maria Villar L, Alvarez-Cermeno JC, Chiva C, et al. Protein-based classifier to predict conversion from clinically isolated syndrome to multiple sclerosis. Molecular and Cellular Proteomics 2016;15(1):318-28. [DOI: 10.1074/mcp.M115.053256] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0015] Comabella M, Borràs E, Cantó E, Choi M, Villar LM, Álvarez-Cermeño JC, et al. Protein-based biomarker predicts conversion from clinically isolated syndrome to multiple sclerosis. Multiple Sclerosis Journal 2015;21(Suppl 11):634. [Google Scholar]

[CD013606-bib-0016] Brichetto G, Monti Bragadin M, Fiorini S, Battaglia MA, Konrad G, Ponzio M, et al. The hidden information in patient-reported outcomes and clinician-assessed outcomes: multiple sclerosis as a proof of concept of a machine learning approach. Journal of the Neurological Sciences 2020;41(2):459-62. [DOI: 10.1007/s10072-019-04093-x] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0017] Tacchino A, Fiorini S, Ponzio M, Barla A, Verri A, Battaglia MA, et al. Multiple sclerosis disease course prediction: a machine learning model based on patient reported and clinician assessed outcomes. In: 7th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2017 October 25-28; Paris (France). ECTRIMS, 2017. Available at onlinelibrary.ectrims-congress.eu/ectrims/2017/ACTRIMS-ECTRIMS2017/202553.

[CD013606-bib-0018] Tacchino A, Fiorini S, Ponzio M, Barla A, Verri A, Battaglia MA, et al. Multiple sclerosis disease course prediction: a machine learning model based on patient reported and clinician assessed outcomes. Multiple Sclerosis Journal 2017;23(Suppl 3):58-9. [Google Scholar]

[CD013606-bib-0019] Calabrese M, Poretto V, Favaretto A, Seppi D, Alessio S, Rinaldi F, et al. The grey matter basis of disability progression in multiple sclerosis. Multiple Sclerosis Journal 2012;18(Suppl 4):121-2. [Google Scholar]

[CD013606-bib-0020] Calabrese M, Romualdi C, Poretto V, Favaretto A, Morra A, Rinaldi F, et al. The changing clinical course of multiple sclerosis: a matter of gray matter. Annals of Neurology 2013;74(1):76-83. [DOI: 10.1002/ana.23882] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0021] De Brouwer E, Becker T, Moreau Y, Havrdova EK, Trojano M, Eichau S, et al. Longitudinal machine learning modeling of MS patient trajectories improves predictions of disability progression. Computer Methods and Programs in Biomedicine 2021;208:106180. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0022] De Brouwer E, Peeters L, Becker T, Altintas A, Soysal A, Van Wijmeersch B, et al. Introducing machine learning for full MS patient trajectories improves predictions for disability score progression. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279466.

[CD013606-bib-0023] De Brouwer E, Peeters L, Becker T, Altintas A, Soysal A, Van Wijmeersch B, et al. Introducing machine learning for full MS patient trajectories improves predictions for disability score progression. Multiple Sclerosis Journal 2019;25(Suppl 2):63-5. [Google Scholar]

[CD013606-bib-0024] Groot V, Beckerman H, Uitdehaag BM, Hintzen RQ, Minneboo A, Heymans MW, et al. Physical and cognitive functioning after 3 years can be predicted using information from the diagnostic process in recently diagnosed multiple sclerosis. Archives of Physical Medicine and Rehabilitation 2009;90(9):1478-88. [DOI: 10.1016/j.apmr.2009.03.018] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0025] Gout O, Bouchareine A, Moulignier A, Deschamps R, Papeix C, Gorochov G, et al. Prognostic value of cerebrospinal fluid analysis at the time of a first demyelinating event. Multiple Sclerosis Journal 2011;17(2):164-72. [DOI: 10.1177/1352458510385506] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0026] Gurevich M, Tuller T, Rubinstein U, Or-Bach R, Achiron A. Prediction of acute multiple sclerosis relapses by transcription levels of peripheral blood cells. BMC Medical Genomics 2009;2:46. [DOI: 10.1186/1755-8794-2-46] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0027] Barbour C, Kosa P, Greenwood M, Bielekova B. Constructing a molecular model of disease severity in multiple sclerosis. Neurology 2019;92(Suppl 15):P3.2-006. [Google Scholar]

[CD013606-bib-0028] Barbour C, Kosa P, Varosanec M, Greenwood M, Bielekova B. Molecular models of multiple sclerosis severity identify heterogeneity of pathogenic mechanisms. medRxiv 2020 May 22 [Epub ahead of print]. [DOI: ] [DOI] [PMC free article] [PubMed]

[CD013606-bib-0029] Barbour CR, Kosa P, Greenwood M, Bielekova B. Constructing a molecular model of disease severity in multiple sclerosis. Multiple Sclerosis Journal 2019;25:23. [Google Scholar]

[CD013606-bib-0030] Kosa P, Barbour C, Varosanec M, Wichman A, Sandford M, Greenwood M, et al. Molecular models of multiple sclerosis severity identify heterogeneity of pathogenic mechanisms. Nature Communications 2022;13(1):7670. [DOI: 10.1038/s41467-022-35357-4] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0031] Kuceyeski A, Monohan E, Morris E, Fujimoto K, Vargas W, Gauthier SA. Baseline biomarkers of connectome disruption and atrophy predict future processing speed in early multiple sclerosis. NeuroImage: Clinical 2018;19:417-24. [DOI: 10.1016/j.nicl.2018.05.003] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0032] Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning in secondary progressive multiple sclerosis: an improved predictive model for short-term disability progression. Multiple Sclerosis Journal Experimental Translational and Clinical 2019;5(4):2055217319885983. [DOI: 10.1177/2055217319885983] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0033] Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning outperforms linear regression for predicting disability progression in SPMS. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/228174.

[CD013606-bib-0034] Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning outperforms linear regression for predicting disability progression in SPMS. Multiple Sclerosis Journal 2018;24(Suppl 2):1025. [Google Scholar]

[CD013606-bib-0035] Lejeune F, Chatton A, Laplaud D, Wiertlewski S, Edan G, Le Page E, et al. SMILE: a predictive model for scoring the severity of relapses in multIple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):198. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0036] Lejeune F, Chatton A, Laplaud DA, Le Page E, Wiertlewski S, Edan G, et al. SMILE: a predictive model for scoring the severity of relapses in multIple sclerosis. Journal of Neurology 2021;268(2):669-79. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0037] Lejeune F, Chatton A, Laplaud DA, Wiertlewski S, Edan G, Lepage E, et al. SCOPOUSEP: a predictive model for scoring the severity of relapses in multiple sclerosis. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/229235.

[CD013606-bib-0038] Lejeune F, Chatton A, Laplaud DA, Wiertlewski S, Edan G, Lepage E, et al. SCOPOUSEP: a predictive model for scoring the severity of relapses in multiple sclerosis. Multiple Sclerosis Journal 2018;24(Suppl 2):791-2. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0039] Malpas CB, Manouchehrinia A, Sharmin S, Roos I, Horakova D, Havrdova EK, et al. Aggressive form of multiple sclerosis can be predicted early after disease onset. Multiple Sclerosis Journal 2019;25(Suppl 2):605-7. [Google Scholar]

[CD013606-bib-0040] Malpas CB, Manouchehrinia A, Sharmin S, Roos I, Horakova D, Havrdova EK, et al. Early clinical markers of aggressive multiple sclerosis. Brain 2020;143(5):1400-13. [DOI: 10.1093/brain/awaa081] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0041] Mandrioli J, Sola P, Bedin R, Gambini M, Merelli E. A multifactorial prognostic index in multiple sclerosis. Cerebrospinal fluid IgM oligoclonal bands and clinical features to predict the evolution of the disease. Journal of Neurology 2008;255(7):1023-31. [DOI: 10.1007/s00415-008-0827-5] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0042] Manouchehrinia A, Zhu F, Piani-Meier D, Lange M, Silva DG, Carruthers R, et al. Predicting risk of secondary progression in multiple sclerosis: a nomogram. Multiple Sclerosis Journal 2019;25(8):1102-12. [DOI: 10.1177/1352458518783667] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0043] Margaritella N, Mendozzi L, Garegnani M, Colicino E, Gilardi E, Deleonardis L, et al. Sensory evoked potentials to predict short-term progression of disability in multiple sclerosis. Journal of the Neurological Sciences 2012;33(4):887-92. [DOI: 10.1007/s10072-011-0862-3] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0044] Martinelli V, Dalla Costa G, Messina MJ, Di Maggio G, Moiola L, Rodegher M, et al. Use of multiple biomarkers to improve the prediction of multiple sclerosis in patients with clinically isolated syndromes. Journal of the Neurological Sciences 2015;23(Suppl 11):370-1. [Google Scholar]

[CD013606-bib-0045] Martinelli V, Dalla Costa G, Messina MJ, Di Maggio G, Sangalli F, Moiola L, et al. Multiple biomarkers improve the prediction of multiple sclerosis in clinically isolated syndromes. Acta Neurologica Scandinavica 2017;136(5):454-61. [DOI: 10.1111/ane.12761] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0046] Misicka E, Sept C, Briggs FBS. Predicting onset of secondary-progressive multiple sclerosis using genetic and non-genetic factors. Journal of Neurology 2020;267(8):2328-39. [DOI: 10.1007/s00415-020-09850-z] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0047] Montolio A, Martin-Gallego A, Cegonino J, Orduna E, Vilades E, Garcia-Martin E, et al. Machine learning in diagnosis and disability prediction of multiple sclerosis using optical coherence tomography. Computers in Biology and Medicine 2021;133:104416. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0048] Olesen MN, Soelberg K, Debrabant B, Nilsson AC, Lillevang ST, Grauslund J, et al. Cerebrospinal fluid biomarkers for predicting development of multiple sclerosis in acute optic neuritis: a population-based prospective cohort study. Journal of Neuroinflammation 2019;16(1):59. [DOI: 10.1186/s12974-019-1440-5] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0049] Oprea S, Văleanu A, Negreș S. The development and validation of a disability and outcome prediction algorithm in multiple sclerosis patients. Farmacia 2020;68(6):1147-54. [Google Scholar]

[CD013606-bib-0050] Copetti M, Fontana A, Freudensprung U, De Moor C, Hyde R, Bovis F, et al. Predicting MS disease progression remains a significant challenge: results from advanced statistical models of RCT placebo arms. In: 7th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2017 October 25-28; Paris (France). ECTRIMS, 2017. Available at onlinelibrary.ectrims-congress.eu/ectrims/2017/ACTRIMS-ECTRIMS2017/199979.

[CD013606-bib-0051] Copetti M, Fontana A, Freudensprung U, De Moor C, Hyde R, Bovis F, et al. Predicting MS disease progression remains a significant challenge: results from advanced statistical models of RCT placebo arms. Multiple Sclerosis Journal 2017;23(Suppl 3):113. [Google Scholar]

[CD013606-bib-0052] Pellegrini F, Copetti M, Sormani M P, Bovis F, Moor C, Debray TP, et al. Predicting disability progression in multiple sclerosis: insights from advanced statistical modeling. Multiple Sclerosis Journal 2019;26(14):1828-36. [DOI: 10.1177/1352458519887343] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0053] Pinto MF, Oliveira H, Batista S, Cruz L, Pinto M, Correia I, et al. Prediction of disease progression and outcomes in multiple sclerosis with machine learning. Scientific Reports 2020;10(1):21038. [DOI: 10.1038/s41598-020-78212-6] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0054] Pisani AI, Scalfari A, Crescenzo F, Romualdi C, Calabrese M. A novel prognostic score to assess the risk of progression in relapsing-remitting multiple sclerosis patients. European Journal of Neurology 2021;28(8):2503-12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0055] Pisani AI, Scalfari A, Romualdi C, Calabrese M. The progressive multiple sclerosis score: a prognostic assistant tool in multiple sclerosis disease. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279464.

[CD013606-bib-0056] Pisani AI, Scalfari A, Romualdi C, Calabrese M. The progressive multiple sclerosis score: a prognostic assistant tool in multiple sclerosis disease. Multiple Sclerosis Journal 2019;25(Suppl 2):62. [Google Scholar]

[CD013606-bib-0057] Roca P, Attye A, Colas L, Tucholka A, Rubini P, Cackowski S, et al. Artificial intelligence to predict clinical disability in patients with multiple sclerosis using FLAIR MRI. Diagnostic and Interventional Imaging 2020;101(12):795-802. [DOI: 10.1016/j.diii.2020.05.009] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0058] Filippi M, Rovaris MG, Sormani MP, Caputo D, Ghezzi A, Montanari E, et al. Earlier prognostication in primary progressive multiple sclerosis using MRI: a 15-year longitudinal study. European Journal of Neurology 2017;24(Suppl 1):43. [Google Scholar]

[CD013606-bib-0059] Rocca MA, Sormani MP, Rovaris M, Caputo D, Ghezzi A, Montanari E, et al. Anticipation of long-term disability progression in PPMS using MRI: a 15-year longitudinal study. Multiple Sclerosis Journal 2017;23(Suppl 3):292-3. [Google Scholar]

[CD013606-bib-0060] Rocca MA, Sormani MP, Rovaris M, Caputo D, Ghezzi A, Montanari E, et al. Long-term disability progression in primary progressive multiple sclerosis: a 15-year study. Brain 2017;140(11):2814-9. [DOI: 10.1093/brain/awx250] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0061] Rovaris M, Judica E, Gallo A, Benedetti B, Sormani MP, Caputo D, et al. Grey matter damage predicts the evolution of primary progressive multiple sclerosis at 5 years. Brain 2006;129(Pt 10):2628-34. [DOI: ] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0062] Runia TF, Jafari N, Siepman DAM, Nieboer D, Steyerberg E, et al. A clinical prediction model for definite multiple sclerosis in patients with clinically isolated syndrome. Multiple Sclerosis 2014;20(Suppl 1):404. [Google Scholar]

[CD013606-bib-0063] Runia TF. Multiple Sclerosis - Predicting the Next Attack [Dissertation]. Rotterdam (Netherlands): Erasmus University Rotterdam, 2015. [Google Scholar]

[CD013606-bib-0064] Seccia R, Gammelli D, Dominici F, Romano S, Landi AC, Salvetti M, et al. Considering patient clinical history impacts performance of machine learning models in predicting course of multiple sclerosis. PLOS One 2020;15(3):e0230219. [DOI: 10.1371/journal.pone.0230219] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0065] Skoog B, Runmarker B, Oden A, Andersen O. Multiple sclerosis: a method to identify high risk for secondary progression. Neurology 2012;78(Suppl 1):P05.089. [Google Scholar]

[CD013606-bib-0066] Skoog B, Tedeholm H, Runmarker B, Oden A, Andersen O. Continuous prediction of secondary progression in the individual course of multiple sclerosis. Multiple Sclerosis and Related Disorders 2014;3(5):584-92. [DOI: 10.1016/j.msard.2014.04.004] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0067] Tedeholm H, Skoog B, Andersen O. A method to identify the risk of transition to the secondary progressive course in multiple sclerosis patients. Neurology 2013;80(Suppl 7):P04.131. [Google Scholar]

[CD013606-bib-0068] Tedeholm H, Skoog B, Runmarker B, Oden A, Andersen O. A new method to identify multiple sclerosis patients with a high risk for secondary progression. Multiple Sclerosis Journal 2012;18(Suppl 4):91. [Google Scholar]

[CD013606-bib-0069] Skoog B, Link J, Tedeholm H, Longfils M, Nerman O, Fagius J, et al. Short-term prediction of secondary progression in a sliding window: a test of a predicting algorithm in a validation cohort. Multiple Sclerosis Journal - Experimental, Translational and Clinical 2019;5(3):2055217319875466. [DOI: 10.1177/2055217319875466] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0070] Sombekke MH, Arteta D, de Wiel MA, Crusius JB, Tejedor D, Killestein J, et al. Analysis of multiple candidate genes in association with phenotypes of multiple sclerosis. Multiple Sclerosis 2010;16(6):652-9. [DOI: 10.1177/1352458510364633] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0071] Sormani MP, Rovaris M, Comi G, Filippi M. A composite score to predict short-term disease activity in patients with relapsing-remitting MS. Neurology 2007;69(12):1230-5. [DOI: 10.1212/01.wnl.0000276940.90309.15] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0072] Spelman T, Meyniel C, Rojas JI, Lugaresi A, Izquierdo G, Grand'Maison F, et al. Quantifying risk of early relapse in patients with first demyelinating events: prediction in clinical practice. Multiple Sclerosis Journal 2017;23(10):1346-57. [DOI: 10.1177/1352458516679893] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0073] Szilasiová J, Rosenberger J, Mikula P, Vitková M, Fedičová M, Gdovinová Z. Cognitive event-related potentials-the P300 wave is a prognostic factor of long-term disability progression in patients with multiple sclerosis. Journal of Clinical Neurophysiology 2020 Oct 05 [Epub ahead of print]. [DOI: 10.1097/WNP.0000000000000788] [DOI] [PubMed]

[CD013606-bib-0074] Tacchella A, Romano S, Ferraldeschi M, Salvetti M, Zaccaria A, Crisanti A, et al. Collaboration between a human group and artificial intelligence can improve prediction of multiple sclerosis course: a proof-of-principle study. F1000Research 2017;6:2172. [DOI: 10.12688/f1000research.13114.2] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0075] Tacchella A, Romano S, Ferraldeschi M, Salvetti M, Zaccaria A, Crisanti A, et al. Collaboration between a human group and artificial intelligence can improve prediction of multiple sclerosis course: a proof-of-principle study. F1000Research 2018;6:2172. [DOI: 10.12688/f1000research.13114.1] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0076] Tommasin S, Cocozza S, Taloni A, Gianni C, Petsas N, Pontillo G, et al. Machine learning classifier to identify clinical and radiological features relevant to disability progression in multiple sclerosis. Journal of Neurology 2021;268(12):4834-45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0077] Tousignant A, Lemaître P, Precup D, Arnold DL, Arbel T. Prediction of disease progression in multiple sclerosis patients using deep learning analysis of MRI data. Proceedings of Machine Learning Research 2019;102:483-92. [Google Scholar]

[CD013606-bib-0078] Aurenção JCK, Vasconcelos CCF, Thuler LCS, Alvarenga RMP. Validation of a clinical risk score for long-term progression of MS. Multiple Sclerosis Journal 2017;23(Suppl 3):740. [Google Scholar]

[CD013606-bib-0079] Vasconcelos CCF, Aurenção JCK, Alvarenga RMP, Thuler LCS. Long-term MS secondary progression: derivation and validation of a clinical risk score. Clinical Neurology and Neurosurgery 2020;194:105792. [DOI: 10.1016/j.clineuro.2020.105792] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0080] Vasconcelos CCF, Thuler LCS, Calvet Kallenbach Aurenção JCK, Papais-Alvarenga RM. A proposal for a risk score for long-term progression of multiple sclerosis. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/115269.

[CD013606-bib-0081] Vasconcelos CCF, Thuler LCS, Calvet Kallenbach Aurenção JCK, Papais-Alvarenga RM. A proposal for a risk score for long-term progression of multiple sclerosis. Multiple Sclerosis Journal 2015;21(Suppl 11):732. [Google Scholar]

[CD013606-bib-0082] Vukusic S, Hutchinson M, Hours M, Moreau T, Cortinovis-Tourniaire P, Adeleine P, et al. Erratum: Pregnancy and multiple sclerosis (the PRIMS study) - clinical predictors of post-partum relapse. Brain 2004;127(Pt 8):1912. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0083] Vukusic S, Hutchinson M, Hours M, Moreau T, Cortinovis-Tourniaire P, Adeleine P, et al. Pregnancy and multiple sclerosis (the PRIMS study) - clinical predictors of post-partum relapse. Brain 2004;127(Pt 6):1353-60. [DOI: 10.1093/brain/awh152] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0084] Weinshenker BG, Rice GPA, Noseworthy JH, Carriere W, Baskerville J, Ebers GC. The natural history of multiple sclerosis: a geographically based study. 3. Multivariate analysis of predictive factors and models of outcome. Brain 1991;114(Pt 2):1045-56. [DOI: 10.1093/brain/114.2.1045] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0085] Weinshenker BG, Issa M, Baskerville J. Long-term and short-term outcome of multiple sclerosis: a 3-year follow-up study. Archives of Neurology 1996;53(4):353-8. [DOI: 10.1001/archneur.1996.00550040093018] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0086] Ciccarelli O, Kwok PP, Wottschel V, Chard D, Stromillo ML, De Stefano N, et al. Predicting clinical conversion to multiple sclerosis in patients with clinically isolated syndrome using machine learning techniques. Multiple Sclerosis Journal 2012;18(Suppl 4):30-1. [Google Scholar]

[CD013606-bib-0087] Wottschel V, Alexander DC, Kwok PP, Chard DT, Stromillo ML, De Stefano N, et al. Predicting outcome in clinically isolated syndrome using machine learning. NeuroImage: Clinical 2015;7:281-7. [DOI: 10.1016/j.nicl.2014.11.021] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0088] Wottschel V, Ciccarelli O, Chard DT, Miller DH, Alexander DC. Prediction of second neurological attack in patients with clinically isolated syndrome using support vector machines. In: 2013 International Workshop on Pattern Recognition in Neuroimaging. 2013:82-5.

[CD013606-bib-0089] Wottschel V, Chard DT, Enzinger C, Filippi M, Frederiksen JL, Gasperini C, et al. SVM recursive feature elimination analyses of structural brain MRI predicts near-term relapses in patients with clinically isolated syndromes suggestive of multiple sclerosis. NeuroImage: Clinical 2019;24:102011. [DOI: 10.1016/j.nicl.2019.102011] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0090] Ye F, Liang J, Li J, Li H, Sheng W. Development and validation of a five-gene signature to predict relapse-free survival in multiple sclerosis. Frontiers in Neurology 2020;11:579683. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0091] Yoo Y, Tang LW, Brosch T, Li DKB, Metz L, Traboulsee A, et al. Deep Learning and Data Labeling for Medical Applications. Springer, 2016. [Google Scholar]

[CD013606-bib-0092] Yoo Y, Tang LYW, Li DKB, Metz L, Kolind S, Traboulsee AL, et al. Deep learning of brain lesion patterns and user-defined clinical and MRI features for predicting conversion to multiple sclerosis from clinically isolated syndrome. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization 2019;7(3):250-9. [DOI: 10.1080/21681163.2017.1356750] [DOI] [Google Scholar]

[CD013606-bib-0093] Yperman J, Becker T, Valkenborg D, Popescu V, Hellings N, Van Wijmeersch B, et al. Machine learning analysis of motor evoked potential time series to predict disability progression in multiple sclerosis. BMC Neurology 2020;20(1):105. [DOI: 10.1186/s12883-020-01672-w] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0094] Yperman J, Becker T, Valkenborg D, Popescu V, Hellings N, Van Wijmeersch B, et al. Machine learning analysis of motor evoked potential time series to predict disability progression in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):874-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0095] Zakharov AV, Khinivtseva EV, Poverennova IE, Gindullina EA, Vlasov Ia V, Sineok EV. Assessment of the risk of the transition of a monofocal clinically isolated syndrome to clinically definite multiple sclerosis. Zhurnal Nevrologii i Psikhiatrii Imeni S.S. Korsakova 2013;113(2 Pt 2):28-31. [PMID: ] [PubMed] [Google Scholar]

[CD013606-bib-0096] Zhang H, Alberts E, Pongratz V, Mühlau M, Zimmer C, Wiestler B, et al. Predicting conversion from clinically isolated syndrome to multiple sclerosis - an imaging-based machine learning approach. NeuroImage: Clinical 2019;21:101593. [DOI: 10.1016/j.nicl.2018.11.003] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0097] Chitnis T, Zhao Y, Healy BC, Rotstein D, Guttmann CRG, Bakshi R, et al. Predicting clinical course in multiple sclerosis using machine learning. In: 6th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2014 September 10-13; Boston (MA). ECTRIMS, 2014. Available at onlinelibrary.ectrims-congress.eu/ectrims/2014/ACTRIMS-ECTRIMS2014/64470.

[CD013606-bib-0098] Chitnis T, Zhao Y, Healy BC, Rotstein D, Guttmann CRG, Bakshi R, et al. Predicting clinical course in multiple sclerosis using machine learning. Multiple Sclerosis Journal 2014;20(Suppl 1):404. [Google Scholar]

[CD013606-bib-0099] Zhao Y, Chitnis T, Doan T. Ensemble learning for predicting multiple sclerosis disease course. Multiple Sclerosis Journal 2019;25(Suppl 1):160-1. [Google Scholar]

[CD013606-bib-0100] Zhao Y, Healy BC, Rotstein D, Guttmann CR, Bakshi R, Weiner HL, et al. Exploration of machine learning techniques in predicting multiple sclerosis disease course. PLOS One 2017;12(4):e0174866. [DOI: 10.1371/journal.pone.0174866] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0101] Zhao Y, Wang T, Bove R, Cree B, Henry R, Lokhande H, et al. Ensemble learning predicts multiple sclerosis disease course in the SUMMIT study. NPJ Digital Medicine 2020;3:135. [DOI: 10.1038/s41746-020-00361-9] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0102] Achiron A. Measuring disability progression in multiple sclerosis. Journal of Neurology 2006;253(6):vi31-6. [Google Scholar]

[CD013606-bib-0103] Ahlbrecht J, Martino F, Pul R, Skripuletz T, Suhs KW, Schauerte C, et al. Deregulation of microRNA-181c in cerebrospinal fluid of patients with clinically isolated syndrome is associated with early conversion to relapsing-remitting multiple sclerosis. Multiple Sclerosis Journal 2016;22(9):1202-14. [DOI: 10.1177/1352458515613641] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0104] Andersen O, Skoog B, Runmarker B, Lisovskaja V, Nerman O, Tedeholm H. Fifty years untreated prognosis of multiple sclerosis based on an incidence cohort. European Journal of Neurology 2015;22(Suppl 1):25. [Google Scholar]

[CD013606-bib-0105] Azevedo C, Cen S, Zheng L, Jaberzadeh A, Pelletier D. Minimum clinically important difference for brain atrophy measures in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):697-8. [Google Scholar]

[CD013606-bib-0106] Barkhof F, Filippi M, Miller DH, Scheltens P, Campi A, Polman CH, et al. Comparison of MRI criteria at first presentation to predict conversion to clinically definite multiple sclerosis. Brain 1997;120(Pt 11):2059-69. [DOI: 10.1093/brain/120.11.2059] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0107] Brettschneider J, Petzold A, Junker A, Tumani H. Axonal damage markers in the cerebrospinal fluid of patients with clinically isolated syndrome improve predicting conversion to definite multiple sclerosis. Multiple Sclerosis Journal 2006;12(2):143-8. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0108] Bsteh G, Hegen H, Riedl K, Altmann P, Auer M, Berek K, et al. Quantifying the risk of disease reactivation after interferon and glatiramer acetate discontinuation in multiple sclerosis: the VIAADISC score. European Journal of Neurology 2021;28(5):1609-16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0109] Castellaro M, Bertoldo A, Morra A, Monaco S, Calabrese M, Doyle O. Prediction of conversion to secondary progression phase in multiple sclerosis. Multiple Sclerosis Journal 2015;23:198-9. [Google Scholar]

[CD013606-bib-0110] Chalkou K, Steyerberg E, Egger M, Manca A, Pellegrini F, Salanti G. A two-stage prediction model for heterogeneous effects of treatments. Statistics in Medicine 2021;40(20):4362-75. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0111] Costa GD, Di Maggio G, Sangalli F, Moiola L, Colombo B, Comi G, et al. Prognostic factors for multiple sclerosis in patients with spinal isolated syndromes. European Journal of Neurology 2017;24:62. [Google Scholar]

[CD013606-bib-0112] Cutter G, Wolinsky JS, Comi G, Ladkani D, Knappertz V, Vainstein A, et al. Indirect comparison of glatiramer acetate 40mg/mL TIW and 20mg/mL QD dosing regimen effects on relapse rate: results of a predictive statistical model. Multiple Sclerosis Journal 2014;20:112. [Google Scholar]

[CD013606-bib-0113] Damasceno A, Pimentel-Silva LR, Damasceno BP, Cendes F. Cognitive trajectories in relapsing–remitting multiple sclerosis: a longitudinal 6-year study. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/278685. [DOI] [PubMed]

[CD013606-bib-0114] Damasceno A, Pimentel-Silva LR, Damasceno BP, Cendes F. Cognitive trajectories in relapsing–remitting multiple sclerosis: a longitudinal 6-year study. Multiple Sclerosis Journal 2019;26(13):1740-51. [DOI: 10.1177/1352458519878685] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0115] Daumer M, Neuhaus A, Lederer C, Scholz M, Wolinsky JS, Heiderhoff M, et al. Prognosis of the individual course of disease - steps in developing a decision support tool for multiple sclerosis. BMC Medical Informatics and Decision Making 2007;7:11. [DOI: 10.1186/1472-6947-7-11] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0116] Dekker I, Eijlers AJC, Popescu V, Balk LJ, Vrenken H, Wattjes MP, et al. Predicting clinical progression in multiple sclerosis after 6 and 12 years. European Journal of Neurology 2019;26(6):893-902. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0117] Esposito M, De Falco I, De Pietro G. An evolutionary-fuzzy DSS for assessing health status in multiple sclerosis disease. International Journal of Medical Informatics 2011;80(12):e245-54. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0118] Filippi M, Rocca MA, Calabrese M, Sormani MP, Rinaldi F, Perini P, et al. Intracortical lesions and new magnetic resonance imaging diagnostic criteria for multiple sclerosis. Multiple Sclerosis Journal 2010;16:S42. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0119] Filippi M, Preziosa P, Copetti M, Riccitelli G, Horsfield MA, Martinelli V, et al. Gray matter damage predicts the accumulation of disability 13 years later in MS. Neurology 2013;81(20):1759-67. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0120] Fuchs TA, Dwyer MG, Jakimovski D, Bergsland N, Ramasamy DP, Weinstock-Guttman B, et al. Quantifying disease pathology and predicting disease progression in multiple sclerosis with only clinical routine T2-FLAIR MRI. NeuroImage: Clinical 2021;31:102705. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0121] Gasperini C, Prosperini L, Rovira A, Tintore M, Sastre-Garriga J, Tortorella C, et al. Scoring the 10-year risk of ambulatory disability in multiple sclerosis: the RoAD score. European Journal of Neurology 2021;28(8):2533-42. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0122] Gasperini C, Prosperini L, Tortorella C, Haggiag S, Ruggieri S, Mancinelli CR, et al. Scoring the 10-year risk of ambulatory disability in DMD treated multiple sclerosis patients: the RoAD score. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/231905.

[CD013606-bib-0123] Gasperini C, Prosperini L, Tortorella C, Haggiag S, Ruggieri S, Mancinelli CR, et al. Scoring the 10-year risk of ambulatory disability in DMD treated multiple sclerosis patients: the RoAD score. Multiple Sclerosis Journal 2018;24(Suppl 2):58. [Google Scholar]

[CD013606-bib-0124] Gomez-Gonzalez E, Garcia-Sanchez MI, Izquierdo-Ayuso G, Coca De La Torre A, Ramirez-Martinez D, Marco-Ramirez AM, et al. Application of image and signal processing algorithms to oligoclonal IgG bands classification. Multiple Sclerosis Journal 2010;16:341-2. [Google Scholar]

[CD013606-bib-0125] Hakansson I, Tisell A, Cassel P, Blennow K, Zetterberg H, Lundberg P, et al. Neurofilament light chain in cerebrospinal fluid and prediction of disease activity in clinically isolated syndrome and relapsing-remitting multiple sclerosis. European Journal of Neurology 2017;24(5):703-12. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0126] Ho J, Ghosh J, Unnikrishnan K. Risk prediction of a multiple sclerosis diagnosis. In: 2013 IEEE International Conference on Healthcare Informatics. 2013:175-83.

[CD013606-bib-0127] Ignatova V, Todorova L, Haralanov L. Predictors of long term disability progression in patients with relapsing remitting multiple sclerosis. Multiple Sclerosis Journal 2018;24(Suppl 2):788-9. [Google Scholar]

[CD013606-bib-0128] Invernizzi P, Bertolasi L, Bianchi MR, Turatti M, Gajofatto A, Benedetti MD. Prognostic value of multimodal evoked potentials in multiple sclerosis: the EP score. Journal of Neurology 2011;258(11):1933-9. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0129] Jackson KC, Sun K, Barbour C, Hernandez D, Kosa P, Tanigawa M, et al. Genetic model of MS severity predicts future accumulation of disability. Annals of Human Genetics 2020;84(1):1-10. [DOI: 10.1111/ahg.12342] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0130] Kalincik T, Guttmann CR, Krasensky J, Vaneckova M, Lelkova P, Tyblova M, et al. Multiple sclerosis susceptibility loci do not alter clinical and MRI outcomes in clinically isolated syndrome. Genes & Immunity 2013;14(4):244-8. [DOI: 10.1038/gene.2013.17] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0131] Leocani L, Pisa M, Bianco M, Guerrieri S, Di Maggio G, Romeo M, et al. Multimodal EPs predict no evidence of disease activity at two years of first line multiple sclerosis treatment. Neurology 2017;88(Suppl 16):P4.386. [Google Scholar]

[CD013606-bib-0132] Morelli ME, Baldini S, Sartori A, D'Acunto L, Dinoto A, Bosco A, et al. Early putamen hypertrophy and ongoing hippocampus atrophy predict cognitive performance in the first ten years of relapsing-remitting multiple sclerosis. Neurological Sciences 2020;41(10):2893-904. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0133] Palace J, Bregenzer T, Tremlett H, Duddy M, Boggild M, Zhu F, et al. Modelling natural history for the UK multiple sclerosis risk-sharing scheme. Multiple Sclerosis Journal 2013;19(Suppl 1):339. [Google Scholar]

[CD013606-bib-0134] Pappalardo F, Russo G, Pennisi M, Parasiliti Palumbo GA, Sgroi G, Motta S, et al. The potential of computational modeling to predict disease course and treatment response in patients with relapsing multiple sclerosis. Cells 2020;9(3):586. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0135] Petrou P, Yagmour N, Karussis D. Biomarkes for diagnosis and prognosis in multiple sclerosis. Multiple Sclerosis Journal 2018;24:15. [Google Scholar]

[CD013606-bib-0136] Preziosa P, Rocca M, Mesaros S, Copetti M, Petrolini M, Drulovic J, et al. Different MRI measures predict clinical deterioration and cognitive impairment in MS: a 5 year longitudinal study. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116658.

[CD013606-bib-0137] Rajda C, Galla Z, Polyák H, Maróti Z, Babarczy K, Pukoli D, et al. High neurofilament light chain and high quinolinic acid levels in the CSF of patients with multiple sclerosis are independent predictors of active, disabling disease. Multiple Sclerosis Journal 2019;25:856. [Google Scholar]

[CD013606-bib-0138] Rio J, Rovira A, Gasperini C, Tintore M, Prosperini L, Otero-Romero S, et al. Treatment response scoring systems to assess long term prognosis in relapsing-remitting multiple sclerosis patients. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279564.

[CD013606-bib-0139] Rio J, Rovira A, Gasperini C, Tintore M, Prosperini L, Otero-Romero S, et al. Treatment response scoring systems to assess long term prognosis in relapsing-remitting multiple sclerosis patients. Multiple Sclerosis Journal 2019;25:121-2. [Google Scholar]

[CD013606-bib-0140] Rodriguez JD, Perez A, Arteta D, Tejedor D, Lozano JA. Using multidimensional bayesian network classifiers to assist the treatment of multiple sclerosis. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2012;42(6):1705-15. [DOI: 10.1109/TSMCC.2012.2217326] [DOI] [Google Scholar]

[CD013606-bib-0141] Rothman AM, Button J, Balcer LJ, Frohman EM, Frohman TC, Reich DS, et al. Retinal measurements predict 10-year disability in multiple sclerosis. In: 32nd European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2016 September 14-17; London (UK). ECTRIMS, 2016. Available at onlinelibrary.ectrims-congress.eu/ectrims/2016/32nd/146960.

[CD013606-bib-0142] Rothman AM, Button J, Balcer LJ, Frohman EM, Frohman TC, Reich DS, et al. Retinal measurements predict 10-year disability in multiple sclerosis. Multiple Sclerosis Journal 2016;22:20-1. [Google Scholar]

[CD013606-bib-0143] Roura E, Maclair G, Martinez-Lapiscina EH, Andorra M, Villoslada P. Brain complexity and damage in patients with multiple sclerosis using fractal analysis: a new imaging outcome for monitoring MS severity. Multiple Sclerosis Journal 2018;24(2):210. [Google Scholar]

[CD013606-bib-0144] Sbardella E, Tomassini V, Stromillo ML, Filippini N, Battaglini M, Ruggieri S, et al. Pronounced focal and diffuse brain damage predicts short-term disease evolution in patients with clinically isolated syndrome suggestive of multiple sclerosis. Multiple Sclerosis Journal 2011;17(12):1432-40. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0145] Schlaeger R, D'Souza M, Schindler C, Grize L, Dellas S, Radue EW, et al. Prediction of long-term disability in multiple sclerosis. Multiple Sclerosis Journal 2012;18(1):31-8. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0146] Srinivasan J, Gudesblatt M. Multiple sclerosis management: predicting disease trajectory of multiple sclerosis on multi-dimensional data including digital cognitive assessments and patient reported outcomes using machine learning techniques. In: 5th Annual Americas Committee for Treatment and Research in Multiple Sclerosis (ACTRIMS); 2020 February 27-29; West Palm Beach (FL). West Palm Beach (FL): ACTRIMS, 2020.

[CD013606-bib-0147] Tintoré M. Predicting MS extremes: benign and aggressive. Multiple Sclerosis 2015;23:56. [Google Scholar]

[CD013606-bib-0148] Tomassini V, Fanelli F, Prosperini L, Cerqua R, Cavalla P, Pozzilli C. Predicting the profile of increasing disability in multiple sclerosis. Multiple Sclerosis Journal 2019;25(9):1306-15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0149] Tossberg JT, Crooke PS, Henderson MA, Sriram S, Mrelashvili D, Vosslamber S, et al. Using biomarkers to predict progression from clinically isolated syndrome to multiple sclerosis. Journal of Clinical Bioinformatics 2013;3(1):18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0150] Uher T, Vaneckova M, Sobisek L, Tyblova M, Seidl Z, Krasensky J, et al. Combining clinical and magnetic resonance imaging markers enhances prediction of 12-year disability in multiple sclerosis. Multiple Sclerosis Journal 2017;23(1):51-61. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0151] Uher T, Vaneckova M, Sormani MP, Krasensky J, Sobisek L, Dusankova JB, et al. Identification of multiple sclerosis patients at highest risk of cognitive impairment using an integrated brain magnetic resonance imaging assessment approach. European Journal of Neurology 2017;24(2):292-301. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0152] Veloso M. A web-based decision support tool for prognosis simulation in multiple sclerosis. Multiple Sclerosis and Related Disorders 2014;3(5):575-83. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0153] Vukusic S, Confavreux C. Pregnancy and multiple sclerosis: the children of PRIMS. Clinical Neurology and Neurosurgery 2006;108(3):266-70. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0154] Wahid K, Charron O, Colen R, Shinohara RT, Kotrotsou A, Papadimitropoulos G, et al. Prediction of disability and treatment response from radiomic features: a machine learning analysis from the combirx multi-center cohort. Multiple Sclerosis Journal 2019;25:112-3. [Google Scholar]

[CD013606-bib-0155] Zephir H, Lefranc D, Dubucquoi S, Seze J, Boron L, Prin L, et al. Serum IgG repertoire in clinically isolated syndrome predicts multiple sclerosis. Multiple Sclerosis Journal 2009;15(5):593-600. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0156] Ziemssen T, Piani-Meier D, Bennett B, Johnson C, Tinsley K, Trigg A, et al. Validation of the scoring algorithm for a novel integrative MS progression discussion tool. European Journal of Neurology 2019;26:872. [Google Scholar]

[CD013606-bib-0157] Achiron A, Gurevich M, Snir Y, Segal E, Mandel M. Zinc-ion binding and cytokine activity regulation pathways predicts outcome in relapsing-remitting multiple sclerosis. Clinical and Experimental Immunology 2007;149(2):235-42. [DOI: 10.1111/j.1365-2249.2007.03405.x] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0158] Behling M, Bryant A, Brecht T, Cerf S, Gliklich R, Su Z. Predicting relapse episodes in patients with multiple sclerosis treated with disease modifying therapies in a large representative real-world cohort in the United States. Pharmacoepidemiology and Drug Safety 2019;28(Suppl 2):130. [Google Scholar]

[CD013606-bib-0159] Castellazzi G, Martinelli D, Collorone S, Alhamadi A, Debernard L, Melzer TR, et al. A clinical decision system based on resting state fMRI-derived features to predict the conversion of CIS to RRMS. Multiple Sclerosis Journal 2019;25(Suppl 2):686-7. [Google Scholar]

[CD013606-bib-0160] Castellazzi G, Martinelli D, Collorone S, Alhamadi A, Debernard L, Melzer TR, et al. A clinical decision system based on resting state fMRI-derived features to predict the conversion of CIS to RRMS. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/278467: ECTRIMS, 2019.

[CD013606-bib-0161] Chaar D, Kakara M, Razmjou S, Bernitsas E. Predicting EDSS in MS through imaging biomarkers using artificial neural networks. Neurology 2019;92(Suppl 15):P5.2-010. [Google Scholar]

[CD013606-bib-0162] Dalla Costa G, Moiola L, Leocani L, Furlan R, Filippi M, Comi G, et al. Artificial intelligence techniques in the diagnosis of clinically definite multiple sclerosis. Multiple Sclerosis Journal 2014;20(Suppl 1):170. [Google Scholar]

[CD013606-bib-0163] Ghosh P, Neuhaus A, Daumer M, Basu S. Joint modelling of multivariate longitudinal data for mixed responses and survival in multiple sclerosis. Multiple Sclerosis 2009;15:S157-8. [Google Scholar]

[CD013606-bib-0164] Kister I, Bacon T, Levinas M, Green R, Cutter G, Chamot E. Stability and prognostic utility of patient-derived MS severity score (P-MSSS) among MS clinic patients. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/115800.

[CD013606-bib-0165] Kister I, Bacon T, Levinas M, Green R, Cutter G, Chamot E. Stability and prognostic utility of patient-derived MS severity score (P-MSSS) among MS clinic patients. Multiple Sclerosis Journal 2015;21(Suppl 11):410-1. [Google Scholar]

[CD013606-bib-0166] Kister I, Cutter G, Salter A, Herbert J, Chamot E. Novel, easy-to-use prediction tool accurately estimates probability of “aggressive MS” at 2-year follow up. Neurology 2015;84(Suppl 14):P3.214. [Google Scholar]

[CD013606-bib-0167] Mallucci G, Trivelli L, Colombo E, Trojano M, Amato MP, Zaffaroni M, et al. The RECIS (risk estimate in CIS) study: a novel model to early predict clinically isolated syndrome evolution. Multiple Sclerosis Journal 2019;25(Suppl 2):405-6. [Google Scholar]

[CD013606-bib-0168] Medin J, Joyeux A, Braune S, Bergmann A, Rigg J, Wang L. Predicting disease activity for patients with relapsing remitting multiple sclerosis using electronic medical records. In: American Academy of Neurology Annual Meeting; 2016 April 15-21; Vancouver (Canada). 2016. Available at neurotransdata.com/images/publikationen/2016-predicting-disease-activity-aan.pdf.

[CD013606-bib-0169] Medin J, Joyeux A, Braune S, Bergmann A, Rigg J, Wang L. Predicting disease activity for patients with relapsing remitting multiple sclerosis using electronic medical records. Neurology 2016;86(Suppl 16):P1.395. [Google Scholar]

[CD013606-bib-0170] Pareto D, Garcia A, Huerga E, Auger C, Sastre-Garriga J, Tintore M, et al. Pattern recognition for neuroimaging toolbox PRoNTo: a pilot study in predicting clinically isolated syndrome conversion. Multiple Sclerosis Journal 2017;23(Suppl 3):231-2. [Google Scholar]

[CD013606-bib-0171] Sharmin S, Bovis F, Malpas C, Horakova D, Havrdova E, Ayuso GI, et al. Predicting long-term sustained disability progression in multiple sclerosis. Neurology 2020;94(Suppl 15):2002. [Google Scholar]

[CD013606-bib-0172] Sharmin S, Bovis F, Sormani MP, Butzkueven H, Kalincik T. Predicting long-term sustained disability progression in multiple sclerosis: application in the clarity trial. Multiple Sclerosis Journal 2020;26(Suppl 3):181. [Google Scholar]

[CD013606-bib-0173] Sharmin S, Malpas C, Horakova D, Havrdova EK, Izquierdo G, Eichau S, et al. Predicting long-term sustained disability progression in multiple sclerosis. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279563.

[CD013606-bib-0174] Sharmin S, Malpas C, Horakova D, Havrdova EK, Izquierdo G, Eichau S, et al. Predicting long-term sustained disability progression in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):119-21. [Google Scholar]

[CD013606-bib-0175] Sharmin S. Follow-up for cochrane review - prognostc predicton models in multiple sclerosis [pers comm]. Email to: On BI 12 April 2021.

[CD013606-bib-0176] Silva D, Meier DP, Ritter S, Davorka T, Medin J, Lange M, et al. Multiple sclerosis care optimization tool (MS-COT): a clinical application prototype to predict future disease activity. Neurology 2017;88(16):P1.368. [Google Scholar]

[CD013606-bib-0177] Silva D, Meier DP, Ritter S, Tomic D, Medin J, Lange M, et al. Multiple sclerosis care optimization tool (MSCOT): a clinical application prototype to predict future disease activity. In: 69th Congress of the American Academy of Neurology; 2017 April 22-28; Boston (MA). Novartis Pharma AG, 2017. Available at novartis.medicalcongressposters.com/Default.aspx?doc=ac1bf.

[CD013606-bib-0178] Tam R. Follow-up for cochrane review - prognostic prediction models in multiple sclerosis (Tayyab 2020) [pers comm]. Email to: K Reeve 20 July 2021.

[CD013606-bib-0179] Tayyab M, Metz L, Dvorak A, Kolind S, Au S, Carruthers R, et al. Machine learning of deep grey matter volumes on mri for predicting new disease activity after a first clinical demyelinating event. Multiple Sclerosis Journal 2020;26(Suppl 3):116-7. [Google Scholar]

[CD013606-bib-0180] Thiele A, Lederer C, Neuhaus A, Strobl R, Fahrmeir L, Koch-Henriksen N, et al. Comparison of model-based and matching-based prediction of the annualised relapse-rate of MS-patients. Multiple Sclerosis 2009;15(9):S163. [Google Scholar]

[CD013606-bib-0181] Tintoré M, Río J, Otero-Romero S, Arrambide G, Tur C, Comabella M, et al. Dynamic model for predicting prognosis in CIS patients. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116690.

[CD013606-bib-0182] Tintoré M, Río J, Otero-Romero S, Arrambide G, Tur C, Comabella M, et al. Dynamic model for predicting prognosis in CIS patients. Multiple Sclerosis Journal 2015;21(Suppl 11):33. [Google Scholar]

[CD013606-bib-0183] Tommasin S, Taloni A, Farrelly FA, Petsas N, Ruggieri S, Gianni C, et al. Evaluation of 5-year disease progression in multiple sclerosis via magnetic-resonance-based deep learning techniques. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279263.

[CD013606-bib-0184] Tommasin S, Taloni A, Farrelly FA, Petsas N, Ruggieri S, Gianni C, et al. Evaluation of 5-year disease progression in multiple sclerosis via magnetic-resonance-based deep learning techniques. Multiple Sclerosis Journal 2019;25(Suppl 2):468. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0185] Wahid K, Colen R, Kotrotsou A, Lincoln J, Narayana PA, Cofield SS, et al. Radiomic prediction of clinical outcome in multiple sclerosis patients from the combirx cohort. Multiple Sclerosis Journal 2018;24(Suppl 1):71-2. [Google Scholar]

[CD013606-bib-0186] Adelman G, Rane SG, Villa KF. The cost burden of multiple sclerosis in the United States: a systematic review of the literature. Journal of Medical Economics 2013;16(5):639-47. [DOI: 10.3111/13696998.2013.778268] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0187] Altman DG, Royston P. What do we mean by validating a prognostic model? Statistics in Medicine 2000;19(4):453-73. [DOI: 10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.0.CO;2-5] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0188] Altman DG. The time has come to register diagnostic and prognostic research. Clinical Chemistry 2014;60(4):580-2. [DOI: 10.1373/clinchem.2013.220335] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0189] Attfield KE, Jensen LT, Kaufmann M, Friese MA, Fugger L. The immunology of multiple sclerosis. Nature Reviews Immunology 2022;22(12):734-50. [DOI: 10.1038/s41577-022-00718-z] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0190] Bakshi R, Dandamudi VS, Neema M, De C, Bermel RA. Measurement of brain and spinal cord atrophy by magnetic resonance imaging as a tool to monitor multiple sclerosis. Journal of Neuroimaging 2005;15(4 Suppl):30s-45s. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0191] Belbasis L, Bellou V, Evangelou E, Ioannidis JPA, Tzoulaki I. Environmental risk factors and multiple sclerosis: an umbrella review of systematic reviews and meta-analyses. Lancet Neurology 2015;14(3):263-73. [DOI: 10.1177/1352458517690270] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0192] Bjornevik K, Münz C, Cohen JI, Ascherio A. Epstein–Barr virus as a leading cause of multiple sclerosis: mechanisms and implications. Nature Reviews Neurology 2023;19(3):160-71. [DOI: 10.1038/s41582-023-00775-5] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0193] Bluemke DA, Moy L, Bredella MA, Ertl-Wagner BB, Fowler KJ, Goh VJ, et al. Assessing radiology research on artificial intelligence: a brief guide for authors, reviewers, and readers—from the Radiology editorial board. Radiology 2020;294(3):487–89. [DOI: 10.1148/radiol.2019192515] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0194] Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 2015;351:h5527. [DOI: 10.1136/bmj.h5527] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0195] Boulesteix A, Janitza S, Hornung R, Probst P, Busen H, Hapfelmeier A. Making complex prediction rules applicable for readers: current practice in random forest literature and recommendations. Biometrical Journal 2019;61(5):1314-28. [DOI: 10.1002/bimj.201700243] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0196] Bouwmeester W, Zuithoff NPA, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, et al. Reporting and methods in clinical prediction research: a systematic review. PLOS Medicine 2012;9(5):e1001221. [DOI: 10.1371/journal.pmed.1001221] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0197] Bovis F, Carmisciano L, Signori A, Pardini M, Steinerman JR, Li T, et al. Defining responders to therapies by astatistical modeling approach appliedto randomized clinical trial data. BMC Medicine 2019;17:113. [DOI: 10.1186/s12916-019-1345-2] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0198] Briggs FB, Thompson NR, Conway DS. Prognostic factors of disability in relapsing remitting multiple sclerosis. Multiple Sclerosis and Related Disorders 2019;30:9-16. [DOI: 10.1016/j.msard.2019.01.045] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0199] Briscoe S, Bethel A, Rogers M. Conduct and reporting of citation searching in Cochrane systematic reviews: a cross-sectional study. Research Synthesis Methods 2020;11(2):169-80. [DOI: 10.1002/jrsm.1355] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0200] Brown FS, Glasmacher SA, Kearns PKA, MacDougall N, Hunt D, Connick P, et al. Systematic review of prediction models in relapsing remitting multiple sclerosis. PLOS One 2020;15(5):e0233575. [DOI: 10.1371/journal.pone.0233575] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0201] Chatfield C. Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society. Series A (Statistics in Society) 1995;158(3):419-66. [DOI: ] [Google Scholar]

[CD013606-bib-0202] Chen JH, Asch SM. Machine learning and prediction in medicine — beyond the peak of inflated expectations. New England Journal of Medicine 2017;376(26):2507-9. [DOI: 10.1056/NEJMp1702071] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0203] Cochrane Multiple Sclerosis and Rare Disease of the CNS. Our reviews. msrdcns.cochrane.org/our-review (accessed 30 October 2021).

[CD013606-bib-0204] Cohen J. Statistical Power Analysis for the Behavioral Sciences. Hillsdale (NJ): L. Erlbaum Associates, 1988. [Google Scholar]

[CD013606-bib-0205] Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Journal of Clinical Epidemiology 2015;68(2):112-21. [DOI: 10.1016/j.jclinepi.2014.11.010] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0206] Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models. Annals of Internal Medicine 1993;118(3):201-10. [DOI: 10.7326/0003-4819-118-3-199302010-00009] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0207] Correale J, Ysrraelit MC, Fiol MP. Benign multiple sclerosis: does it exist? Current Neurology and Neuroscience Reports 2012;12(5):601-9. [DOI: 10.1007/s11910-012-0292-5] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0208] Cree BAC, Gourraud P-A, Oksenberg JR, Bevan C, Crabtree-Hartman E, Gelfand JM, et al. Long-term evolution of multiple sclerosis disability in the treatment era. Annals of Neurology 2016;80(4):499-510. [DOI: 10.1002/ana.24747] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0209] Cree BA, Hollenbach JA, Bove R, Kirkish G, Sacco S, Caverzasi E, et al. Silent progression in disease activity-free relapsing multiple sclerosis. Annals of Neurology 2019;85(5):653-66. [DOI: 10.1002/ana.25463] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0210] Day GS, Rae-Grant A, Armstrong MJ, Pringsheim T, Cofield SS, Marrie RA. Identifying priority outcomes that influence selection of disease-modifying therapies in MS. Neurology Clinical Practice 2018;8(3):179-85. [DOI: 10.1212/CPJ.0000000000000449] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0211] Debray TP, Damen JA, Snell KI, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ 2017;356:i6460. [DOI: 10.1136/bmj.i6460] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0212] Debray TP, Damen JA, Riley RR, Snell K, Reitsma JB, Hooft L, et al. A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes. Statistical Methods in Medical Research 2019;28(9):2768-86. [DOI: 10.1177/0962280218785504] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0213] Derfuss T. Personalized medicine in multiple sclerosis: hope or reality? BMC Medicine 2012;10:116. [DOI: 10.1186/1741-7015-10-116] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0214] Dhiman P, Ma J, Navarro CA, Speich B, Bullock G, Damen JAA, et al. Reporting of prognostic clinical prediction models based on machine learning methods in oncology needs to be improved. Journal of Clinical Epidemiology 2021;138:60-72. [DOI: 10.1016/j.jclinepi.2021.06.024] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0215] Diamond GA. Future imperfect: the limitations of clinical prediction models and the limits of clinical prediction. Journal of the American College of Cardiology 1989;14(3):A12-22. [DOI: 10.1016/0735-1097(89)90157-5] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0216] Diaz C, Zarco LA, Rivera DM. Highly active multiple sclerosis: an update. Multiple Sclerosis and Related Disorders 2019;30:215-24. [DOI: 10.1016/j.msard.2019.01.039] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0217] Ferrazzano G, Crisafulli SG, Baione V, Tartaglia M, Cortese A, Frontoni M, et al. Early diagnosis of secondary progressive multiple sclerosis: focus on fluid and neurophysiological biomarkers. Journal of Neurology 2021;268(10):3626-45. [DOI: 10.1007/s00415-020-09964-4] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0218] Foroutan F, Guyatt G, Zuk V, Vandvik PO, Alba AC, Mustafa R, et al. GRADE Guidelines 28: Use of GRADE for the assessment of evidence about prognostic factors: rating certainty in identification of groups of patients with different absolute risks. Journal of Clinical Epidemiology 2020;121:62-70. [DOI: 10.1016/j.jclinepi.2019.12.023] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0219] Freedman MS, Rush CA. Severe, highly active, or aggressive multiple sclerosis. Continuum 2016;22(3):761-84. [DOI: 10.1212/CON.0000000000000331] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0220] Gafson A, Craner MJ, Matthews PM. Personalised medicine for multiple sclerosis care. Multiple Sclerosis Journal 2017;23(3):362-9. [DOI: 10.1177/1352458516672017] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0221] Gauthier SA, Mandel M, Guttmann CRG, Glanz BI, Khoury SJ, Betensky RA. Predicting short-term disability in multiple sclerosis. Neurology 2007;68(24):2059-65. [DOI: 10.1212/01.wnl.0000264890.97479.b1.] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0222] Ge Y. Multiple sclerosis: the role of MR imaging. American Journal of Neuroradiology 2006;27(6):1165-76. [PMID: ] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0223] Geersing G-J, Bouwmeester W, Zuithoff P, Spijker R, Leeflang M, Moons K. Search filters for finding prognostic and diagnostic prediction studies in MEDLINE to enhance systematic reviews. PLOS One 2012;7(2):e32844. [DOI: 10.1371/journal.pone.0032844] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0224] Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143(1):29-36. [DOI: ] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0225] Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 1996;15(4):361-87. [DOI: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0226] Harrell F. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York (NY): Springer-Verlag, 2001. [Google Scholar]

[CD013606-bib-0227] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. 2nd edition. New York (NY): Springer, 2009. [Google Scholar]

[CD013606-bib-0228] Havas J, Leray E, Rollot F, Casey R, Michel L, Lejeune F, et al. Predictive medicine in multiple sclerosis: a systematic review. Multiple Sclerosis and Related Disorders 2020;40:101928. [DOI: 10.1016/j.msard.2020.101928] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0229] Hemmer B, et al. Diagnosis and therapy of multiple sclerosis, neuromyelitis optica spectrum diseases and MOG-IgG-associated diseases, S2k Guideline [Diagnose und Therapie der Multiplen Sklerose, Neuromyelitis-optica-Spektrum-Erkrankungen und MOG-IgG-assoziierten Erkrankungen, S2k-Leitlinie]. Deutsche Gesellschaft für Neurologie (Hrsg.), Leitlinien für Diagnostik und Therapie in der Neurologie. (www.dgn.org/leitlinien) 2021 (accessed 17 June 2021).

[CD013606-bib-0230] Hempel S, Graham GD, Fu N, Estrada E, Chen AY, Miake-Lye I, et al. A systematic review of modifiable risk factors in the progression of multiple sclerosis. Multiple Sclerosis Journal 2017;23(4):525-33. [DOI: 10.1177/1352458517690270] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0231] Hernández AV, Steyerberg EW, Habbema JDF. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. Journal of Clinical Epidemiology 2004;57(5):454-60. [DOI: 10.1016/j.jclinepi.2003.09.014] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0232] Hohlfeld R, Dornmair K, Meinl E, Wekerle H. The search for the target antigens of multiple sclerosis, part 1: autoreactive CD4+ T lymphocytes as pathogenic effectors and therapeutic targets. Lancet Neurology 2016;15(2):198-209. [DOI: 10.1016/S1474-4422(15)00334-8] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0233] Hohlfeld R, Dornmair K, Meinl E, Wekerle H. The search for the target antigens of multiple sclerosis, part 2: CD8+ T cells, B cells, and antibodies in the focus of reverse-translational research. Lancet Neurology 2016;15(3):317-31. [DOI: 10.1016/S1474-4422(15)00313-0] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0234] Iorio A, Spencer FA, Falavigna M, Alba C, Lang E, Burnand B, et al. Use of GRADE for assessment of evidence about prognosis: rating confidence in estimates of event rates in broad categories of patients. BMJ 2015;350:h870. [DOI: 10.1136/bmj.h870] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0235] Jarman B, Pieter D, Veen AA, Kool RB, Aylin P, Bottle A, et al. The hospital standardised mortality ratio: a powerful tool for Dutch hospitals to assess their quality of care? BMJ Quality & Safety 2010;19(1):9-13. [DOI: 10.1136/qshc.2009.032953] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0236] Justice AC. Assessing the generalizability of prognostic information. Annals of Internal Medicine 1999;130(6):515-24. [DOI: 10.7326/0003-4819-130-6-199903160-00016] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0237] Kalincik T, Manouchehrinia A, Sobisek L, Jokubaitis V, Spelman T, Horakova D, et al. Towards personalized therapy for multiple sclerosis: prediction of individual treatment response. Brain 2017;140(9):2426-43. [DOI: 10.1093/brain/awx185] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0238] Kalincik T. Reply: towards personalized therapy for multiple sclerosis: limitations of observational data. Brain 2018;141(5):e39. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0239] Kaufman S, Rosset S, Perlich C. Leakage in data mining: formulation, detection, and avoidance. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 6. 2011:556–63. [DOI: 10.1145/2020408.2020496] [DOI]

[CD013606-bib-0240] Korevaar DA, Salameh J-P, Vali Y, Cohen JF, McInnes MDF, Spijker R, et al. Searching practices and inclusion of unpublished studies in systematic reviews of diagnostic accuracy. Research Synthesis Methods 2020;11(3):343-53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0241] Kreuzberger N, Damen JAAG, Trivella M, Estcourt LJ, Aldin A, Umlauff L, et al. Prognostic models for newly-diagnosed chronic lymphocytic leukaemia in adults: a systematic review and meta-analysis. Cochrane Database of Systematic Reviews 2020, Issue 7. Art. No: CD012022. [DOI: 10.1002/14651858.CD012022.pub2] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0242] Kurtzke JF, Beebe GW, Nagler B, Kurland LT, Auth TL. Studies on the natural history of multiple sclerosis--8. Early prognostic features of the later course of the illness. Journal of Chronic Diseases 1977;30(12):819-30. [DOI: 10.1016/0021-9681(77)90010-8] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0243] Lorscheider J, Buzzard K, Jokubaitis V, Spelman T, Havrdova E, Horakova D, et al. Defining secondary progressive multiple sclerosis. Brain 2016;139(Pt 9):2395-405. [DOI: 10.1093/brain/aww173] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0244] Lublin FD, Reingold SC. Defining the clinical course of multiple sclerosis: results of an international survey. National Multiple Sclerosis Society (USA) Advisory Committee on Clinical Trials of New Agents in Multiple Sclerosis. Neurology 1996;46(4):907-11. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0245] Lublin FD, Reingold SC, Cohen JA, Cutter GR, Sørensen PS, Thompson AJ, et al. Defining the clinical course of multiple sclerosis. Neurology 2014;83(3):278-86. [DOI: 10.1212/WNL.0000000000000560] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0246] Mateen BA, Liley J, Denniston AK, Holmes CC, Vollmer SJ. Improving the quality of machine learning in health applications and clinical research. Nature Machine Intelligence 2020;2:554-6. [DOI: 10.1038/s42256-020-00239-1] [DOI] [Google Scholar]

[CD013606-bib-0247] McDonald WI, Compston A, Edan G, Goodkin D, Hartung HP, Lublin FD, et al. Recommended diagnostic criteria for multiple sclerosis: guidelines from the International Panel on the diagnosis of multiple sclerosis. Annals of Neurology 2001;50(1):121-7. [DOI: 10.1002/ana.1032] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0248] Meyer-Moock S, Feng Y-S, Maeurer M, Dippel F-W, Kohlmann T. Systematic literature review and validity evaluation of the Expanded Disability Status Scale (EDSS) and the Multiple Sclerosis Functional Composite (MSFC) in patients with multiple sclerosis. BMC Neurology 2014;14:58. [DOI: 10.1186/1471-2377-14-58] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0249] Miller A, Avidan N, Tzunz-Henig N, Glass-Marmor L, Lejbkowicz I, Pinter RY, et al. Translation towards personalized medicine in multiple sclerosis. Journal of the Neurological Sciences 2008;274(1):68-75. [DOI: 10.1016/j.jns.2008.07.028] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0250] Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLOS Medicine 2009;6(7):e1000097. [DOI: 10.1371/journal.pmed.1000097] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0251] Montalban X, Gold R, Thompson AJ, Otero-Romero S, Amato MP, Chandraratna D, et al. ECTRIMS/EAN Guideline on the pharmacological treatment of people with multiple sclerosis. Multiple Sclerosis Journal 2018;24(2):25. [DOI: 10.1177/1352458517751049] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0252] Montavon G, Orr G, Müller KR. Neural networks: tricks of the trade. Springer, 2012. [Google Scholar]

[CD013606-bib-0253] Moons KG, Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLOS Medicine 2014;11(10):e1001744. [DOI: 10.1371/journal.pmed.1001744] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0254] Moons KG, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Annals of Internal Medicine 2019;170(1):W1-33. [DOI: 10.7326/M18-1377] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0255] Newcombe RG. Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 2: asymptotic methods and evaluation. Statistics in Medicine 2006;25(4):559-73. [DOI: 10.1002/sim.2324] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0256] Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Proceedings of the 22nd International Conference on Machine Learning. 2005:625-32. [DOI: 10.1145/1102351.1102430] [DOI]

[CD013606-bib-0257] Ontaneda D, Tallantyre E, Kalincik T, Planchon SM, Evangelou N. Early highly effective versus escalation treatment approaches in relapsing multiple sclerosis. Lancet Neurology 2019;18(10):973-80. [DOI: 10.1016/S1474-4422(19)30151-6] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0258] Optic Neuritis Study Group. The clinical profile of optic neuritis. Experience of the optic neuritis treatment trial. Archives of Ophthalmology 1991;109(12):1673-8. [DOI: 10.1001/archopht.1991.01080120057025] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0259] Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan - a web and mobile app for systematic reviews. Systematic Reviews 2016;5(1):210. [DOI: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0260] Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. PLoS Medicine 2021;18(3):e1003583. [DOI: 10.1371/journal.pmed.1003583] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0261] Patsopoulos NA, Baranzini SE, Santaniello A, Shoostari P, Cotsapas C, Wong G et al. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science 2019;365(6460):eaav7188. [DOI: 10.1126/science.aav7188] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0262] Peat G, Riley RD, Croft P, Morley KI, Kyzas PA, Moons KGM, Group for the PROGRESS. Improving the transparency of prognosis research: the role of reporting, data sharing, registration, and protocols. PLOS Medicine 2014;11(7):e1001671. [DOI: 10.1371/journal.pmed.1001671] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0263] Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers 1999;10(3):61-74. [Google Scholar]

[CD013606-bib-0264] Polman CH, Reingold SC, Edan G, Filippi M, Hartung H-P, Kappos L, et al. Diagnostic criteria for multiple sclerosis: 2005 revisions to the "McDonald Criteria". Annals of Neurology 2005;58(6):840-6. [DOI: 10.1002/ana.20703] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0265] Polman CH, Reingold SC, Banwell B, Clanet M, Cohen JA, Filippi M, et al. Diagnostic criteria for multiple sclerosis: 2010 revisions to the McDonald criteria. Annals of Neurology 2011;69(2):292-302. [DOI: 10.1002/ana.20703] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0266] Poser CM, Paty DW, Scheinberg L, McDonald WI, Davis FA, Ebers GC, et al. New diagnostic criteria for multiple sclerosis: guidelines for research protocols. Annals of Neurology 1983;13(3):227-31. [DOI: 10.1002/ana.410130302] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0267] Rae-Grant A, Day GS, Marrie RA, Rabinstein A, Cree BA, Gronseth GS, et al. Comprehensive systematic review summary: disease-modifying therapies for adults with multiple sclerosis. Neurology 2018;90(17):789-800. [DOI: 10.1212/WNL.0000000000005345] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0268] Reich DS, Lucchinetti CF, Calabresi PA. Multiple sclerosis. New England Journal of Medicine 2018;378(2):169-80. [DOI: 10.1056/NEJMra1401483] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0269] Riley RD, Snell KIE, Ensor J, Burke DL, Harrell Jr FE, Moons KGM, et al. Minimum sample size for developing a multivariable prediction model: part II - binary and time-to-event outcomes. Statistics in Medicine 2019;38(7):1276-96. [DOI: 10.1002/sim.7992] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0270] Riley RD, Ensor J, Snell KIE, Harrell FE Jr, Martin GP, Reitsma JB, et al. Calculating the sample size required for developing a clinical prediction model. BMJ 2020;368:m441. [DOI: 10.1136/bmj.m441] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0271] Roozenbeek B, Maas AIR, Lingsma HF, Butcher I, Lu J, Marmarou A, et al. Baseline characteristics and statistical power in randomized controlled trials: selection, prognostic targeting, or covariate adjustment? Critical Care Medicine 2009;37(10):2683-90. [DOI: 10.1097/ccm.0b013e3181ab85ec] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0272] Rotstein D, Montalban X. Reaching an evidence-based prognosis for personalized treatment of multiple sclerosis. Nature Reviews Neurology 2019;15(5):287-300. [DOI: 10.1038/s41582-019-0170-8] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0273] Runmarker B, Andersson C, Odén A, Andersen O. Prediction of outcome in multiple sclerosis based on multivariate models. Journal of Neurology 1994;241(10):597-604. [DOI: 10.1007/BF00920623] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0274] Río J, Comabella M, Montalban X. Predicting responders to therapies for multiple sclerosis. Nature Reviews Neurology 2009;5(10):553-60. [DOI: 10.1038/nrneurol.2009.139] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0275] Río J, Ruiz-Peña JL. Short-term suboptimal response criteria for predicting long-term non-response to first-line disease modifying therapies in multiple sclerosis: a systematic review and meta-analysis. Journal of the Neurological Sciences 2016;361:158-67. [DOI: 10.1016/j.jns.2015.12.043] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0276] Sawcer S. The major cause of multiple sclerosis is environmental: genetics has a minor role--no. Multiple Sclerosis 2011;17(10):1174-5. [DOI: 10.1177/1352458511421106] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0277] Seccia R, Romano S, Salvetti M, Crisanti A, Palagi L, Grassi F. Machine learning use for prognostic purposes in multiple sclerosis. Life 2021;11(2):122. [DOI: 10.3390/life11020122] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0278] Sekula P, Pressler JB, Sauerbrei W, Goebell PJ, Schmitz-Dräger BJ. Assessment of the extent of unpublished studies in prognostic factor research: a systematic review of p53 immunohistochemistry in bladder cancer as an example. BMJ Open 2016;6(8):e009972. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0279] Simera I, Altman DG, Moher D, Schulz KF, Hoey J. Guidelines for reporting health research: the EQUATOR Network's survey of guideline authors. PLOS Medicine 2008;5(6):e139. [DOI: 10.1371/journal.pmed.0050139] [DOI] [PMC free article] [PubMed]

[CD013606-bib-0280] Snell KIE, Allotey J, Smuk M, Hooper R, Chan C, Ahmed A, et al. External validation of prognostic models predicting pre-eclampsia: individual participant data meta-analysis. BMC Medicine 2020;18(1):302. [DOI: 10.1186/s12916-020-01766-9] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0281] Sormani MP, Rio J, Tintorè M, Signori A, Li D, Cornelisse P, et al. Scoring treatment response in patients with relapsing multiple sclerosis. Multiple Sclerosis Journal 2013;19(5):605-12. [DOI: 10.1177/1352458512460605] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0282] Sormani MP, Gasperini C, Romeo M, Rio J, Calabrese M, Cocco E et al. Assessing response to interferon-β in a multicenter dataset of patients with MS. Neurology 2016;87(2):134-40. [DOI: 10.1212/WNL.0000000000002830] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0283] Sormani MP. Prognostic factors versus markers of response to treatment versus surrogate endpoints: three different concepts. Multiple Sclerosis Journal 2017;23(3):378-81. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0284] Steyerberg EW, Moons KG, Windt DA, Hayden JA, Perel P, Schroter S, et al. Prognosis research strategy (PROGRESS) 3: prognostic model research. PLOS Medicine 2013;10(2):e1001381. [DOI: 10.1371/journal.pmed.1001381] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0285] Steyerberg EW, Claggett B. Towards personalized therapy for multiple sclerosis: limitations of observational data. Brain 2018;141(5):e38. [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0286] Steyerberg E. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2nd edition. New York (NY): Springer-Verlag, 2019. [Google Scholar]

[CD013606-bib-0287] Thompson AJ, Montalban X, Barkhof F, Brochet B, Filippi M, Miller DH, et al. Diagnostic criteria for primary progressive multiple sclerosis: a position paper. Annals of Neurology 2000;47(6):831-35. [DOI: 10.1002/1531-8249(200006)47:6<831::AID-ANA21>3.0.CO;2-H] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0288] Thompson AJ, Baranzini SE, Geurts J, Hemmer B, Ciccarelli O. Multiple sclerosis. Lancet 2018;391(10130):1622-36. [DOI: 10.1016/S0140-6736(18)30481-1] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0289] Thompson AJ, Banwell BL, Barkhof F, Carroll WM, Coetzee T, Comi G, et al. Diagnosis of multiple sclerosis: 2017 revisions of the McDonald criteria. Lancet Neurology 2018;17(2):162-73. [DOI: 10.1016/S1474-4422(17)30470-2] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0290] Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology 2014;14:137. [DOI: 10.1186/1471-2288-14-137] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0291] Munster CEP, Uitdehaag BMJ. Outcome measures in clinical trials for multiple sclerosis. CNS Drugs 2017;31(3):217-36. [DOI: 10.1007/s40263-017-0412-5] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0292] Smeden M. Should a risk prediction model be developed?; 3 August 2018. https://twitter.com/maartenvsmeden/status/1025315100796899328 (accessed 26 November 2021).

[CD013606-bib-0293] Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Annals of Internal Medicine 2007;147(8):573-7. [DOI: 10.7326/0003-4819-147-8-200710160-00010] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0294] Völler S, Flint RB, Stolk LM, Degraeuwe PLJ, Simons SHP, Pokorna P, et al. Model-based clinical dose optimization for phenobarbital in neonates: an illustration of the importance of data sharing and external validation. European Journal of Pharmaceutical Sciences 2017;109:S90-7. [DOI: 10.1016/j.ejps.2017.05.026] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0295] Walton C, Rachel K, Lindsay R, Wendy K, Emmanuelle L, Ruth AM, et al. Rising prevalence of multiple sclerosis worldwide: Insights from the atlas of MS. Multiple Sclerosis 2020;26(14):1816-21. [DOI: 10.1177/1352458520970841] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0296] Warnke C, Havla J, Kitzrow M, Biesalski A-S, Knauss S. Entzündliche Erkrankungen. In: Sturm D, Biesalski A-S, Höffken O, editors(s). Neurologische Pathophysiologie: Ursachen und Mechanismen neurologischer Erkrankungen. Berlin, Heidelberg: Springer, 2019:51-98. [DOI: 10.1007/978-3-662-56784-5_2] [DOI] [Google Scholar]

[CD013606-bib-0297] Weinshenker BG, Bass B, Rice GP, Noseworthy J, Carriere W, Baskerville J, et al. The natural history of multiple sclerosis: a geographically based study. I. Clinical course and disability. Brain 1989;112(1):133-46. [DOI: 10.1093/brain/112.1.133] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0298] Wiendl H, Gold R, Berger T, Derfuss T, Linker R, Mäurer M, et al. Multiple Sclerosis Therapy Consensus Group (MSTCG): position statement on disease-modifying therapies for multiple sclerosis (white paper). Therapeutic Advances in Neurological Disorders 2021;14:17562864211039648. [DOI: 10.1177/17562864211039648] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0299] Wingerchuk DM, Weinshenker BG. Disease modifying therapies for relapsing multiple sclerosis. BMJ 2016;354:i3518. [DOI: 10.1136/bmj.i3518] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0300] Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Annals of Internal Medicine 2019;170(1):51-8. [DOI: 10.7326/M18-1376] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0301] Wynants L, Collins GS, Van Calster B. Key steps and common pitfalls in developing and validating risk models. BJOG: An International Journal of Obstetrics and Gynaecology 2017;124(3):423-32. [DOI: 10.1111/1471-0528.14170] [DOI] [PubMed] [Google Scholar]

[CD013606-bib-0302] Wynants L, Van Calster B, CollinsG S, Riley R D, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 2020;369:m1328. [DOI: 10.1136/bmj.m1328] [DOI] [PMC free article] [PubMed] [Google Scholar]

[CD013606-bib-0303] Zadrozny B, Elkan C. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proceedings of the Eighteenth International Conference on Machine Learning. 2001:609-16.

[CD013606-bib-0304] On Seker BI, Reeve K, Havla J, Burns J, Gosteli MA, Lutterotti A, et al. Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis. Cochrane Database of Systematic Reviews 2020, Issue 5. Art. No: CD013606. [DOI: 10.1002/14651858.CD013606] [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis

Kelly Reeve

Begum Irmak On

Joachim Havla

Jacob Burns

Martina A Gosteli-Peter

Albraa Alabsawi

Zoheir Alayash

Andrea Götschi

Heidi Seibold

Ulrich Mansmann

Ulrike Held

Abstract

Background

Objectives

Search methods

Selection criteria

Data collection and analysis

Main results

Authors' conclusions

Plain language summary

Summary of findings

Summary of findings 1. Summary of findings.

Background

Description of the health condition and context

Description of the prognostic models

Health outcomes

Why it is important to do this review

Objectives

Methods

Criteria for considering studies for this review

Types of studies

Targeted population

Types of prognostic models

Types of outcomes to be predicted

Search methods for identification of studies

Electronic searches

Searching other resources

Data collection

Selection of studies

Details regarding selection of studies

Data extraction and management

Assessment of reporting deficiencies

Assessment of risk of bias in included studies

Measures of association or predictive performance measures to be extracted

Dealing with missing data

Terms used for reporting

Data synthesis

Investigation of sources of heterogeneity between studies

Synthesis

Conclusions and summary of findings

Results

Description of studies

Results of the search

1.

Excluded studies

Included studies

Models with more than one external validation

BREMS score

Manouchehrinia 2019

Characteristics of included models

2.

Data source

Participants

3.

Outcomes

Disability progression

Relapse

Conversion to a more advanced disease subtype

Composite

4.

Predictors

Predictor domains

5.

Other predictors

Number of predictors

Predictor handling

Sample size and missing data

Model development