Abstract
Background
Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system that affects millions of people worldwide. The disease course varies greatly across individuals and many disease‐modifying treatments with different safety and efficacy profiles have been developed recently. Prognostic models evaluated and shown to be valid in different settings have the potential to support people with MS and their physicians during the decision‐making process for treatment or disease/life management, allow stratified and more precise interpretation of interventional trials, and provide insights into disease mechanisms. Many researchers have turned to prognostic models to help predict clinical outcomes in people with MS; however, to our knowledge, no widely accepted prognostic model for MS is being used in clinical practice yet.
Objectives
To identify and summarise multivariable prognostic models, and their validation studies for quantifying the risk of clinical disease progression, worsening, and activity in adults with MS.
Search methods
We searched MEDLINE, Embase, and the Cochrane Database of Systematic Reviews from January 1996 until July 2021. We also screened the reference lists of included studies and relevant reviews, and references citing the included studies.
Selection criteria
We included all statistically developed multivariable prognostic models aiming to predict clinical disease progression, worsening, and activity, as measured by disability, relapse, conversion to definite MS, conversion to progressive MS, or a composite of these in adult individuals with MS. We also included any studies evaluating the performance of (i.e. validating) these models. There were no restrictions based on language, data source, timing of prognostication, or timing of outcome.
Data collection and analysis
Pairs of review authors independently screened titles/abstracts and full texts, extracted data using a piloted form based on the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS), assessed risk of bias using the Prediction Model Risk Of Bias Assessment Tool (PROBAST), and assessed reporting deficiencies based on the checklist items in Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD). The characteristics of the included models and their validations are described narratively. We planned to meta‐analyse the discrimination and calibration of models with at least three external validations outside the model development study but no model met this criterion. We summarised between‐study heterogeneity narratively but again could not perform the planned meta‐regression.
Main results
We included 57 studies, from which we identified 75 model developments, 15 external validations corresponding to only 12 (16%) of the models, and six author‐reported validations. Only two models were externally validated multiple times. None of the identified external validations were performed by researchers independent of those that developed the model. The outcome was related to disease progression in 39 (41%), relapses in 8 (8%), conversion to definite MS in 17 (18%), and conversion to progressive MS in 27 (28%) of the 96 models or validations. The disease and treatment‐related characteristics of included participants, and definitions of considered predictors and outcome, were highly heterogeneous amongst the studies. Based on the publication year, we observed an increase in the percent of participants on treatment, diversification of the diagnostic criteria used, an increase in consideration of biomarkers or treatment as predictors, and increased use of machine learning methods over time.
Usability and reproducibility
All identified models contained at least one predictor requiring the skills of a medical specialist for measurement or assessment. Most of the models (44; 59%) contained predictors that require specialist equipment likely to be absent from primary care or standard hospital settings. Over half (52%) of the developed models were not accompanied by model coefficients, tools, or instructions, which hinders their application, independent validation or reproduction. The data used in model developments were made publicly available or reported to be available on request only in a few studies (two and six, respectively).
Risk of bias
We rated all but one of the model developments or validations as having high overall risk of bias. The main reason for this was the statistical methods used for the development or evaluation of prognostic models; we rated all but two of the included model developments or validations as having high risk of bias in the analysis domain. None of the model developments that were externally validated or these models' external validations had low risk of bias. There were concerns related to applicability of the models to our research question in over one‐third (38%) of the models or their validations.
Reporting deficiencies
Reporting was poor overall and there was no observable increase in the quality of reporting over time. The items that were unclearly reported or not reported at all for most of the included models or validations were related to sample size justification, blinding of outcome assessors, details of the full model or how to obtain predictions from it, amount of missing data, and treatments received by the participants. Reporting of preferred model performance measures of discrimination and calibration was suboptimal.
Authors' conclusions
The current evidence is not sufficient for recommending the use of any of the published prognostic prediction models for people with MS in clinical routine today due to lack of independent external validations. The MS prognostic research community should adhere to the current reporting and methodological guidelines and conduct many more state‐of‐the‐art external validation studies for the existing or newly developed models.
Keywords: Adult, Humans, Disease Progression, Multiple Sclerosis, Prognosis, Reproducibility of Results, Systematic Reviews as Topic
Plain language summary
Which models exist for prediction of future disease outcomes in people with multiple sclerosis?
Why is it important to study multiple sclerosis?
Multiple sclerosis (MS) is a chronic disease of the brain, spine, and nerves. Millions of people worldwide suffer from this disease, but the disease and how it progresses can be very different from person to person. Although MS cannot be cured, different treatments are available that can help reduce symptoms and slow the worsening of the disease. These treatments work differently, with some having more severe side effects than others. Understanding the severity of an individual’s MS is important to patients and medical professionals.
Why are prognostic models important in the context of multiple sclerosis?
Prognostic models help patients and medical professionals understand how sick an individual is and will become. This understanding can support patients during life and treatment choices. Prognostic models can also help medical professionals make decisions about how to best treat an individual, better understand the disease, or to develop treatments. Prognostic models for MS might involve combining a range of different pieces of information about an individual to predict how their MS will continue to develop. Important pieces of information to include in a prognostic model could be, for example, information on personal characteristics (such as age, sex, body mass index), information on their behaviour (such as whether they smoke), and information about their MS (such as how long they have had the disease). Other clinical features or measurements may also be important.
What did we want to find out?
We wanted to search for and find all prognostic models that combine multiple pieces of information to predict how MS will continue to develop and worsen in adults.
What did we do?
We used different techniques to search for all studies that described prognostic models, which combine multiple pieces of information, developed in the context of MS. We were interested in studies showing how these prognostic models were developed, as well as studies evaluating how well they actually work in practice. Once we found all relevant studies, we summarised them and evaluated how well they reported their results and how well they were conducted.
What did we find?
We found 57 studies that described prognostic models combining multiple pieces of information to predict how MS will continue to develop and worsen in adults. These studies described the development of 75 different prognostic models. There were 15 instances in which the performance of specific prognostic models was evaluated.
We found that prognostic models focus on different outcomes; 41% looked at disease progression, 8% at relapses, 18% at moving from a first attack to definite MS, and 28% at moving from the early stages of MS to progressive MS. The prognostic models we found were very different from one another in many ways. The patients they used to develop the models, for example, were very different in terms of treatments. In addition, the pieces of information they used to predict the course of MS were very different from one another. We found that prognostic models have changed over time regarding the diagnosis of MS and increase in use of treatment, the information observed with new techniques, or new modelling approaches. We also found that using these prognostic models requires information about the individual that would require a medical specialist and often specialist equipment, both of which may not be available in many clinics and hospitals.
What are the limitations of the evidence?
We found problems with most studies, meaning that we may not be able to trust their results. Common problems involved data and statistical methods used across studies. Additionally, many of the studies report results that may be very different if the prognostic models are applied to a new set of people with MS. We also found that the studies did a poor job of describing their methods and reporting their findings.
What does this mean?
The studies we found show that the evidence on prognostic models for predicting how MS will continue to develop and worsen in adults is not yet well‐developed. New research is needed that focusses on using methods recommended in guidelines to develop prognostic models and evaluate their performance. This research should also focus on describing their methods and results well, so that other researchers and medical professionals can use them for research and clinical practice.
Summary of findings
Summary of findings 1. Summary of findings.
|
Population: adults with relapsing‐remitting multiple sclerosis Setting: specialty clinical care Model: models with more than one external validation Outcome: conversion to progressive multiple sclerosis Timing: prediction of time to outcome at disease onset | ||||
| Model name | External validations (study, if different) | Number of participants | Performance measure | Overall risk of bias assessment |
| Manouchehrinia 2019 | British Columbia cohort | 3967 | c‐statistic 0.77 (95% CI 0.76 to 0.78) | High due to use of a predictor measured at a time point after time of model use, and lack of calibration assessment |
| ACROSS trial | 175 | c‐statistic 0.77 (95% CI 0.70 to 0.85) | Same as above | |
| FREEDOMS and FREEDOMS II trial extensions | 2355 | c‐statistic 0.87 (95% CI 0.84 to 0.89) | Same as above | |
| Bayesian Risk Estimate for Multiple Sclerosis score | Italian cohort (Bergamaschi 2007) | 535 | Cutoff 95% sensitivity 0.17, specificity 0.99 | High due to lack of discrimination or calibration assessment |
| MSBase registry (Bergamaschi 2015) | 1131 | Cutoff 50% sensitivity 0.35, specificity 0.80 | Same as above | |
ACROSS: A CROSS‐Sectional Long‐term Follow‐up of Fingolimod Phase II Study Patients CI: confidence interval FREEDOMS: FTY720 Research Evaluating Effects of Daily Oral therapy in Multiple Sclerosis
Background
Description of the health condition and context
Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system (CNS) that usually begins in young adulthood and affects 2.8 million people worldwide (Adelman 2013; Thompson 2018a; Walton 2020). The course of MS varies greatly and is characterised by clinical, radiological, genetic, and pathological heterogeneity. The exact aetiology of MS is still unclear, even though there are convincing arguments for an (auto‐)immunopathogenesis (Hohlfeld 2016a; Hohlfeld 2016b), triggered or driven by exposure to environmental risk factors (Attfield 2022). These include the neuropathological findings, the various analogies to the autoimmune animal models, and, above all, the response of MS to various immunosuppressive therapies (Thompson 2018a). Genetic research also supports the (auto‐)immunopathogenesis theory implicating peripheral immune cells and microglia in susceptibility (Attfield 2022; Patsopoulos 2019; Sawcer 2011). Environmental factors such as vitamin D deficiency or Epstein‐Barr virus infection have been shown to have an influence on the development (Attfield 2022; Belbasis 2015; Bjornevik 2023) and course (Hempel 2017) of the disease. Modern imaging techniques, such as diffusion tensor imaging, as well as neuropathological investigations have shown that, in addition to demyelination in MS, significant damage occurs to axons (Thompson 2018a).
The current diagnosis of MS is based on the modified ‘McDonald criteria’ (Thompson 2018b), and further differentiation of disease course into subtypes is described by Lublin 2014. In relapsing MS, the disease may initially present as clinically isolated syndrome (CIS), a first disease attack of at least 24 hours with patient‐reported symptoms reflecting an inflammatory demyelinating event in the CNS without fever or infection. According to current diagnostic criteria, the first attack may already be definite relapsing‐remitting MS (RRMS) if there is temporal and spatial dissemination at the time of initial manifestation, as evidenced by magnetic resonance imaging (MRI), cerebrospinal fluid (CSF) diagnosis, and/or clinical presentation. RRMS is characterised by relapses and periods of remission with stable neurological disability (Thompson 2018b). According to natural history studies, 30% to 50% of untreated RRMS patients convert to secondary progressive MS (SPMS) within 10 to 15 years after disease onset (Weinshenker 1989a). Progressive MS is defined as a steadily increasing neurological disability without unequivocal recovery (Lorscheider 2016). About 15% of people with MS, however, have progressive disease from the start, the primary progressive MS (PPMS) subtype (Reich 2018). This classification was made at a time when few biomarkers were available and is still used in clinical practice, especially for communication with patients and the definition of study cohorts. However, in the current understanding of MS pathophysiology, both peripherally initiated and CNS compartmentalised inflammation processes are assumed to contribute to the disease progression. In addition, signs of neurodegeneration can be detected early in the disease. From a clinical point of view, this means that even people with relapsing MS may have a gradual progression independent of relapse activity in addition to accumulation of residual disability from relapses (relapse‐associated worsening). Similarly, people with progressive MS may continue experiencing relapses (Lublin 2014).
Although MS is still incurable, pharmacological treatment for MS, particularly RRMS, has developed with increasing speed since the introduction of the first interferon‐beta preparation more than 25 years ago. The arsenal of MS therapeutics includes various substances with different mechanisms of action. The main therapeutic goals of treatment are reduction in relapse rate, delaying onset, and slowing or stopping confirmed disability progression (Wingerchuk 2016). Therapeutic choice amongst highly effective therapeutic options has led to the expectation of no evidence of disease activity (NEDA) under immunotherapy. This is defined by the absence of relapses, disability progression, and active MRI lesions (Thompson 2018a). There are two established treatment strategies: the use of mild to moderately effective but safe medications from disease onset with escalation strategies as needed ('stepwise escalation'), or the use of higher efficacy medications from disease onset, which may be associated with higher risk of adverse events ('hit hard and early' concept) (Ontaneda 2019). Overtreatment or undertreatment should be avoided and the risk‐benefit balance be considered; however, refraining from treatment is also an option.
The current guidelines usually classify the available therapies as first‐, second‐, or third‐line according to their efficacy and safety profiles and recommend selection of a therapy based on the patient’s disease activity and preferences, reserving efficacious but high‐risk second‐line medications for highly active disease (Hemmer 2021; Montalban 2018; Rae‐Grant 2018). The definition of highly active disease varies across the literature, however (Diaz 2019; Freedman 2016), and how to define benign MS is unclear (Correale 2012). With its broad spectrum of clinical manifestations and an armamentarium of therapeutic approaches with different risk profiles, MS is a prime example of a disease that requires individualised medicine.
Description of the prognostic models
Many potential prognostic factors have been identified for predicting disease progression, worsening, and activity in people with MS. These include but are not limited to age, sex, body mass index, smoking history, and disease duration (Briggs 2019). Various biomarkers for MS have also been proposed, with those measured by MRI being the most commonly investigated (Rotstein 2019). However, prediction typically requires a combination of prognostic factors (Steyerberg 2013), especially for multifactorial diseases such as MS. Researchers in this clinical field have noted the strong focus on prognostic factors as opposed to prognostic modelling and have expressed the need for models for estimation of ‘individualised’ risk (de Groot 2009; Wottschel 2015).
A prognostic model is an empirical model that combines the effects of two or more predictors in order to estimate the risk of future clinical outcomes in individual patients within a specified length of time (Steyerberg 2013; Steyerberg 2019). As with prognosis research more generally, these models can serve many purposes, including improving the study design and analysis of randomised clinical trials (Hernández 2004; Roozenbeek 2009). For instance, Sormani and colleagues suggest the use of their model for participant selection in MS clinical trials (Sormani 2007). Adjusting for baseline risk in network meta‐analyses (Chalkou 2021) and health service research (Jarman 2010) are other application areas.
Ideally, prognostic models are developed using large high‐quality datasets, with subjects representative of the population to which the model should later be applied. Large samples may generally be required for more complex modelling tasks, such as model development including data‐driven predictor selection from a large set of candidate predictors. Sufficiently large datasets reduce the potential for overfitting and ensure that the overall risk can be precisely estimated (Riley 2019). Outcomes and their timing should be important to people with the health condition of interest and, along with predictors, be well‐defined prior to their assessment. When selecting predictors to consider, basic variables known to be related to prognosis, such as disease duration and sex, should always be included, in addition to novel biomarkers that may provide added value (Steyerberg 2013).
Before a prognostic model is used in practice, it must be appropriately evaluated. This evaluation ideally has two components. One component, discrimination, assesses the success of a prognostic model in ranking those that experience the event versus those that do not. The second component, calibration, assesses the prognostic model’s ability to estimate event probabilities that are close to those actually observed. Good discriminative power is important to all prognostic model applications and may even be sufficient for some applications (Justice 1999), such as patient stratification in randomised controlled trials and adjustment in comparative healthcare research. However, people with MS and the clinicians advising them are interested in the absolute probability of outcomes in these individuals, as opposed to comparing risks with other people, hence model calibration is very important in this setting.
The data used for evaluation determines its usefulness and generalisability, i.e. how the model is expected to perform in new patients (Justice 1999). Internal validation is the evaluation of a model in the sample that it was developed in. If the internal validation is performed directly in the development sample without any resampling techniques (apparent validation), the model accuracy is expected to be overestimated, i.e. overoptimistic (Harrell 2001; Moons 2019; Steyerberg 2019). Resampling techniques, such as cross‐validation and bootstrapping, allow us to assess overfitting and account for overoptimism. However, even with correct internal validation procedures, we only learn about the accuracy of the model as applied to people from an identical underlying population. Therefore, a further prerequisite before use of a prognostic model in practice is external validation, i.e. its evaluation in a group of patients independent of those that were used in the model’s development. Such independence may be based on many qualities, such as time, location, and participant spectrum (patients with different disease severities or belonging to different disease subtypes). In MS, historical transportability is important, for example, because disease severity is likely to have changed over time with changes in diagnostic criteria. It is important to assess whether models developed under older diagnostic criteria are still accurate when applied to patients today. Before any clinical application, a prognostic model needs to have good discrimination and calibration in many different external validations, preferably by researchers independent of those that developed the model.
To our knowledge, no widely accepted prognostic model for MS is being used in clinical practice yet. A systematic review is needed to understand the state of the MS prognostic modelling literature as a whole and whether any models are on the way towards translation into practice. In order to address this goal, the scope of our review will be broad in terms of outcomes, predictors, timing, and setting, as well as the form of MS addressed.
Health outcomes
According to a survey conducted by Day and colleagues, disability progression and relapses are the most important outcomes related to disease course for people with MS and clinical experts alike (Day 2018). Disease progression is characterised by a relapse‐independent accumulation of neurological deficits and usually manifests as a decrease in walking ability that occurs over varying time spans (Warnke 2019). Disease progression is most commonly measured by the ambulation functional score of the Expanded Disability Status Scale (EDSS). Neurological disability has also been operationalised by the Multiple Sclerosis Functional Composite (MSFC). The International Advisory Committee on Clinical Trials of MS has suggested that the term ‘progression’ should only be used for the progressive subtypes of the disease and that relapse‐related increases in disability be referred to as disease 'worsening' (Lublin 2014). Although a more consistent application of the Lublin Criteria and the 'progression independent of relapse activity' and 'relapse‐associated worsening' terminology regarding relapse‐associated worsening and progression independent of relapse activity has become apparent in recent years, the terminology used in the literature may not exactly match these definitions. For the purposes of this review, increase in disability, either dependent or independent of relapses, is relevant, as it is ranked as the highest priority outcome by people with MS (Day 2018).
Relapses, another high‐priority clinical outcome indicative of disease activity, manifest as acute and transient episodes of neurological symptoms. Subacute episodes can lead to different neurological symptoms, which may remit completely in the course of the disease, but may also be accompanied by residual disability. Despite the fact that relapse rate is the primary outcome in most confirmatory clinical trials leading to market approval of RRMS therapies, it is not yet clear whether a reduction in relapse rate is associated with a better overall prognosis. For example, the strength of the effect of the reduction in relapse rate with regard to the prevention of long‐term disability progression remains controversial (Cree 2019).
Diagnostic transition to a more advanced disease stage, indicative of worsening and active disease, is also of interest to prognostic research in this field. For example, people initially diagnosed with CIS can meet the criteria of clinically definite MS by experiencing another relapse. The ability to predict whether or when the conversion to definite MS will occur might have substantial clinical impact on decisions to start or abstain from early treatment in people with CIS. Patients initially diagnosed with RRMS can be considered to have converted to a progressive course, SPMS, by retrospective assessment of sustained progression independent of relapses over a period of time, for example one year (Thompson 2018b).
As MS is a lifelong condition, we find the aforementioned outcomes to be relevant not only at various time points of prognostication during the disease course, but also for various prediction horizons. We also expect outcome definitions, timing, and measurement methods for clinical disease progression, worsening, and activity to be highly heterogeneous in the literature.
Why it is important to do this review
While there are more than 50 published Cochrane Reviews on interventions for MS or associated symptoms and more than 20 are ongoing, this is the first Cochrane Review of prognostic studies in MS (Cochrane 2021). Independent of the Cochrane network, Hempel and colleagues reviewed 59 studies of single modifiable prognostic factors in MS progression, such as vitamin D levels and smoking status (Hempel 2017). Also, Río and Ruiz‐Peña reviewed 45 studies that predict long‐term treatment response by short‐term response criteria, including both single factors and multivariable expert‐based algorithms (Río 2016). Both reviews found a wide variety of methods, timing, and outcome and prognostic factor definitions.
We aimed to conduct a systematic Cochrane Review of multivariable prognostic models for predicting future clinical outcomes indicative of disease progression, worsening, and activity in people with MS at any time point following diagnosis. The results from this review will provide a long‐sought comprehensive summary and assessment of the evidence base for all disease subtypes (not just RRMS) and across all statistical methodologies (not just machine learning (ML) studies). We aimed to thereby enhance the knowledge base described by the many non‐systematic reviews (Derfuss 2012; Gafson 2017; Miller 2008; Rotstein 2019) and focused systematic reviews (Brown 2020; Havas 2020; Seccia 2021) reported thus far. Identified models could potentially provide people with MS and their physicians with informative and clinically relevant tools for making decisions on disease management.
No review thus far presents changes in prognostic factors and methods over time, nor assessment of reporting deficiencies of the models in the literature. We also summarised the readiness of the models for translation into clinical practice in terms of external validation evidence, thereby identifying models that require further external validation or clinical impact assessment. This review forms a solid basis from which to make recommendations for future prognosis research in MS.
Objectives
To identify and summarise multivariable prognostic models for quantifying the risk of clinical disease progression, worsening, and activity in MS.
To this end we aimed to:
describe the characteristics of the identified multivariable prognostic models, including prognostic factors considered and evaluation measures used;
describe changes in outcome definitions, time frames, prognostic factors, and statistical methods over time;
summarise the validation performance of the models;
summarise model performance and synthesise across external validation studies via meta‐analysis, where possible;
investigate sources of heterogeneity between studies;
assess the risk of bias in the models;
evaluate moderating effects on model performance by meta‐regression, where possible; and
make recommendations for future MS prognostic research.
Methods
Criteria for considering studies for this review
Defining the eligibility criteria was an iterative process, which involved multiple discussions within the review team based on our previous knowledge, as well as several studies we knew should be included, and borderline cases that we knew should be excluded. These criteria are described by the PICOTS table below and in the following sections.
| Population | Adults with MS, including all subtypes (CIS, RRMS, SPMS, PPMS) |
| Intervention | All multivariable prognostic models and their validation studies |
| Comparator | There are no comparators in this review |
| Outcome | Clinical disease progression, worsening, and activity, which are measured based on disability, relapses, conversion to a more advanced disease subtype (clinically definite, progressive), or a composite of these |
| Timing | The models are to be used any time following diagnosis for predicting future disease course |
| Setting | Any clinical setting where people with MS receive medical care |
| CIS: clinically isolated syndrome; MS: multiple sclerosis; PPMS: primary progressive MS; RRMS: relapsing‐remitting MS; SPMS: secondary progressive MS | |
Types of studies
We included studies that aimed to develop, validate, extend, or update multivariable prognostic models of future disease outcomes in people with MS.
Study design: We included prognostic modelling studies that used data collected retrospectively or prospectively from the following sources: routine care, disease/patient registries, cohort studies, case‐control studies, and randomised controlled trials.
Data source and setting: We included studies based on both primary and secondary use of data. We included models intended for use in any clinical setting where people with MS receive medical care. We excluded studies that did not contain prediction of future outcomes in individuals.
Statistical methods: We included models developed with either traditional statistical methods or machine learning (ML). For the purpose of this review, a method is considered ML if it has at least one tuning parameter, excluding Bayesian priors, for controlling its architecture and, as a result, its performance.
Validation: We included studies that evaluated a previously reported prognostic model in a different set of participants by reporting discrimination, calibration, or classification measures based on predictions from that model, even though the term ‘validation’ was not explicitly used. We also included studies that reported validation of a previously reported prognostic model, even though what was done did not constitute an external validation in its strictest sense (see 'Terms used for reporting' for details). Studies that did not meet the search or inclusion criteria themselves but described the development of models evaluated or validated in future eligible studies were also included in order to extract data from and assess the risk of bias in the model development.
Targeted population
We included studies on adult individuals, 18 years old or over, with a diagnosis of MS, irrespective of the subtype or treatment status. We included studies that did not specify the disease subtype of their sample and studies that included people with one or more MS subtypes of CIS, relapsing, progressive, or any other categories. When a study included people with a single episode of optic neuritis, we considered the event to comprise a CIS and considered the study eligible.
Types of prognostic models
Determining whether a study reporting a multivariable model is a prognostic model study for predicting future disease outcomes in individuals can be difficult (Kreuzberger 2020). In this review, a study was considered to develop a multivariable prognostic model if the aims, results, and discussion report on the model itself, and not just the individual predictors comprising the model or the methodology used. For example, a study that reported only adjusted predictor effect measures from a multivariable model and discussed these, but neither evaluated the predictive performance of the model using discrimination, calibration, or classification measures nor discussed the model as a whole was excluded.
Studies were not limited by their modelling method; i.e. inclusion did not depend on whether traditional statistical methods or ML methods were used for development. We excluded studies predicting outcomes only based on single prognostic factors. We also excluded studies reporting on models that aimed to predict treatment response, either beneficial or harmful. The use of treatment as a predictor in the model was by itself not considered to determine the aim of treatment response prediction. Rather, the reported aim of the study was the determining factor.
Types of outcomes to be predicted
We included clinical outcomes indicating disease progression, worsening, and activity. We accepted author definitions based on any of the following:
disability progression/worsening;
relapse/attack;
-
conversion to a more advanced disease subtype:
to definite MS; or
to progressive MS;
composite outcomes containing at least one of the above (such as NEDA).
We included studies with any of the above outcomes, including models validated for a different outcome than originally developed. We did not exclude studies based on the data type of the outcome, even though prognosis is usually interpreted as referring to the risk of an event, i.e. necessitating a binary outcome. We excluded models that predict only paraclinical outcomes, such as laboratory measurements or image findings, because their translation to patient‐relevant outcomes at the individual level is unclear and they are not prioritised by people with MS (Day 2018). We also excluded studies predicting only quality of life outcomes, due to the difficulty in interpreting their clinical meaning. We considered cognitive disability to constitute a domain of disability whereas fatigue, depression, and falls did not fit any of the aforementioned outcome categories and we considered them out of scope for this review aiming to be relevant to clinical practice.
We did not exclude any studies based on time point of prognostication or the time horizon for which the prognostic models apply because our preliminary review of the prognostic literature in MS indicated very liberally defined (in years) and heterogeneous time points of prognostication, both in relation to diagnosis and start of treatment. Defining a time horizon was considered too restrictive for the review objective. For clinically meaningful outcomes, however, we expected disability progression/worsening and conversion from RRMS to SPMS to be measured in years. Relapses and conversion from CIS to RRMS were expected to be measured in months to a couple of years.
Search methods for identification of studies
Electronic searches
To identify eligible studies, we searched the following databases on 2 July 2021 (Appendix 1).
MEDLINE (Ovid SP) (1996 to 1 July 2021)
Embase (embase.com) (1996 to 2 July 2021)
Cochrane Database of Systematic Reviews (CDSR 2021, Issue 6) (searched 2 July 2021, via www.cochranelibrary.com)
Cochrane Central Register of Controlled Trials (CENTRAL 2021, Issue 6) (searched 2 July 2021, via www.cochranelibrary.com)
The Embase search above included conference proceedings from the following organisations.
European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS)
Americas Committee for Treatment and Research in Multiple Sclerosis (ACTRIMS)
American Academy of Neurology (AAN)
European Academy of Neurology (EAN)
We restricted the search to studies published since 1996, the year of publication of an important tutorial on multivariable prognostic models in Statistics in Medicine (Harrell 1996). Before this time, methods were rapidly being developed but at the same time concerns over the misuse of statistical modelling for prediction of health outcomes were being raised (Chatfield 1995; Concato 1993; Diamond 1989). We considered Harrell 1996 to be a turning point, after which many papers (Altman 2000), textbooks (Harrell 2001; Steyerberg 2019), and guidelines (Enhancing the QUAlity and Transparency Of health Research (EQUATOR)) network and Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD)) addressing proper analysis and reporting became readily available (Collins 2015; Simera 2008). We did not impose language restrictions on the search.
We used a search strategy for systematic reviews of prognostic models based on that of Geersing 2012 and further refined for this review. We validated this strategy for our specific question by determining whether it could identify a list of 11 a priori defined studies of interest: six inclusions (Bejarano 2011; Bergamaschi 2007; Margaritella 2012; Pellegrini 2019; Tousignant 2019; Wottschel 2015) and five excluded studies (borderline studies) that required assessment at the full‐text stage (Bovis 2019; Cree 2016; Gauthier 2007; Kalincik 2017; Runmarker 1994). Also, we randomly selected 120 titles and abstracts from less stringent search criteria and screened them to prevent missing any relevant studies.
We split this modified filter into three sub‐searches: search terms specific for prediction or prognostic models (2a); terms for general models (2b); and the statistical search terms (2c).
The search comprised two main parts, with each combining either two or three main concepts, as follows:
MS (1) and specific prognostic models (2a); or
MS (1) and general models or general statistical terms (2b or 2c) and clinical outcomes (3).
The search strategy used combinations of thesaurus search terms (MeSH or Emtree) and free‐text search terms including synonyms in the title and abstract. Animal studies and studies in children were excluded from the search.
Searching other resources
We performed backward reference searching of all included studies and all MS prognosis reviews identified during screening using Web of Science. We tracked citations of all the included studies (forward reference searching) via Web of Science. We performed the search in Web of Science between 13 October 2020 and 25 October 2020 for the studies/reviews from the initial database search, and on 16 August 2021 for the studies/reviews from the update to the database search. We also contacted authors of all included studies for further information on unpublished or ongoing studies.
Data collection
Selection of studies
Aiming to refine the eligibility criteria and ensure a common understanding amongst the review authors, we conducted a pilot title and abstract screening with a random subset that produced 200 results from the draft search strategy. This was followed by full‐text screening of the eight titles marked for inclusion. We selected eligible studies from the search results using the criteria outlined in 'Criteria for considering studies for this review' via the Rayyan web application (Ouzzani 2016). We used the same platform to document the exclusion reasons at the full‐text screening stage.
Pairs of review authors (BIO, KAR, AA, ZA, AG) performed title and abstract screening independently, and we included all titles marked for inclusion by at least one author at this stage in the full‐text screening. We also performed, independently and in duplicate, assessment of full texts for their inclusion in the review or reasons for their exclusion. When the record corresponded to a conference abstract, we searched its title and/or authors online (www.google.com, onlinelibrary.ectrims‐congress.eu/ectrims; accessed between 22 April 2022 and 29 October 2021) for any related articles, poster, or video presentations and, if available, considered these additional sources of information during our assessment. In addition, if the full text did not meet all the inclusion criteria but also could not be excluded with the available reported information, the authors of the respective studies were contacted for clarification. We resolved disagreements by involving a third review author (BIO, KAR) and, when necessary, through group discussion.
If a conference abstract meeting our inclusion criteria did not have an associated publication like a peer‐reviewed or preprint article, the data needed to inform the review and risk of bias assessment could not be extracted. As stated by Kreuzberger and colleagues, this complicates assessment of both inclusion/exclusion and risk of bias (Kreuzberger 2020). Consultation with study authors would only provide sufficient information if an associated publication could also be supplied. Hence, we considered conference abstracts to be awaiting classification until a report with more information on them becomes available.
For the assessment of non‐English titles/abstracts, we used online translators (translate.google.com, www.deepl.com/en/translator) and included any record that seemed to be relevant for full‐text screening. At full‐text stage, we (BIO and KAR) consulted the assessment of native speakers of that language and retrieved the translation of the full‐text for our independent assessment in duplicate.
We summarised the study selection process with a flowchart adapted from the PRISMA statement (Page 2021), showing the number of records we identified, the number of reports we excluded with reasons, and the total number of studies included.
Details regarding selection of studies
Due to the recency of the relevant reporting guidelines (Collins 2015), poor labelling of prognostic prediction studies, and the novelty of this review type (Kreuzberger 2020), we had regular meetings to clarify the boundaries and application of the selection criteria both at the title/abstract and full‐text screening levels. For transparency, we report the details of the recurrent themes and the decisions below.
The distinction between reports of multivariable prognostic prediction models and reports assessing the value of a single prognostic factor or searching for independent prognostic factors by multivariable modelling was not always clear (Kreuzberger 2020). We included records if there was any hint of individual level predictions either verbally or by the measures they reported for the multivariable models. We considered mentioning the overall model performance measures (e.g. R2, Brier score), discrimination measures (e.g. Harrel’s c‐index, area under the receiver operating characteristic curve (AUC)), classification measures (e.g. sensitivity, accuracy), or the terms calibration or validation in the context of prognosis sufficient for being taken forward to full‐text screening. We excluded records that only reported effect estimates (e.g. hazard ratio, odds ratio), or performance measures for single factors or univariable models at the title/abstract level.
We applied exclusion based on the eligibility criterion of aiming to develop or validate prognostic prediction models only at the full‐text screening level in order to take into account the totality of reporting.
We excluded multivariable combinations other than statistically developed prognostic prediction models, such as diagnostic criteria or expert scoring rules, even when they were used for individual prognostic prediction. Despite their potential usefulness, the intentions behind their development are different.
Expecting prognostic prediction models to be based on statistical theory, we also excluded scores based on counts of prognostic factors selected via or simplified from multivariable models unless the full‐text report provided a reason for the simplification (e.g. all effect estimates being similar) or compared the prediction performance of the count score to the multivariable model generating it.
Our search also picked up records that reported prediction of treatment response either by multivariable models or scoring rules. We were aware that some of these reports were making static or dynamic prognostic predictions conditional on treatment (e.g. Kalincik 2017; Sormani 2013), rather than treatment response predictions (Kalincik 2018; Sormani 2017; Steyerberg 2018). We decided not to reinterpret the stated objective or the presented results and excluded such reports.
In order to assign a single agreed upon exclusion justification to a full‐text report that may fulfil multiple criteria, we used a hierarchy based on convenience of assessment. As higher‐level exclusion reasons based on the headings, we evaluated a study’s eligibility in the following order: study type or objective, population, outcome, model (intervention), and timing.
Data extraction and management
Pairs of review authors (KAR, BIO, AA, AG) independently extracted data from the included studies into a predefined, piloted electronic spreadsheet (see Appendix 2) based on the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist (Moons 2014) and TRIPOD guidelines (Collins 2015) and open disagreements were resolved jointly. If a study was associated with multiple reports and the data in them were inconsistent, we preferred to collect the data from:
journal article over other types of reports;
more recent journal article over an older one; and
main text of a journal article over its supplements/appendices.
Our data extraction form included the following items, with further explanation available in Appendix 3:
Article information (title, author, year, publication type).
Data sources (e.g. use of randomised trial/cohort/registry/case‐control data, primary/secondary data use).
Participants (e.g. inclusion/exclusion criteria, recruitment method, country, number of centres, setting, participant description, treatments received, MS subtype).
Outcomes (e.g. definitions and methods of measurement, categorisation into disability/relapse/conversion to clinically definite MS/conversion to SPMS/composite, duration of follow‐up or time of outcome assessment, blinding).
Candidate predictors (e.g. predictor definitions and method/timing of measurement, handling/transformations, categorisation into the following domains: demographics, symptoms, scores, CSF, imaging, electrophysiological, omics, environmental, non‐CSF samples, disease type, treatment, or other).
Sample size (e.g. number of participants, number of events, number of events per predictor).
Missing data (e.g. number of participants with missing predictor or outcome data, handling of missing data).
Model development (e.g. type of model, method for predictor consideration, model/predictor selection method, predictor selection criteria, tuning parameter details, data leakage prevention steps, shrinkage).
Model performance and evaluation (e.g. discrimination, calibration, and classification measures with standard errors or confidence intervals, internal or external validation).
Model presentation and interpretation (e.g. final models, alternative presentations, exploratory versus confirmatory research, comparison with other studies, generalisability, strengths, and limitations).
Factors related to model usability and reproducibility (sufficient explanation to allow for further use, skill and equipment specialisation required for predictor assessment, whether model/tool, code, and/or data were provided, whether absolute risks could be computed).
Assessment of reporting deficiencies
Deficiencies in methods and reporting in prognostic modelling studies are well‐known (Bouwmeester 2012; Brown 2020; Havas 2020; Kreuzberger 2020; Peat 2014). We described deficiencies in the MS prognostic modelling literature using Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guideline items (Collins 2015). We assessed 20 items from the Methods (source of data, participants, outcome, predictors (only for developments), sample size, missing data, statistical analysis methods (only for developments)), Results (participants, model development, model specification (only for developments), model performance), and Discussion (limitations) sections of the checklist provided in the guideline. We used the categories of reported, not reported, and unclearly/somehow reported. During this assessment, we only took into account the text present in the publications themselves or the publications explicitly referenced in them for data source, definitions, or methods (referenced as auxiliary references in Characteristics of included studies) and ignored the information provided by the study authors during follow‐up correspondence.
Assessment of risk of bias in included studies
We performed risk of bias assessments independently and in duplicate (KAR, BIO, AA, AG) using the Prediction Model Risk of Bias Assessment Tool (PROBAST) (Wolff 2019). The tool consists of signalling questions in four domains (participants, predictors, outcome, analysis), covering sources of possible bias due to data sources, definition or measurements of predictors and outcomes, sample size and analysis sets, model development, and model performance evaluation. We graded each domain as having low, high, or unclear risk of bias, which formed the basis for the overall risk of bias assessment (as described in Moons 2019). A third review author (KAR or BIO) reconciled the duplicate assessments and resolved any remaining disagreements at the item and model/validation level by joint discussion with the respective raters. When insufficient information was reported to allow for clear assessment, resulting in an unclear rating at the domain or study level, we contacted the study authors via email to request further information. In order to develop a common understanding of the form, two review authors (KAR, BIO) piloted the tool, discussed discrepancies in use, and agreed on rules for further use.
When multiple models were developed in a single study or development and external validation of a model were included in the same study, we assessed the quality of each model or external validation separately. We presented the risk of bias primarily at the analysis level in the Results and Discussion. However, in the Characteristics of included studies we presented the risk of bias at the study level for each domain. When domain‐level assessments differed across analyses within a single study, the domain was rated as the best within the domain at the study level and the differences per analyses were noted in the support for judgement.
In order to assess risk of bias, we needed to further refine the interpretation of the PROBAST items. The topics that required further refinement were related to the reporting in the literature, specifics of the disease area, and the application of the tool to studies employing non‐traditional prognostic modelling methods ‐ including ML and non‐binary outcomes. These issues were jointly discussed amongst review authors (BIO, KAR, HS, UM, UH, JH, JB) until consensus was reached. Additionally, this review was designed broadly in order to identify all prognostic prediction models of clinical outcomes in MS. This meant that studies may have been included that were not considered to be exactly applicable to the aim of the review, even though they met the selection criteria. We report our decisions regarding PROBAST interpretation and the assessment of applicability in Appendix 4.
Measures of association or predictive performance measures to be extracted
In our protocol we stated that predictor effect measures would be collected and standardised in order to describe changes in prognostic factors in the models over time (On Seker 2020). We extracted effect measures where possible; however, the variety of predictors and their definitions, in addition to the use of ML models for which effect measure reporting is unclear, made comparison of predictors based on effect measures impossible. Instead, we reported categories of predictors, both considered and included in the final models, and described changes in these categories over time (Differences between protocol and review).
We primarily extracted performance measures for discrimination and calibration, as well as their measures of uncertainty. Discrimination (e.g. c‐statistic, AUC, Harrel’s c‐index, Gonen and Heller’s concordance index, Royston and Sauerbrei’s D‐statistic) refers to a model’s ability to distinguish between participants developing and not developing the outcome of interest. We expected the c‐statistic (or equivalently the AUC) to be the most frequently reported measure of discrimination. It gives the proportion of randomly chosen pairs from the sample (one participant with the outcome and one without) in which the participant with the outcome has the higher predicted score/risk. A c‐statistic of 0.5 means that the model’s discriminative performance is no better than chance, while a value of 1.0 is considered perfect discrimination.
Calibration (e.g. calibration slope, calibration‐in‐the‐large, observed‐to‐expected (O:E) ratio) refers to the extent to which the expected outcomes and observed outcomes agree. We expected calibration to be reported infrequently and therefore focused on possible extraction of the O:E ratio, which is strongly related to calibration‐in‐the‐large and is an average across the range of predicted risks (Debray 2017). Values close to 1 indicate a well‐calibrated model overall; however, this does not rule out poor calibration in some subgroups.
We also extracted data on classification measures like sensitivity, specificity, predictive values, or accuracy. Such classification measures are based on categorisation of predicted probabilities at some cutoff. A cutoff may be predetermined based on clinical relevance, arbitrarily defined as the middle (0.5) of the theoretical range of probability (0–1), or calculated post hoc in a data‐driven manner.
The aforementioned performance measures can be evaluated in different contexts. Internal validation is the evaluation of a model’s performance in the same population used for development. External validation is the evaluation of a model’s performance in a population different from that of its development. The characteristics of the participants in a validation that make it external might be based on, for example, location (e.g. participants from different sites), time (e.g. temporal split of participants from a single site), or spectrum (e.g. participants with a different disease subtype or treatment status).
Dealing with missing data
We contacted the corresponding authors via email to request missing or unclear information required for study eligibility, basic study description, quantitative data synthesis, or risk of bias assessment. When the c‐statistic was provided without its standard error or confidence interval, we calculated its variance based on the combination of sample size and number of events, if available, according to the method of Hanley and McNeil and computed the corresponding confidence interval according to Newcombe and colleagues (Hanley 1982; Newcombe 2006 method 4).
Terms used for reporting
A clarification of the terms used to differentiate between the various levels at which we are reporting was necessary. We used the term 'record' to refer to the database entries we retrieved through databases and considered during title/abstract screening. All eligible records were associated with at least one scientific report, retrieved for full‐text screening mostly from the publishers but also via contacting the study authors or searching the Internet. A comprehensive prognostic prediction exercise with a clear goal performed by the same set of authors was called a 'study'. A single study may be associated with more than one 'report'. A report may be one of the following types, ordered according to the amount of information they contain: journal article, preprint article, dissertation, poster or video presentation, conference abstract.
The main unit of interest in this review was called the 'analysis'. A single study (or report) may contain many analyses: multiple model developments or validations with different outcomes, timing, predictor sets, modelling methods, or participant subsets. Analyses may be of the type model 'development' or model 'validation'. Model developments are interchangeably referred to as models or developments.
Several studies we included reported results from development of more than one prognostic model, but only a subset of these studies aimed to present multiple final prognostic models. When multiple models in a study were reported in an almost equivalent manner without any indication of a preferred one, we included all the models in our review. We extracted the data for each model separately and presented them individually. This decision is motivated by our aim of reviewing all prognostic models with potential clinical meaning in the disease area of MS.
When the reporting in an eligible study with multiple models indicated a preferred or selected model, we included only that model in our review. Model preference was communicated either directly (e.g. by discussing the superiority of one amongst the competing models) or indirectly (e.g. by using a bold font for select results or presenting figures for a single model) by those studies’ authors. The other models were considered to be by‐products of the modelling process and not meant to be presented as final models. We always reported all validations of included models as separate analyses.
For the purpose of clear reporting we made a distinction between internal, external, and author‐reported validations. 'Internal validation' is the evaluation method directly relevant to the analyses of the type model development and thus reported in that context. For calling a validation external, we expected loyal evaluation of a developed model in an independent set of participants. Even though authors who developed prognostic models and reported model evaluation measures using a different set of participants may have referred to their activities as "validation", we refrained from calling them 'external validation' if the set of participants was not independent of the development set, if the new set of participants was only used for model refitting, or if the model was improperly changed, e.g. predictors dropped without statistical re‐estimation. These exceptional cases are referred to as other 'author‐reported validations'. We always reported all validations of included models as separate analyses. For the description of the overall literature evaluating prediction performance using a separate set of participants, we referred to the analyses of external validations and other author‐reported validations altogether as validations unless a differentiation between them was deemed necessary. For example, when reporting or discussing clinical readiness, we concentrated only on external validations.
To differentiate between multiple analyses from a single study, we referred to them first by the study name (e.g. Zhao 2020). If there were multiple models included from a single study, these models were differentiated from each other by the name/abbreviation authors have used or by a reference to what separates the models included from that study (e.g. the modelling method and considered set of predictors in Zhao 2020 XGB Common). Finally, if a model had validation other than an internal one, we differentiated these separate analyses by adding 'Dev' for development, 'Ext Val' for external validations, and 'Val' for other author‐reported validations (e.g. Zhao 2020 XGB Common Val).
Data synthesis
This broad review intended to identify all prognostic models of clinical disease progression, worsening, and activity across all types of MS. We expected to identify numerous model development studies, but only few external validation studies overall. As per the protocol, we summarised all identified multivariable models and prognostic factors included in these models in narrative, graphical, and tabular formats. We had planned to apply methods to derive missing performance measures (the c‐statistic for discrimination and the O:E ratio for calibration) (Debray 2019); however, this was not possible due to limited reporting of alternative discrimination measures, descriptions of the linear predictor distribution, and the expected number of events. We did not meta‐analyse prognostic model performance statistics for single models externally validated in several independent samples because no single model had at least three independent external validation studies outside its development study. We also could not perform meta‐regression due to heterogeneity in predictor and outcome definitions and the low number of studies with reported or derivable performance measures for a single outcome. Please see Differences between protocol and review for details.
Investigation of sources of heterogeneity between studies
We expected to find substantial heterogeneity, as diagnostic criteria for MS subtypes and available treatment options, as well as the technology used to assess disease activity, have evolved over time. Heterogeneity is also typically high in prognostic studies. We expected heterogeneity both between development studies and their corresponding validation studies for specific models and also between different development models for the same outcome. Potential sources of heterogeneity related to either or both of these include:
case mix (e.g. age, sex, disease duration, treatment status);
study design (e.g. follow‐up time, source of data, outcome definitions and prognostic factors); and
statistical analysis methods and reporting (e.g. number of prognostic factors included, traditional statistics versus ML, risk of bias, validation methods).
We aimed to extract relevant information and to include a narrative summary of these potential sources of heterogeneity. We had planned to further investigate heterogeneity statistically using I2 calculation and meta‐regression for random‐effects models for external validation performance measures of single models; however, as stated earlier, this was not possible (see Differences between protocol and review). Instead, as also discussed in the protocol, we discussed these potential sources of heterogeneity in the narrative text, and with tables and figures, as far as possible.
Synthesis
For synthesis, we used median, interquartile range (IQR), or range to describe the quantitative measures reported for the included analyses. These consisted of participant characteristics (age, sex, disease duration, study timing), predictors (number considered, number included), sample size (number of participants, number of events), and performance measures (c‐statistic). The sample in some analyses belonging to the same study or using the same data source might be overlapping or even the same. However, we ignored this correlation for reporting purposes due to the facts that (1) the description is aimed to give a sense of the state of the literature rather than to give precise estimates, (2) no study or data source has excessive undue influence over these measures, and (3) it is impossible to discern the extent of overlap between analyses utilising the same data source.
Tables were organised by model outcome, which aligns with the diagnostic subtype. Model outcomes were categorised into five groups: disability, relapse, conversion to definite MS, conversion to progressive MS, and composite outcomes. The models were summarised over several tables presenting certain aspects: study characteristics, participant characteristics, predictor domains, number of predictors, model development and validation, final model presentation, reporting items, and usability. External validation information was also included in these tables, where appropriate. Figures were organised by model outcome, development versus validation, or algorithm type, where appropriate. Algorithm type was categorised into two groups: traditional statistics and ML.
We used the statistical programming software R (version 4.0.1) and the following packages for all analyses: tidyverse (1.3.0), dmetar (0.0.9000).
Conclusions and summary of findings
A GRADE framework adaptation specific to prognostic model research, rather than general prognosis (Iorio 2015) or prognostic factor research (Foroutan 2020) is still a topic of future work, hence, we did not apply GRADE to our conclusions. Our conclusions highlighted the biases in the current literature, the usability of the currently available models, and areas in need of improved reporting. We also made recommendations for future research.
Results
In this section, we start by reporting the results from the search and screening process, including the reasons for exclusion. This is followed by an in‐depth review of the models with more than one external validation. Then, we describe the data extracted from all the included studies, and the respective analyses as appropriate, in the order of the CHARMS checklist (Moons 2014): data source, participants, outcomes, predictors, sample size and missing data, model development, model performance and evaluation, model presentation, and interpretation. We finalise this section with our assessment of the analyses based on the extracted data: usability and reproducibility, risk of bias, and reporting deficiencies.
Description of studies
Results of the search
We identified 13,046 records via our database search (4757 from MEDLINE, 7706 from Embase, and 583 from the Cochrane Library ‐ search updated on 2 July 2021), as summarised in Figure 1. Our backward and forward citation tracking of the included studies and reviews on MS prognosis identified during title/abstract screening resulted in an additional 4727 records. Contact with the authors of included studies led us to a further 23 suggested records related to the topic. After deduplication of the 17,796 records from all sources, we screened the titles/abstracts of 12,258 unique records, of which 261 were found eligible for full‐text retrieval. We identified an additional 48 reports of the types preprint article, dissertation, and poster or video presentations related to the conference abstracts via searching the Internet or contacting the abstract authors. In total, we assessed 309 full‐text reports for eligibility.
1.

Flow diagram based on PRISMA 2020 guideline
At the full‐text screening stage, we excluded 180 reports (see Excluded studies). Furthermore, 21 reports of 11 studies were conference abstracts or presentations without any associated full‐text publication. Despite attempts to contact the authors for more information, a final judgement on eligibility could not be reached for an additional eight reports corresponding to six studies due to limited information (see Characteristics of studies awaiting classification). Thus, we included 100 reports corresponding to 57 studies in our review. Bergamaschi 2001 did not report any predictions for individuals and Weinshenker 1991 was published before the dates covered by our search algorithm. These two studies were nevertheless included in our review due to the fact that they described the development of models that were validated in future eligible studies (Bergamaschi 2007; Bergamaschi 2015; Weinshenker 1996).
Excluded studies
We excluded 180 reports after full‐text screening for the following reasons, listed according to our hierarchy of exclusion reasons at the first level and decreasing number of studies at the second level.
-
Wrong study type (113)
112 records that do not aim to develop or validate prognostic models
1 record that is not an original study but a review
-
Wrong population (3)
3 records with prognostication applied to people without a diagnosis of MS
-
Wrong outcome (6)
6 records with outcomes other than disability, relapses, or conversion to a more advanced disease subtype
-
Wrong model (43)
13 records using multivariable combinations not derived from statistical prognostic models (e.g. diagnostic criteria, scoring rules)
12 records predicting treatment response
10 records that do not perform individual level predictions
8 records containing predictions from a model not multivariable in nature
-
Wrong timing (15)
15 records predicting concurrent or cross‐sectional outcomes
A representative selection of the excluded studies, with detailed reasons, is available in the section Characteristics of excluded studies.
Included studies
Of the 57 studies included in this review, 42 (74%) reported prognostic model development only (Aghdam 2021; Agosta 2006; Bendfeldt 2019; Bergamaschi 2001; Borras 2016; Brichetto 2020; De Brouwer 2021; de Groot 2009; Gout 2011; Kosa 2022; Kuceyeski 2018; Law 2019; Margaritella 2012; Martinelli 2017; Misicka 2020; Montolio 2021; Olesen 2019; Oprea 2020; Pellegrini 2019; Pinto 2020; Pisani 2021; Roca 2020; Rocca 2017; Rovaris 2006; Runia 2014; Seccia 2020; Skoog 2014; Sombekke 2010; Spelman 2017; Szilasiová 2020; Tacchella 2018; Tommasin 2021; Tousignant 2019; Vukusic 2004; Weinshenker 1991; Wottschel 2015; Wottschel 2019; Ye 2020; Yoo 2019; Yperman 2020; Zakharov 2013; Zhang 2019), and eight (14%) reported both development and external validation of prognostic models (Ahuja 2021; Calabrese 2013; Lejeune 2021; Malpas 2020; Mandrioli 2008; Manouchehrinia 2019; Sormani 2007; Vasconcelos 2020). Bergamaschi 2007 reported an external validation of a previously developed but not evaluated model (Bergamaschi 2001). Bejarano 2011 and probably Zhao 2020 replicated the modelling process in an independent set of participants instead of evaluating the final model derived from the development set. Hence, these two studies are considered to have reported development and other author‐reported validation of prognostic models.
The remaining four studies (7%) were a combination of the aforementioned types: the model developed in Bergamaschi 2001 (called Bayesian Risk Estimate for Multiple Sclerosis (BREMS) by its authors) was both externally validated and, after dropping of post‐onset predictors without a statistical justification (called BREMS onset (BREMSO) by its authors), validated for the original and a new outcome in Bergamaschi 2015. Gurevich 2009 developed two models of interest (called First Level Predictor (FLP) and Fine Tuning Predictor (FTP) by their authors) but externally validated only one of them (FLP). In Skoog 2019, a previously developed model (Skoog 2014) was both further internally evaluated in a subset of the development cohort and validated externally. Finally, Weinshenker 1996 reported both an external validation of a previously developed model (called Model 3 by its authors in Weinshenker 1991) and the development of a new model (short‐term outcome).
We contacted the authors of all 57 included studies to obtain missing information or for clarification of the reported information. No response was received for 21 (37%) studies. Response was received but without further information or clarification from the six (10%) contacts. Authors of the remaining 30 studies (53%) provided further details and clarifications.
Models with more than one external validation
We identified two models with more than one external validation (Table 1), both originally developed to predict time to conversion to progressive MS: the BREMS score (Bergamaschi 2001) and the survival model of Manouchehrinia 2019.
BREMS score
The BREMS score was developed using clinical data from 186 people with RRMS seen at a single MS clinic in Italy until December 1997. The mean follow‐up time was 7.5 years, ranging from 3 to 25 years. Bergamaschi and colleagues defined the time of onset of the secondary progressive phase as the earliest date of observation of a progressive worsening severe enough to induce an increase of at least one EDSS point and which persisted for at least six months after the progression onset. This score was developed using Bayesian methods to jointly model relapse, Kurtzke’s Functional Systems scores, and EDSS until the primary time to SPMS conversion outcome. The presented sum score contained nine predictors: age at onset, female sex, sphincter onset, pure motor onset, motor‐sensory onset, sequelae after onset, number of involved functional systems at onset, number of sphincter plus motor relapses, and EDSS greater than four outside relapse. This model was not internally validated in the development study, but was externally validated in two further studies (Bergamaschi 2007; Bergamaschi 2015). Additionally, the model was updated by dropping two predictors not available at onset (number of sphincter plus motor relapses and EDSS greater than four outside relapse) and renamed BREMSO (Bergamaschi 2015). This update did not, however, consist of a refitting of the model using the subset of original predictors, but rather presented the original coefficients without the two dropped ones. This updated model was evaluated for prediction of the original SPMS conversion outcome as well as for severe MS defined using the Multiple Sclerosis Severity Score. Because the BREMS model was only externally validated twice and no measures of discrimination or calibration were reported, we did not perform meta‐analysis for summarising the performance of this model.
Manouchehrinia 2019
Manouchehrinia and colleagues developed their model using data from 8825 participants with RRMS seen up until May 2016 in the Swedish national MS registry. The mean (standard deviation (SD)) follow‐up time was 12.5 (8.7) years. In this parametric survival model, time to SPMS was defined as the earliest recognised date of SPMS onset determined by a neurologist at a routine clinic visit according to the Lublin 1996 criteria. The model was presented as a nomogram for computing 10‐, 15‐, and 20‐year conversion probability and additionally as a web application (https://aliman.shinyapps.io/SPMSnom/). The final model included five predictors: year of birth, age at onset, sex, first EDSS, and age at first EDSS.
This model was internally validated using the bootstrap method, with both calibration and discrimination assessed. In the same publication as the model development study, it was reported that model validation was also performed using three external multi‐site datasets addressing temporal, geographic, and spectrum transportability. The British Columbia MS Cohort provided 3967 participants diagnosed with RRMS according to Poser 1983, who were enrolled between January 1980 and December 2004 and followed up for an average of 13.8 (SD 8.4) years. The second external validation analysis was performed using the 175 participants from the ACROSS (A CROSS‐Sectional Long‐term Follow‐up of Fingolimod Phase II Study Patients) randomised placebo‐controlled phase 2 trial of the disease‐modifying therapy fingolimod who returned for assessment at 10 years. The third external validation analysis used 2355 participants from the long‐term follow‐up extension study of the phase 3 trials FREEDOMS (FTY720 Research Evaluating Effects of Daily Oral therapy in Multiple Sclerosis) and FREEDOMS II, which also assessed fingolimod. RRMS diagnosis was made using the McDonald 2001 and 2005 criteria (Polman 2005) and mean follow‐up time was 18.6 (SD 7.9) years and 14 (SD 7.8) years, respectively, in ACROSS and FREEDOMS validation analyses.
The model development and external validations in Manouchehrinia 2019 were all found to be at high risk of bias. In the development, people with MS were included from registry data based on availability of EDSS score. It was unclear how standard the data collection was or whether the included sample may have differed from the general population with MS. The outcome, conversion to SPMS, was based on the Lublin 1996 criteria, which we considered to be subjective. The combination of retrospective use of registry data and a subjective outcome increases the risk of bias. During analysis, only complete cases were used, and it was unclear in which subset of participants the backward selection of predictors took place. This was followed by internal validation, which did not include the predictor selection process.
Age at onset, age at first recorded EDSS score, and the first recorded EDSS value were amongst the candidate predictors remaining in the final model. What is meant by 'onset' is not clearly defined here; however, in this review, 'onset' was liberally defined to include up to one year after onset. Given that the model was to be applied at onset, age at onset and age at first EDSS score should have been similar, if not equivalent. However, first EDSS assessment occurred on average 6.5 years after onset, which meant that the development included predictors available only after prediction model application. This use of unavailable predictors is even clearer here than in other analyses because of the use of survival analysis, in which the start and stop of time is explicitly defined. Although EDSS may be available at onset in general, the estimated model would probably change due to a different range of EDSS scores actually seen at onset. The inclusion of predictors unavailable at the time of model application makes the model unusable, as the predictor should be unavailable when applying the model to a future patient by definition. It also inflates model performance, because the predictor measured more closely in time to the outcome is probably more strongly associated with the outcome (Moons 2019).
We rated the external validations at high risk of bias for similar reasons. Inclusion in the British Columbia validation analysis was based on data availability, instead of employing multiple imputation of missing values, and the observation frequency for outcome measurement varied across participants. Only participants with complete follow‐up were included in the ACROSS validation analysis, even though a time‐to‐event analysis capable of dealing with censoring was employed, and this analysis included only 26 events, well below the recommended 100 events for validation studies. Only the FREEDOMS analysis used a clear definition of time of conversion to SPMS based on increased EDSS for at least six months. The three validation analyses assessed discrimination using Harrel’s c‐statistic, but did not assess calibration, which is valuable to assess in external samples, not just the development set.
Although the Manouchehrinia 2019 model was evaluated using three external datasets, all three of these validations were conducted by the same study team within the development publication. Sources of confusion related to timing in model development were propagated across all the validation analyses. We therefore did not consider these evaluations to be independent external validation studies and decided against performing a meta‐analysis.
Characteristics of included models
In total, we extracted data from 75 models developed in 54 studies (see Appendix 5 for details). Of these, 35 (47%) models were developed using traditional statistical methods and the remaining using ML methods. Of the studies that developed models, 42 (78%) contributed one model each, four (7%) contributed two models each (Gurevich 2009; Olesen 2019; Wottschel 2015; Ye 2020), seven (13%) contributed three models each (Bendfeldt 2019; de Groot 2009; Law 2019; Misicka 2020; Pinto 2020; Seccia 2020; Tacchella 2018), and Zhao 2020 contributed four models.
In those 12 studies from which multiple model developments were included, the primary difference between the models was timing of the outcome measurement in five (42%) (Misicka 2020; Pinto 2020; Seccia 2020; Tacchella 2018; Wottschel 2015), modelling method in three (25%) (Gurevich 2009; Law 2019; Zhao 2020), outcome in two (17%) (de Groot 2009; Pinto 2020), considered predictors in four (33%) (Bendfeldt 2019; Olesen 2019; Ye 2020; Zhao 2020), and participant subset in one study (Bendfeldt 2019).
We extracted data from 21 external or author‐reported validations in 15 studies. Of these studies, 11 (73%) contained one validation (Ahuja 2021; Bejarano 2011; Bergamaschi 2007; Calabrese 2013; Gurevich 2009; Lejeune 2021; Malpas 2020; Mandrioli 2008; Sormani 2007; Vasconcelos 2020; Weinshenker 1996), two (13%) contained two validations (Skoog 2019; Zhao 2020), and two (13%) contained three validations (Bergamaschi 2015; Manouchehrinia 2019).
Of all validations, 15 (71%) were external validations of 12 models (16%): 10 were externally validated once (Ahuja 2021 Dev; Calabrese 2013 Dev; Gurevich 2009 FLP Dev; Lejeune 2021 Dev; Malpas 2020 Dev; Mandrioli 2008 Dev; Skoog 2014 Dev; Sormani 2007 Dev; Vasconcelos 2020 Dev; Weinshenker 1991 M3 Dev), the model Bergamaschi 2001 BREMS Dev was externally validated twice in studies separate from the development but by the same research team, and the model Manouchehrinia 2019 Dev was externally validated three times in the same study of its development. The remaining six (29%) were other author‐reported validations (Bejarano 2011 Val; Bergamaschi 2015 BREMSO MSSS Val; Bergamaschi 2015 BREMSO SP Val; Skoog 2019 Val; Zhao 2020 LGBM Common Val; Zhao 2020 XGB Common Val). None of the validations were performed by researchers independent of that model’s development team.
Our main sources for data extraction were journal articles for 55 (96%) studies, a dissertation for Runia 2014, and a conference proceeding for Tousignant 2019. The number of published prognostic model studies, and the analyses (both model developments and validations) they contain, have greatly increased during the last couple of years (see Figure 2 left). Before 2001, two studies containing three analyses were published, whereas 36 studies containing 63 analyses have been published after 2015. Yet, there seems to be no discernible time‐trend in the number of published validations relative to the number of published developments. Recently, there has been an increase in the popularity of ML methods for prognostic prediction model development (see Figure 2 right).
2.
Publication characteristics by year. Left: number of included studies (black outline) and the model developments (blue)/validations (orange) they contain by year of publication; right: number of included developments using traditional (dark blue) or machine learning (yellow) methods by year of publication. Data for the year 2021 is incomplete (only until July). ML: machine learning.
Data source
As a data source for model development or validation, 37 (39%) analyses used cohort studies, 18 (19%) analyses used routine care sources, 14 (15%) analyses used randomised trial participants, 13 (14%) analyses used disease registries. Four (4%) analyses used a combination of these: Ahuja 2021 used cohort study and routine care data (electronic health records), Kosa 2022 used cohort study and case‐control study data, Bergamaschi 2001 used registry and routine care data, and Kuceyeski 2018 used cohort study, registry, and routine care data. The source of data was not reported or unclear for 10 (10%) analyses (Gurevich 2009 FLP Dev; Gurevich 2009 FLP Ext Val; Gurevich 2009 FTP; Sombekke 2010; Tommasin 2021; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val; Ye 2020 gene signature; Ye 2020 nomogram; Zakharov 2013).
The data used in the analyses were collected to conduct prognostic research in 27 (28%) analyses. In 61 (64%) analyses, data use was secondary, i.e. the data were collected for other reasons but were then repurposed. The data collection purpose for the remaining eight (8%) analyses was either unclear or not reported (Borras 2016; Gurevich 2009 FLP Dev; Gurevich 2009 FLP Ext Val; Gurevich 2009 FTP; Martinelli 2017; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val; Zakharov 2013).
Of the 21 validations, the participants in the validation were different from those in the model development in terms of location in six (29%) (Bejarano 2011 Val; Bergamaschi 2007 BREMS Ext Val; Malpas 2020 Ext Val; Weinshenker 1996 M3 Ext Val; Zhao 2020 LGBM Common Val; Zhao 2020 XGB Common Val), in terms of time in three (14%) (Calabrese 2013 Ext Val; Mandrioli 2008 Ext Val; Vasconcelos 2020 Ext Val), and in terms of patient spectrum in two (10%) (Ahuja 2021 Ext Val; Sormani 2007 Ext Val). The difference was in multiple dimensions in eight (38%) validations: in terms of both location and time in Bergamaschi 2015 BREMS Ext Val; Bergamaschi 2015 BREMSO MSSS Val; Bergamaschi 2015 BREMSO SP Val; Skoog 2019 Ext Val, in terms of location, time, and spectrum in Manouchehrinia 2019 Ext Val 1; Manouchehrinia 2019 Ext Val 2; Manouchehrinia 2019 Ext Val 3, and in terms of location and spectrum in Lejeune 2021 Ext Val. The difference between the validation cohort and the derivation cohort was not explicitly reported in Gurevich 2009 FLP Ext Val and Skoog 2019 Val is a further evaluation of the model in a subset of the derivation cohort.
Participants
The participants were recruited from a single site in 54 (56%) and from multiple sites in 40 (42%) analyses; the number of sites was not reported for the remaining analyses. Of those 90 (94%) analyses for which country of participant recruitment could be extracted, recruitment was from centres in Europe in 68 (76%), in North America in 28 (31%), in Asia in 13 (14%), in South America in seven (8%), in Oceania in five (6%), and in Africa (country South Africa) in a single analysis (Pellegrini 2019).
In 86 (90%) analyses for which summary statistics on sex could be extracted (including studies, e.g. Zhao 2020, which did not report characteristics of the included sample but led to references on the source population), the proportion of females ranged from 50% (Rocca 2017; Rovaris 2006) to 100% (Vukusic 2004) with a median (IQR) of 69% (65% to 73%). The distribution of sex in included analyses did not seem to vary by category of the predicted outcome (Figure 3 top left).
3.
Participant characteristics in included analyses by outcome. Top left: percentage of females; middle left: measure of centre of disease duration in years; bottom left: measure of centre of age in years as reported at disease onset or at the time of analysis; top right: diagnostic criteria by publication year per outcome; middle right: diagnostic subtype by publication year; bottom right: percent treated by publication year measured at baseline or during follow‐up. Data for the year 2021 are incomplete (only until July). CDMS: conversion to clinically definite MS; CPMS: conversion to progressive MS.
The reported measure of centre (i.e. mean or median) for participant disease duration in 54 (56%) analyses ranged from 0.1 years in participants with a diagnosis of 100% CIS in Wottschel 2015 to 19 years in participants with a diagnosis of 100% RRMS in Seccia 2020. As expected, participants included in the analyses of conversion to progressive MS had been living with MS diagnosis for a longer time compared to those included in the analyses of conversion to definite MS, who had their first symptoms very recently (Figure 3 middle left).
In 87 (91%) analyses that reported age, the reported measure of centre ranged from 24.8 years (Bergamaschi 2007 BREMS Ext Val) to 51.3 years (Rocca 2017; Rovaris 2006). Included in this summary are 40 (42%) analyses that did not specify the time point of measurement, 32 (33%) analyses that reported age at disease onset, and 15 (16%) analyses that reported age at an unclear time or for the source population. The participants were older at the time of those analyses with disability related or composite outcomes as opposed to relapse or diagnostic conversion outcomes. There was no evidence of variance in the distribution of age at onset by category of predicted outcome (Figure 3 bottom left).
Of those 84 (88%) analyses that clearly reported the diagnostic subtype of the included participants, participants of a single subtype were recruited in 62 (74%): CIS in 17 (20%), RRMS in 40 (48%), PPMS in two (Rocca 2017; Rovaris 2006) analyses, and SPMS in three models from a single study (Law 2019 Ada; Law 2019 DT; Law 2019 RF). Participants with a mixture of the aforementioned diagnoses were included in eight (10%) analyses (Agosta 2006; Bejarano 2011 Val; Kosa 2022; Montolio 2021; Sombekke 2010; Szilasiová 2020; Vukusic 2004; Yperman 2020). The remaining 14 (17%) used a different diagnostic subtyping in describing the mixture of their participants. The models developed in participants with primary or secondary progressive subtypes were predicting disability outcomes. As expected, all models predicting conversion to definite MS were developed in participants with CIS and all models predicting conversion to progressive MS were developed in participants with RRMS (Figure 3 middle right).
Of those 68 (71%) analyses that clearly reported the diagnostic criteria at recruitment, 13 (19%) used a mixture of different criteria: 18 (26%) used Poser 1983, two (3%) used Thompson 2000, 41 (60%) used one or more versions of the McDonald criteria (17 used 2001 (McDonald 2001), 11 used 2005 (Polman 2005), 10 used 2010 (Polman 2011), six used 2017 (Thompson 2018b), and three used an unspecified version), seven (10%) analyses used their own definition (Bendfeldt 2019 Linear Placebo; Bendfeldt 2019 M7 Placebo; Bendfeldt 2019 M9 IFN; Law 2019 Ada; Law 2019 DT; Law 2019 RF; Runia 2014), and Olesen 2019 used criteria other than mentioned above (Optic Neuritis Study Group 1991). The changing diagnostic criteria were reflected in the diversification of the criteria over time (Figure 3 top right). Although newer criteria are increasingly used, some studies published after 2015 were conducted in participants diagnosed with McDonald 2001 (Manouchehrinia 2019 Ext Val 2; Montolio 2021; Pellegrini 2019; Szilasiová 2020; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val; Ye 2020 gene signature; Ye 2020 nomogram) or even Poser 1983 (Manouchehrinia 2019 Ext Val 1; Skoog 2019 Ext Val; Skoog 2019 Val; Spelman 2017; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val).
In the 45 (47%) analyses with clear reporting, the proportion of participants on treatment at recruitment ranged from 0% in 28 analyses to 100% (Calabrese 2013 Dev; Calabrese 2013 Ext Val; Pisani 2021) with a median (IQR) of 0% (0% to 10%) and at least one participant was on treatment at recruitment in 17 (38%) analyses. In the 37 (39%) analyses with clear reporting, the proportion of participants on treatment during follow‐up ranged from 0% in 11 analyses to 100% (Bendfeldt 2019 M9 IFN; Calabrese 2013 Dev; Calabrese 2013 Ext Val; Manouchehrinia 2019 Ext Val 2; Manouchehrinia 2019 Ext Val 3; Oprea 2020; Pisani 2021; Szilasiová 2020), with a median (IQR) of 35% (0% to 68%). When the analyses reporting the treatment or its timing unclearly are included, the median (IQR, range) proportion of participants receiving treatment during follow‐up becomes 50% (12% to 73%, 0% to 100%) in 58 analyses, of which at least one participant was on treatment in 47 (81%). As expected, the proportion of participants receiving treatment is higher during follow‐up than at recruitment. This trend is especially visible in the analyses published during the last 15 years. Regardless of the time point of measurement, the proportion treated is increasing as publications become more recent (Figure 3 bottom right). It should be noted that some analyses were conducted on data from RCT arms, which partially explain percentages of 0% and 100% treated during follow‐up.
Year of observation start ranged from 1972 (Weinshenker 1991 M3 Dev) to 2014 (Brichetto 2020; Olesen 2019 Candidate; Olesen 2019 Routine) with a median of 2003 in 55 (57%) analyses clearly reporting when data collection or recruitment started. Year of observation end ranged from 1984 (Weinshenker 1991 M3 Dev) to 2021 (Kosa 2022) with a median of 2013 in 50 (52%) analyses clearly reporting when data collection or recruitment ended.
In 46 (48%) analyses that clearly reported both of the items, the median (IQR, range) duration of data collection was 7 (3 to 12, 0 to 33) years.
Outcomes
Although definitions in individual analyses might differ, we categorised the outcomes into one of the following domains in line with our PICOTS: disability, relapse, conversion to clinically definite MS, and conversion to progressive MS. Composite outcomes containing any one of the above were also included and categorised separately.
Disability progression
Of the 96 analyses, 31 model developments and eight validations (41%) defined outcomes related to disability progression. Most of these, 33 (85%), operationalised it by using EDSS, two by DSS (Weinshenker 1991 M3 Dev; Weinshenker 1996 M3 Ext Val), and two by MS Severity Score (MSSS) (Bergamaschi 2015 BREMSO MSSS Val; Sombekke 2010), a measure derived from EDSS. The most common outcome definition based on EDSS in nine analyses was disability progression or clinical worsening (sometimes confirmed, sometimes not) based on an increase in EDSS (at least 0.5 increase if EDSS < 6 and at least 1 point increase if EDSS > 5.5). Other outcomes defined by different levels of or just change in EDSS included aggressive disease, severe MS, worsening, and residual disability after relapse. Apart from (E)DSS based outcomes, two analyses used other measures of disability: Kuceyeski 2018 used the SDMT to measure cognitive disability and de Groot 2009 Dexterity used the 9‐Hole Peg Test (9‐HPT). Many of the analyses with outcomes based on disability, 22 (56%), were in participants with a mixture of diagnostic subtypes, 11 (28%) were only in RRMS participants, three (8%) were only in SPMS participants (Law 2019 Ada; Law 2019 DT; Law 2019 RF), two (5%) were only in PPMS participants (Rocca 2017; Rovaris 2006), and Roca 2020 did not report the diagnostic subtype of the participants. In those analyses that defined the timing of measurement, disease progression was measured at the earliest six months (as residual disability after relapse in Lejeune 2021), and at the latest 15 years (as EDSS score ≥ 5.0 in Szilasiová 2020) after intended time of prognostication. In those analyses that did not specify the timing of measurement or had time‐to‐event outcomes, either the follow‐up or the time of outcome occurrence were described as being at the earliest 5.25 years (as clinically worsened in Rovaris 2006) and at the latest 55 years (as mild MS in Bergamaschi 2015) after the intended time of prognostication.
Relapse
Six model developments and two validations (8%) defined outcomes based on relapses: the model developed in Sormani 2007 and its external validation were in participants with RRMS diagnosis (Sormani 2007 Dev; Sormani 2007 Ext Val), the dataset that was used for the models in Gurevich 2009 and Ye 2020 was composed of a mixture of participants with CIS and clinically definite MS (Gurevich 2009 FLP Dev; Gurevich 2009 FLP Ext Val; Gurevich 2009 FTP; Ye 2020 gene signature; Ye 2020 nomogram), and the model in Vukusic 2004 was developed in a mixture of participants with RRMS and SPMS. The relapse outcome definition in Vukusic 2004 had time of measurement: three months after the intended time of prognostication, which was child delivery. Relapse was conceptualised as a time‐to‐event outcome in the analyses other than Vukusic 2004 and the follow‐up was described as being at the earliest 10 months (Sormani 2007 Dev) and at the latest 16 months (Sormani 2007 Val) after the intended time of prognostication.
Conversion to a more advanced disease subtype
Seventeen (18%) model developments defined outcomes of conversion to definite MS in participants with CIS. When defining the definite MS, five (29%) of these referred to McDonald 2010 (Polman 2011) (Aghdam 2021; Olesen 2019 Candidate; Olesen 2019 Routine; Zakharov 2013; Zhang 2019), Yoo 2019 referred to McDonald 2005 (Polman 2005), four (24%) referred to Poser 1983 (Gout 2011; Martinelli 2017; Runia 2014; Spelman 2017), and Bendfeldt 2019 referred to a modified Poser criteria (Bakshi 2005). From the remaining analyses, Borras 2016 provided a definition of definite MS that included Barkhof criteria, whereas no criteria were cited in Wottschel 2015 or Wottschel 2019. In those analyses that defined the timing of measurement, conversion to definite MS was measured at the earliest at one year (Wottschel 2015 one year; Wottschel 2019) and at the latest three years (Wottschel 2015 three years; Zhang 2019) after intended prognostication. In those analyses that did not specify the timing of measurement or had time‐to‐event outcomes, either the follow‐up or the time of outcome occurrence were described as being at the earliest 3.4 years (Olesen 2019) and at the latest 12.7 years (Gout 2011).
Seventeen model developments and 10 validations (28%) defined outcomes of conversion to progressive MS in participants with RRMS, except for Brichetto 2020 in which the diagnostic subtype of the model development population is unclear. When describing the secondary progression outcome, seven (26%) of these analyses referred to Lublin 1996 (Manouchehrinia 2019 Dev; Manouchehrinia 2019 Ext Val 1; Manouchehrinia 2019 Ext Val 2; Pisani 2021; Skoog 2014 Dev; Skoog 2019 Ext Val; Skoog 2019 Val), 12 (44%) did not cite any criteria but provided an outcome definition based on EDSS, and eight (30%) neither cited criteria nor provided an operationalised definition of the outcome (Brichetto 2020; Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever; Pinto 2020 SP; Seccia 2020 180 days; Seccia 2020 360 days; Seccia 2020 720 days). In those analyses that defined the timing of measurement, conversion to progressive MS was measured at the earliest at six months (Seccia 2020 180 days; Tacchella 2018 180 days) and at the latest five years (Calabrese 2013) after the intended time of prognostication. In those analyses that did not specify the timing of measurement or had time‐to‐event outcomes, either the follow‐up or the time of outcome occurrence were described as being at the earliest 10 years (Misicka 2020 10 years) and at the latest 56.7 years (Skoog 2014) after the intended time of prognostication.
Composite
Finally, the remaining four model developments and one external validation (5%) had composite outcomes. Ahuja 2021 defined a relapse outcome as a clinical and/or a radiological event at one year in a participant group of mixed diagnostic subtypes. Kosa 2022 defined a model‐based outcome that included disability and imaging components in a participant group of mixed diagnostic subtypes, with no clear timing of measurement. de Groot 2009 defined a cognitive disability outcome within three years based on multiple clinical test results in a participant group of mixed diagnostic subtypes. Pellegrini 2019 defined a disability outcome based on multiple clinical scale or test results measured within two years in people with RRMS.
The number of developments or validations for different outcome types by publication year are shown in Figure 4. There seems to be an increased interest in publishing models developed to predict diagnostic conversion to a more advanced disease state (definite MS or progressive MS) during the last decade. This might be related to the changing diagnostic criteria and a willingness to predict the newly established criteria. Interestingly, while the models predicting conversion to progressive MS were validated the most in terms of their relative frequency, there were no validations of models predicting conversion to definite MS.
4.
Outcomes in included analyses by year of publication. Left: categories in model developments; right: categories in model validations. Data for the year 2021 are incomplete (only until July). CDMS: conversion to clinically definite MS; CPMS: conversion to progressive MS.
Predictors
Predictor domains
Of the 75 developments, demographic predictors were considered for inclusion in 65 (87%) and finally included in 49 (65%) models. Predictors related to disability scores or tests were considered for inclusion in 56 (75%) and finally included in 38 (51%) models. Predictors related to symptoms (relapses) were considered for inclusion in 55 (73%) and finally included in 37 (49%) models. Predictors derived from analyses of MR images were considered for inclusion in 42 (56%) and finally included in 36 (48%) models. Of the 27 models developed in participants not confined to a single diagnostic subtype, 17 (63%) considered diagnostic categories as predictors and finally nine (33%) included them in the model. Predictors related to MS treatment were considered for inclusion in 15 (20%) and finally included in nine (12%) models. Predictors from molecular analysis of proteins, transcripts, or genes were considered for inclusion in 10 (13%) and finally included in all of those models. Predictors derived from cerebrospinal fluid (CSF) analysis were considered for inclusion in 10 (13%) and finally included in seven (9%) models. Electrophysiological predictors were considered for inclusion in five (7%) and finally included in four (5%) models. Serum 25‐OH‐vitamin D, considered in Runia 2014, was the only non‐CSF laboratory sample parameter to be considered, but was not selected in the final model. Only Aghdam 2021 considered a predictor from the environmental domain, season of attack (spring vs other), for inclusion but it was also not selected in the final model.
The proportion of model developments that considered a specific domain across publication years are presented in Figure 5 (top). More recent models seem to increasingly consider para‐clinical predictors, like those derived from the analysis of imaging, CSF, omics, and electrophysiological tests. This may be related to increasing interest in these biomarkers as prognostic factors, which sometimes is the main focus of the included studies, and the increased availability of technological means to collect and analyse them. The consideration of MS treatment in prognostic model developments also shows an expected increase with time as treatment options increase and become widespread. Predictor domains considered and included in individual models are presented in Appendix 5.
5.
Predictors in included models. Top: percent of models considering each predictor domain by year of publication; bottom left: number of models with selection considering (light blue) and including (dark blue) each predictor domain; bottom‐right: number of considered and included predictors (on log‐2 scale) per modelling method by publication year. Shaded regions depict the predictor number range. Data for the year 2021 are incomplete (only until July). CSF: cerebrospinal fluid, ML: machine learning.
Most of the model developments (71%) considered between three and five domains out of the 11 reported above. Figure 5 (bottom left) compares the frequency of consideration and inclusion of predictor domains in the 47 (63%) models that considered more than one domain for inclusion and had predictor selection. When considered, para‐clinical biomarkers from the domains of imaging, omics, CSF, and electrophysiology seem to be included more frequently than predictors from other domains. There are probably two explanations for this observation. Firstly, the authors considering these predictors in a prognostic model are likely to be interested in them and to select a final model that contains them (e.g. Martinelli 2017). Also, the number of possible predictors that can be derived from these measurements is high. Hence, predictors from these domains tend to outnumber those from other domains and survive a selection procedure (e.g. Gurevich 2009).
Other predictors
Predictors that were considered for inclusion in a total of 28 (37%) developments from 18 studies, but that do not fit any of the above categories, were: administrative (duration of follow‐up, seen at onset, annualised visit density, hospitalisation, scanner, study identifier, presence of specific medical assessments, country, MRI site), medical history related (co‐treatment, concomitant diseases, procedures), pregnancy and post‐partum related, patient‐reported outcomes or symptoms, disability not measured by scores or tests, and output of another predictive model. All of these predictors were considered in only single studies except follow‐up time (six studies) and pregnancy (two studies).
Number of predictors
The number of considered predictors (in degrees of freedom) ranged from two (Zakharov 2013) to 852,167 (Kosa 2022) with a median (IQR) of 23 (12.5 to 124) in 67 (89%) model developments (20 of which reported it unclearly). In seven (9%) model developments, neural network algorithms were used with raw/unsummarised imaging or longitudinal data (De Brouwer 2021; Roca 2020; Seccia 2020 180 days; Seccia 2020 360 days; Seccia 2020 720 days; Tousignant 2019; Yoo 2019), making predictor number counts irrelevant. In Bendfeldt 2019 Linear Placebo, the number of considered predictors in the support vector machine (SVM) model was unclear and is reported as the number of voxels in MRI images.
The number of predictors included in the final models (in degrees of freedom) ranged from two (Borras 2016; Sormani 2007 Dev; Zakharov 2013) to 703 (Kuceyeski 2018) with a median (IQR) of 6.5 (4 to 11.5) in 64 (85%) models (17 of which were unclear). For four (5%) model developments (Bendfeldt 2019 Linear Placebo; Pinto 2020 Severity 10 years; Pinto 2020 Severity 6 years; Pinto 2020 SP), there was insufficient information on the number of predictors in the final model.
The number of predictors considered for and included in the final models were clear for 31 (41%) model developments. Of these, 10 (32%) had no predictor selection, so the numbers before and at the end of the modelling process were equal. In the remaining, the difference between the number of considered and included predictors ranged from one (de Groot 2009 Cognitive; Olesen 2019 Routine) to 201 (Ye 2020 nomogram) with a median (IQR) of 14 (1 to 28) and the median (IQR) percent decrease in the number of predictors from considered to included was 77% (40% to 81%). The number of considered and included predictors by algorithm type are presented in Figure 5 on the log2‐scale. As expected, independent of time, models developed using ML methods seem to both consider and include higher number of predictors than those using traditional methods. There also seems to be a slight increase with time in the number of considered predictors for models developed with traditional statistics and number of included predictors in models developed with ML methods.
Bergamaschi 2015 was the only study in which the set of predictors was different in the validation than in the original model, BREMS, which had nine predictors (developed in Bergamaschi 2001 and initially evaluated in Bergamaschi 2007). Two predictors that were measured within one year of disease onset were dropped from the model without refitting, resulting in BREMSO with seven predictors.
Predictor handling
In four (5%) of the 75 models, at least one interaction between predictors was considered during development. In eight (11%) models, no interactions were considered during development. Modelling methods, e.g. random forests, that intrinsically accounted for interactions were used in 31 (41%) model developments. For the remaining 32 (43%) models, it was not reported if interactions were considered or not during development. During the development of 44 (59%) models, there was no evidence of categorisation of continuous predictors. During the development of 17 (23%) models at least one predictor was dichotomised or categorised. How the predictors were handled was unclear in 13 (17%) model developments. There was insufficient information to deduce how the predictors were handled during development in Zakharov 2013.
Timing of candidate predictor measurement was described as 'at disease onset' in 20 (27%) models. The predictors were measured at study baseline in 17 (23%) models using data from RCT or cohort studies. At least 13 (17%) models considered predictors measured at multiple visits, and at least the models in Misicka 2020 and Oprea 2020 were based on predictor and outcome data collected at a single time point (Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever; Oprea 2020).
Sample size and missing data
In model developments, sample size ranged from 33 participants (Olesen 2019) to 8825 participants (Manouchehrinia 2019), with a median size (IQR) of 186 (84 to 664) participants. There were six developments (8%, from four studies) with visits as the unit of analysis. In these studies, visits per participants over time were treated as independent observations. The number of visits ranged from 527 (Tacchella 2018) to 2502 (Yperman 2020). Event number was not relevant for seven (9%) developments with continuous outcomes (Bejarano 2011; Gurevich 2009; Kosa 2022; Kuceyeski 2018; Margaritella 2012; Roca 2020; Rocca 2017). The remaining developments analysed a median of 80 events (IQR 37 to 165 events, range 16 to 1953), but five values were unclearly reported.
There were three developments that considered raw/unsummarised imaging data: while Tousignant 2019 considered only imaging data, Roca 2020 and Yoo 2019 both also considered summary predictors, such as lesion load/volume, as well as patient demographics. For these studies, the maximum EPV was calculated excluding the raw/unsummarised imaging data. The EPV could not be computed for Tousignant 2019 due to predictor type and for one development (Bendfeldt 2019 Linear Placebo) due to missing number of considered predictors. The median EPV in the remaining 73 developments was 3.9 (IQR 1 to 9.9, range 0.0002 to 122.1); however, the precise EPV was unclear in 22 developments and the largest, i.e. most optimistic, EPV possible based on reported information was used. For the 73 developments for which EPV could be computed, 17 (23%) had an EPV of 10 or greater and seven (10%) had an EPV of 20 or greater, respectively the older and more current rule of thumb thresholds for the minimum EPV needed for prediction model development.
Model validations included from 10 (Gurevich 2009) to 14,211 participants (Bergamaschi 2015) with a median (IQR) of 217 participants (136 to 700 participants). The number of events was not reported in four validations (19%) and not relevant in one study due to a continuous outcome. Validation in Ahuja 2021 was at the observation level, with an unreported number of observations from 186 participants. The median number of events in model validation was 76 (IQR 33 to 130, range 19 to 3567), below the 100 event minimum suggested by PROBAST. Only seven (44%) of the 16 validations with clear reporting included at least the minimum recommended number of events.
The most common method for handling missing data, employed in 35 (36%) analyses, was to exclude participants from the study if data were missing for specific or any variables. Complete case analysis was used in 26 (27%) analyses. Predictors, instead of participants, with missing data were excluded from nine analyses (9%) found in two studies (Seccia 2020; Zhao 2020). Imputation was reported for 18 analyses (19%), but only five of these reported using multiple imputation. Multiple methods for dealing with missing data were often combined, as reported in 25 (26%) analyses. The method of handling missing data was not reported for 25 (26%) analyses. Although reporting on the number of participants with missing data was often unclear, it was clear that, when using routine care or registry data, hundreds, even thousands, of participants were excluded from analysis due to missing or error‐prone data. De Brouwer 2021, for example, describe the exclusion steps that brought the MSBase analysis set from 55,409 participants down to 6682 participants, an 88% drop.
Model development
We identified 34 models developed using traditional statistical methods (45%), 40 models developed using ML methods (53%), and one model that selected predictors using ML but fit the final model using traditional statistical methods. The traditional statistical methods included 16 (46%) logistic regression models, 15 (43%) survival analyses (11 of them Cox models), one Bayesian model averaging model, and three (9%) linear regression models.
Of those using ML, three (8%) were developed using penalised regression (a LASSO penalty was applied to two logistic models and one Cox model), 10 (25%) using SVM, and 16 (40%) using tree‐based methods (two classification trees, nine random forests, and five using boosting). Of the random forest developments, one had a numeric outcome (Kosa 2022), and one a survival outcome (Pisani 2021). One model used partial least squares regression (Kuceyeski 2018). Another eight (20%) used neural networks and an additional two models were developed by combining ML methods.
The first identified development using ML was published in 2009. Gurevich 2009 used a multi‐class SVM to distinguish between three data‐driven categories of time until relapse. Two years later, Bejarano 2011 used a multilayer perceptron (a type of neural network) to predict change in EDSS. Since 2018, ML developments have been published in the literature every year and in increasing frequency (see Figure 2 right). As of the latest search in July 2021, only prediction model developments employing ML had been published in 2021. Please note that the decrease in number of identified prediction modelling developments in 2021 is at least partially due to the search covering only the first half of the year.
Univariable predictor selection was reported in 17 developments (23%), while this was unreported or unclear in four developments (5%). While 22 developments (29%) took a full model approach, multivariable predictor selection in the remaining developments took several forms. Of these, eight (11%) based selection on coefficient hypothesis testing, 18 (24%) employed stepwise selection, seven (9%) selected from several models with different predictor sets, two (3%) relied on the selection properties of LASSO penalised regression, and another four (5%, Montolio 2021; Pinto 2020 three models) used LASSO for selection but not for prediction. Other methods of multivariable predictor selection methods were used in nine developments (12%), including Bayesian methods, variable importance ranking, minimal depth in tree‐based methods, frequency of selection within cross‐validation, and combinations of methods. For five developments (7%), the use of multivariable selection methods was unclear or not reported.
In one study, uniform shrinkage was applied to each of its three final developed models (de Groot 2009). Some amount of shrinkage was induced in 38 (51%) developments due to modelling methods, including Bayesian methods, penalised estimation, and other ML methods. No shrinkage was applied in 31 (41%) developments, and it was unclear for two (3%) developments.
Of the 41 developments involving tuning parameters, 19 (46%) mentioned specific tuning parameters and the method used to tune them. There were six (15%) models from two studies for which the use of software defaults were reported upon correspondence (Pinto 2020; Tacchella 2018). Details were unclear in 10 (24%) developments in which tuning for parameters unrelated to the ML algorithm was mentioned but algorithm‐specific tuning was not. There was no reporting related to tuning in six model developments (15%).
Model performance and evaluation
Internal validation methods
Of the 73 model performance evaluations using development data, two evaluations relate to a single model assessed using development data in both the development study and a later validation study (Skoog 2014; Skoog 2019). There was a single development study in which model performance was only evaluated on an external validation set (Ahuja 2021). There were an additional two development studies in which model performance was not evaluated (Bergamaschi 2001; Weinshenker 1991). These were the two studies included because their models were evaluated as prediction models in later studies. Apparent performance was reported in 16 (22%) internal validations and a single random split of the data was used in nine (12%) evaluations. Cross‐validation and bootstrap procedures, preferred approaches to internal validation, were conducted 34 (47%) and 10 (14%) times, respectively. Methods were unclear in four (5%) internal validations, in which bootstrap methods were used for some purpose during development, but not clearly for performance evaluation. The number of bootstrap samples ranged from 200 (Manouchehrinia 2019) to 1500 (Spelman 2017). Leave‐one‐out cross‐validation was reported 10 times, while k‐fold cross‐validation was reported 18 times, with the number of folds varying between 2 (Wottschel 2019) and 10 (Bejarano 2011; Law 2019; Montolio 2021; Pinto 2020; Zhao 2020). Additionally, Wottschel 2019 assessed the influence of cross‐validation methods on classification performance estimates by comparing 2‐fold, 5‐fold, 10‐fold, and leave‐one‐out cross‐validation. Cross‐validation based on leaving a percentage of the data out or based on shuffle split was reported in another five evaluations.
Performance measures
There were a total of 93 model performance evaluations, 24 (26%) of which reported on model calibration, either with a plot, table, measure, or several of these. There were 16 evaluations with calibration plots, four with O:E tables, two with histograms or bar plots depicting differences between observed and predicted outcomes in some way, and one table of observed event frequencies across score levels. Calibration slopes were reported in four evaluations from two studies and the P value from the Hosmer‐Lemeshow test in five from four studies. The Gronnesby and Borgan test P value was reported for one model evaluation and mean square error once. The O:E ratio was reported twice in one study in order to provide a recalibration factor based on the development data and on external validation data (Skoog 2019).
We had intended to compute O:E ratios from reported information; however, the expected number of events or other expected outcomes were only rarely reported. Both of the evaluations rated at low risk of bias in the analysis domain assessed calibration in some way. Pellegrini 2019 presented bootstrap‐corrected calibration slopes of 1.08 (SE 0.17) and 0.97 (SE 0.15) for one and two year composite disease progression outcomes. De Brouwer 2021 reported the use of Platt scaling on their deep learning model for EDSS‐based disease progression at two years. This procedure transforms the output of the classification model into a probability distribution via logistic regression (Platt 1999). Upon correspondence, the authors of De Brouwer 2021 provided a calibration plot with no evidence of major departures from calibration.
There were 85 evaluations for which discrimination and classification measures were applicable (models with survival or binary outcomes). A c‐statistic was reported in 47 (55%) of them, 31 (36%) with some measure of uncertainty. Reporting was unclear about the use of a c‐statistic for survival data in three evaluations (Runia 2014; Spelman 2017; Ye 2020 gene signature). Reported c‐statistics ranged from a minimum of 0.59 (Pellegrini 2019; Ye 2020) to a maximum of 0.92 (Pisani 2021; Tommasin 2021), with a median (IQR) value of 0.77 (IQR 0.71 to 0.82). Both of the evaluations rated at low risk of bias in the analysis domain reported c‐statistics below the median observed across the literature: Pellegrini 2019 reported the minimum c‐statistic of 0.59 and De Brouwer 2021 reported a c‐statistic of 0.66.
Classification measures were reported in 49 (58%) evaluations of survival or classification models. These evaluations reported accuracy or error measures 36 times, sensitivity or specificity 43 times, positive or negative predictive values 21 times, and other measures such as F1 score eight times. There were nine evaluations communicating use of 0.5 as the threshold value for estimating classification performance and seven reporting classification measures for more than one threshold value. Another three used some percentile of the data and nine used data‐driven methods to identify an optimal threshold. Classification measures were also applied to models of continuous outcomes in five evaluations (Bejarano 2011 Dev; Bejarano 2011 Val; Gurevich 2009 FTP; Margaritella 2012; Rocca 2017), with the threshold value unclearly reported for two evaluations from one study (Bejarano 2011) and the other three evaluations using some window around the observed value to be predicted as a threshold.
Model presentation
Models were presented in various ways across studies and modelling methods. For the 35 developments using traditional regression methods for fitting, eight (23%) full regression models with intercepts/baseline hazards were presented and another eight (23%) regression models were presented without intercepts/baseline hazards or some other model coefficients. Three (9%) regression models were simplified into sum scores (Gout 2011; Runia 2014; Vasconcelos 2020), two of which were unweighted sums, i.e. predictor counts (Runia 2014; Vasconcelos 2020). The tools presented included eight nomograms (Manouchehrinia 2019; three in Misicka 2020; two in Olesen 2019; Spelman 2017; Ye 2020), three score charts (all in de Groot 2009), one web application (Skoog 2014), and one heat map (Borras 2016). Manouchehrinia 2019 additionally presented their nomogram as a web application and the web application of Skoog 2014 was updated using the shrinkage factor estimated in Skoog 2019. Two developments were described by lists of included predictors. Malpas 2020 presented a chart of relative risks associated with various combinations of predictors from their simplified model. Two (6%) developments based on traditional methods did not present the final model in any way (Oprea 2020; Zakharov 2013).
Of the 40 models fit using ML, only five (12%) reported tools allowing other users to make predictions for new people with MS. Lejeune 2021 presented a web application, Aghdam 2021 presented a decision tree, and Pisani 2021 presented a tool based on the sum of a heat map‐derived value and a formula weighted by predictor random forest minimal depths. The other two studies provided model coefficients from penalised regression without intercepts/baseline hazards (Ahuja 2021; Ye 2020). Other presentations included one bar chart of predictor weights from a linear SVM although a non‐linear SVM was fit (Bendfeldt 2019), and eight ML developments presented the final model as a list of included predictors. Ten ML developments were not followed by final model presentation in any way. Independent of the model presentation described above, there were a total of 19 ML developments that reported some measure of variable importance.
Model interpretation
Of the 57 studies included, 26 (46%) primarily aimed to predict clinical outcomes in individual patients, as indicated by mentioning the intent to create or assess a model or tool in their abstract, introduction, and discussion. In another 21 studies (37%), outcome prediction was an aim of the study; however, the focus appeared to be on other aspects of the study, such as predictors and modelling methods. Outcome prognostication in individuals was not the primary aim in 10 studies (18%), all of which were instead mainly interested in predictor identification or the usefulness of specific predictors. Forty‐three studies (75%) were presented as exploratory research, indicating some need for further development or validation, while 14 studies (25%) were presented with confirmatory conclusions, eight of which were not associated with any external validation.
We assessed the presence of information on study strengths and limitations, generalisability of results, and comparisons with other modelling studies for the 57 included studies. Most studies discussed their strengths and limitations (49 (86%) and 51 (89%) studies, respectively), just over half of studies (31, 54%) discussed the generalisability of their results; however, only 16 (28%) studies mentioned other models in their discussions. These comparisons with other models were focused on the predictors and modelling methods used, rather than a comparison of model performance with that of other MS prognostic models with similar outcomes. The most comprehensive comparison with other prognostic models, a table of performance measures for models from five other MS prognostic model studies with description of outcome definitions and timing, was presented by Montolio 2021.
Usability and reproducibility
Model usability and reproducibility, as defined in Appendix 3, was assessed for each of the 75 developed models and summarised in Table 2 (ordered by outcome). Usability was assessed in terms of skill and equipment specialisation required for predictor collection, model presentation, the ability of the presented model to estimate absolute risk, and the number of external validations performed for the model. Model reproducibility is summarised by the availability of the model/tool, code, and data.
1. Model usability and reproducibility.
| Model | Outcome | Predictor timing | Equipment | Usability | Absolute risk | Ext. Val. | Reproducibility |
| Agosta 2006 | Disability (EDSS) | From study entry to 1 year after study entry | Specialty centre | Unclear | No | 0 | Unclear |
| Bejarano 2011 | Disability (EDSS) | From disease onset (MS) to study entry | Specialty centre | No model | Not risk model | 1 external refit | None |
| De Brouwer 2021 | Disability (EDSS) | From onset to index date (including trajectories during the 3 years prior to index date) | No special equipment | No model | No | 0 | Code |
| de Groot 2009 Dexterity | Disability (9HPT) | At Poser MS diagnosis (within 6 months) | Specialty centre | Tool + instructions | No | 0 | Tool |
| de Groot 2009 Walking | Disability (EDSS) | At Poser MS diagnosis (within 6 months) | Standard hospital | Tool + instructions | No | 0 | Tool |
| Kuceyeski 2018 | Disability (cognitive ‐ SDMT) | From disease onset (undefined ‐ RRMS?) to final follow‐up | Specialty centre | No model | Not risk model | 0 | None |
| Law 2019 Ada | Disability (EDSS) | From disease onset (MS) to study entry | Specialty centre | No model | No | 0 | None |
| Law 2019 DT | Disability (EDSS) | From disease onset (MS) to study entry | Specialty centre | No model | No | 0 | None |
| Law 2019 RF | Disability (EDSS) | From disease onset (MS) to study entry | Specialty centre | No model | No | 0 | None |
| Lejeune 2021 | Disability (EDSS) | From disease onset (undefined ‐ RRMS?) to index relapse | No special equipment | Tool + instructions | Yes | 1 | Tool, DOR |
| Malpas 2020 | Disability (EDSS) | From symptom onset to 1 year after onset | No special equipment | Tool + instructions | No | 1 | Tool |
| Mandrioli 2008 | Disability (EDSS) | From disease onset (first attack) to diagnosis (CDMS) | Specialty centre | Unclear | Yes | 1 | Unclear |
| Margaritella 2012 | Disability (EDSS) | From disease onset (MS) to 1 year prior to outcome | Specialty centre | Unclear | Not risk model | 0 | Unclear |
| Montolio 2021 | Disability (EDSS) | At study entry, year 1 visit and year 2 visit | Specialty centre | No model | No | 0 | None |
| Oprea 2020 Disability | Disability (EDSS) | At study entry | No special equipment | No model | No | 0 | None |
| Pinto 2020 Severity 10 years | Disability (EDSS) | From onset to 5 years post‐prognostication | Not reported | No model | No | 0 | None |
| Pinto 2020 Severity 6 years | Disability (EDSS) | From onset to 2 years post‐prognostication | Not reported | No model | No | 0 | None |
| Roca 2020 | Disability (EDSS) | At FLAIR imaging (anytime) | Specialty centre | No model | Not risk model | 0 | None |
| Rocca 2017 | Disability (EDSS) | From study entry to 15 months after study entry | Specialty centre | Model | Not risk model | 0 | Model |
| Rovaris 2006 | Disability (EDSS) | From study entry (anytime during PPMS) | Specialty centre | Unclear | No | 0 | Unclear |
| Sombekke 2010 | Disability (MSSS) | At disease onset (MS) | Specialty centre | Model | No | 0 | Model |
| Szilasiova 2020 | Disability (EDSS) | At study entry | Standard hospital | Unclear | 0 | Unclear | |
| Tommasin 2021 | Disability (EDSS) | At imaging visit | Specialty centre | No model | No | 0 | None |
| Tousignant 2019 | Disability (EDSS) | At imaging visit | Specialty centre | No model | No | 0 | None |
| Weinshenker 1991 M3 | Disability (DSS) | From disease onset (initial symptom) to assessment (not defined) | No special equipment | Model + instructions | Yes | 1 | Model |
| Weinshenker 1996 Short‐term | Disability (EDSS) | From disease onset (initial symptom) to outcome measurement | No special equipment | Model | Yes | 0 | Model |
| Yperman 2020 | Disability (EDSS) | At clinical visit (unclear: at any time during MS) | Specialty centre | No model | No | 0 | DOR |
| Zhao 2020 LGBM All | Disability (EDSS) | 2‐year observation window | Specialty centre | No model | No | 0 | Code, DOR |
| Zhao 2020 LGBM Common | Disability (EDSS) | 2‐year observation window | Specialty centre | No model | No | 1 (unclear if refit) | Code, DOR |
| Zhao 2020 XGB All | Disability (EDSS) | 2‐year observation window | Specialty centre | No model | No | 0 | Code, DOR |
| Zhao 2020 XGB Common | Disability (EDSS) | 2‐year observation window | Specialty centre | No model | No | 1 (unclear if refit) | Code, DOR |
| Gurevich 2009 FLP | Relapse | During CIS or CDMS | Specialty centre | No model | No | 1 | None |
| Gurevich 2009 FTP | Relapse | During CIS or CDMS | Specialty centre | No model | Not risk model | 0 | None |
| Sormani 2007 | Relapse | From 2 years prior to baseline measurement (during RRMS) | Standard hospital | Model + instructions | Yes | 1 | Model |
| Vukusic 2004 | Relapse | From disease onset (MS) to delivery | No special equipment | Model + instructions | Yes | 0 | Model |
| Ye 2020 gene signature | Relapse | At study entry | Specialty centre | Model + instructions | No | 0 | Model, Data |
| Ye 2020 nomogram | Relapse | At study entry | Specialty centre | Model | Yes | 0 | Tool, Data |
| Aghdam 2021 | Conversion to definite MS | At ON event | Standard hospital | Model | Yes | 0 | Tool |
| Bendfeldt 2019 Linear Placebo | Conversion to definite MS | At CIS onset (within 60 days) | Specialty centre | No model | No | 0 | None |
| Bendfeldt 2019 M7 Placebo | Conversion to definite MS | At CIS onset (within 60 days) | Specialty centre | No model | No | 0 | None |
| Bendfeldt 2019 M9 IFN | Conversion to definite MS | At CIS onset (within 60 days) | Specialty centre | No model | No | 0 | None |
| Borras 2016 | Conversion to definite MS | At disease onset (CIS, up to 126 days after onset) | Specialty centre | Tool + instructions | Yes | 0 | Tool |
| Gout 2011 | Conversion to definite MS | At CIS onset (admission for CIS event) | Standard hospital | Tool + instructions | No | 0 | Tool |
| Martinelli 2017 | Conversion to definite MS | At CIS onset (within 3 months) | Specialty centre | No model | No | 0 | None |
| Olesen 2019 Candidate | Conversion to definite MS | At disease onset (ON, up to 38 days after onset) | Specialty centre | Tool + instructions | Yes | 0 | Tool, DOR |
| Olesen 2019 Routine | Conversion to definite MS | At disease onset (ON, up to 38 days after onset) | Specialty centre | Tool + instructions | Yes | 0 | Tool, DOR |
| Runia 2014 | Conversion to definite MS | At disease onset (CIS) | Standard hospital | Tool + instructions | No | 0 | Tool |
| Spelman 2017 | Conversion to definite MS | At disease onset (within 12 months) | Specialty centre | Tool + instructions | Yes | 0 | Tool |
| Wottschel 2015 1 year | Conversion to definite MS | At CIS onset (within a mean of 6.15 weeks) | Specialty centre | No model | No | 0 | None |
| Wottschel 2015 3 years | Conversion to definite MS | At CIS onset (within a mean of 6.15 weeks) | Specialty centre | No model | No | 0 | None |
| Wottschel 2019 | Conversion to definite MS | At CIS onset (within 14 weeks) | Specialty centre | No model | No | 0 | None |
| Yoo 2019 | Conversion to definite MS | At CIS onset (within 180 days) | Specialty centre | No model | No | 0 | None |
| Zakharov 2013 | Conversion to definite MS | At first MRI after CIS onset | Specialty centre | No model | No | 0 | None |
| Zhang 2019 | Conversion to definite MS | At CIS onset (primary clinical work‐up for CIS) | Specialty centre | No model | No | 0 | None |
| Bergamaschi 2001 BREMS | Conversion to progressive MS | From disease onset (RRMS) to 1 year after disease onset | No special equipment | Unclear | No | 2, simplified: 2* | Unclear |
| Brichetto 2020 | Conversion to progressive MS | At visit of interest | Standard hospital | No model | No | 0 | None |
| Calabrese 2013 | Conversion to progressive MS | At study entry (during RRMS) | Specialty centre | Model + instructions | Yes | 1 | Model |
| Manouchehrinia 2019 | Conversion to progressive MS | From disease onset (unclear: RRMS?) up to first EDSS recorded (several years after onset) | No special equipment | Tool + instructions | Yes | 3 | Tool |
| Misicka 2020 10 years | Conversion to progressive MS | At study interview | Specialty centre | Tool + instructions | Yes | 0 | Tool |
| Misicka 2020 20 years | Conversion to progressive MS | At study interview | Specialty centre | Tool + instructions | Yes | 0 | Tool |
| Misicka 2020 Ever | Conversion to progressive MS | At study interview | Specialty centre | Tool + instructions | Yes | 0 | Tool |
| Pinto 2020 SP | Conversion to progressive MS | From onset to 2 years post‐prognostication | Not reported | No model | No | 0 | None |
| Pisani 2021 | Conversion to progressive MS | From RRMS onset to 2 years post‐onset | Specialty centre | Model | No | 0 | Tool, DOR |
| Seccia 2020 180 days | Conversion to progressive MS | Patient trajectories until index visit (during RRMS) | Standard hospital | No model | No | 0 | Data |
| Seccia 2020 360 days | Conversion to progressive MS | Patient trajectories until index visit (during RRMS) | Standard hospital | No model | No | 0 | Data |
| Seccia 2020 720 days | Conversion to progressive MS | Patient trajectories until index visit (during RRMS) | Standard hospital | No model | No | 0 | Data |
| Skoog 2014 | Conversion to progressive MS | From last relapse to index date, repeatedly | No special equipment | Tool + instructions | Yes | 1 | Tool |
| Tacchella 2018 180 days | Conversion to progressive MS | From disease onset to the index visit of interest | Standard hospital | No model | No | 0 | None |
| Tacchella 2018 360 days | Conversion to progressive MS | From disease onset to the index visit of interest | Standard hospital | No model | No | 0 | None |
| Tacchella 2018 720 days | Conversion to progressive MS | From disease onset to the index visit of interest | Standard hospital | No model | No | 0 | None |
| Vasconcelos 2020 | Conversion to progressive MS | From onset (unclear) to at least 2 years (unclear) | No special equipment | Unclear | No | 1 | Unclear |
| Ahuja 2021 | Composite (relapse) | From 12 months prior to index date | Standard hospital | Model | No | 1 | Model, Code, DOR |
| Kosa 2020 | Composite (EDSS, SNRS, T25FW, NDH‐9HPT) | At lumbar puncture | Specialty centre | No model | Not risk model | 0 | None |
| de Groot 2009 Cognitive | Composite (cognitive tests) | At Poser MS diagnosis (within 6 months) | Specialty centre | Tool + instructions | No | 0 | Tool |
| Pellegrini 2019 | Composite (EDSS, T25FW, 9HPT, PASAT, VFT) | From disease onset (MS) to study entry | Standard hospital | Model | No | 0 | Model |
9HPT: 9‐hole peg test Ada: adaptive boosting BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset CDMS: clinically definite multiple sclerosis CIS: clinically isolated syndrome DOR: data available on request (as reported in the publication) DSS: Disability Status Scale DT: decision tree EDSS: Expanded Disability Status Scale FLP: first level predictor FTP: fine tuning predictor IFN: interferon LGBM: light gradient boosting machine MRI: magnetic resonance imaging MS: multiple sclerosis MSSS: multiple sclerosis severity score NDH‐9HPT: non‐dominant hand 9‐hole peg test ON: optic neuritis PASAT: Paced Auditory Serial Addition Test PPMS: primary progressive multiple sclerosis RF: random forest RRMS: relapsing‐remitting multiple sclerosis SDMT: symbol digit modalities test SNRS: Scripps neurological rating scale SP: secondary progressive T25FW: timed 25‐foot walk VFT: visual function test XGB: extreme gradient boosting
The timing of predictor collection varied across the final models. There were 38 models (51%) using information available at a single time point (20 at disease onset and 18 at an arbitrary point) and seven models (9%) using information from a specific timeframe (between 12 and 24 months) relative to disease onset or study entry. Twenty‐five models (33%) used data available over time and another four models from two studies specifically used predictor data longitudinally (De Brouwer 2021; Seccia 2020). Predictor assessment timing was unclear in the model of Vasconcelos 2020.
The level of skill and equipment specialisation could not be assessed for three developments from one study due to lack of information on selected predictors (Pinto 2020). All 72 models reporting details on included predictors were found to require a specialist in order to measure or assess predictors, 35 of which contained EDSS. For this reason, level of skill is omitted from Table 2. Greater variability in rating was observed for level of equipment specialisation: 11 models (14.7%) required no special equipment, relying only on demographics, disease subtype, symptoms, and treatments. Predictors from 17 models (22.7%) could be measured in a standard hospital, and 44 models (59%) required specialised equipment related to advanced imaging, omics, CSF markers, and evoked potential markers.
We identified 39 (52%) developed models that were not accompanied by model coefficients, tools, or instructions. Eight (11%) models were reported with basic model information, five (7%) with a model and instructions, and 16 (21%) were given as simple tools with instructions or explanation of use. This measure of usability was rated as unclear for seven (9%) models for which model components not considered to be predictors were not reported (for example, coefficients for follow‐up duration adjustment), when it was unclear if model coefficients were missing, or when coding of predictors was especially unclear. For two models, for example, coding of the basic demographic predictor sex was unclear (Margaritella 2012; Szilasiová 2020).
There were seven developed models that did not aim to predict the risk of a clinical outcome, but rather the value of a continuous measure. Of the 34 models for future disease risk that reported a final model in some way, absolute risk can be estimated with the reported information from 18 (53%) of them, but not for 15 (44%). Clarification of predictor coding would enable estimation of absolute risk from one further model (Szilasiová 2020).
Analysis data was made publicly available for five (7%) models from two studies (Gurevich 2009; Seccia 2020) and analysis code was publicly available for six (8%) models from three studies (Ahuja 2021; De Brouwer 2021; Zhao 2020). One further study developing two models (Ye 2020) reused the data provided by Gurevich and colleagues (Gurevich 2009). Six (8%) studies explicitly stated that data were available upon request (Ahuja 2021; Lejeune 2021; Olesen 2019; Pisani 2021; Yperman 2020; Zhao 2020). For 28 (37%) models from 19 studies, no models, code, or data were provided (three traditional regression models and 25 ML models). None of the studies provided a model/tool, code, and data or even just code and data.
Because multiple external validations should be performed before a model is deemed clinically useful, the number of external validations performed for each model is also given in Table 2. Of the identified models, only 12 (16%) were externally validated at least once. Of these, one model was externally validated twice (Bergamaschi 2001) and another three times (Manouchehrinia 2019).
Risk of bias
As depicted in Figure 6 (left), all but one of the 96 analyses (Pellegrini 2019) were found to have high risk of bias. This single study was co‐authored by a clinical prediction modelling methodologist from outside the MS field, who is part of the PROBAST group (Wolff 2019). The introduction of this study listed many of the substandard aspects of prediction model development in MS also identified in this review. It appeared to aim to depict correct prognostic model development and internal validation steps for the MS community.
6.
Risk of bias summary. Left: by domain; right: item‐wise for analysis domain.
The high risk of bias across the literature was driven mainly by the analysis domain, for which only two (2%) analyses, that of De Brouwer 2021 and Pellegrini 2019, were found to be at low risk of bias but 94 (98%) at high risk (see Figure 6 left), and, to a lesser extent, by the participants domain, for which 18 (19%) analyses were found to be at low risk but 78 at high or unclear risk of bias (59 (61%) and 19 (20%), respectively). Domain‐level risk of bias plots per analysis are provided in Appendix 6. Item‐level assessment details for the analysis domain are depicted in Figure 6 (right).
The high risk of bias related to the analysis domain was multi‐faceted but mainly driven by two PROBAST items: 80 (83%) analyses were found to have an insufficient number of participants and 81 (84%) analyses did not use relevant model performance measures, with most ignoring calibration. Besides the exclusion for missing data addressed in the participants domain, 32 (33%) analyses used other suboptimal methods for dealing with missing data. Predictor dichotomisation and univariable predictor selection are still used, as found in 13 (16%) analyses and 17 (23%) developments, respectively. A clear difference between studies using ML as opposed to traditional statistics can be seen in the reporting of final models. Of the 35 developments using traditional regression methods, 23 (66%) were reported in such a way that it was clear that the final model corresponded to the multivariable analysis. However, only three (8%) of the 40 ML developments reported final model details that correspond directly with the multivariable analysis. Most ML developments did not present tools or report enough information for understanding of the final models.
The two most common reasons for a high risk rating in the participants domain were the use of routine care or registry data (36 analyses, 38%) and inappropriate exclusion of participants (35 analyses, 36%). While registries are an important source of data for MS research, their quality and limitations should be reported and addressed. Data quality was rarely discussed, and the only reported method to deal with poor data quality was to exclude participants with missing/erroneous data. There was no mention of whether the excluded participants were otherwise similar to included participants with respect to observed covariates, and it was unclear whether study teams even had access to the excluded data in order to assess possible differences. Additionally, inappropriate inclusion of participants known to already have the outcome at baseline affected at least five developments from three studies (de Groot 2009; Malpas 2020; Szilasiová 2020). This is expected to result in overestimated performance estimates at internal validation (Moons 2019).
The use of problematic data sources also led to issues in the predictor and outcome domains. Combined with insufficient reporting, it was difficult to judge whether predictors and outcomes were assessed uniformly across participants and whether each was blinded to the other. The registry datasets cover long time periods and multiple sites, which makes it unlikely that predictors and outcomes were uniformly measured, especially given the rapid changes in diagnostic criteria and the poor generalisability of imaging predictors across machines (Seccia 2021). An important, independent issue to highlight within the predictors' domain though relates to timing. We identified 11 (11%) analyses using predictors only available after the intended time of model use, which makes the model unusable in practice. The intended time of model use was generally unclear, making it difficult to understand when the model is meant to be used and how far into the future it is meant to predict outcomes for.
Although our review question was broad in nature, we found only 36 (38%) analyses to be of low concern regarding applicability. The most common reason for concern related to participants was the inclusion of participants known to have the outcome of interest at the time the model was applied, jeopardising the categorisation of the model as a prognostic model. For example, one study defining disability as EDSS more than or equal to five at 15 years included an unknown number of participants with EDSS more than or equal to five already at baseline (Szilasiová 2020). We found the most frequent concern regarding predictors to be the inclusion of only a single predictor type (e.g. only imaging or genetic predictors) without consideration of more basic, easier‐to‐collect predictors. Only Kosa 2022 was rated at high concern due to unclear interpretation of the outcome. Kosa and colleagues modelled the outcome MS‐DSS, which is itself the output of another model, making interpretation difficult. The most common reason for high concern to overall applicability was a primary aim other than development or validation of a prognostic model for individual prediction, which was determined for 12 (12%) analyses. For 27 (28%) analyses lacking final model/tool presentation in a way allowing for application to new individuals, we considered the applicability unclear. This concern was especially frequent amongst the ML studies.
Reporting deficiencies
Across the included analyses, 54% of the 20 mandatory TRIPOD items we evaluated were reported. When unclear or partial reporting was included, the amount of reporting increased to 69%. Of the 19 mandatory TRIPOD items applicable to developments, fewer were reported in those using ML methods (49%) compared to those using traditional statistics (60%). An item‐wise summary of reporting is shown in Figure 7 both overall for all analyses types and by algorithm type for model developments.
7.
Summary of reporting deficiencies based on TRIPOD items. Top: Overall; bottom: For developments by modelling method. Val: validation
When we compared the percentage of reporting in the model developments using traditional statistics published before 2016 (the publication year of TRIPOD) and during or after 2016, we did not observe any difference (59% and 61%, respectively). Visual inspection did not indicate any time trends in median percentage reporting overall or in categories based on the algorithm or the analysis type (see Appendix 6).
When described analysis‐wise, the best reporting amongst developments was in all three models from de Groot 2009 with 84% of the 19 items reported, and the worst reporting was in Oprea 2020 with 16% reported. The best reporting amongst validations was at 73% of the 15 items in Lejeune 2021 Ext Val, Manouchehrinia 2019 Ext Val 2, and Manouchehrinia 2019 Ext Val 3, and the worst reporting was at 13% in Gurevich 2009 FLP Ext Val. Item‐wise reporting per analysis is displayed in Appendix 6.
Source of data and participants
At least one out of the five items related to source of data and participants was not reported in 70 (73%) of the analyses. The item with worst reporting under this heading was treatments (item 5c). Of the 96 included analyses, the treatments received by participants, either at baseline or during follow‐up, were somehow reported but not clearly in 25 (26%) analyses and not reported at all in 40 (42%) analyses. Of those that did not report treatments received by participants, eight (20%) were solely in people with CIS (Aghdam 2021; Olesen 2019 Candidate; Olesen 2019 Routine; Runia 2014; Wottschel 2019; Yoo 2019; Zakharov 2013; Zhang 2019). This item was reported less frequently in models developed with ML methods (20%) than with traditional statistics (46%).
The study start and end dates (item 4b) were the next item that most of the analyses failed to report. They were somehow reported but not clearly in three (3%) (Bergamaschi 2001 BREMS Dev; Borras 2016; Roca 2020), and not reported at all in 52 (54%) of the 96 included analyses. Although reported relatively better than most of the other items, the most fundamental information on the study design or source of data (item 4a) was reported in an unclear manner in almost every fourth (21) analysis, and was totally missing from three (3%) analyses (Kuceyeski 2018; Oprea 2020; Yoo 2019).
Predictors and outcome
At least one out of the three items related to predictors and outcome was not reported in 92 (96%) analyses. Of the 96 included analyses, the outcome definition (item 6a) was missing from five (5%) analyses with conversion to progressive MS outcomes (Brichetto 2020; Pinto 2020 SP; Tacchella 2018 180 days; Tacchella 2018 360 days; Tacchella 2018 720 days). Outcomes were not clear in Bejarano 2011, which reported AUC measures for change in EDSS (modelled as continuous), and Oprea 2020, which reported keeping an EDSS score with unclear thresholds and time points. Blinding of the outcome assessment to predictors (item 6b) was reported in only four analyses (Kosa 2022; Olesen 2019 Candidate; Olesen 2019 Routine; Rovaris 2006) and not reported at all in the remaining.
Of the 75 model developments in which the reporting of predictor definitions (item 7a) was assessed, predictor definitions were somehow reported but not clearly in 24 (32%) developments and not reported at all in 12 (16%) developments.
Sample size and missing data
At least one out of the three items related to sample size and missing data were not reported in 82 (85%) analyses. Of the 96 analyses, none reported details of sample size justification (item 8) to reach a level of certainty of the reported effect sizes. The presentation closest to a sample size justification was in Yperman 2020, which used a random forest classifier in nested cross‐validation. They plotted a learning curve of AUC as a function of different sizes of the training set to discuss any plateauing and its sufficiency. The limitations posed by a small sample size were somewhat discussed in 24 (25%) analyses. Model developments with ML methods were more likely (45%) to discuss their limited sample size or the drawbacks posed by it compared to those with traditional methods (14%).
Only 23 (24%) of the analyses reported the amount of missing data handled (item 13b) during study design or analysis. This is despite the fact that we considered a study to have reported the amount of missing data when the only information provided on this topic was the number of excluded participants due to lack of a predictor domain measurement (e.g. missing MR images). Thirty‐six analyses (38%) reported the amount of missing data in an unclear or inconsistent manner. The method of dealing with missing data (item 9) was somehow reported but not clearly in 16 (17%), and not reported at all in 25 (26%) analyses.
Statistical analysis methods
At least one out of the two items related to the statistical analysis was not reported in 18 (24%) developments. The type of model, model‐building procedures (including predictor selection and tuning parameter optimisation, as relevant), and method for internal validation (item 10b) were reported to a limited extent for 21 (28%), and not reported at all for 13 (17%) of the 75 model developments. The model‐building steps, expected to be relatively simpler for traditional methods, were reported more frequently in the model developments utilising traditional statistics (74%) than those utilising ML methods (38%).
Results and discussion
At least one out of the seven items related to results and discussion was not reported in 79 (82%) analyses. Of the 96 analyses, the number of participants and the number of events (items 13a/14a) were reported in an unclear manner in 11 (11%), and not reported at all in seven (7%) (Ahuja 2021 Dev; Ahuja 2021 Ext Val; Bergamaschi 2015 BREMS Ext Val; Gurevich 2009 FLP Ext Val; Oprea 2020; Sormani 2007 Ext Val; Szilasiová 2020). Information on basic baseline participant characteristics (item 13b: age, sex, diagnostic subtype) was missing from 17 (18%) analyses.
A comparison of the distribution of important variables with the development data (item 13c) was missing from 11 (55%) of the 20 validations, excluding Skoog 2019 Val using a subset of participants as the model development study (Skoog 2014 Dev). Also, none of the model developments that used a single random‐split for evaluation provided such a comparison.
The full prediction model including the intercept or baseline survival to allow for calculation of absolute risk (item 15a), was reported more or less clearly only in 16 (21%) of the 75 developments. An explanation on how to make predictions or assign an individual to a risk group based on the developed model (item 15b) was provided for 22 (29%) models. Although neither item 15a nor item 15b were reported, the discussion section of five (7%) model developments had confirmatory language that suggests implementation of the models to support clinical decisions (Malpas 2020 Dev; Pisani 2021; Roca 2020; Tousignant 2019; Ye 2020 gene signature). Models developed with traditional statistical methods were much more likely to present the final full models (40%) or how to calculate predictions from them (60%) than those developed by ML methods (5% and 2%, respectively).
A model performance measure (item 16, assessed by the presence of a discrimination measure) was reported in 33 (34%) analyses with its uncertainty and in 25 (26%) analyses without its uncertainty. Reporting of AUC in Szilasiová 2020 was unclear due to the inconsistency between the receiver operating characteristic figure associated with the AUC and the reported point sensitivity/specificity value. No discrimination or classification measures were reported in 10 (10%) analyses: three did not contain any evaluation of model performance and were included because of their validations (Ahuja 2021 Dev; Bergamaschi 2001 BREMS Dev; Weinshenker 1991 M3 Dev), three reported R2 (Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever), and the remaining four survival analyses reported Kaplan‐Meier or incidence plots for predicted risk groups (Gout 2011; Sormani 2007 Dev; Sormani 2007 Ext Val; Vasconcelos 2020 Dev). The discussion section of four (5%) models from two studies had confirmatory language, although no discrimination or classification measures were reported (Bergamaschi 2001 BREMS Dev; Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever).
Discussion
Summary of main results
The main objective of this review was to identify and summarise all multivariable prognostic model developments and validations for quantifying the risk of clinical disease progression, worsening, and activity in people with MS. We identified 75 models, 12 of which were externally validated at least once in a total of 15 validations by applying the model of interest as intended by its development to predict outcomes in new participants. Only a single model, Manouchehrinia 2019, was externally validated three times, all within the development study. There were six other author‐reported validations that did not meet our criteria for external validation. No external validation has yet occurred for the remaining 60 models, making them unsuitable for use in practice at this time.
Models with an external validation
Of the 12 models with any external validation, only two models were found to have been externally evaluated more than once. The BREMS score (Bergamaschi 2001) was evaluated with two external datasets by the development team before being simplified to the BREMSO score and being evaluated further using the second external development dataset. The model for conversion to progressive MS developed by Manouchehrinia and colleagues (Manouchehrinia 2019) was evaluated using three external datasets (one registry, two randomised controlled trials) within the development study. No validation studies were found to be performed by researchers external to the development team and almost all were part of the development publication. The lack of independent external validations hinders conclusions about the models’ generalisability. While these studies provide information on model performance when applied by the development authors to new people with MS, it is still unclear whether the reporting is sufficient to enable use by external researchers and clinicians. This is important because model performance will depend on the interpretation of unclearly reported models, predictors, and outcomes.
Models without an external validation
At 80%, models for which no external validation evidence of any kind exists make up the overwhelming majority of the MS prognostic modelling literature. However, this is not surprising considering that only 12 of these 60 models reported their full model or provided a tool and gave instructions on its use. Only one of these models explicitly stated that external validation was not pursued due to the poor discrimination of the developed model in the internal validation (Pellegrini 2019). It is worth noting that none of the identified models for conversion to definite MS were found to be externally validated. Due to its early position in the disease course, valid prognostic models addressing this outcome in people with CIS have the potential to exert the greatest impact on treatment decisions and thereby long‐term outcomes.
Before developing a new prognostic model, it is recommended that one should first review the literature in search of previously developed prognostic models predicting the outcome of interest (van Smeden 2018). This would ideally be followed by external validation of relevant existing models to test their generalisability. This validation process is meant to call attention to possible weaknesses of the model and to allow for an iterative process of improvement via model updating. When multiple models for the same outcome exist, it is important to compare their performance when applied to the population of interest, instead of just comparing the included predictors and modelling method. These recommended steps would channel the efforts of the scientific community towards the common goal of delivering a generalisable and clinically useful prognostic prediction model. Our review points to the fact that the recommended initial steps are omitted in MS prognostic research. Model developments dominate the literature and the model performances of newly developed models are not compared to that of other published prognostic models in a meaningful way. Some included studies, e.g. Seccia 2020, explicitly mentioned the lack of validated models for MS prognosis, and yet continued to develop new models without evaluating the performance of previously developed models for similar outcomes on their independent data.
The lack of external validations also point to the need for effective clinical data‐sharing. In order to make the best use of the resources allocated to medical research, independent researchers should be able to access existing datasets for external validation of published prognostic models (Völler 2017). Ideally, these datasets should be harmonised and be provided through an infrastructure allowing individual patient level meta‐analysis (Snell 2020) from different sources to reach a sufficient sample size. While there are several large MS registries and even a network between many of them (Big Multiple Sclerosis Data Network), access provided to general researchers appears limited. The increasing use of large, long‐term registry data will also necessitate improved data quality measures and reporting across domains, especially participant characteristics. More attention to participant selection and how it affects model applicability will be needed.
Overall completeness, certainty of the evidence and study limitations of externally validated models
Overall completeness of the data
In the MS literature, reporting of prognostic prediction model studies was very poor, echoing the experience in other disease areas (Kreuzberger 2020; Wynants 2020). The situation was at least as dire in models developed with ML compared to those developed with traditional methods, partially because the current EQUATOR Network guidelines do not seem directly applicable to these studies (Dhiman 2021). Although a reporting guideline for prediction modelling, TRIPOD (Collins 2015), has been available since 2015, most of the items that were poorly reported in the studies we included were also part of other reporting guidelines published earlier, like STROBE (von Elm 2007) or STARD (Bossuyt 2015). Additionally, most of the analyses (66%) in our review were published in or later than 2016 and no temporal pattern could be observed in the proportion of the items reported. Across the included analyses, just over half (54%) of the 20 mandatory TRIPOD items we evaluated were clearly reported. The state of the literature makes us doubt whether the reporting guidelines are being required or at least recommended by peer reviewers or publishers/editors.
None of the studies justified its sample size. The failure to consider this aspect during study design jeopardises efforts put into prognostic research (Steyerberg 2019). Only three cohort studies reported that the outcome assessors were blinded to at least a subset of the predictors (e.g. image readings or lab analysis) (Kosa 2022; Olesen 2019 Candidate; Olesen 2019 Routine; Rovaris 2006). Because the purpose of data collection was not prognostic modelling in most of the analyses (64%), i.e. secondary data use, blinding of predictors to the outcome was probably not even considered. Still, its presence (or absence) could have been reported, especially in a disease area like MS for which the subjectivity and reliability during the assessment of clinical outcomes is an ongoing issue (van Munster 2017).
Details of the full model and how to find the absolute or relative risk of an individual patient were either missing or unclear in most (79% and 71%, respectively) of the model developments. In our opinion, this indicates a failure to deliver the objective of these studies. Reporting the performance of a newly developed prognostic prediction model serves no purpose unless the models are also reported in such a way to enable future, preferably independent, external validations and further application to individual patients. Despite the anticipated difficulty of reporting models developed using ML models, we consider this to be possible by e.g. transporting model objects or developing web‐based applications/platforms to calculate predictions for individual patients (Boulesteix 2019). Failure to provide the model or a way to use it indicates that there is research waste without any tangible (potential) benefits. This failure also precludes any discussion suggesting the need for further validation or clinical application.
Clear reporting of the amount of missing data (only in 24%) was another topic that could be improved. This aspect makes assessment of potential bias due to overrepresentation of complete cases difficult (Wynants 2017). Many studies also failed to report the disease‐modifying treatments received by the participants. Any treatment received at baseline is important to understand the population to which the prognostic prediction is applicable. Treatment received during follow‐up poses another challenge to the prediction model because it is likely to change the outcome as a factor post‐baseline. The treatments received by study participants were reported clearly only in 32% of the analyses.
Finally, model performance measures were reported to a limited extent but did not meet our expectations. Although we considered only a discrimination measure, e.g. a c‐statistic or AUC, with its associated uncertainty, to be sufficient for this reporting assessment item, only 34% of the model developments or validations could fulfil this criterion. The value of a model cannot be evaluated without an appropriate performance measure. The uncertainty around this measure is also critical to understand if the model is actually performing better than random decisions, corresponding to a c‐statistic of 0.5.
Certainty of the evidence
At the time of this review’s submission, no GRADE tool was available for prognostic models. Hence, the certainty of evidence is not rated in this review.
Study limitations of prognostic model development studies
We found all but one development to have a high risk of bias according to PROBAST. The principal drivers of this result were primarily from the analysis domain: the use of routine care or registry data combined with suboptimal methods for dealing with missing data, insufficient sample sizes with respect to the number of predictors of interest, incomplete model performance evaluation due to lack of assessment of calibration and sometimes even discrimination, and failure to account for overfitting and optimism, especially with regard to accounting for all modelling steps.
Besides being of sufficient quality and representative of the population of interest, the data should also be sufficiently large in order to develop robust models that precisely estimate risk overall, as well as across the spectrum of predicted risk, both in the development set, and, more importantly, when applied to new people (Riley 2020). We found about three‐quarters of the developments to be high risk due to insufficient sample size, a point also mentioned in Pellegrini 2019 as a limitation of the MS prognosis research literature. In smaller datasets with large numbers of considered predictor parameters relative to the number of events and total size, the risk of model overfitting increases, without guarantee that further methods, such as shrinkage, can be applied to fully overcome the problem. Very few of the traditional statistical developments found here addressed overfitting and optimism by applying shrinkage.
Overfitting and optimism were also neglected in other ways. Although resampling procedures have been recommended as best practice in internal validation, around one‐third of the reviewed analyses conducted apparent validation or relied on a single random split of the data. Internal validations employing cross‐validation or bootstrapping methods, however, are not immune from overoptimism. The resampling schemes should include the entire modelling process, including, for example, predictor selection. Unfortunately, this is difficult to assess given the identified pitfalls and the increasingly complex analysis structure, especially in developments using ML.
Large sample sizes and proper validation methods need to be used in combination with clinically meaningful performance measures to contribute information on the suitability of the model to practice. Assessment of both discrimination and calibration are widely recommended; however, we found only about half of model evaluations to have reported discrimination and only one‐quarter to have assessed calibration. While the recommendation to assess calibration has been poorly received in general, it seems worse in publications using ML models. This may be partly due to a focus on classification rather than risk estimation or lack of familiarity with model evaluation in the clinical setting. Due to the increasing popularity of ML in the field of clinical prediction modelling, it is ever more important to identify and address the reasons for this shortcoming.
Additionally, we found the importance of timing in prognostic modelling to be underappreciated in the MS literature. Many studies only implicitly stated the time at which the model is meant to be applied to participants and the prediction horizon of interest, making assessment of all PROBAST domains difficult. This information is needed to understand whether the included participants align with the population of interest, whether predictors are available at the intended time of model use, whether the time interval between predictor collection and outcome assessment is appropriate, and whether complexities in the data related to time have been properly accounted for. As discussed by Pellegrini and colleagues, defining a true baseline time point in MS is difficult (Pellegrini 2019); however, even clear clinical landmarks, such as before or after treatment initiation, have been ignored especially when using registry data. Several studies have used predictors assessed years after the implicit baseline timing of the model, which has complicated interpretation and assessment. Furthermore, handling of differing observation periods across participants by ‘adjusting’ for this confounding has been unclearly reported and possibly inappropriate for prediction tasks. These issues suggest possible confusion about the use of prognostic modelling and the types of conclusions that can be drawn from these analyses.
Usability
While all models were found to require an MS specialist for predictor assessment, equipment specialisation varied, with some models relying on basic clinical predictors but many relying on advanced imaging, omics, or electrophysiological data. While these models may currently be limited to MS research centres, improving access may expand the applicable settings of use over time. This can be seen most readily in the rapid development of diagnostic capabilities in recent years: imaging with high‐resolution 3 Tesla MRI scanners is now much better positioned across the board than it was 10 years ago. New markers, such as the ability to measure brain atrophy, are being added to standardised MS MRI protocols. Major advances have been observed in other areas as well. Laboratory chemistry can determine antibodies for differential diagnostic considerations that were widely unknown 10 years ago (e.g. aquaporin‐4 antibody, myelin oligodendrocyte glycoprotein, immunoglobulin G). New laboratory analyses are also on the horizon. Here, for example, the possibility of detecting neurofilament light chain in blood is emerging; several years ago, this would have been possible only in CSF.
Besides the lack of external validation, the other greatest threat to usability is the lack of clear reporting of final models or tools and instructions on their use; half of the identified models provided neither of these while only approximately 20% reported both. The reporting of the other models contained inconsistencies, unclear predictor coding, or missing model components, making direct use on new participants difficult. Partial reporting also hampers the ability of future researchers to predict absolute risks using these models. These deficits in usability were not compensated for by measures of reproducibility, as only a handful of studies provided analysis data or code and none shared both. Sharing of both data and code could improve model sharing, especially for complex algorithms without simple representations. However, their use may require further specialised skills and other forms of model presentation should be preferred when translation into clinical practice is the goal.
Potential bias in the review process
In order to reduce the bias at the search stage, we searched three major databases. We also searched the conference proceedings of the main organisations in the MS disease area and tried to access more information on the eligible ones by Internet search and author contact. The measures we took to prevent bias in the study selection and data extraction/risk of bias processes were 1) pre‐protocol pilots, 2) training of all contributors on these steps with the relevant methodological publications, protocol, and internal guidance, 3) performing these steps independently in duplicate, 4) resolving any disagreements with at least one other co‐author in group discussions, and 5) contacting the study authors for any missing or unclear data critical for risk of bias assessment or planned analysis. Despite these measures, due to the novelty of this review type, some methodological decisions needed to be made or elaborated on either at the protocol stage or during the review, which may have introduced bias.
Database search: Our database search strategy was constructed to be sensitive. Our decision not to search trial registries is not expected to introduce bias. Prognostic research studies, like diagnostic ones, are unlikely to be pre‐registered (Korevaar 2020; Peat 2014; Sekula 2016) despite calls to do so (Altman 2014). The restriction of the search to publications after 1996 might have introduced bias. However, the fact that only two studies before 2001 (Weinshenker 1996 and the related model development study Weinshenker 1991) were found eligible for this review indicates that this restriction is expected to result in missing of very few relevant publications, if any.
Reference search: A post‐protocol change that may have introduced bias is the step of backward reference searching. For forward reference searching, we utilised the functionality of Web of Science, which is one of the available and commonly used platforms for this task (Briscoe 2020). Because Web of Science also offers an option for backward reference searching, we decided to access the titles/abstracts from the same platform instead of handsearching. This methodological change allowed us to screen the totality of titles/abstracts of the references as opposed to only titles, but it was less sensitive than handsearching due to the fact that some references are not linked in Web of Science records. Still, we expect this pitfall to introduce little bias because most of the backward/forward references were linked to the records and only 5% of the included reports came from reference searching as opposed to database searches and other sources.
Study selection: The language of reporting prognostic models varies across time and based on the main speciality of the authors, e.g. clinicians versus methodologists. In order to be more representative of the literature, we considered the objectives of a study to be an expression of intent not only within the objectives section of the abstract or the main text but also dispersed over the focus of the Results and Discussion sections. This perspective led to inclusion of studies with primary objectives other than prognostication in MS patients. We were also inclusive of various clinical outcome definitions (including timing), data types, and statistical methods. However, we consider this aspect not as a bias but rather a strength of our review, which, due to the rarity of independent external validations of the developed models, was destined to turn into a systematic and comprehensive description of the state of the literature in this disease area. Expanding the outcomes of interest to fatigue, falls, or depression would have resulted in only a few, if any, additional reports but it would have made the results from our review, which has the aim of being relevant to clinical practice and patients alike, less interpretable.
Model selection: In a model development study, fitting many models may be used, amongst others, as a means for selecting predictors, selecting the most predictable outcome, selecting the best algorithm type, or optimising tuning parameters. When the authors of a study indicated their preferred model in any way, we only included those models, otherwise, we refrained from making a selection. Although the studies that reported multiple models and failed to select a final favourite model for presentation might even be considered not to have prognostic model development as an aim, we decided not to exclude such studies. The number of models from a single study reached up to four (Zhao 2020), but less than one‐quarter of the studies contributed more than one model. This may have introduced bias in the descriptive quantitative measures we reported across the analyses, e.g. median percentage of females or median sample size, due to the overrepresentation of some studies or samples. Because our intention and capability with the available dataset is not to summarise or analyse the models but just to describe them, we find it appropriate to consider these as separate analyses for a general overview of the literature.
Risk of bias: Although a detailed explanation and elaboration publication for PROBAST exists (Moons 2019), we had to interpret some items to fit the needs of our review. These interpretations were aimed at adapting the responses to the specifics of MS or the variety of statistical methods not addressed by the narrow scope of the current PROBAST, which focusses on binary outcomes modelled with traditional regression methods. These decisions are summarised in the Methods and in detail in Appendix 4. Irrespective of our interpretations, almost all analyses that were rated as having high risk of bias are expected to have still been rated the same at the analysis domain, and hence overall, due to their failure of assessing model performance appropriately by reporting both discrimination and calibration while accounting for overoptimism. Thus, our interpretations are expected to introduce no substantial change in the overall risk of bias assessment and any bias would be limited to the item/domain level.
Analysis: Because the number of independent external validations per model did not allow us to perform a meta‐analysis or meta‐regression, we had no analysis to which bias can be introduced. The only model that would allow meta‐analysis of its performance in its three external validations (Manouchehrinia 2019) was not meta‐analysed due to the lack of independence in its external validations (Kreuzberger 2020), as well as its inherent limitations by design causing high risk of bias both in its development and validations. Any derived quantitative measures in this review are related to basic description of the studies. If reported for subgroups, pooled mean and pooled SD of participant characteristics were calculated. Also, if missing, the variance of c‐statistics was calculated and used to build confidence intervals. We report these in the data tables or Characteristics of included studies.
In this review, publication bias was not assessed due to the lack of methodological guidance on this topic.
Applicability of findings to clinical practice and policy
We identified 75 models, 63 of which were not externally validated and the performance measures for these models were reported only in the same population as model development. Although 11 of these models used language in their discussion sections suggesting their applicability and implementation to clinical care, they cannot be recommended for application without showing generalisability by independent external validations of their performance. The same is true for the remaining 12 models, none of which have external validations by independent teams and only three have external validations in separate studies: Weinshenker 1991 M3 in Weinshenker 1996, Bergamaschi 2001 (BREMS) in Bergamaschi 2007 and Bergamaschi 2015, and Skoog 2014 in Skoog 2019. Also, of those with any external validation, 10 have only a single one. The development and validations of the model with the maximum number of external validations (Manouchehrinia 2019, three in the same study) were rated as having high risk of bias in two to four domains out of the four domains addressed by PROBAST. Hence, none of the identified models are yet applicable in clinical practice.
Moreover, the heterogeneity of definitions and populations in the findings of this review rather highlight the challenges of developing or validating a prognostic prediction model in people with MS due to the difficulty of defining the clinical need in terms of the relevant population, outcome and available predictors. The literature in this disease area turned out to lack uniformity in definitions of patient‐relevant clinical outcomes and time points to measure these. Also, the fast‐changing landscape of diagnostic subtypes and their criteria, e.g. the McDonald criteria published and then revised three times during the last 20 years, makes not only the extrapolation of applicability of a model to a future patient population difficult, but also the outcomes of diagnostic conversion irrelevant in clinical practice with passing time. Lack of an objective agreed‐upon and standardised definition of secondary progression is another factor hindering any research aiming to support clinical decision‐making that targets this outcome. The increasing number of different treatment options for all diagnostic subtypes of MS during the last 25 years, and hence the increasing proportion of those treated with them, also raises questions about the applicability of prognostic models developed using data from the pre‐treatment or first‐line treatment era to the people with MS today or in the future. This is evident in many domains, including in the relapse rate, the detection of paraclinical disease activity but also in the possibility of using advanced markers such as lesion volume and atrophy measurements in MRI or the use of laboratory‐based biomarkers for prognosis assessment. With all of these changes, prognostic modelling in this field is truly like "chasing a moving target" (Chen 2017), and difficult even when applying the highest methodological standards (Pellegrini 2019). Any conclusions regarding the applicability of prognostic models in this disease area require rigorous testing of the developed models with many and up‐to‐date external validation studies; however, this is currently lacking.
Implications of the rise of ML for research
The rising popularity of ML algorithms has also spread to MS. Since 2018 an increasing number of ML prognostic model developments have been published and every single development identified in the first half of 2021 employed ML techniques. The Radiology Editorial Board suggests that artificial intelligence and ML will impact any medical application using imaging (Bluemke 2020). As MRI has been an important tool for depicting pathological features in MS since the 1980s (Ge 2006), this trend should then not be surprising. Although ML offers great potential for uncovering complex relationships in our ever‐growing data using fewer assumptions, this potential cannot be harnessed without greater attention to the needs of clinical practice and to good practice in prediction model development.
Because the use of ML for clinical prediction is still relatively new in MS, it is unsurprising that several publications are presented as pilot or proof of concept studies. As stated previously, many ML studies identified here also provide no model or tool for external use. There is, however, a looming threat of research waste if this trend continues. Across several specialities, the discrepancy between the number of developments and the number of tools used in practice has been noted. Studies using clinical applications are important for highlighting methodological accomplishments, but this type of research should not be conflated with or replace actual attempts to create prediction models for clinical practice (Mateen 2020). These differing aims may partly explain the high number of studies with no selected final models and low number of validation studies; the models identified were meant to depict methodological and technological advances rather than to provide individualised estimates of outcome risks.
Another possible reason for the lack of presented tools for prediction in new individuals may relate to the difference in cultures between ML and clinical research (Mateen 2020), and to the notion that clinical prediction modelling guidelines are not relevant to this body of ML research (Dhiman 2021). Mateen and colleagues argue that, in order for clinical practice to experience the greatest benefit from this work, greater collaboration between healthcare experts and the ML community is necessary. Our review suggests that these collaborations already exist, but that the guidelines put forth by clinical prediction modelling experts are still ignored. We argue that all researchers interested in clinical prediction need to not only work together, but also to be responsible for conducting research according to the current best practices. This entails, at a minimum, adherence to the reporting guidelines set out in TRIPOD and TRIPOD‐AI, when it becomes available. The brief guide on assessing radiological research using artifical intelligence published by the Radiology Editorial Board may also prove valuable to the MS research community (Bluemke 2020), although this document is relevant to a wider range of radiological studies involving ML.
Comparisons with other reviews
We are aware of several related prognosis reviews in MS, including Brown 2020, Havas 2020, and Seccia 2021. Although a systematic review of prognostic models in a different clinical field, it is also worth comparing our review to Kreuzberger 2020, the first published Cochrane Review of prognostic models to date.
The review Brown 2020 differed from our review in that they focused specifically on prognostic models intended to be used at diagnosis of RRMS. While their population was more specific, their definition of prediction models was broader, including all models using multiple predictors in combination to determine the probability of an outcome. This led to the inclusion of several models that were not developed with the intent of prediction of individual outcomes, but rather as explanatory models of disease aetiology, and which were excluded from our review. This highlights the difficulty in distinguishing between studies aiming to develop prognostic models and those with other purposes, a problem encountered in our review, as well as in Kreuzberger 2020. This point was also echoed in the review Havas 2020, which reported that almost half of the over 6000 studies screened were not prediction modelling studies, but rather other study types that used the words prediction and association interchangeably.
Unlike our review and that of Brown 2020, however, Havas 2020 included models predicting treatment response, such as the Magnetic Resonance Imaging in MS (MAGNIMS) score (Sormani 2016), and scoring systems defined by expert knowledge, such as the Rio score (Río 2009), rather than statistical algorithms. Models predicting treatment response are very important to MS clinical practice; however, they were outside the scope of our review. Treatment response prediction, and causal prediction more generally, is an evolving field and its model development and performance assessment methods are an area of active research. Further methodological foundations are still needed in order to inform such a review task. While Havas and colleagues called on MS researchers to establish a consensus on the definition, development, and validation of prognostic models, we would like to emphasise that this consensus already exists within the clinical prediction modelling literature and only needs to be incorporated into the MS speciality.
Seccia 2021 reviewed recent ML models considering clinical data in their feature sets, making the scope of their review much narrower than ours. All studies identified in Seccia 2021 were also included in our review and we identified 19 additional ML models. At least seven of these were published too late for consideration by Seccia and colleagues and several others were excluded from that review for a strong focus on imaging or omics data over clinical data. The review authors highlighted the importance of sufficient data size but also data quality, relevant to ML prognostic model studies and traditional regression studies alike. They also mentioned the problems inherent in subjective disease measures and the non‐generalisability of some predictor types, such as imaging data specific to a single device, which we also identified to be problematic in the MS field. They further discussed issues specific to ML, including interpretable ML and the combination of tabular and non‐tabular data. They made a point of stating that no identified study had developed a prognostic model with performance suggestive of clinical use. Given that none of these studies were truly externally validated and that risk of bias was rated high in all of them due, at least in part, to small sample sizes and lack of calibration assessment, we would add that, even if the reported performance estimates were significantly higher, these models would still not be ready for clinical practice. Additionally, these were not reported in a way that allows for external validation, as no tool or model fit was provided.
Unlike in Kreuzberger 2020, our results focus on all studies identified, not just those with external validation. Due to the rarity of multiple external validations for a single prognostic model in MS, we wanted to comprehensively describe and summarise the state of the field, not just the models most ready for translation into clinical practice. In light of this aim, we did not consider the applicability of a model as lesser based on the age of the data or diagnostic criteria. Kreuzberger and colleagues rated applicability as unclear if eligibility criteria or recruitment period were not given, stating that they could not be sure if the included individuals matched the review question and a current application of the model. This is certainly also an issue in MS, which has faced continual updating of diagnostic criteria since publication of the first McDonald criteria in 2001 (McDonald 2001). In fact, several studies we included defined conversion to clinically definite MS outcomes using the Poser 1983 criteria and used components of the later published criteria as predictors. The aim of these studies may have been more in line with validation of newly published criteria rather than development of a new prediction tool, again highlighting the need for better reporting. However, the models with multiple external validations in this review did not suffer from such problems.
Authors' conclusions
Implications for practice
The goal of prognostic modelling research must be to bring into routine clinical care the use of multivariable prognostic models for predicting future clinical outcomes in people with multiple sclerosis (MS). This is of particular interest because, although highly effective therapeutic options are available, they are associated with relevant risks to the patient. With high variability in disease worsening or progression, it is imperative to use a need‐based therapy.
The currently available evidence for predicting MS prognosis in clinical routine is not sufficient. This is due to the quality standards that have to be applied for the generation and especially the validation of such prediction scores. Ideally, prediction models are developed using large, high‐quality datasets with subjects representative of the population to which the model will later be applied. However, our results do not exclude the possibility of transferability of the existing prediction models into clinical routine after successful external validations and demonstration of benefit in clinical practice by randomised controlled trials investigating their impact. Both the validation of currently available predictive models and the consistent application of quality standards in future studies are needed.
Implications for research
Our systematic review identified an abundance of models developed for prediction of disability, relapse, conversion to definite MS, conversion to progressive MS, and models for composite outcomes based on these. As previously found within and beyond the scope of MS, these studies were generally not conducted or reported according to current standards and guidelines. We point the MS research community to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) and Prediction Model Risk of Bias Assessment Tool (PROBAST) for such guidance, and to the upcoming artificial intelligence updates to these tools for machine learning specific guidance.
Clinical prediction modelling studies should be conducted using longitudinal datasets collected for the purpose of prognostic research in order to minimise bias in terms of participants, predictors, and outcome. An understanding of when the predictor or outcome measurements can take place and how these affect the interpretation of a prognostic model is a point of note for researchers in this disease area. Appropriate methods should be used in consultation with the experts in clinical prediction modelling. Models should be provided in a manner making them usable by other researchers because developed models should be externally validated, preferably by independent researchers. Data sharing practices can support external validation efforts.
History
Protocol first published: Issue 5, 2020
Notes
Data and code availability
The dataset summarised in this review is available as tables in the Appendices and in Characteristics of included studies. The R code used for the statistical description is available upon request from the authors.
Role of sources of support
The funding sources did not have any influence on the planning, conduct, analysis, or reporting of this review.
Acknowledgements
We would like to thank Cochrane and several members for their support in the development of this review and those who conducted the editorial process for this review:
Graziella Filippini (Co‐ordinating Editor) and Ben Ridley (Managing Editor) of Cochrane Multiple Sclerosis and Rare Diseases of the CNS.
We would like to thank the following people who conducted the editorial process for this article:
Sign‐off Editor (final editorial decision): Robert Boyle, Imperial College London, Cochrane’s Editorial Board.
Managing Editor (selected peer reviewers, provided editorial guidance to authors, edited the article): Colleen Ovelman and Sam Hinsley, Central Editorial Service.
Editorial Assistant (conducted editorial policy checks, collated peer reviewer comments and supported editorial team): Lisa Wydrzynski, Central Editorial Service.
Copy Editor (copy editing and production): Jenny Bellorini, Cochrane Central Production Service; pre‐edit: Tori Capehart, Copy Editor, J&J Editorial; Margaret Silvers, Copy Editor, J&J Editorial; Sarah Hammond, Senior Copy Editor, J&J Editorial.
Peer reviewers (provided comments and recommended an editorial decision): Arman Eshaghi, University College London (clinical/content review), Bruce V Taylor, University of Tasmania (clinical/content review), Steve Simpson‐Yap, The University of Melbourne (clinical/content review), Iván Pérez‐Neri (consumer review), Nina Kreuzberger, Cochrane Haematology, Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany (methods review), Robin Featherstone, Cochrane Central Editorial Service (search review).
We would also like to thank Johanna AA Damen (Co‐ordinator) of the Cochrane Prognosis Methods Group for responding to our questions. We are thankful to Liliya Eugenevna Ziganshina for the support during the eligibility assessment of a publication in Russian (Zakharov 2013). For the full translation of the same article and support during its data extraction, we are grateful to Larissa German.
We thank all the primary study authors who replied to our requests for further information. We are very grateful for the conversations with and guidance from Beate Sick regarding deep learning methods. We are also thankful to Anja Friedrichs for the diligent proofreading of critical sections of the review text.
Appendices
Appendix 1. Electronic search strategies
Database: Ovid MEDLINE(R) and Epub Ahead of Print, In‐Process, In‐Data‐Review & Other Non‐Indexed Citations and Daily 1946 to 1 July 2021
Date search conducted: 1 July 2021
Strategy:
| # | Concept | Searches | Results |
| 1 | 1 Multiple sclerosis |
(exp Multiple Sclerosis/ OR ((multipl* OR disseminated OR insular) ADJ1 sclerosis).ti,ab.) NOT (animals NOT humans).sh. NOT (child NOT adult).sh. | 80,108 |
| 2 | 2a Prognostic/ prediction |
(exp Prognosis/ AND (exp disease progression/ OR exp Remission, Spontaneous/ OR exp Recurrence/)) OR (predict OR prognos*).ti. OR ((predict* OR prognos*) ADJ3 (recurrence OR progression OR relaps* OR remission OR remitting OR 'multiple sclerosis' OR ms)).ti,ab. OR ((predict* OR prognos*) ADJ3 treat* ADJ3 response).ti,ab. OR ((predict* OR prognos*) ADJ3 disease ADJ3 activity).ti,ab. | 357,355 |
| 3 | 2b General models |
((model* OR decision* OR identif*) ADJ3 (history OR variable* OR multicomponent* OR multivariable* OR multivariate* OR covariate* OR criteria OR criterion OR scor* OR characteristic* OR finding* OR factor* OR rule*)).ti,ab. OR (decision* ADJ6 model*).ti,ab. | 420,767 |
| 4 | 2c Statistical terms |
((logistic OR statistic*) ADJ3 model*).ti,ab. OR (decision*.ti,ab. AND exp models, statistical/) OR (Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*).ti. OR ((Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*).ab. AND (prognos* OR predict*).ti,ab.) | 976,860 |
| 5 | 3 Outcomes |
((disease OR disability OR invalid* OR function* OR outcome OR impairment OR composite OR activity OR severity OR cognitive OR edss OR treatement OR ms OR 'multiple sclerosis' OR brems) ADJ6 (scor* OR scal* OR status OR assess* OR index OR classification)).ti,ab. OR (clinical ADJ3 (assess* OR activity)).ti,ab. OR ((disease OR disabilit* OR risk OR calculat*) ADJ3 (course OR progression)).ti,ab. OR (relaps* ADJ3 (rate OR frequen* OR time OR prognos* OR predict*)).ti,ab. OR (clinical* ADJ3 decision*).ti,ab. OR ((ms OR cdms OR 'multiple sclerosis') ADJ3 (develop* OR course OR progress* OR relaps* OR clinical*)).ti,ab. | 1,094,627 |
| 6 | (1 and 2) or (1 and (3 or 4) and 5) | 5004 | |
| 7 | limit 6 to yr="1996 ‐Current" | 4764 |
Database: EMBASE via embase.com 1974 to 2 July 2021
Date search conducted: 2 July 2021
Strategy:
| # | Concept | Searches | Results |
| 1 | 1 Multiple sclerosis |
('multiple sclerosis'/exp/mj OR (ms:ti AND 'multiple sclerosis'/exp) OR (((multipl* OR disseminated OR insular) NEAR/1 sclerosis):ti,ab)) NOT ([animals]/lim NOT [humans]/lim) NOT ('nonhuman'/exp NOT 'human'/exp) NOT 'animal model'/exp NOT ('child'/exp NOT 'adult'/exp) NOT [conference abstract]/lim | 82,989 |
| 2 | 2a Prognostic/ prediction |
('predictive value'/exp AND 'model'/exp) OR ('prognosis'/exp AND ('disease exacerbation'/exp OR 'recurrent disease'/exp OR 'recurrence risk'/exp OR 'relapse'/exp OR 'remission'/exp)) OR predict*:ti OR prognos*:ti OR ((predict* OR prognos*) NEAR/3 (recurr* OR progress* OR relaps* OR remission OR remitting OR 'multiple sclerosis' OR ms)):ti,ab OR ((predict* OR prognos*) NEAR/3 treat* NEAR/3 response):ti,ab OR ((predict* OR prognos*) NEAR/3 disease NEAR/3 activity):ti,ab | 848,229 |
| 3 | 2b General models |
((model* OR decision* OR identif*) NEAR/3 (history OR variable* OR multicomponent* OR multivariable* OR multivariate* OR covariate* OR criteria OR criterion OR scor* OR characteristic* OR finding* OR factor* OR rule*)):ti,ab OR (decision* NEAR/6 model*):ti,ab | 596,768 |
| 4 | 2c Statistical terms |
((logistic OR statistic*) NEAR/3 model*):ti,ab OR (decision*:ti,ab AND 'statistical model'/exp) OR (Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*):ti OR ((Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*):ab AND (prognos* OR predict*):ti,ab) | 1,397,586 |
| 5 | 3 Outcomes |
((disease OR disability OR invalid* OR function* OR outcome OR impairment OR composite OR activity OR severity OR cognitive OR edss OR treatment OR ms OR 'multiple sclerosis' OR brems) NEAR/6 (scor* OR scal* OR status OR assess* OR index OR classification)):ti,ab OR (clinical NEAR/3 (assess* OR activity)):ti,ab OR ((disease OR disabilit* OR risk OR calculat*) NEAR/3 (course OR progression)):ti,ab OR (relaps* NEAR/3 (rate OR frequen* OR time OR prognos* OR predict*)):ti,ab OR (clinical* NEAR/3 decision*):ti,ab OR ((ms OR cdms OR 'multiple sclerosis') NEAR/3 (develop* OR course OR progress* OR relaps* OR clinical*)):ti,ab | 1,680,033 |
| 6 | #1 AND #2 OR (#1 AND (#3 OR #4) AND #5) | 5041 | |
| 7 | (#1 AND #2 OR (#1 AND (#3 OR #4) AND #5)) AND [1996‐2021]/py | 4976 | |
| 8 | Multiple sclerosis conference abstracts | ('multiple sclerosis'/exp/mj OR (ms:ti AND 'multiple sclerosis'/exp) OR (((multipl* OR disseminated OR insular) NEAR/1 sclerosis):ti,ab)) NOT ([animals]/lim NOT [humans]/lim) NOT ('nonhuman'/exp NOT 'human'/exp) NOT 'animal model'/exp NOT ('child'/exp NOT 'adult'/exp) AND [conference abstract]/lim |
33,377 |
| 9 | #8 AND #2 OR (#8 AND (#3 OR #4) AND #5) | 4077 | |
| 10 | Specific conference names | 'european committee for treatment and research in multiple sclerosis':nc OR ectrims:nc OR 'americas committee for treatment and research in multiple sclerosis':nc OR actrims:nc OR 'american academy of neurology':nc OR aan:nc OR 'european academy of neurology':nc OR ean:nc |
49,010 |
| 11 | #9 AND #10 | 2730 | |
| 12 | #9 AND #10 AND [1996‐2021]/py | 2730 |
Databases: Cochrane Database of Systematic Reviews (CDSR; 2021, Issue 6) and Cochrane Central Register of Controlled Trials (CENTRAL; 2021, Issue 6) via www.cochranelibrary.com
Date search conducted: 2 July 2021
Strategy:
| # | Concept | Searches | Results |
| 1 | 1 Multiple sclerosis |
((multipl* OR disseminated OR insular) NEAR/1 sclerosis):ti,ab,kw | 10,654 |
| 2 | 2a Prognostic/ prediction |
predict*:ti OR prognos*:ti OR ((predict* OR prognos*) NEAR/3 (recurr* OR progress* OR relaps* OR remission OR remitting OR 'multiple sclerosis' OR ms)):ti,ab,kw OR ((predict* OR prognos*) NEAR/3 treat* NEAR/3 response):ti,ab,kw OR ((predict* OR prognos*) NEAR/3 disease NEAR/3 activity):ti,ab,kw | 32,700 |
| 3 | 2b General models |
((model* OR decision* OR identif*) NEAR/3 (history OR variable* OR multicomponent* OR multivariable* OR multivariate* OR covariate* OR criteria OR criterion OR scor* OR characteristic* OR finding* OR factor* OR rule*)):ti,ab,kw OR (decision* NEAR/6 model*):ti,ab,kw | 25,595 |
| 4 | 2c Statistical terms |
((logistic OR statistic*) NEAR/3 model*):ti,ab,kw OR (Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*):ti OR ((Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*):ab AND (prognos* OR predict*):ti,ab,kw) | 70,797 |
| 5 | 3 Outcomes |
((disease OR disability OR invalid* OR function* OR outcome OR impairment OR composite OR activity OR severity OR cognitive OR edss OR treatment OR ms OR 'multiple sclerosis' OR brems) NEAR/6 (scor* OR scal* OR status OR assess* OR index OR classification)):ti,ab,kw OR (clinical NEAR/3 (assess* OR activity)):ti,ab,kw OR ((disease OR disabilit* OR risk OR calculat*) NEAR/3 (course OR progression)):ti,ab,kw OR (relaps* NEAR/3 (rate OR frequen* OR time OR prognos* OR predict*)):ti,ab,kw OR (clinical* NEAR/3 decision*):ti,ab,kw OR ((ms OR cdms OR 'multiple sclerosis') NEAR/3 (develop* OR course OR progress* OR relaps* OR clinical*)):ti,ab,kw | 340,882 |
| 6 | (#1 AND #2) OR (#1 AND (#3 OR #4) AND #5) | 583 | |
| 7 | (#1 AND #2) OR (#1 AND (#3 OR #4) AND #5) with Cochrane Library publication date from Jan 1996 to Jul 2021 |
583 |
Appendix 2. Data extraction form
Adapted from CHARMS checklist of Moons 2014.
| Domain | Key items |
| Study information |
|
| Source of data |
|
| Participants |
|
| Outcomes to be predicted |
|
| Candidate predictors |
|
| Sample size |
|
| Missing data |
|
| Model development |
|
| Model performance |
|
| Model evaluation |
|
| Results |
|
| Interpretation and discussion |
|
| Usability and reproducibility of final model |
|
AIC: Akaike information criterion BIC: Bayesian information criterion N: sample size NPV: negative predictive value PPV: positive predictive value TP: true positive TN: true negative
Appendix 3. Definitions used for data extraction
In order to ensure a uniform data extraction from included studies with various reporting styles, we had some working definitions. These are listed below:
Data sources
-
Cohort study: Many studies reported collecting data from a cohort of patients, although other details in their report implicitly or explicitly suggested sources of data other than a cohort study. This suggests that the word 'cohort' is used to refer to a group of patients on whom there is some longitudinal data available rather than a longitudinal study with pre‐defined data collection times and items. After trying to resolve any unclarity with the study authors, for practical purposes, the following occurs.
If words indicating other types of sources (e.g. a well‐known registry like MSBase, or health records) were also used in relation to the data source without explicit definition of a cohort study, we considered the data source to be the other type.
If no other words related to the data source were used while referring to the data, but no specific cohort study was referred to, we assumed the data source to be a cohort study, even though there were clues (e.g. irregular follow‐up times) against it.
In both of the cases above, the reporting of the data source in Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) was marked as unclear.
Primary data use: In line with the suggestion by Wynants 2017, we refrained from using 'retrospective' or 'prospective' for describing the data source. The data source types for which prognostic prediction modelling could be considered as primary data use were case‐control or prospective cohort studies. When the data collection of an included study had vague objectives, like natural history (e.g. Weinshenker 1991) or research on certain predictor domains (e.g. Agosta 2006), we assumed the data collection purpose to be primary, unless it was explicitly reported to be a retrospective data collection (e.g. Wottschel 2015).
Participants
Participants description: When age, disease duration, or sex was reported for subsets of the included patients but not overall (e.g. by outcome or diagnosis type), we calculated and reported weighted averages and pooled standard deviations according to Cohen 1988.
Treatments received: We collected data on disease‐modifying therapies and ignored symptomatic treatments for relapses. If the reported eligibility criteria specified inclusion of only treatment‐naive patients or a requirement for a wash‐out period, irrespective of its length, then treatment received at recruitment was considered to be none. No assumptions were made based on the diagnostic subtype of the included population or inclusion time with respect to disease onset. For instance, due to proponents of treating clinically isolated syndrome (CIS) patients in the literature (e.g. Wiendl 2021), we preferred not to assume they were treatment‐free unless it was explicitly reported.
Outcomes
Blinded assessment: Analyses utilising randomised trial participants as the data source usually referred to source articles reporting trials of drug interventions with blinding. However, this blinding only concerned the intervention under investigation but not the predictors considered in the prognostic prediction modelling study, employing secondary use of data. Hence, outcome assessment was only considered blinded if there was an explicit reporting of the outcome assessment being blinded to baseline status of the study population.
Candidate predictors
Considered predictors were all predictors used in univariate or multivariate analysis of all the models with the outcome of interest and included predictors were all predictors presented as being in the final model.
In line with Moons 2019, predictors were counted in terms of degrees of freedom.
We assumed dummy‐coding of categorical predictors (instead of e.g. one‐hot encoding) for all modelling methods unless the number of features or another type of coding was explicitly reported. Hence, when counting the degrees of freedom of the predictors considered or included in the models, we might have underestimated the number.
The data on number of considered interactions was deemed irrelevant for the following modelling methods, which were assumed to intrinsically take interactions into account: tree‐based methods (e.g. boosting, random forests), neural networks, support vector machines with nonlinear kernels.
Sample size
For continuous outcomes, we considered the number of events to be the number of observations and calculated the events per variable (EPV) accordingly. For models considering both tabular and non‐tabular (imaging) predictors, the EPV was computed using only the number of tabular predictors and number of events. For models considering only non‐tabular predictors, EPV could not be computed. When models used longitudinal data, predictor trajectories were counted as single predictors.
Missing data
We considered exclusion of participants with missing data from the study to be a separate method for handling missing data than complete case analysis, because exclusions were listed in the study exclusion criteria, implying that differences between these participants and included ones could not be explored.
Model development
Shrinkage: We extracted whether a specific shrinkage method was used (e.g. uniform shrinkage) or whether the modelling method induced some form of shrinkage. Penalised regression, support vector machines, random forests, boosting, and Bayesian methods were considered to induce shrinkage. Neural networks were considered to induce shrinkage only if dropout, early stopping, or other regularisation methods were mentioned.
Model usability and reproducibility
Skill required for predictor measurement was categorised into three levels: predictors a patient could assess alone, predictors for which a primary care clinician would be qualified to measure, and predictors requiring a specialist for measurement or interpretation.
Equipment specialisation required for predictor measurement was also categorised into three levels, based on equipment standards in Western Europe: predictors that required no special equipment or that could be measured with equipment found in a typical primary care clinic, equipment found in a standard hospital (for example, magnetic resonance imaging (MRI)), and equipment only found in a speciality centre (for example, optical coherence tomography or multimodal electrophysiological equipment)
We defined whether a model can practically be expected to be used in practice based on whether a model was reported, whether it was provided in a way enabling easy use on future people with multiple sclerosis (MS), and whether instructions on use were given. The categories are: lack of both model and instructions, reporting of only a model for prognostication, reporting of a model and instructions on use, and reporting of a tool for easy use and instructions.
A model’s reproducibility was described by the components given: ‘None’ if no model, tool, data, or code was provided, ‘Model’ if a model was reported, ‘Tool’ if a tool was given (for example, a nomogram or web application), ‘Code’ if analysis code was provided, and ‘Data’ if data were provided (‘DOR’ if the authors explicitly stated that data are available on request).
If coefficients other than intercepts or baseline hazards were not reported or if it was unclear whether all necessary coefficients were reported, the usability and reproducibility measures were rated as unclear.
Appendix 4. Decisions related to risk of bias and applicability
Decisions related to risk of bias
Participants domain
Prediction Model Risk of Bias Assessment Tool (PROBAST) includes items (4.3, 4.4) on the handling of missing data in the analysis domain. However, studies often explicitly used availability of data as an eligibility criterion and excluded participants with missing data. Similarly to Kreuzberger 2020, we decided to address the exclusion of participants with missing data in the participants domain (PROBAST item 1.2) if a study mentioned it as part of the inclusion/exclusion criteria and we addressed it in the analysis domain otherwise. If selection criteria were based on complete examinations and further predictor‐level missing data was addressed in the analysis, ratings in both domains were affected.
We considered registry data sources as being at high risk of bias (PROBAST item 1.1) unless the study authors reported a specific cohort study within the registry. This was in line with the PROBAST tool, which considers a data source to be appropriate when defined methods are consistently applied for participant inclusion/exclusion, predictor assessment, and outcome determination. This is not expected to be true of registries receiving data from many clinics over long periods of time. There may also be issues related to data quality and availability. For instance, Kalincik 2017 describes implementation of quality assessments for MSBase, a popular international multiple sclerosis (MS) registry. However, it is not only unclear whether these assessments are used to improve data quality in or sampling from the database or any of the prognostic studies based on this database, but they also do not address all possible limitations inherent in observational data.
Predictors and outcome domains
Objectively defining the diagnostic conversion from relapsing‐remitting MS (RRMS) to secondary progressive MS (SPMS) is difficult (Ferrazzano 2020). Currently, it is based on retrospective evaluation of gradual worsening in clinical and radiological assessments, independent of relapses (Lublin 2014). While some studies operationalised the definition of conversion to SPMS, e.g. using a priori defined changes in Expanded Disability Status Scale (EDSS), other studies left conversion unclearly defined. We considered the definition of conversion to SPMS using only clinical judgement to be subjective and therefore at high risk of bias, especially when used in studies relying on retrospective data collected across many sites and over long periods of time.
Based on the rationale that validated disability scales and scores (e.g. EDSS), relapses, or functional systems of symptoms are the most commonly used and accepted clinical parameters in MS practice and research, measurements or definitions based on these were generally considered to be objective and not greatly affected by interrater variability or blinding. Hence, these were considered to be at low risk of bias unless there was an indication to the contrary. The EDSS, for example, is a valid measure of MS severity and progression. Although this measure has documented drawbacks, such as greater interrater variability for lower scores, it is robust for measurements over long time periods and is internationally accepted as a primary endpoint in clinical trials (Meyer‐Moock 2014).
Analysis domain
Nonparametric techniques make fewer assumptions, which require more data. Machine learning (ML) modelling methods then require at least as much data as traditional modelling methods, possibly requiring over 200 events per predictor (van der Ploeg 2014). Clear guidance on this topic is lacking, so we used the current recommendation for PROBAST item 4.1, which requires at least 20 events per predictor. For learning methods using non‐tabular input without prior feature extraction, e.g. deep learning models taking raw images as input, events per variable (EPV) could not be defined, and we rated this item as ‘no information’ (NI), unless the sample size was clearly insufficient, as evidenced by the number of inputs and events.
Some studies dealt with heterogeneity in participant observation times by adjusting for follow‐up duration or numbers of visits during follow‐up, without specifying exactly how this was done. In these situations, we considered the study to be at high risk of bias regarding the methods for accounting for the complexity in the data (PROBAST item 4.6). However, when follow‐up duration was considered as a predictor, this was rather considered to be a predictor measured after time of intended prognostication and was addressed in PROBAST item 2.3.
Although an established outcome measure in MS, the EDSS is not without criticism. For example, the EDSS exhibits greater variability for lower scores than for higher scores, has unequal interval distances, and its rate of change depends on baseline values (Meyer‐Moock 2014). Outcome definitions addressing its weaknesses are recognised; however, not all studies within the review used these preferable outcomes. When the ordinal EDSS was predicted as a continuous outcome in a parametric linear regression, we also assessed baseline EDSS range and any use of interactions. If the range was large and interactions were not tested, we considered the study to be at high risk of bias due to insufficiently accounting for the complexity in the data (PROBAST item 4.6).
Calibration is just as important as discrimination in assessing prognostic models in medicine (Steyerberg 2019). For ML algorithms that output class assignments rather than probabilities, calibration measures may seem inappropriate compared to classification measures. However, many ML methods are known to produce poor predicted probabilities, making assessment even more important (Niculescu‐Mizil 2005; Zadrozny 2001). Calibrating ML models is possible using standard software, just as for traditional regression methods, and should be expected in the biomedical setting. Hence, we did not change the interpretation of item 4.7 of PROBAST for different modelling methods and judged studies lacking calibration assessment to be at high risk of bias.
Assessment of whether overfitting and performance optimism were accounted for, especially in ML studies, required information on data pre‐processing and tuning parameter selection, which can both lead to data leakage. Data leakage is the use of information in model training which is not expected to be available at the time of prognostication, leading to overestimation of model performance (Kaufman 2011). Preprocessing steps such as predictor standardisation are done to improve model fit and were therefore treated as a model tuning step. It is best practice to tune, select, and evaluate the model performance on different data, as in, e.g. a nested cross‐validation structure (Hastie 2009; Steyerberg 2019). We rated a model development as at high risk of bias (PROBAST item 4.8) if there was evidence of data leakage. This is related to the PROBAST guidance on all modelling steps being accounted for appropriately during internal validation.
Many ML studies that reported aiming to develop a clinical prediction model stopped short of clearly selecting a final combination of tuning parameters, predictors, and algorithm and then fitting the selected combination to the full dataset. These studies instead focused on presenting the process of model development. In these cases, it was impossible to determine whether the final presented model corresponded to the results from the multivariable analysis, as the final presented model did not seem to exist. Accordingly, we considered there to be no information for which to respond to PROBAST item 4.9.
Decisions related to applicability
We rated a study as having high concern regarding applicability if:
Participants domain: Participants with the outcome at baseline were included. This review is interested in prognostic models rather than diagnostic ones.
Outcomes domain: Outcomes did not have a clear clinical interpretation. This review included studies with clinical outcomes, which are relevant to people with MS and measure their symptoms, functioning and health status. For example, we considered the well‐known composite outcome no evidence of disease activity to be a simple interpretable combination of relapse, disease progression, and magnetic resonance imaging (MRI) activity. On the other hand, outcomes based on complex weighting of many clinical and paraclinical measures were considered to be difficult to interpret as it is not clear what a specific value of such an outcome means for people with MS.
Predictors domain: Only one type of predictor was considered. To be useful, clinical prediction models should use simple and cost‐effective predictors and add more complex predictors when they offer information above and beyond that offered by simple, available predictors such as demographics and disease characteristics (Steyerberg 2019). Studies using only MRI images, for example, are rated as at high concern for applicability. This review is interested in multivariable prediction models. While such studies may technically be multivariable, they ignore the prognostic value of other, possibly easier to collect, predictors.
Overall: The main objective according to the study report was not development or validation of a model for predicting future clinical outcomes in individuals with MS. The distinction between multivariable models used for prognostication and those used for other purposes can be unclear even when considering the full text, making exclusion of all models for other purposes difficult. When prognostication is not the main aim, the methods may not be optimal for this purpose.
Additionally, we rated study applicability as unclear if:
Overall: The study did not include sufficient details on a final model to allow for validation by unrelated researchers. Model coefficients, nomograms, scores, score charts, and web‐based tools and calculators were considered sufficient, while a list of important predictors was considered insufficient. Studies not reporting a final model are likely to be interested in the importance of methods or predictors, not prognostication of outcomes in individuals. Some studies mentioned the word pipeline, but we did not consider a pipeline to be a complete model/tool directly usable by clinicians and people with MS.
Participants: No eligibility criteria were reported other than the diagnosis of MS for the study population. Having no eligibility criteria other than MS diagnosis seemed unreasonably broad and to be hinting at an underreporting issue.
Appendix 5. Data tables
2. Study characteristics.
| Analysis | Outcome | Study type | Data source | Recruitment from |
| Agosta 2006 | Disability | Development | Cohort (primary use) | Italy (single site) |
| Bejarano 2011 Dev | Disability | Development + validation | Cohort (primary use) | Spain (single site) |
| Bejarano 2011 Val | Disability | Development + validation (location): model refit | Cohort (primary use) | Italy (single site) |
| Bergamaschi 2015 BREMSO MSSS Val | Disability | Validation (location, time): predictors dropped and different outcome | Registry (secondary use) | Italy, Canada, Australia, Spain, Netherlands, Argentina, Iran, Kuwait, Turkey, Denmark, Czech Republic, Portugal, French, Belgium, UK, Germany, Cuba, Israel, Hungary, USA, India, Mexico, Malta, Macedonia, Romania, Brazil |
| De Brouwer 2021 | Disability | Development | Registry (secondary use) | ND (multisite) |
| de Groot 2009 Dexterity | Disability | Development | Cohort (primary use) | Netherlands (multisite) |
| de Groot 2009 Walking | Disability | Development | Cohort (primary use) | Netherlands (multisite) |
| Kuceyeski 2018 | Disability | Development | Mixed: cohort, registry, routine care (secondary use) | ND (ND site) |
| Law 2019 Ada | Disability | Development | Randomised trial participants (secondary use) | Canada, United Kingdom, Netherlands, Sweden, Denmark, Finland, Germany, Estonia, Latvia, Spain |
| Law 2019 DT | Disability | Development | Randomised trial participants (secondary use) | Canada, United Kingdom, Netherlands, Sweden, Denmark, Finland, Germany, Estonia, Latvia, Spain |
| Law 2019 RF | Disability | Development | Randomised trial participants (secondary use) | Canada, United Kingdom, Netherlands, Sweden, Denmark, Finland, Germany, Estonia, Latvia, Spain |
| Lejeune 2021 Dev | Disability | Development + external validation | Randomised trial participants (secondary use) | France (multisite) |
| Lejeune 2021 Ext Val | Disability | Development + external validation (location, spectrum) | Routine care (secondary use) | France (single site) |
| Malpas 2020 Dev | Disability | Development + external validation | Registry (secondary use) | ND (multisite) |
| Malpas 2020 Ext Val | Disability | Development + external validation (location) | Registry (secondary use) | Sweden (multisite) |
| Mandrioli 2008 Dev | Disability | Development + external validation | Cohort (secondary use) | Italy (single site) |
| Mandrioli 2008 Ext Val | Disability | Development + external validation (time) | Cohort (secondary use) | Italy (single site) |
| Margaritella 2012 | Disability | Development | Routine care (secondary use) | Italy (single site) |
| Montolio 2021 | Disability | Development | Routine care (secondary use) | Spain (single site) |
| Oprea 2020 | Disability | Development | Routine care (secondary use) | Romania (single site) |
| Pinto 2020 severity 10 years | Disability | Development | Routine care (secondary use) | Portugal (single site) |
| Pinto 2020 severity 6 years | Disability | Development | Routine care (secondary use) | Portugal (single site) |
| Roca 2020 | Disability | Development | Registry (secondary use) | France (multisite) |
| Rocca 2017 | Disability | Development | Cohort (primary use) | Italy (multisite) |
| Rovaris 2006 | Disability | Development | Cohort (primary use) | Italy (multisite) |
| Sombekke 2010 | Disability | Development | Unclear (secondary use) | Netherlands (single site) |
| Szilasiova 2020 | Disability | Development | Cohort (secondary use) | Slovak Republic (single site) |
| Tommasin 2021 | Disability | Development | Unclear (secondary use) | Italy (multisite) |
| Tousignant 2019 | Disability | Development | Randomised trial participants (secondary use) | ND (multisite) |
| Weinshenker 1991 M3 Dev | Disability | Development | Cohort (primary use) | Canada (single site) |
| Weinshenker 1996 M3 Ext Val | Disability | External validation (location) | Routine care (secondary use) | Canada (single site) |
| Weinshenker 1996 short‐term | Disability | Development | Routine care (secondary use) | Canada (single site) |
| Yperman 2020 | Disability | Development | Routine care (secondary use) | Belgium (single site) |
| Zhao 2020 LGBM All | Disability | Development + validation | Cohort (primary use) | USA (single site) |
| Zhao 2020 LGBM Common | Disability | Development + validation | Cohort (primary use) | USA (single site) |
| Zhao 2020 LGBM Common Val | Disability | Development + validation (location): unclear if model refit | Cohort (primary use) | USA (single site) |
| Zhao 2020 XGB All | Disability | Development + validation | Cohort (primary use) | USA (single site) |
| Zhao 2020 XGB Common | Disability | Development + validation | Cohort (primary use) | USA (single site) |
| Zhao 2020 XGB Common Val | Disability | Development + validation (location): unclear if model refit | Cohort (primary use) | USA (single site) |
| Gurevich 2009 FLP Dev | Relapse | Development + external validation | Unclear (unclear use) | Israel (single site) |
| Gurevich 2009 FLP Ext Val | Relapse | Development + external validation | Unclear (unclear use) | Israel (single site) |
| Gurevich 2009 FTP | Relapse | Development | Unclear (unclear use) | Israel (single site) |
| Sormani 2007 Dev | Relapse | Development + external validation | Randomised trial participants (secondary use) | Argentina, Australia, Austria, Belgium, Canada, Denmark, France, Germany, Hungary, Israel, Italy, Netherlands, New Zealand, Spain, Sweden, Switzerland, UK, USA |
| Sormani 2007 Ext Val | Relapse | Development + external validation (spectrum) | Randomised trial participants (secondary use) | Europe (undefined), Canada |
| Vukusic 2004 | Relapse | Development | Cohort (primary use) | Unclear (total PRIMS cohort): France, Austria, Belgium, Netherlands, Italy, Denmark, Spain, Germany, United Kingdom, Portugal, Switzerland, Ireland |
| Ye 2020 gene signature | Relapse | Development | Unclear (secondary use) | Israel (single site) |
| Ye 2020 nomogram | Relapse | Development | Unclear (secondary use) | Israel (single site) |
| Aghdam 2021 | Conversion to definite MS | Development | Cohort (secondary use) | Iran (single site) |
| Bendfeldt 2019 Linear Placebo | Conversion to definite MS | Development | Randomised trial participants (secondary use) | Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada |
| Bendfeldt 2019 M7 Placebo | Conversion to definite MS | Development | Randomised trial participants (secondary use) | Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada |
| Bendfeldt 2019 M9 IFN | Conversion to definite MS | Development | Randomised trial participants (secondary use) | Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada |
| Borras 2016 | Conversion to definite MS | Development | Cohort (unclear use) | Spain (single site) |
| Gout 2011 | Conversion to definite MS | Development | Registry (secondary use) | France (single site) |
| Martinelli 2017 | Conversion to definite MS | Development | Routine care (unclear use) | Italy (single site) |
| Olesen 2019 candidate | Conversion to definite MS | Development | Cohort (primary use) | Denmark (multisite) |
| Olesen 2019 routine | Conversion to definite MS | Development | Cohort (primary use) | Denmark (multisite) |
| Runia 2014 | Conversion to definite MS | Development | Cohort (primary use) | Netherlands (single site) |
| Spelman 2017 | Conversion to definite MS | Development | Cohort (primary use) | ND (multisite) |
| Wottschel 2015 1 year | Conversion to definite MS | Development | Cohort (secondary use) | UK (single site) |
| Wottschel 2015 3 year | Conversion to definite MS | Development | Cohort (secondary use) | UK (single site) |
| Wottschel 2019 | Conversion to definite MS | Development | Cohort (secondary use) | Spain, Denmark, Austria, UK, Italy |
| Yoo 2019 | Conversion to definite MS | Development | Randomised trial participants (secondary use) | Canada, United States |
| Zakharov 2013 | Conversion to definite MS | Development | Unclear (unclear use) | Russia (single site) |
| Zhang 2019 | Conversion to definite MS | Development | Cohort (primary use) | Germany (single site) |
| Bergamaschi 2001 BREMS Dev | Conversion to progressive MS | Development | Mixed: registry, routine care (secondary use) | Italy (single site) |
| Bergamaschi 2007 BREMS Ext Val | Conversion to progressive MS | External validation (location) | Cohort (secondary use) | Italy (multi‐site) |
| Bergamaschi 2015 BREMS Ext Val | Conversion to progressive MS | External validation (location, time) | Registry (secondary use) | Italy, Canada, Australia, Spain, Netherlands, Argentina, Iran, Kuwait, Turkey, Denmark, Czech Republic, Portugal, French, Belgium, UK, Germany, Cuba, Israel, Hungary, USA, India, Mexico, Malta, Macedonia, Romania, Brazil |
| Bergamaschi 2015 BREMSO SP Val | Conversion to progressive MS | Validation (location, time): predictors dropped | Registry (secondary use) | Italy, Canada, Australia, Spain, Netherlands, Argentina, Iran, Kuwait, Turkey, Denmark, Czech Republic, Portugal, French, Belgium, UK, Germany, Cuba, Israel, Hungary, USA, India, Mexico, Malta, Macedonia, Romania, Brazil |
| Brichetto 2020 | Conversion to progressive MS | Development | Cohort (primary use) | Italy (multisite) |
| Calabrese 2013 Dev | Conversion to progressive MS | Development + external validation | Cohort (primary use) | Italy (single site) |
| Calabrese 2013 Ext Val | Conversion to progressive MS | Development + external validation (time) | Cohort (primary use) | Italy (single site) |
| Manouchehrinia 2019 Dev | Conversion to progressive MS | Development + external validation | Registry (secondary use) | Sweden (multisite) |
| Manouchehrinia 2019 Ext Val 1 | Conversion to progressive MS | Development + external validation (location, time, spectrum) | Cohort (secondary use) | Canada (multisite) |
| Manouchehrinia 2019 Ext Val 2 | Conversion to progressive MS | Development + external validation (location, time, spectrum) | Randomised trial participants (secondary use) | Canada, Denmark, France, Germany, Italy, Poland, Portugal, Spain, Switzerland, United Kingdom |
| Manouchehrinia 2019 Ext Val 3 | Conversion to progressive MS | Development + external validation (location, time, spectrum) | Randomised trial participants (secondary use) | ND (multisite) |
| Misicka 2020 10 years | Conversion to progressive MS | Development | Registry (secondary use) | USA (multisite) |
| Misicka 2020 20 years | Conversion to progressive MS | Development | Registry (secondary use) | USA (multisite) |
| Misicka 2020 ever | Conversion to progressive MS | Development | Registry (secondary use) | USA (multisite) |
| Pinto 2020 SP | Conversion to progressive MS | Development | Routine care (secondary use) | Portugal (single site) |
| Pisani 2021 | Conversion to progressive MS | Development | Cohort (secondary use) | Italy (single site) |
| Seccia 2020 180 days | Conversion to progressive MS | Development | Routine care (secondary use) | Italy (single site) |
| Seccia 2020 360 days | Conversion to progressive MS | Development | Routine care (secondary use) | Italy (single site) |
| Seccia 2020 720 days | Conversion to progressive MS | Development | Routine care (secondary use) | Italy (single site) |
| Skoog 2014 Dev | Conversion to progressive MS | Development | Cohort (primary use) | Sweden (single site) |
| Skoog 2019 Ext Val | Conversion to progressive MS | External validation (location, time) | Registry (secondary use) | Sweden (single site) |
| Skoog 2019 Val | Conversion to progressive MS | Validation | Cohort (primary use) | Sweden (single site) |
| Tacchella 2018 180 days | Conversion to progressive MS | Development | Routine care (secondary use) | Italy (single site) |
| Tacchella 2018 360 days | Conversion to progressive MS | Development | Routine care (secondary use) | Italy (single site) |
| Tacchella 2018 720 days | Conversion to progressive MS | Development | Routine care (secondary use) | Italy (single site) |
| Vasconcelos 2020 Dev | Conversion to progressive MS | Development + external validation | Unclear (unclear use) | Brazil (single site) |
| Vasconcelos 2020 Ext Val | Conversion to progressive MS | Development + external validation (time) | Unclear (unclear use) | Brazil (single site) |
| Ahuja 2021 Dev | Composite | Development + external validation | Mixed: routine care (electronic health records), cohort (secondary use) | United States (single site) |
| Ahuja 2021 Ext Val | Composite | Development + external validation (spectrum) | Routine care: electronic health records (secondary use) | United States (multisite) |
| de Groot 2009 cognitive | Composite | Development | Cohort (primary use) | Netherlands (multisite) |
| Kosa 2022 | Composite | Development | Mixed: case‐control, cohort (primary use) | USA (ND site) |
| Pellegrini 2019 | Composite | Development | Randomised trial participants (secondary use) | Australia, Austria, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Canada, Chile, Colombia, Costa Rica, Croatia, Czech Republic, Estonia, France, Georgia, Germany, Greece, Guatemala, India, Ireland, Israel, Latvia, Mexico, Macedonia, Netherlands, Moldova, New Zealand, Peru, Poland, Romania, Russian Federation, Puerto Rico, Serbia, Slovakia, South Africa, Switzerland, Spain, Ukraine, United Kingdom, United States, Virgin Islands (USA) |
Ada: adaptive boosting BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset Dev: development DT: decision tree Ext: external FLP: first level predictor FTP: fine tuning predictor IFN: interferon LGBM: light gradient boosting machine MS: multiple sclerosis MSSS: multiple sclerosis severity score ND: no data available PRIMS: pregnancy in multiple sclerosis RF: random forest SP: secondary progressive Val: validation XGB: extreme gradient boosting
3. Participant characteristics.
| Analysis | Outcome | Females | Age (years) | Diagnosis (criteria) | Disease duration (years) | Treated | Clinical description |
| Agosta 2006 | Disability | 70% | Mean: 33.5 | 27.4% CIS, 46.6% RRMS, 26.0% SPMS (Lublin 1996; Poser 1983) | Range: 0 to 25 | Recruitment: 0%, follow‐up: 55% | EDSS median (range): CIS 0.0 (0.0 to 1.5), RRMS 2.5 (1.0 to 5.5), SPMS 5.5 (3.5 to 6.5) |
| Bejarano 2011 Dev | Disability | 65% | Mean: 35.1 | 31.4% CIS, 51.0% RRMS, 5.9% SPMS, 7.8% PPMS, 3.9% PRMS (McDonald 2005) | Mean: 5.9, SD: 7.4 | Recruitment: 55%, follow‐up: ND/unclear | EDSS median (range): 2.0 (0 to 6), number of relapses in previous 2 years mean (SD): 1.29 (1.51) |
| Bejarano 2011 Val | Disability | 67% | Mean: 37 | 88.5% RRMS, 11.5% SPMS (McDonald 2005) | Mean: 9, SD: 6 | Recruitment: ND/unclear, follow‐up: ND/unclear | EDSS median (range): 1.5 (0 to 6.5) |
| Bergamaschi 2015 BREMSO MSSS Val | Disability | 71% | Mean: 31.1 | 100% RRMS (McDonald 2001) | ND/unclear | Recruitment: ND/unclear, follow‐up: 72% | ND |
| De Brouwer 2021 | Disability | 71% | Mean (at onset): 32.2 | 85.6% RRMS, 4.9% SPMS, 3.3% PPMS, 1.4% PRMS, 4.8% unknown (Lublin 1996) | Mean: 6.88, range: 3 to 25 | Recruitment: ND/unclear, follow‐up: ND/unclear | Prior 3‐year EDSS per patient mean (SD, range): 2.38 (1.48, 0 to 8.5) |
| de Groot 2009 dexterity | Disability | 64% | Mean: 37.4 | 82% relapse onset, 18% non‐relapse onset (Poser 1983) | Max: 0.5 | Recruitment: 6%, follow‐up: 30% | EDSS median (IQR): 2.5 (2.0 to 3.0) |
| de Groot 2009 walking | Disability | 64% | Mean: 37.4 | 82% relapse onset, 18% non‐relapse onset (Poser 1983) | Max: 0.5 | Recruitment: 6%, follow‐up: 30% | EDSS median (IQR): 2.5 (2.0 to 3.0) |
| Kuceyeski 2018 | Disability | 73% | Mean (unclear when): 36.8 | 100% RRMS (McDonald 2010; McDonald 2017) | Mean: 1.5, SD: 1.3 | Recruitment: 95%, follow‐up: ND/unclear | EDSS mean (SD): 1.1 (1.1) |
| Law 2019 Ada | Disability | 64% | Mean: 50.9 | 100% SPMS (own definition) | Mean: 9.3, SD: 5 | Recruitment: ND/unclear, follow‐up: 50% | EDSS median (IQR): 6.0 (4.5 to 6.5) |
| Law 2019 DT | Disability | 64% | Mean: 50.9 | 100% SPMS (own definition) | Mean: 9.3, SD: 5 | Recruitment: ND/unclear, follow‐up: 50% | EDSS median (IQR): 6.0 (4.5 to 6.5) |
| Law 2019 RF | Disability | 64% | Mean: 50.9 | 100% SPMS (own definition) | Mean: 9.3, SD: 5 | Recruitment: ND/unclear, follow‐up: 50% | EDSS median (IQR): 6.0 (4.5 to 6.5) |
| Lejeune 2021 Dev | Disability | 76% | Mean (unclear when): 35.3 | 100% RRMS (McDonald 2005) | Mean: 7.32, SD: 5.5 | Recruitment: 55%, follow‐up: unclear, 32.8% therapeutic escalation, 59.1% no DMT change | EDSS mean (SD): 3.45 (0.96) |
| Lejeune 2021 Ext Val | Disability | 77% | Mean (unclear when): 36.2 | 100% RRMS (McDonald 2005) | Mean: 7.62, SD: 6.56 | Recruitment: 59%, follow‐up: unclear, 48% therapeutic escalation, 49.1% no DMT change | EDSS Mean (SD): 2.93 (1.00) |
| Malpas 2020 Dev | Disability | 71% | Mean (at onset): 31.7 | 100% RRMS (McDonald 2010) | Mean: 0.33, SD: 0.3 | Recruitment: unclear number of participants, mean percentage of time on treatment 1st year, first‐line 17.1%, second‐line 0.50%, follow‐up: unclear number of participants, mean percentage of time on treatment 10th year, 46% first‐line, 5.3% second‐line | First year EDSS mean (SD): 1.78 (1.26), number of relapses mean (SD): 0.74 (0.93) |
| Malpas 2020 Ext Val | Disability | ND | Mean (at onset): 33.4 | 100% RRMS (McDonald 2010) | ND/unclear | Recruitment: ND/unclear, follow‐up: ND/unclear | First year EDSS mean (SD): 1.51 (1.28) |
| Mandrioli 2008 Dev | Disability | 61% | Mean (at onset): 27.6 | 100% RRMS (ND) | ND/unclear | Recruitment: ND/unclear, follow‐up: 70% | EDSS at diagnosis mean (SD): BMS 1.76 (0.24), SMS 2.17 (0.18) |
| Mandrioli 2008 Ext Val | Disability | 62% | Mean (at onset): 33 | 100% RRMS (ND) | ND/unclear | Recruitment: ND/unclear, follow‐up: 68% | EDSS at diagnosis mean (SD): BMS 1.65 (0.10), SMS 2.45 (0.23) |
| Margaritella 2012 | Disability | 79% | Mean (at onset): 28.6 | 89.7% RRMS, 3.4% PPMS, 6.9% Benign MS (McDonald 2001; McDonald 2005) | Mean: 10.1, SD: 7.3 | Recruitment: ND/unclear, follow‐up: ND/unclear | EDSS mean (SD): 2.1 (1.5) |
| Montolio 2021 | Disability | 67% | Mean: 42.4 | 92.7% RRMS, 6.1% SPMS, 1.2% PPMS (McDonald 2001) | Mean: 10.1, pooled SD: 7.74 | Recruitment: ND/unclear, follow‐up: 70% | EDSS mean: 2.6 (SD between 1.27 to 2.02) |
| Oprea 2020 | Disability | 62% | Mean: 40.3 | Unclear: RRMS, PPMS (ND) | Mean: 10.2 | Recruitment: ND/unclear, follow‐up: 100% | ND |
| Pinto 2020 severity 10 years | Disability | 78% | Mean (at onset): 32.3 | 100% RRMS (McDonald (undefined)) | ND/unclear | Recruitment: ND/unclear, follow‐up: ND/unclear | ND |
| Pinto 2020 severity 6 years | Disability | 70% | Mean (at onset): 30.3 | 100% RRMS (McDonald (undefined)) | ND/unclear | Recruitment: ND/unclear, follow‐up: ND/unclear | ND |
| Roca 2020 | Disability | ND | ND/unclear | ND (ND) | ND/unclear | Recruitment: ND/unclear, follow‐up: ND/unclear | ND |
| Rocca 2017 | Disability | 50% | Mean: 51.3 | 100% PPMS (Thompson 2000) | Median: 10, range: 2 to 26 | Recruitment: 33%, follow‐up: 18% | EDSS median (IQR): 6.0 (4.5 to 6.5) |
| Rovaris 2006 | Disability | 50% | Mean: 51.3 | 100% PPMS (Thompson 2000) | Median: 10, range: 2 to 26 | Recruitment: 33%, follow‐up: 18% | EDSS median (range): 5.5 (2.5 to 7.5) |
| Sombekke 2010 | Disability | 64% | Mean (at onset): 32.4 | 51.2% RRMS, 31.4% SPMS, 17.4% PPMS (Poser 1983; McDonald 2006) | Mean: 13.1, SD: 8.3 | Recruitment: ND/unclear, follow‐up: ND/unclear | EDSS median (IQR): 4.0 (3.5) |
| Szilasiova 2020 | Disability | 65% | ND/unclear | 63.5% RRMS, 29.4% SPMS, 7.1% PPMS (McDonald 2001) | Mean: 6.7, range: 0.5 to 30 | Recruitment: ND/unclear, follow‐up: 100% | EDSS mean (SD, range): 3.03 (1.5, 1.0 to 7.0) |
| Tommasin 2021 | Disability | 64% | Mean: 39.7 | 74.8% RRMS, 25.2% PMS (McDonald 2010; McDonald 2017) | Mean: 9.9, SD: 8.06 | Recruitment: ND/unclear, follow‐up: 72% | EDSS median (range): 3.0 (0.0 to 7.5) |
| Tousignant 2019 | Disability | ND | ND/unclear | 100% RRMS (ND) | ND/unclear | Recruitment: 0%, follow‐up: 0% | ND |
| Weinshenker 1991 M3 Dev | Disability | 66% | Mean (at onset): 30.5 | 65.8% RRMS, 14.8% relapsing progressive, 18.7% chronically progressive, 0.9% unknown/83.3% diagnosis probable, 16.4% diagnosis possible (Poser 1983) | Mean: 11.9, SE: 0.3 | Recruitment: 0%, follow‐up: 0% | ND |
| Weinshenker 1996 M3 Ext Val | Disability | 69% | Mean: 44.1 | 84.3% RRMS, 2.0% relapsing progressive, 13.7% chronically progressive (ND) | Mean: 12 | Recruitment: ND/unclear, follow‐up: ND/unclear | Unclear |
| Weinshenker 1996 short‐term | Disability | 69% | Mean: 44.1 | 84.3% RRMS, 2.0% relapsing progressive, 13.7% chronically progressive (ND) | Mean: 12 | Recruitment: ND/unclear, follow‐up: ND/unclear | Unclear |
| Yperman 2020 | Disability | 72% | Mean: 45 | CIS 1.7%, RRMS 53.2%, SPMS 10.7%, PPMS 2.9%, unknown 32.9% (unrecorded in the dataset) | ND/unclear | Recruitment: 74%, follow‐up: 79% | EDSS mean (SD): 3.0 (1.8) |
| Zhao 2020 LGBM All | Disability | 76% | Mean (unclear when): 39 | Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND) | Median: 2, range: 0 to 44 | Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment | Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5) |
| Zhao 2020 LGBM Common | Disability | 76% | Mean (unclear when): 39 | Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND) | Median: 2, range: 0 to 44 | Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment | Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5) |
| Zhao 2020 LGBM Common Val | Disability | 69% | Mean (unclear when): 42.5 | Unclear: 15.9% CIS, 70.8% RRMS, 9.3% SPMS, 3.9% PPMS (ND) | Median: 6, range: 0 to 45 | Recruitment: unclear, for source cohort, 93.4% DMT first line, 0.7% DMT oral, 1.6% DMT high, 0.7% experimental, 0.5% immune, 1.6% steroid, 1.6% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 57.4% DMT first line, 19.8% DMT oral, 9.6% DMT high, 3% experimental, 0.5% immune, 5% steroid, 4.8% MS other, 15.1% never on treatment | Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 3; 0 to 7) |
| Zhao 2020 XGB All | Disability | 76% | Mean (unclear when): 39 | Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND) | Median: 2, range: 0 to 44 | Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment | Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5) |
| Zhao 2020 XGB Common | Disability | 76% | Mean (unclear when): 39 | Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND) | Median: 2, range: 0 to 44 | Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment | Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5) |
| Zhao 2020 XGB Common Val | Disability | 69% | Mean (unclear when): 42.5 | Unclear: 15.9% CIS, 70.8% RRMS, 9.3% SPMS, 3.9% PPMS (ND) | Median: 6, range: 0 to 45 | Recruitment: unclear, for source cohort, 93.4% DMT first line, 0.7% DMT oral, 1.6% DMT high, 0.7% experimental, 0.5% immune, 1.6% steroid, 1.6% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 57.4% DMT first line, 19.8% DMT oral, 9.6% DMT high, 3% experimental, 0.5% immune, 5% steroid, 4.8% MS other, 15.1% never on treatment | Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 3; 0 to 7) |
| Gurevich 2009 FLP Dev | Relapse | 64% | Mean (unclear when): 36.3 | 34.0% CIS, 66.0% CDMS (McDonald 2001) | Mean: 5.67, pooled SD: 0.89 | Recruitment: 0%, follow‐up: 35% | EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1) |
| Gurevich 2009 FLP Ext Val | Relapse | ND | ND/unclear | Unclear: CIS 60%, CDMS 40% (McDonald 2001) | ND/unclear | Recruitment: 0%, follow‐up: unclear, 9 on IMD | Unclear, published inconsistencies, EDSS (unclear if mean and SD): CIS 2.58 (0.15) CDMS 5.3 (2.39), annualised relapse rate (unclear if mean and SD): CIS 6.1 (2.05) CDMS 1 (0.51) |
| Gurevich 2009 FTP | Relapse | 64% | Mean (unclear when): 36.3 | 34.0% CIS, 66.0% CDMS (McDonald 2001) | Mean: 5.67, pooled SD: 0.89 | Recruitment: 0%, follow‐up: 35% | EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1) |
| Sormani 2007 Dev | Relapse | ND | Median: 37 | 100% RRMS (Poser 1983) | Median: 5.9, range: 0.6 to 30 | Recruitment: 0%, follow‐up: 1% | EDSS median (range): 2.0 (0.0 to 5.0), prior 2‐year number of relapses (range): 2 (1 to 11) |
| Sormani 2007 Ext Val | Relapse | ND | Median: 34 | 100% RRMS (Poser 1983) | Median: 3.8, range: 0.5 to 22 | Recruitment: 0%, follow‐up: 0% | EDSS median (range): 2.0 (0.0 to 4.0), prior 2‐year number of relapses (range): 2 (1 to 8) |
| Vukusic 2004 | Relapse | 100% | Mean: 30 | 96% RRMS, 4% SPMS (Poser 1983) | Mean: 6, SD: 4 | Recruitment: 0%, follow‐up: 2% | DSS at beginning of pregnancy mean (SD): 1.3 (1.4), annualised relapse rate during year before pregnancy (95% CI): 0.7 (0.6 to 0.8) |
| Ye 2020 gene signature | Relapse | 64% | Mean (unclear when): 36.3 | 34.0% CIS, 66.0% CDMS (McDonald 2001) | Mean: 5.67, pooled SD: 0.89 | Recruitment: ND/unclear, follow‐up: 31% | EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1) |
| Ye 2020 nomogram | Relapse | 64% | Mean (unclear when): 36.3 | 34.0% CIS, 66.0% CDMS (McDonald 2001) | Mean: 5.67, pooled SD: 0.89 | Recruitment: ND/unclear, follow‐up: 31% | EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1) |
| Aghdam 2021 | Conversion to definite MS | 74% | Mean (unclear when): 40 | 100% CIS (McDonald 2010) | ND/unclear | Recruitment: 0%, follow‐up: ND/unclear | History of ON: 11.9% |
| Bendfeldt 2019 linear placebo | Conversion to definite MS | 71% | Mean: 30.8 | 100% CIS (own definition) | Max: 0.16 | Recruitment: 0%, follow‐up: 0% | EDSS median (range): conv‐ 1.5 (1.0 to 2.0), conv+ 1.5 (1.0 to 2.0) |
| Bendfeldt 2019 M7 placebo | Conversion to definite MS | 70% | Mean: 29.7 | 100% CIS (own definition) | Max: 0.16 | Recruitment: 0%, follow‐up: 0% | EDSS median (range): conv‐ 1.0 (0.0 to 2.0), conv+ 1.5 (1.0 to 2.0) |
| Bendfeldt 2019 M9 IFN | Conversion to definite MS | 66% | Mean: 29.6 | 100% CIS (own definition) | Max: 0.16 | Recruitment: 0%, follow‐up: 100% | EDSS median (range): conv‐ 2.0 (1.0 to 2.0), conv+ 2.0 (1.0 to 2.5) |
| Borras 2016 | Conversion to definite MS | 66% | Median (unclear when): 35.5 | 100% CIS (ND) | Median: 0.22, range: 0.01 to 0.35 | Recruitment: ND/unclear, follow‐up: 8% | EDSS median (range): 1.5 (0 to 5) |
| Gout 2011 | Conversion to definite MS | 70% | Median: 31 | 100% CIS (ND) | ND/unclear | Recruitment: 0%, follow‐up: 0% | EDSS median (range): 2 (0 to 6) |
| Martinelli 2017 | Conversion to definite MS | 68% | Mean: 32 | 100% CIS (ND) | Max: 0.25 | Recruitment: ND/unclear, follow‐up: 40% | ND |
| Olesen 2019 candidate | Conversion to definite MS | 68% | Median: 36 | 100% CIS (Optic Neuritis Study Group criteria 1991) | ND/unclear | Recruitment: 0%, follow‐up: ND/unclear | ND |
| Olesen 2019 routine | Conversion to definite MS | 68% | Median: 36 | 100% CIS (Optic Neuritis Study Group criteria 1991) | ND/unclear | Recruitment: 0%, follow‐up: ND/unclear | ND |
| Runia 2014 | Conversion to definite MS | 73% | ND/unclear | 100% CIS (own definition) | Max: 0.5 | Recruitment: ND/unclear, follow‐up: ND/unclear | ND |
| Spelman 2017 | Conversion to definite MS | 71% | Median (at MS onset): 31.6 | 100% CIS (Poser 1983) | Max: 1 | Recruitment: ND/unclear, follow‐up: 28% | EDSS median (IQR): 2 (1 to 2.5) |
| Wottschel 2015 1 year | Conversion to definite MS | 66% | Mean: 33.1 | 100% CIS (ND) | Mean: 0.12, SD: 0.07 | Recruitment: 0%, follow‐up: 0% | EDSS median (range): 1 (0 to 8) |
| Wottschel 2015 3 years | Conversion to definite MS | 67% | Mean: 33.2 | 100% CIS (ND) | Mean: 0.12, SD: 0.07 | Recruitment: 0%, follow‐up: 0% | EDSS median (range): 1 (0 to 8) |
| Wottschel 2019 | Conversion to definite MS | 66% | Mean (at onset): 32.7 | 100% CIS (ND) | Max: 0.27 | Recruitment: ND/unclear, follow‐up: ND/unclear | EDSS median (range): 2 (0 to 8) |
| Yoo 2019 | Conversion to definite MS | 69% | Mean (at onset): 35.9 | 100% CIS (ND) | Median: 0.23, range: 0.06 to 0.52 | Recruitment: 0%, follow‐up: 50% | EDSS median (range): 1.5 (0 to 4.5) |
| Zakharov 2013 | Conversion to definite MS | 70% | Mean: 25.1 | 100% CIS (ND) | ND/unclear | Recruitment: ND/unclear, follow‐up: ND/unclear | EDSS ≤ 2 |
| Zhang 2019 | Conversion to definite MS | 70% | Mean (unclear when): 42.4 | 100% CIS (McDonald 2010) | ND/unclear | Recruitment: 1%, follow‐up: ND/unclear | EDSS median: 1 |
| Bergamaschi 2001 BREMS Dev | Conversion to progressive MS | 63% | Mean: 28.5 | 100% RRMS (Poser 1983; Lublin 1996) | ND/unclear | Recruitment: 10%, follow‐up: 10% | ND |
| Bergamaschi 2007 BREMS Ext Val | Conversion to progressive MS | 69% | Median: 24.8 | 100% RRMS (Poser 1983) | ND/unclear | Recruitment: 3%, follow‐up: 57% | ND |
| Bergamaschi 2015 BREMS Ext Val | Conversion to progressive MS | 71% | Mean: 31.1 | 100% RRMS (McDonald 2001) | ND/unclear | Recruitment: ND/unclear, follow‐up: 72% | ND |
| Bergamaschi 2015 BREMSO SP Val | Conversion to progressive MS | 71% | Mean: 31.1 | 100% RRMS (McDonald 2001) | ND/unclear | Recruitment: ND/unclear, follow‐up: 72% | ND |
| Brichetto 2020 | Conversion to progressive MS | ND | ND/unclear | Unclear: unclear, RRMS, SPMS (ND) | ND/unclear | Recruitment: ND/unclear, follow‐up: ND/unclear | ND |
| Calabrese 2013 Dev | Conversion to progressive MS | 67% | Mean: 35.3 | 100% RRMS (McDonald 2001) | Mean: 11.3, range: 5 to 23 | Recruitment: 100%, follow‐up: 100% | EDSS median (range): 2.5 (0 to 4.5) |
| Calabrese 2013 Ext Val | Conversion to progressive MS | 60% | Mean: 34.5 | 100% RRMS (McDonald 2001) | Mean: 10.5, range: 10 to 21 | Recruitment: 100%, follow‐up: 100% | ND |
| Manouchehrinia 2019 Dev | Conversion to progressive MS | 72% | Mean (at onset): 31.5 | 100% RRMS (McDonald (undefined)) | ND/unclear | Recruitment: unclear, a minority, follow‐up: unclear number of participants, median duration of exposures first‐line 3, second‐line 0.8 | First to recorded EDSS median (IQR): 2 (1 to 3) |
| Manouchehrinia 2019 Ext Val 1 | Conversion to progressive MS | 74% | Mean (at onset): 31.1 | 100% RRMS (Poser 1983) | ND/unclear | Recruitment: ND/unclear, follow‐up: unclear number of participants, median 0 | First to recorded EDSS median (IQR): 2 (1 to 3) |
| Manouchehrinia 2019 Ext Val 2 | Conversion to progressive MS | 67% | Mean (at onset): 29.5 | 100% RRMS (McDonald 2001) | ND/unclear | Recruitment: 0%, follow‐up: 100% | First to recorded EDSS median (IQR): 2 (1.5 to 3) |
| Manouchehrinia 2019 Ext Val 3 | Conversion to progressive MS | 74% | Mean (at onset): 29.9 | 100% RRMS (McDonald 2005) | ND/unclear | Recruitment: 0%, follow‐up: 100% | First to recorded EDSS median (IQR): 2 (1.5 to 3.5) |
| Misicka 2020 10 years | Conversion to progressive MS | 78% | Median (at MS onset): 32 | 100% RRMS (McDonald 2005; McDonald 2010) | Median: 11, IQR: 5 to 19 | Recruitment: 0%, follow‐up: ND/unclear | Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1% |
| Misicka 2020 20 years | Conversion to progressive MS | 78% | Median (at MS onset): 32 | 100% RRMS (McDonald 2005; McDonald 2010) | Median: 11, IQR: 5 to 19 | Recruitment: 0%, follow‐up: ND/unclear | Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1% |
| Misicka 2020 ever | Conversion to progressive MS | 78% | Median (at onset): 32 | 100% RRMS (McDonald 2005; McDonald 2010) | Median: 11, IQR: 5 to 19 | Recruitment: 0%, follow‐up: ND/unclear | Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1% |
| Pinto 2020 SP | Conversion to progressive MS | 73% | Mean (at onset): 31.1 | 100% RRMS (McDonald (undefined)) | ND/unclear | Recruitment: ND/unclear, follow‐up: ND/unclear | ND |
| Pisani 2021 | Conversion to progressive MS | 58% | Mean: 33.5 | 100% RRMS (McDonald 2005) | ND/unclear | Recruitment: 100%, follow‐up: 100% | EDSS median (range): 1.5 (0 to 3.5) |
| Seccia 2020 180 days | Conversion to progressive MS | 70% | Mean (at onset): 29 | 100% RRMS (latest criteria at time of diagnosis) | Mean: 19 | Recruitment: ND/unclear, follow‐up: 73% | ND |
| Seccia 2020 360 days | Conversion to progressive MS | 70% | Mean (at onset): 29 | 100% RRMS (latest criteria at time of diagnosis) | Mean: 19 | Recruitment: ND/unclear, follow‐up: 73% | ND |
| Seccia 2020 720 days | Conversion to progressive MS | 70% | Mean (at onset): 29 | 100% RRMS (latest criteria at time of diagnosis) | Mean: 19 | Recruitment: ND/unclear, follow‐up: 73% | ND |
| Skoog 2014 Dev | Conversion to progressive MS | 65% | Mean: 33.5 | 100% RRMS (Poser 1983) | Median: 2 | Recruitment: 0%, follow‐up: 0% | ND |
| Skoog 2019 Ext Val | Conversion to progressive MS | 76% | Mean (at CDMS onset (2nd attack)): 33 | 100% RRMS (Poser 1983) | ND/unclear | Recruitment: 0%, follow‐up: unclear, few patients received first generation DMT (IFN‐beta or glatiramer acetate), 99 out of 1762 patient years | ND |
| Skoog 2019 Val | Conversion to progressive MS | 65% | Mean (at CDMS onset (2nd attack)): 33 | 100% RRMS (Poser 1983) | Median: 2 | Recruitment: 0%, follow‐up: 0% | ND |
| Tacchella 2018 180 days | Conversion to progressive MS | ND | ND/unclear | 100% RRMS (McDonald 2017) | ND/unclear | Recruitment: ND/unclear, follow‐up: 89% | ND |
| Tacchella 2018 360 days | Conversion to progressive MS | ND | ND/unclear | 100% RRMS (McDonald 2017) | ND/unclear | Recruitment: ND/unclear, follow‐up: 89% | ND |
| Tacchella 2018 720 days | Conversion to progressive MS | ND | ND/unclear | 100% RRMS (McDonald 2017) | ND/unclear | Recruitment: ND/unclear, follow‐up: 89% | ND |
| Vasconcelos 2020 Dev | Conversion to progressive MS | 76% | Mean (at onset): 28.7 | 100% RRMS (Poser 1983; McDonald 2001) | Mean: 16, SD: 9.42 | Recruitment: ND/unclear, follow‐up: 58% | Patients with more than one relapse at first year of disease: 74% |
| Vasconcelos 2020 Ext Val | Conversion to progressive MS | 78% | Mean (at onset): 28.5 | 100% RRMS (Poser 1983; McDonald 2001) | Mean: 13.22, SD: 9.72 | Recruitment: ND/unclear, follow‐up: 77% | Patients with more than one relapse at first year of disease: 74% |
| Ahuja 2021 Dev | Composite | 74% | Median (unclear when): 43.3 | Unclear: approximately 70% to 80% RRMS, approximately 10% PPMS, 10% to 20% SPMS (ND) | Median: 5.12, IQR: 2.03 | Recruitment: ND/unclear, follow‐up: 55% | ND |
| Ahuja 2021 Ext Val | Composite | 74% | Median (unclear when): 43.3 | Unclear: approximately 70% to 80% RRMS, approximately 10% PPMS, 10% to 20% SPMS (ND) | Median: 4.37, IQR: 2.82 | Recruitment: ND/unclear, follow‐up: 55% | ND |
| de Groot 2009 cognitive | Composite | 64% | Mean: 37.4 | 82% relapse onset, 18% non‐relapse onset (Poser 1983) | Max: 0.5 | Recruitment: 6%, follow‐up: 30% | EDSS median (IQR): 2.5 (2.0 to 3.0) |
| Kosa 2022 | Composite | 54% | Mean: 49.6 | 30.8% RRMS, 24.2% SPMS, 44.9% PPMS (McDonald 2010; McDonald 2017) | Mean: 12.2, pooled SD: 8.51 | Recruitment: 0%, follow‐up: ND/unclear | EDSS mean (SD): development set; RRMS 1.8 (1.2), SPMS 5.9 (1.2), PPMS 5.3 (1.6)/validation set RRMS 2.2 (1.6), SPMS 5.5 (1.5), PPMS 5.2 (1.6) |
| Pellegrini 2019 | Composite | 71% | Mean: 37.1 | 100% RRMS (McDonald 2001; McDonald 2005) | Mean: 7.5, SD: 6.5 | Recruitment: 34%, follow‐up: 0% | EDSS mean (SD): 2.5 (1.2), number of relapses 1 year prior to study entry mean (SD): 1.4 (0.7) |
Ada: adaptive boosting BREMS: Bayesian Risk Estimate for Multiple Sclerosis BREMSO: Bayesian Risk Estimate for Multiple Sclerosis Onset CDMS: clinically definite multiple sclerosis CI: confidence interval CIS: clinically isolated syndrome conv‐: did not convert to CDMS conv+: converted to CDMS Dev: development DMT: disease‐modifying treatment DSS: disability status scale DT: decision tree EDSS: Expanded Disability Status Scale Ext: external FLP: first level predictor FTP: fine tuning predictor IFN: interferon IMD: immunomodulatory drug IQR: interquartile range LGBM: light gradient boosting machine Max: maximum Min: minimum MS: multiple sclerosis ND: no data available ON: optic neuritis PPMS: primary progressive MS PRMS: progressive‐relapsing MS RF: random forest RRMS: relapsing‐remitting MS SD: standard deviation SP: secondary progressive SPMS: secondary progressive MS Val: validation
4. Number of predictors.
| Model | Outcome | Number considered | Number included | Timing |
| Agosta 2006 | Disability | 26 | 3 (2 or 3 (unclear if follow‐up duration included)) | At study baseline (cohort entry at least 2 years after diagnosis of definite MS or 3 months after CIS), 12 months (± 10 days) after baseline, at final follow‐up (outcome measurement) |
| Bejarano 2011 | Disability | 23 (22 or 23 (unclear transformation)) | 5 | At study baseline (cohort entry) |
| De Brouwer 2021 | Disability | 24 + EDSS trajectories | 19 predictors + EDSS trajectories | At multiple visits, at least 6 in 3‐year period |
| de Groot 2009 dexterity | Disability | 5 | 5 | At disease onset (definite MS) (study baseline within 6 months after diagnosis) |
| de Groot 2009 walking | Disability | 5 | 3 | At disease onset (definite MS) (study baseline within 6 months after diagnosis) |
| Kuceyeski 2018 | Disability | 965 | 703 (703 predictors transformed to 6 principal components) | At baseline image (early RRMS within 5 years of their first neurologic symptom), at follow‐up (unclear: outcome measurement) |
| Law 2019 Ada | Disability | 9 | 9 | At study baseline (RCT) |
| Law 2019 DT | Disability | 9 | 9 | At study baseline (RCT) |
| Law 2019 RF | Disability | 9 | 9 | At study baseline (RCT) |
| Lejeune 2021 | Disability | 19 (≤ 19 and ≥ 14 (unclear transformations)) | 6 (7 df) | At study baseline (RCT, relapse) or retrospectively at screening |
| Malpas 2020 | Disability | 17 | 3 | At symptom onset, at visits up to 1 year following symptom onset, and at final follow‐up |
| Mandrioli 2008 | Disability | 15 | 4 | At disease onset (RRMS) |
| Margaritella 2012 | Disability | 8 (≥ 8 (unclear transformations)) | 6 | At multiple assessments consecutively for 3 years until 1 year prior to outcome |
| Montolio 2021 | Disability | 39 | 5 (4 of them longitudinal) | At 3 visits over 2 years (not defined baseline and annual visits 1 and 2) |
| Oprea 2020 | Disability | 6 | 6 | At a single time point during outcome determination |
| Pinto 2020 severity 10 years | Disability | 1306 | Unclear which predictors make up the final model | At multiple visits dependent on which 1‐year to 5‐year model |
| Pinto 2020 severity 6 years | Disability | 1306 | Unclear which predictors make up the final model | At multiple visits dependent on which 1‐year to 5‐year model |
| Roca 2020 | Disability | Unstructured data + 65 | Unstructured data + 65 | At FLAIR imaging (initial in the dataset) |
| Rocca 2017 | Disability | 26 | 5 | At study baseline (cohort entry), at median 15 months after baseline, and at median 56 months (called 5 years) after baseline |
| Rovaris 2006 | Disability | 25 | 3 (2 or 3 (unclear if follow‐up time included)) | At study baseline (cohort entry), at 15 months post‐baseline (follow‐up), at final follow‐up (outcome measurement) |
| Sombekke 2010 | Disability | 72 | 9 (13 df) | At baseline (already available or retrospectively collected) |
| Szilasiova 2020 | Disability | 11 | 6 (7 df) | At study baseline (cohort entry) |
| Tommasin 2021 | Disability | 16 | 4 | At assessment (not defined), at follow‐up |
| Tousignant 2019 | Disability | Unstructured data | Unstructured data | At imaging |
| Weinshenker 1991 M3 | Disability | 13 (≥ 13 (unclear if complete list)) | 7 | At assessment (not defined), at follow‐up |
| Weinshenker 1996 short‐term | Disability | 5 (≥ 5 (unclear if complete list)) | 5 | At assessment (not defined), at follow‐up (unclear: outcome measurement) |
| Yperman 2020 | Disability | 5893 | 9 (≤ 9 (unclear subset)) | At visit of interest |
| Zhao 2020 LGBM All | Disability | 198 | 198 | At multiple assessments every 6 months from baseline (undefined) to year 2 |
| Zhao 2020 LGBM Common | Disability | 105 (≤ 105 (unclear subset)) | 105 (≤ 105 (unclear subset)) | At multiple assessments every year from baseline (undefined) to year 2 |
| Zhao 2020 XGB All | Disability | 198 | 198 | At multiple assessments every 6 months from baseline (undefined) to year 2 |
| Zhao 2020 XGB Common | Disability | 105 (≤ 105 (unclear subset)) | 105 (≤ 105 (unclear subset)) | At multiple assessments every year from baseline (undefined) to year 2 |
| Gurevich 2009 FLP | Relapse | 10,602 | 10 (df unclear) | At study baseline (cohort entry) |
| Gurevich 2009 FTP | Relapse | 10,602 | 9 (df unclear) | At study baseline (cohort entry) |
| Sormani 2007 | Relapse | 12 (≥ 12 (unclear transformations)) | 2 | At study baseline (RCT, entry at least 1 year after disease onset) |
| Vukusic 2004 | Relapse | 11 | 3 | At study baseline (cohort entry during pregnancy week 4 to 36), at examinations at 20, 28, 36 weeks of gestation, and also post‐partum |
| Ye 2020 gene signature | Relapse | 202 | 5 | At study baseline (cohort entry) |
| Ye 2020 nomogram | Relapse | 206 | 5 | At study baseline (cohort entry) |
| Aghdam 2021 | Conversion to definite MS | 10 (≥ 10 (unclear transformations)) | 4 | At presentation due to ON |
| Bendfeldt 2019 linear placebo | Conversion to definite MS | Number of voxels in the cortical GM mask | NA | At disease onset (CIS) (RCT baseline within 60 days after onset) |
| Bendfeldt 2019 M7 placebo | Conversion to definite MS | 301 | 25 (df unclear (reported predictors do not add up to 25)) | At disease onset (CIS) (RCT baseline within 60 days after onset) |
| Bendfeldt 2019 M9 IFN | Conversion to definite MS | 301 | 15 (df unclear) | At disease onset (CIS) (RCT baseline within 60 days after onset) |
| Borras 2016 | Conversion to definite MS | 32 (≤ 32 and ≥ 17 (discrepant lists)) | 2 | At disease onset (CIS) reported as 'first relapse' (4 to 126 days between CIS and lumbar puncture) |
| Gout 2011 | Conversion to definite MS | 15 (≥ 15 (unclear how many interactions tested)) | 3 | At disease onset (CIS) leading to admission |
| Martinelli 2017 | Conversion to definite MS | 36 (≥ 24 and ≤ 36 (unclear adjustments and transformations)) | 7 (5 or 7 (unclear adjustment)) | At disease onset (CIS) and up to 3 months after disease onset |
| Olesen 2019 candidate | Conversion to definite MS | 14 | 3 | At disease onset (ON), from ON onset median (range): 14 days (2 days 38 days) |
| Olesen 2019 routine | Conversion to definite MS | 4 | 3 | At disease onset (ON), from ON onset median (range): 14 days (2 days 38 days) |
| Runia 2014 | Conversion to definite MS | 21 (≥ 16 or 21 (unclear transformations)) | 5 | At disease onset (CIS) (at study baseline within 6 months after onset) |
| Spelman 2017 | Conversion to definite MS | 16 (≥ 16 (unclear how many interactions tested)) | 7 (11 df) | At disease onset (CIS) (up to 12 months after disease onset) |
| Wottschel 2015 1 year | Conversion to definite MS | 14 | 3 (df unclear) | At disease onset (CIS) and up to a mean of 6.15 weeks (SD 3.4) after disease onset |
| Wottschel 2015 3 years | Conversion to definite MS | 14 | 6 (df unclear) | At disease onset (CIS) and up to a mean of 6.15 weeks (SD 3.4) after disease onset |
| Wottschel 2019 | Conversion to definite MS | 214 | 36 ((for 2‐fold CV)) | At disease onset (CIS) and up to 14 weeks after disease onset |
| Yoo 2019 | Conversion to definite MS | Unstructured data + 11(user‐defined) | Unstructured data + 11 | At disease onset (CIS) (RCT baseline within 180 days after disease onset) |
| Zakharov 2013 | Conversion to definite MS | 2 (≥ 2 (unclear if complete list)) | 2 | Unclear, at first MRI after CIS onset (timing distribution unknown) |
| Zhang 2019 | Conversion to definite MS | 30 | 18 | At disease onset (CIS) (during primary clinical work‐up for CIS) |
| Bergamaschi 2001 BREMS | Conversion to progressive MS | 9 (> 9 (unclear if complete list)) | 9 | At disease onset (RRMS) and regular visits up to 1 year after onset (baseline) |
| Brichetto 2020 | Conversion to progressive MS | 143 | 33 | Unclear, at multiple assessments every 4 months |
| Calabrese 2013 | Conversion to progressive MS | 16 (≥ 16 (unclear df of initial symptoms)) | 3 | At study baseline (cohort entry at least 5 years after disease onset) |
| Manouchehrinia 2019 | Conversion to progressive MS | 20 | 5 (6 df) | From disease onset (RRMS) to first EDSS recorded (median 2 years) |
| Misicka 2020 10 years | Conversion to progressive MS | 35 | 6 (7 df) | At study interview (the same as the time of outcome reporting) |
| Misicka 2020 20 years | Conversion to progressive MS | 35 | 6 (7 df) | At study interview (the same as the time of outcome reporting) |
| Misicka 2020 ever | Conversion to progressive MS | 35 | 6 (7 df) | At study interview (the same as the time of outcome reporting) |
| Pinto 2020 SP | Conversion to progressive MS | 1306 | Unclear which predictors make up the final model | At multiple visits dependent on which 1‐year to 5‐year model |
| Pisani 2021 | Conversion to progressive MS | 13 (12 or 13 (unclear adjustment)) | 7 | At diagnosis (RRMS) and up to 2 years after diagnosis |
| Seccia 2020 180 days | Conversion to progressive MS | 21 predictor trajectories | 18 predictor trajectories | At multiple visits comprising patient history to the current visit of interest |
| Seccia 2020 360 days | Conversion to progressive MS | 21 predictor trajectories | 18 predictor trajectories | At multiple visits comprising patient history to the current visit of interest |
| Seccia 2020 720 days | Conversion to progressive MS | 21 predictor trajectories | 18 predictor trajectories | At multiple visits comprising patient history to the current visit of interest |
| Skoog 2014 | Conversion to progressive MS | 15 (≥ 15 (unclear transformations)) | 3 (4 df) | At last relapse, at time of prognostication |
| Tacchella 2018 180 days | Conversion to progressive MS | 46 | 46 | At visit of interest |
| Tacchella 2018 360 days | Conversion to progressive MS | 46 | 46 | At visit of interest |
| Tacchella 2018 720 days | Conversion to progressive MS | 46 | 46 | At visit of interest |
| Vasconcelos 2020 | Conversion to progressive MS | 8 | 5 | At multiple visits (unclear if CIS or RR onset) to at least 2 years post‐onset |
| Ahuja 2021 | Composite | 2730 | 114 (model 1: 111, model 2: 3) | From 1 year ago to the index encounter (unspecified) |
| de Groot 2009 cognitive | Composite | 5 | 4 | At disease onset (definite MS) (study baseline within 6 months after diagnosis) |
| Kosa 2022 | Composite | 852,167 (852,167 or 852,165 (unclear adjustment for age and sex)) | 23 (23 or 21 (unclear if age and sex are predictors)) | At lumbar puncture |
| Pellegrini 2019 | Composite | 23 | 3 | At study baseline (RCT) |
Ada: adaptive boosting BREMS: Bayesian Risk Estimate for Multiple Sclerosis CIS: clinically isolated syndrome CV: cross‐validation df: degrees of freedom DT: decision tree EDSS: Expanded Disability Status Scale FLAIR: fluid‐attenuated inversion recovery FLP: first level predictor FTP: fine tuning predictor IFN: interferon LGBM: light gradient boosting machine MRI: magnetic resonance imaging MS: multiple sclerosis NA: not applicable ND: no data available ON: optic neuritis RCT: randomised controlled trial RF: random forest RR: relapsing‐remitting RRMS: relapsing‐remitting MS SD: standard deviation SP: secondary progressive XGB: extreme gradient boosting
5. Development and performance details.
| Analysis | Outcome | Algorithm | Sample size (number of events) | EPV | Evaluation details | Number of external validations | Calibration | Discrimination | Classification |
| Agosta 2006 | Disability (EDSS) | Logistic regression | 70 (44) | 1 | Cross‐validation | 0 | ND | ND | Accuracy = 46/70, sensitivity = 30/41, specificity = 16/29 |
| Bejarano 2011 | Disability (EDSS) | Neural network | 51 (NA) | 2 | Cross‐validation | 1 external refit | ND | AUC computed for continuous outcome | Unclear how classification measures produced for numeric outcome, accuracy = 0.80 (SD 0.14), sensitivity = 0.92, specificity = 0.61, PPV = 0.80, NPV = 0.80 |
| Bejarano 2011 Val | Disability (EDSS) | NA | 96 (NA) | 4 | Validation; location | NA | ND | NA | Unclear how classification measures produced for numeric outcome, accuracy = 0.81 |
| Bergamaschi 2015 BREMSO MSSS Val | Disability (MSSS) | NA | 14,211 (3567) | NA | Validation; multiple (location, time); predictors dropped and different outcome | NA | ND | ND | Sensitivity = 0.36, specificity = 0.79 |
| De Brouwer 2021 | Disability (EDSS) | Neural network | 6682 (1114) | 46 | Cross‐validation | 0 | Calibration plot upon request | 0.66 (0.64 to 0.68)B | ND |
| de Groot 2009 dexterity | Disability (9HPT) | Logistic regression | 146 (46) | 9 | Bootstrap | 0 | Calibration plot, calibration slope 0.85 | 0.77 (0.69 to 0.86) | ND |
| de Groot 2009 walking | Disability (EDSS) | Logistic regression | 146 (37) | 7 | Bootstrap | 0 | Calibration plot, calibration slope 0.93 | 0.89 (0.83 to 0.95) | ND |
| Kuceyeski 2018 | Disability (cognitive ‐ SDMT) | Partial least squares regression | 60 (NA) | 10 | Unclear | 0 | Calibration plot | NA | NA |
| Law 2019 Ada | Disability (EDSS) | Boosting | 485 (115) | 13 | Cross‐validation | 0 | ND | 0.6 (0.54 to 0.66)B | Cutoff (0.527) identified by convex hull method, sensitivity = 53.0 (SD 4.7), specificity = 62.4 (SD 2.5), PPV = 30.5 (SD 1.6), NPV = 81.1 (1.9) |
| Law 2019 DT | Disability (EDSS) | Classification tree | 485 (115) | 13 | Cross‐validation | 0 | ND | 0.62 (0.56 to 0.68)B | Cutoff (0.537) identified by convex hull method, sensitivity = 58.3 (SD 4.6), specificity = 62.2 (SD 2.5), PPV = 32.4 (SD 2.0), NPV = 82.7 (SD 1.8) |
| Law 2019 RF | Disability (EDSS) | Random forest | 485 (115) | 13 | Cross‐validation | 0 | ND | 0.61 (0.55 to 0.67)B | Cutoff (0.531) identified by convex hull method, sensitivity = 59.1 (SD 4.6), specificity = 61.1 (SD 2.5), PPV = 32.1 (SD 2.1), NPV = 82.8 (SD 1.7) |
| Lejeune 2021 | Disability (EDSS) | Penalised regression | 186 (53) | 4 | Bootstrap | 1 | ND | 0.82 (0.73 to 0.91) | Cutoff = 0.5, PPV 0.73 (95% CI 0.53 to 0.92), NPV 0.70 (95% CI 0.50 to 0.88) |
| Lejeune 2021 Ext Val | Disability (EDSS) | NA | 175 (55) | NA | External validation; multiple (location, spectrum) | NA | Calibration plot, Hosmer‐Lemeshow test | 0.71 (0.62 to 0.80) | Cutoff = 0.5, PPV 0.83 (95% CI 0.76 to 0.92), NPV 0.74 (95% CI 0.67 to 0.81) |
| Malpas 2020 | Disability (EDSS) | Bayesian model averaging | 2403 (145) | 8 | Apparent | 1 | ND | 0.8 (0.75 to 0.84) | Full model: cutoff = 0.05, sensitivity = 0.78, specificity = 0.71, PPV = 0.15, NPV = 0.98. Reduced model: cutoff = 0.06, sensitivity = 0.72, specificity = 0.73, PPV = 0.15, NPV = 0.98 |
| Malpas 2020 Ext Val | Disability (EDSS) | NA | 556 (34) | NA | External validation; location | NA | ND | 0.75 (0.66 to 0.84) | Cutoff determined in development set (0.06), PPV = 0.15, NPV = 0.97 |
| Mandrioli 2008 | Disability (EDSS) | Logistic regression | 64 (26) | 2 | Apparent | 1 | ND | ND | Error = 0.0937, sensitivity = 0.8846, specificity = 0.9211, PPV = 0.8846, NPV = 0.9211 |
| Mandrioli 2008 Ext Val | Disability (EDSS) | NA | 65 (20) | NA | External validation; time | NA | ND | ND | Error = 0.1231, sensitivity = 0.8000, specificity = 0.9111, PPV = 0.8000, NPV = 0.9111 |
| Margaritella 2012 | Disability (EDSS) | Other regression | 58 (NA) | 22 | Apparent | 0 | Histogram of differences between measured and predicted values | NA | Percent predictions within ± 0.5 of observed = 0.72 |
| Montolio 2021 | Disability (EDSS) | Neural network | 82 (37) | 1 | Cross‐validation | 0 | ND | 0.82 (0.72 to 0.92)B | Accuracy = 0.817, sensitivity = 0.811, specificity = 0.822, PPV = 0.789 |
| Oprea 2020 | Disability (EDSS) | Logistic regression | 151 (ND) | 13 | Cross‐validation | 0 | ND | 0.82 NA | Accuracy = 0.7662, sensitivity = 0.7775, PPV = 0.8145, F1 = 0.7806 |
| Pinto 2020 severity 10 years | Disability (EDSS) | Support vector machine | 67 (30) | < 1 | Cross‐validation | 0 | ND | 0.85 (0.75 to 0.95)B | Sensitivity = 0.77 (0.13), specificity = 0.79 (0.09), F1 score = 0.72 (0.09), geometric mean = 0.78 (0.08) |
| Pinto 2020 severity 6 years | Disability (EDSS) | Support vector machine | 145 (38) | < 1 | Cross‐validation | 0 | ND | 0.89 (0.83 to 0.95)B | Sensitivity = 0.84 (0.11), specificity = 0.81 (0.05), F1 score = 0.53 (0.07), geometric mean = 0.82 (0.06) |
| Roca 2020 | Disability (EDSS) | ML combination | 1427 (NA) | 22A | Random split | 0 | Other: plot of MSE per EDSS category, MSE 2.21 (validation), 3 (test) | NA | NA |
| Rocca 2017 | Disability (EDSS) | Other regression | 49 (NA) | 2 | Cross‐validation | 0 | ND | NA | EDSS change precision within one point = 0.776 |
| Rovaris 2006 | Disability (EDSS) | Logistic regression | 52 (35) | 1 | Cross‐validation | 0 | ND | ND | Accuracy = 0.808, sensitivity = 31/35 = 0.89, specificity = 11/17 = 0.65 |
| Sombekke 2010 | Disability (MSSS) | Logistic regression | 605 (86) | 1 | Unclear | 0 | Hosmer‐Lemeshow test | 0.78 (0.75 to 0.84) | Sensitivity = 0.37, specificity = 0.953, LR+ = 7.9 |
| Szilasiova 2020 | Disability (EDSS) | Logistic regression | 85 (ND) | 4 | Apparent | 0 | ND | Unclear due to mismatch between ROC curve and reported statistics: 0.94 (95% CI 0.89 to 0.98) | Unclear because these points do not correspond to point on plot, sensitivity = 0.94, specificity = 0.89 |
| Tommasin 2021 | Disability (EDSS) | Random forest | 163 (58) | 4 | Cross‐validation | 0 | ND | 0.92 (0.88 to 0.96)B | Accuracy = 0.92, sensitivity = 0.92, specificity = 0.91 |
| Tousignant 2019 | Disability (EDSS) | Neural network | 1083 (103) | NA | Cross‐validation | 0 | ND | 0.7 (0.64 to 0.76)B | ND |
| Weinshenker 1991 M3 | Disability (DSS) | Survival analysis | 1060 (498) | 38 | None | 1 | ND | ND | ND |
| Weinshenker 1996 M3 Ext Val | Disability (DSS) | NA | 259 (66) | NA | External validation; location | NA | ND | ND | ND |
| Weinshenker 1996 short‐term | Disability (EDSS) | Logistic regression | 174 (28) | 9 | Apparent | 0 | ND | ND | Cutoff = 0.5: accuracy = 0.75, sensitivity = 0.21, specificity = 0.93, for 0.3 cutoff = 0.3: accuracy = 0.67, sensitivity = 0.54, specificity = 0.72 |
| Yperman 2020 | Disability (EDSS) | Random forest | 2502 (275) | < 1 | Cross‐validation | 0 | Calibration plot upon request | 0.75 (0.71 to 0.79)B | ND |
| Zhao 2020 LGBM All | Disability (EDSS) | Boosting | 724 (165) | 1 | Cross‐validation | 0 | ND | 0.78 (0.74 to 0.82)B | Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.77, sensitivity = 0.58, specificity = 0.82 |
| Zhao 2020 LGBM Common | Disability (EDSS) | Boosting | 724 (165) | 2 | Cross‐validation | 1 (unclear if refit) | ND | 0.76 (0.72 to 0.8)B | Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.64, sensitivity = 0.75, specificity = 0.61 |
| Zhao 2020 LGBM Common Val | Disability (EDSS) | NA | 400 (130) | NA | Validation; location | NA | ND | 0.82 (0.78 to 0.86)B | Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.73, sensitivity = 0.73, specificity = 0.73 |
| Zhao 2020 XGB All | Disability (EDSS) | Boosting | 724 (165) | 1 | Cross‐validation | 0 | ND | 0.78 (0.74 to 0.82)B | Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.74, sensitivity = 0.68, specificity = 0.76 |
| Zhao 2020 XGB Common | Disability (EDSS) | Boosting | 724 (165) | 2 | Cross‐validation | 1 (unclear if refit) | ND | 0.76 (0.72 to 0.8)B | Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.65, sensitivity = 0.75, specificity = 0.62 |
| Zhao 2020 XGB Common Val | Disability (EDSS) | NA | 400 (130) | NA | Validation; location | NA | ND | 0.82 (0.78 to 0.86)B | Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.68, sensitivity = 0.85, specificity = 0.60 |
| Gurevich 2009 FLP | Relapse | Support vector machine | 94 (19) | < 1 | Cross‐validation | 1 | ND | ND | Categories determined in data, error = 0.079 |
| Gurevich 2009 FLP Ext Val | Relapse | NA | 10 (ND) | NA | External validation; ND | NA | ND | ND | Error = 0.25 (but 10 patients reported) |
| Gurevich 2009 FTP | Relapse | Other regression | 40 (NA) | < 1 | Cross‐validation | 0 | Calibration plot | NA | Prediction more than 50 days from observed = 0.345 |
| Sormani 2007 | Relapse | Survival analysis | 539 (270) | 22 | Apparent | 1 | ND | ND | ND |
| Sormani 2007 Ext Val | Relapse | NA | 117 (ND) | NA | External validation; spectrum | NA | ND | ND | ND |
| Vukusic 2004 | Relapse | Logistic regression | 223 (63) | 6 | Apparent | 0 | ND | 0.72 (0.64 to 0.8)B | Cutoff = 0.5, accuracy = 160/223 = 0.72 |
| Ye 2020 gene signature | Relapse | Penalised regression | 94 (64) | < 1 | Random split | 0 | ND | 0.73 (0.61 to 0.85)B | ND |
| Ye 2020 nomogram | Relapse | Survival analysis | 94 (64) | < 1 | Random split | 0 | ND | 0.59 (0.47 to 0.71)B | ND |
| Aghdam 2021 | Conversion to definite MS | Classification tree | 277 (117) | 12 | Random split | 0 | ND | ND | Accuracy = 0.74, sensitivity = 0.71, specificity = 0.76, PPV = 0.65, NPV = 0.79 |
| Bendfeldt 2019 linear placebo | Conversion to definite MS | Support vector machine | 69 (25) | NA | Cross‐validation | 0 | ND | ND | Accuracy = 0.712 (95% CI 0.707 to 0.716), sensitivity = 0.64, specificity = 0.783 |
| Bendfeldt 2019 M7 placebo | Conversion to definite MS | Support vector machine | 61 (22) | < 1 | Cross‐validation | 0 | ND | ND | Balanced accuracy = 0.676 (95% CI 0.559 to 0.793) |
| Bendfeldt 2019 M9 IFN | Conversion to definite MS | Support vector machine | 99 (49) | < 1 | Cross‐validation | 0 | ND | ND | Balanced accuracy = 0.704 (95% CI 0.614 to 0.794) |
| Borras 2016 | Conversion to definite MS | Logistic regression | 49 (24) | 1 | Unclear | 0 | ND | 0.79 (0.65 to 0.93)B | Sensitivity = 0.84, specificity = 0.83 |
| Gout 2011 | Conversion to definite MS | Survival analysis | 208 (141) | 9 | Apparent | 0 | ND | ND | ND |
| Martinelli 2017 | Conversion to definite MS | Survival analysis | 243 (108) | 4 | Apparent | 0 | Other: Gronnesby and Borgan statistic | 0.7 (0.64 to 0.75) | Categories defined as low: 0% to 33.3%, moderate: 33.3% to 66.7%, high: 66.6% to 100%, net reclassification improvement = 0.3 |
| Olesen 2019 candidate | Conversion to definite MS | Logistic regression | 33 (16) | 1 | Bootstrap | 0 | Calibration plot, Hosmer‐Lemeshow test | 0.89 (0.77 to 1.00) | ND |
| Olesen 2019 routine | Conversion to definite MS | Logistic regression | 38 (16) | 4 | Bootstrap | 0 | Calibration plot, Hosmer‐Lemeshow test | 0.86 (0.74 to 0.98) | ND |
| Runia 2014 | Conversion to definite MS | Survival analysis | 431 (109) | 7 | Bootstrap | 0 | ND | 0.66 (0.6 to 0.72)B | ND |
| Spelman 2017 | Conversion to definite MS | Survival analysis | 3296 (1953) | 122 | Bootstrap | 0 | Calibration plot | 0.81 (0.79 to 0.83)B | ND |
| Wottschel 2015 1 year | Conversion to definite MS | Support vector machine | 74 (22) | 2 | Cross‐validation | 0 | ND | ND | Accuracy = 0.714 (95% CI 0.58 to 0.84), sensitivity = 0.77, specificity = 0.66, PPV = 0.70, NPV = 0.74 |
| Wottschel 2015 3 years | Conversion to definite MS | Support vector machine | 70 (31) | 2 | Cross‐validation | 0 | ND | ND | Accuracy = 0.68 (95% CI 0.61 to 0.73), sensitivity = 0.60, specificity = 0.76, PPV = 0.72, NPV = 0.65 |
| Wottschel 2019 | Conversion to definite MS | Support vector machine | 400 (91) | < 1 | Cross‐validation | 0 | ND | ND | 2 to fold CV: accuracy = 0.648 (95% CI 0.646 to 0.651), sensitivity = 0.641, specificity = 0.656, also reported for 5 to, 10 to fold, and LOOCV |
| Yoo 2019 | Conversion to definite MS | Neural network | 140 (80) | 7A | Cross‐validation | 0 | ND | 0.75 (0.67 to 0.83)B | Accuracy = 0.75 (SD = 0.113), sensitivity = 0.787 (SD = 0.122), specificity = 0.704 (SD = 0.154) |
| Zakharov 2013 | Conversion to definite MS | Logistic regression | 102 (23) | 12 | Apparent | 0 | ND | ND | Sensitivity = 0.727, specificity = 0.345 |
| Zhang 2019 | Conversion to definite MS | Random forest | 84 (66) | 1 | Cross‐validation | 0 | ND | ND | Accuracy = 0.85 (95% CI 0.75 to 0.91), sensitivity = 0.94 (95% CI 0.85 to 0.98), specificity = 0.50 (95% CI 0.26 to 0.74), PPV = 0.87 (95% CI 0.81 to 0.91), NPV = 0.69 (95% CI 0.44 to 0.87), DOR = 15.50 (95% CI 3.93 to 60.98), balanced Accuracy = 0.72 (posterior probability interval 0.60 to 0.82) |
| Bergamaschi 2001 BREMS | Conversion to progressive MS | Survival analysis | 186 (34) | 4 | None | 2, simplified: 2 | ND | ND | ND |
| Bergamaschi 2007 BREMS Ext Val | Conversion to progressive MS | NA | 535 (87) | NA | External validation; location | NA | ND | ND | Cutoff at 95th percentile (score ≥ 2.0): sensitivity = 0.17, specificity = 0.99, PPV = 0.86, NPV = 0.83, cutoff at 5th percentile (score ≤ 0.63): sensitivity = 0.08, specificity = 1.00, PPV = 1.00, NPV = 0.18 (the event is defined as having secondary progression for 95th percentile cutoff but not having secondary progression for other) |
| Bergamaschi 2015 BREMS Ext Val | Conversion to progressive MS | NA | 1131 (ND) | NA | External validation; multiple (location, time) | NA | ND | ND | Sensitivity = 0.35, specificity = 0.80 |
| Bergamaschi 2015 BREMSO SP Val | Conversion to progressive MS | NA | 14,211 (1954) | NA | Validation; multiple (location, time); predictors dropped | NA | ND | ND | Sensitivity = 0.28, specificity = 0.76 |
| Brichetto 2020 | Conversion to progressive MS | ML combination | 810 (1451) | 10 | Unclear | 0 | ND | ND | Accuracy FCA = 0.826, CCA = 0.860 |
| Calabrese 2013 | Conversion to progressive MS | Logistic regression | 334 (66) | 4 | Cross‐validation | 1 | ND | ND | Accuracy = 0.928, sensitivity = 0.878, specificity = 0.94 |
| Calabrese 2013 Ext Val | Conversion to progressive MS | NA | 83 (19) | NA | External validation; time | NA | ND | ND | Accuracy = 0.916, sensitivity = 0.842, specificity = 0.937 |
| Manouchehrinia 2019 | Conversion to progressive MS | Survival analysis | 8825 (1488) | 74 | Bootstrap | 3 | Calibration plot | 0.84 (0.83 to 0.85) | ND |
| Manouchehrinia 2019 Ext Val 1 | Conversion to progressive MS | NA | 3967 (888) | NA | External validation; multiple (location, time, spectrum) | NA | ND | 0.77 (0.76 to 0.78) | ND |
| Manouchehrinia 2019 Ext Val 2 | Conversion to progressive MS | NA | 175 (26) | NA | External validation; multiple (location, time, spectrum) | NA | ND | 0.77 (0.70 to 0.85) | ND |
| Manouchehrinia 2019 Ext Val 3 | Conversion to progressive MS | NA | 2355 (126) | NA | External validation; multiple (location, time, spectrum) | NA | ND | 0.87 (0.84 to 0.89) | ND |
| Misicka 2020 10 years | Conversion to progressive MS | Survival analysis | 1166 (55) | 2 | Apparent | 0 | ND | ND | ND |
| Misicka 2020 20 years | Conversion to progressive MS | Survival analysis | 1166 (128) | 4 | Apparent | 0 | ND | ND | ND |
| Misicka 2020 ever | Conversion to progressive MS | Survival analysis | 1166 (177) | 5 | Apparent | 0 | ND | ND | ND |
| Pinto 2020 SP | Conversion to progressive MS | Support vector machine | 187 (21) | < 1 | Cross‐validation | 0 | ND | 0.86 (0.78 to 0.94)B | Sensitivity = 0.76 (0.14), specificity = 0.77 (0.05), F1 score = 0.20 (0.05), geometric mean = 0.76 (0.08) |
| Pisani 2021 | Conversion to progressive MS | Random survival forest | 262 (69) | 5 | Random split | 0 | ND | Reported for RF, not final model | Cutoff = 17.7, accuracy = 0.88 (95% CI 0.75 to 0.96), sensitivity = 0.92 (95% CI 0.70 to 1.00), specificity = 0.87 (95% CI 0.70 to 0.96), PPV = 0.75 (95% CI 0.48 to 0.93), NPV = 0.96 (95% CI 0.81 to 1.00) from evaluation of final tool using random split |
| Seccia 2020 180 days | Conversion to progressive MS | Neural network | 1515 (207) | 10 | Random split | 0 | ND | ND | Cutoff = 0.5, accuracy = 0.98, sensitivity = 0.385, specificity = 0.988, PPV = 0.308 |
| Seccia 2020 360 days | Conversion to progressive MS | Neural network | 1449 (207) | 10 | Random split | 0 | ND | ND | Cutoff = 0.5, accuracy = 0.975, sensitivity = 0.50, specificity = 0.982, PPV = 0.295 |
| Seccia 2020 720 days | Conversion to progressive MS | Neural network | 1375 (207) | 10 | Random split | 0 | ND | ND | Cutoff = 0.5, accuracy = 0.98, sensitivity = 0.673, specificity = 0.985, PPV = 0.427 |
| Skoog 2014 | Conversion to progressive MS | Survival analysis | 157 (118) | 8 | Apparent | 1 | O:E table | ND | ND |
| Skoog 2019 Ext Val | Conversion to progressive MS | NA | 145 (54) | NA | External validation; multiple (location, time) | NA | Calibration plot, O:E table, O:E 0.599 | ND | ND |
| Skoog 2019 Val | Conversion to progressive MS | NA | 144 (100) | NA | Apparent validation in new publication, some participants from the development excluded | NA | Calibration plot, O:E table, O:E 0.829 | ND | ND |
| Tacchella 2018 180 days | Conversion to progressive MS | Random forest | 527 (65) | 1 | Cross‐validation | 0 | ND | 0.71 (0.66 to 0.76) | ND |
| Tacchella 2018 360 days | Conversion to progressive MS | Random forest | 527 (125) | 3 | Cross‐validation | 0 | ND | 0.67 (0.62 to 0.71) | ND |
| Tacchella 2018 720 days | Conversion to progressive MS | Random forest | 527 (211) | 5 | Cross‐validation | 0 | ND | 0.68 (0.64 to 0.72) | ND |
| Vasconcelos 2020 | Conversion to progressive MS | Survival analysis | 287 (88) | 11 | apparent | 1 | Other: events per score level | ND | ND |
| Vasconcelos 2020 Ext Val | Conversion to progressive MS | NA | 142 (31) | NA | External validation; time | NA | O:E table (unclear), Hosmer‐Lemeshow test | ND | ND |
| Ahuja 2021 | Composite (relapse) | Penalised regression | 1435 (ND) | 1 | none | 1 | ND | ND | ND |
| Ahuja 2021 Ext Val | Composite (relapse) | NA | 186 (ND) | NA | External validation; spectrum | NA | Plots comparing observed and predicted relapse proportions stratified by disease duration and age separately | 0.71 (0.69 to 0.71) | Sensitivity = 0.499, specificity = 0.719, PPV = 0.223, NPV = 0.900, F1 = 0.307 |
| de Groot 2009 cognitive | Composite (cognitive tests) | Logistic regression | 146 (44) | 9 | Bootstrap | 0 | Calibration plot, calibration slope 0.88 | 0.74 (0.65 to 0.83) | ND |
| Kosa 2022 | Composite (EDSS, SNRS, T25FW, NDH‐9HPT) | Random forest | 227 (NA) | < 1 | Random split | 0 | Calibration plot | NA | NA |
| Pellegrini 2019 | Composite (EDSS, T25FW, 9HPT, PASAT, VFT) | Survival analysis | 1582 (434) | 19 | Bootstrap | 0 | Calibration slope 1 year: 1.10 (bootstrap = 1.08, SE 0.17), 2 years: 1.00 (bootstrap = 0.97, SE 0.15) | 0.59 (0.57 to 0.61)B | ND |
9HPT: 9‐hole peg test AUC: area under the curve Ada: adaptive boosting BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset CI: confidence interval Dev: development DSS: Disability Status Scale DT: decision tree EDSS: Expanded Disability Status Scale EPV: events per variable Ext: external FLP: first level predictor FTP: fine tuning predictor IFN: interferon LGBM: light gradient boosting machine LOOCV: leave‐one‐out cross‐validation MS: multiple sclerosis MSE: mean squared error MSSS: multiple sclerosis severity score NA: not applicable ND: no data available NDH‐9HPT: non‐dominant hand 9‐hole peg test NPV: negative predictive value O:E: observed to expected ratio PASAT: Paced Auditory Serial Addition Test PPV: positive predictive value RF: random forest ROC: receiver operator characteristics SD: standard deviation SE: standard error SP: secondary progressive T25FW: timed 25‐foot walk Val: validation SE: standard error VFT: visual function test XGB: extreme gradient boosting
AEvents per variable was computed using only tabular predictors, but non‐tabular predictors also considered. BConfidence interval was not reported, but computed based on reported information.
6. Final model and presentation.
| Model | Outcome | Definition | Timing | Predictors | Presentation |
| Agosta 2006 | Disability (EDSS) | Clinical worsening confirmed after a 3‐month, relapse‐free period (EDSS increase (for EDSS baseline) ≥ 1.0 (< 6.0), ≥ 0.5 (≥ 6.0)) | Follow‐up median: 8 years, mean: 7.7 years | Baseline GM histogram peak height, average lesion MTR percentage change after 12 months, follow‐up duration | Regression coefficients without the intercept or coefficient for 'adjustment for follow‐up duration' |
| Bejarano 2011 | Disability (EDSS) | Change in EDSS | 2 years | Age, worst central motor conduction time of both arms, worst central motor conduction time of both legs, at least 1 abnormal MEP, motor score of EDSS at baseline | ND |
| De Brouwer 2021 | Disability (EDSS) | Disability progression confirmed at least 6 months later (EDSS increase (baseline EDSS) ≥ 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5)) | 2 years | Gender, age at onset, MS course at time t = 0 (RRMS, SPMS, PPMS, or CIS), disease duration at time t = 0, EDSS at t = 0, last used DMT at t = 0 (none, interferons, natalizumab, fingolimod, teriflunomide, dimethyl‐fumarate, glatiramer, alemtuzumab, rituximab, cladribine, ocrelizumab, other (contains stem cells therapy, siponimod and daclizumab)), EDSS trajectories | ND |
| de Groot 2009 dexterity | Disability (9HPT) | Impaired dexterity (abnormal score (mean – 1.96 SD, healthy Dutch reference population) for the 9HPT) | 3 years | How well can you use your hands?, impairment of sensory tract, impairment of pyramidal tract, Impairment of cerebellar tract, T2‐weighted infratentorial lesion load | Score chart |
| de Groot 2009 walking | Disability (EDSS) | Inability to walk 500 m (EDSS ≥ 4) | 3 years | How well can you walk? Impairment of cerebellar tract, number of lesions in spinal cord | Score chart |
| Kuceyeski 2018 | Disability (cognitive ‐ SDMT) | Processing speed measured by Symbol Digits Modality Test | Mean (SD): 28.6 months (10.3 months) | Age, sex, disease duration, treatment duration, baseline SDMT, baseline EDSS, regional GM atrophy (86 regions), NEMO pairwise disconnection measures (610 considered), number of months between time points | ND |
| Law 2019 Ada | Disability (EDSS) | Confirmed disability progression sustained for 6 months: EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), ≥ 0.5 (≥ 6) | 2 years | T25FW, 9HPT, PASAT, EDSS, disease duration, age, sex, T2LV, BPF | ND |
| Law 2019 DT | Disability (EDSS) | Confirmed disability progression sustained for 6 months: EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), ≥ 0.5 (≥ 6) | 2 years | T25FW, 9HPT, PASAT, EDSS, disease duration, age, sex, T2LV, BPF | ND |
| Law 2019 RF | Disability (EDSS) | Confirmed disability progression sustained for 6 months: EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), ≥ 0.5 (≥ 6) | 2 years | T25FW, 9HPT, PASAT, EDSS, disease duration, age, sex, T2LV, BPF | ND |
| Lejeune 2021 | Disability (EDSS) | Residual disability after relapse (EDSS increase ≥ 1) | 6 months | Increased EDSS during relapse, pre‐relapse EDSS at 0, age, proprioceptive ataxia, subjective sensory disorder, disease duration | Web app at https://shiny.idbc. fr/SMILE/ |
| Malpas 2020 | Disability (EDSS) | Aggressive MS (EDSS ≥ 6 reached within 10 years of symptom onset, sustained over ≥ 6 months, and sustained until end of follow‐up) | 10 years, time from onset to aggressive disease mean (SD, range): 6.05 years (2.79 years, 0 years to 9.89 years) | Onset age, median EDSS in first year, pyramidal signs | Relative risk of aggressive disease by number of positive signs in simplified model (dichotomised based on individual optimal thresholds) |
| Mandrioli 2008 | Disability (EDSS) | Severe MS (EDSS ≥ 4 by 10 years disease duration, EDSS progression confirmed in 2 consecutive examinations) | Follow‐up from onset mean (SD): BMS 16.03 years (0.92 years), SMS 13.62 years (0.80 years), time between unclear | CSF IgM OB presence, motor symptoms at onset, sensory symptoms at onset, time to second relapse in months | Full regression model |
| Margaritella 2012 | Disability (EDSS) | EDSS | 1 year after included mEPS and EDSS predictors | EDSS, mEPS, age at onset, gender, benign course, PP course | Full regression model |
| Montolio 2021 | Disability (EDSS) | Worsening (EDSS increase ≥ 1) | 10 years (8 years) from baseline (last predictors) | Disease duration, relapse in preceding year, EDSS, temporal RNFL thickness, superior RNFL thickness | List of selected predictors |
| Oprea 2020 | Disability (EDSS) | Keeping EDSS score ≤ threshold (chosen threshold: 2.5) | ND | Gender, age at diagnosis, age, EDSS at onset, disease duration, number of treatments | ND |
| Pinto 2020 severity 10 years | Disability (EDSS) | Severe disease (EDSS > 3 based on the mean EDSS from all clinical visits in prediction horizon year) | 10 years from baseline (with predictors at 5 years from baseline) | ND | ND |
| Pinto 2020 severity 6 years | Disability (EDSS) | Severe disease (EDSS > 3 based on the mean EDSS from all clinical visits in prediction horizon year) | 6 years from baseline (with predictors at 2 years from baseline) | ND | ND |
| Roca 2020 | Disability (EDSS) | EDSS | 2 years from the initial imaging | Unstructured: FLAIR images, lesion masks from white matter hyperintensities segmentation from FLAIR images, structured: 60 tracts of interest from the ICBM‐DTI 81 white matter labels and sensorimotor tracts atlases in MNI space, whole‐brain lesion load, volume of the lateral ventricles, age, gender, 3D/2D nature of FLAIR sequence | ND |
| Rocca 2017 | Disability (EDSS) | EDSS change from baseline confirmed after 3 months | 15 years, median (IQR): 15.1 years (13.9 years to 15.4 years) | Baseline EDSS, 15‐month EDSS change, 15‐month new T1 hypointense lesions, percentage brain volume change, baseline grey matter mean diffusivity | Regression coefficients without the intercept |
| Rovaris 2006 | Disability (EDSS) | Clinical worsening confirmed after 3 months (EDSS increase (for baseline EDSS) ≥ 1.0 (< 6.0), ≥ 0.5 (≥ 6.0)) | Follow‐up median (range): 56.0 months (35 months to 63 months) | Baseline EDSS, grey matter mean diffusivity, follow‐up | Regression coefficients without intercept and follow‐up time |
| Sombekke 2010 | Disability (MSSS) | MSSS ≥ 2.5 | ND | Age at onset, male gender, progressive onset type, NOS2 level, PITPNCI level, IL2 level, CCL5 level, ILIRN level, PNMT level | Regression coefficients without intercept |
| Szilasiova 2020 | Disability (EDSS) | EDSS ≥ 5.0 | 15 years | Sex, age, MS form, EDSS, MS duration, P300 latency (ms) | Full regression model |
| Tommasin 2021 | Disability (EDSS) | Disability progression (EDSS increase (for baseline EDSS) ≥ 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5)) | Follow‐up mean (SD, range): 3.93 years (0.95 years, 2 years to 6 years) | T2 lesion load, cerebellar volume, thalamic volume, fractional anisotropy of normal appearing WM | List of predictors (model selected) |
| Tousignant 2019 | Disability (EDSS) | EDSS increase (for baseline EDSS) ≥ 1.5 (0), ≥ 1 (0.5 to 5.5), ≥ 0.5 (≥ 6) sustained for ≥ 12 weeks | 1 year | MRI channels: volumes from T1‐weighted pre‐contrast, T1‐weighted post‐contrast, T2w, proton density‐weighted, FLAIR; T2 lesion masks; Gadolinium enhanced lesion masks | List of predictors (no selection) |
| Weinshenker 1991 M3 | Disability (DSS) | Time to reach DSS 6 (EDSS 6.0 or 6.5) | Follow‐up for 12 years | Age at onset, seen at MS onset, motor (insidious), brainstem, cerebellar, cerebral, pyramidal | Full regression model |
| Weinshenker 1996 short‐term | Disability (EDSS) | Short‐term progression (change in EDSS) | Definition 1 year to 3 years, follow‐up summarised for 2 years | Duration, EDSS, progression index, predicted time to DSS 6 from model 1, follow‐up | Full regression model |
| Yperman 2020 | Disability (EDSS) | Disability progression (EDSS increase (for baseline EDSS) ≥ 1.0 (≤ 5.5), ≥ 0.5 (> 5.5)) | Baseline to outcome EDSS median (IQR): 1.98 years (1.84 years to 2.08 years) (similar for baseline MEP) | Selected predictors unclear, at least latencies, EDSS at T0, age | ND |
| Zhao 2020 LGBM All | Disability (EDSS) | Worsening: EDSS increase ≥ 1.5 | 5 years | Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 months, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time | ND |
| Zhao 2020 LGBM Common | Disability (EDSS) | Worsening: EDSS increase ≥ 1.5 | 5 years | Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time | ND |
| Zhao 2020 XGB All | Disability (EDSS) | Worsening: EDSS increase ≥ 1.5 | 5 years | Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time | ND |
| Zhao 2020 XGB Common | Disability (EDSS) | Worsening: EDSS increase ≥ 1.5 | 5 years | Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time | ND |
| Gurevich 2009 FLP | Relapse | Time from baseline gene expression analysis until next relapse (3 categories: < 500 days, 500 days to 1264 days, > 1264 days) | Unclear reporting | FLJ10201, PDCD2, IL24, MEFV, CA2, SLM1, CLCN4, SMARCA1, TRIM22, TGFB2 | List of selected genes |
| Gurevich 2009 FTP | Relapse | Time from baseline gene expression analysis to next acute relapse (given this time < 500 days) | Unclear reporting | KIAA1043, LOC51145, PPFIA1, MGC8685, DNCH2, PCOLCE2, FPRL1, G3BP, RHBG | List of selected genes |
| Sormani 2007 | Relapse | Time to first relapse (≥ 1 neurological symptoms (causing EDSS increase ≥ 0.5 or 1 grade in the score of 2 or more functional systems or 2 grades in 1 functional system) lasting at least 48 hours and preceded by a relatively stable or improving neurological state in the prior 30 days) | Follow‐up median (range): 14 months (0.4 months to 16 months), time to outcome from study entry mean (SD): 47 weeks (0.9 weeks) | Previous 2 years' relapses, number of enhancing lesions | Regression model formula with survival probability for 6 months and 1 year |
| Vukusic 2004 | Relapse | Postpartum relapse | 3 months after delivery | Number of relapses in pre‐pregnancy year, Number of relapses during pregnancy, MS duration | Full regression model |
| Ye 2020 gene signature | Relapse | Relapse‐free survival | Follow‐up mean (SD): 1.97 years (1.3 years) | FTH1, GBP2, MYL6, NCOA4, SRP9 | Regression coefficients without baseline hazard |
| Ye 2020 nomogram | Relapse | Relapse‐free survival | Follow‐up mean (SD): 1.97 years (1.3 years) | Age, gender, disease type, DMT, risk score | Nomogram |
| Aghdam 2021 | Conversion to definite MS | McDonald 2010 | Follow‐up mean (SD): 5.30 years (2.94 years), no precise time point of measurement | Presence of plaque in MRI, history of previous optic neuritis attack, type of optic neuritis, gender | Decision tree |
| Bendfeldt 2019 linear placebo | Conversion to definite MS | Modified Poser diagnosis confirmed by a central committee (relapse with clinical evidence of at least one CNS lesion (distinct from lesion responsible for CIS presentation if monofocal) or EDSS increase ≥ 1.5 reaching total EDSS ≥ 2.5 and confirmed 3 months later) | 2 years | Cortical grey matter segmentation masks, age, sex, scanner | ND |
| Bendfeldt 2019 M7 placebo | Conversion to definite MS | Modified Poser diagnosis confirmed by a central committee (relapse with clinical evidence of at least one CNS lesion (distinct from lesion responsible for CIS presentation if monofocal) or EDSS increase ≥ 1.5 reaching total EDSS ≥ 2.5 and confirmed 3 months later) | 2 years | Age, sex, EDSS, GM volume ratio, whole brain summaries (total, mean, median, minimum, maximum, SD) of lesion volume, surface area, and mean breadth, unclear if 3 more imaging predictors | ND |
| Bendfeldt 2019 M9 IFN | Conversion to definite MS | Modified Poser diagnosis confirmed by a central committee (relapse with clinical evidence of at least one CNS lesion (distinct from lesion responsible for CIS presentation if monofocal) or EDSS increase ≥ 1.5 reaching total EDSS ≥ 2.5 and confirmed 3 months later) | 2 years | Age, sex, EDSS, GM volume ratio, lesion count, whole brain summaries (total, mean, SD) of lesion volume, surface area, and mean breadth, Euler‐Poincare characteristic | ND |
| Borras 2016 | Conversion to definite MS | Presence of IgG oligoclonal bands and an abnormal brain MRI at baseline (2, 3, or 4 Barkhof criteria) | Follow‐up median (SD): CIS 3.25 years (1.32 years), CDMS 4.08 years (2.48 years) | CH3L1, CNDP1 | Heat maps |
| Gout 2011 | Conversion to definite MS | Time to Poser 1983 diagnosis | Follow‐up median (range): 3.5 years (1.0 year to 12.7 years), time to outcome in those who experience it median (range) 16.6 months (1.1 months to 112.5 months) | Age (≤ 31 years), 3 to 4 positive MR Barkhof criteria, CSF white blood cell Count > 4 per cubic millimetre | Sum score: 1 if age at onset ≤ 31 years, 3 if 3 to 4 Barkhof Criteria present, 1 if > 4 white blood cells per cubic millimetre in CSF |
| Martinelli 2017 | Conversion to definite MS | Time to Poser 1983 diagnosis | Follow‐up median (IQR): 7.3 years (3.5 years to 10.2 years) | 2010 DIS criteria fulfilled, 2010 DIT criteria fulfilled, age, T1 lesions, CSF oligoclonal bands (steroid use in 4 weeks prior to study, DMT use during follow‐up) | List of selected predictors |
| Olesen 2019 candidate | Conversion to definite MS | McDonald 2010 | Follow‐up median (range): 29.6 months (19 months to 41 months) | IL‐10, NF‐L, CXCL13 | Nomogram |
| Olesen 2019 routine | Conversion to definite MS | McDonald 2010 | Follow‐up median (range): 29.6 months (19 months to 41 months) | OCB, leukocytes, IgG index | Nomogram |
| Runia 2014 | Conversion to definite MS | Time from start of first symptoms to CDMS (Poser 1983) | Unclear | DIS + DIT2010, corpus callosum lesions, oligoclonal bands, fatigue, abnormal MRI | Unweighted sum score from 0 to 5 |
| Spelman 2017 | Conversion to definite MS | Time to first relapse following CIS (Poser 1983) | Follow‐up median (IQR): 1.92 years (0.90 years to 3.71 years) | Sex, age, EDSS, first symptom location, T2 infratentorial lesions, T2 periventricular lesions, OCB in CSF | Nomogram for 1‐year outcomes (nomograms for 6‐month, 2, 3, 4, and 5‐year outcomes) |
| Wottschel 2015 1 year | Conversion to definite MS | Occurrence of a second clinical attack attributable to demyelination of more than 24 hours in duration and at least 4 weeks from the initial attack | 1 year | Type of presentation, gender, lesion load | List of selected predictors and kernel degree |
| Wottschel 2015 3 year | Conversion to definite MS | Occurrence of a second clinical attack attributable to demyelination of more than 24 hours in duration and at least 4 weeks from the initial attack | 3 years | Lesion count, average lesion PD intensity, average distance of lesions from the centre of the brain, shortest horizontal distance of a lesion from the vertical axis, age, EDSS at onset | List of selected predictors and kernel degree |
| Wottschel 2019 | Conversion to definite MS | Occurrence of a second clinical episode | 1 year | Type of CIS, WM lesion load ‐ whole brain, WM lesion load ‐ frontal, WM lesion load ‐ limbic, WM lesion load ‐ temporal, WM lesion load ‐ dGM, WM lesion load ‐ WM, GM ‐ cerebellum, GM ‐ thalamus, GM ‐ frontal operculum, GM ‐ middle cingulate gyrus, GM ‐ precentral gyrus medial segment, GM ‐ posterior cingulate gyrus, GM ‐ praecuneus, GM ‐ parietal operculum, GM ‐ post‐central gyrus, GM ‐ planum polare, GM ‐ subcallosal area, GM ‐ supplementary motor cortex, GM ‐ superior occipital gyrus, cortical thickness ‐ central operculum, cortical thickness ‐ cuneus, cortical thickness ‐ fusiform gyrus, cortical thickness ‐ inferior temporal gyrus, cortical thickness ‐ middle occipital gyrus, cortical thickness ‐ post‐central gyrus medial segment, cortical thickness ‐ occipital pole, cortical thickness ‐ opercular part of the inferior frontal gyrus, cortical thickness ‐ orbital part of the inferior frontal gyrus, cortical thickness ‐ planum temporale, cortical thickness ‐ superior occipital gyrus, volume ‐ whole brain, volume ‐ ventral diencephalon, volume ‐ middle temporal gyrus, volume ‐ supramarginal gyrus, volume ‐ limbic | List of selected predictors for peak accuracy when using 2‐fold CV |
| Yoo 2019 | Conversion to definite MS | McDonald 2005 | 2 years | Unstructured: MRI mask images, structured: T2w lesion volume, brain parenchymal fraction, diffusely abnormal white matter, gender, initial CIS event cerebrum, initial CIS event optic nerve, initial CIS event cerebellum, initial CIS event brainstem, initial CIS event spinal cord, EDSS, CIS monofocal or multifocal type at onset | ND |
| Zakharov 2013 | Conversion to definite MS | Development of CDMS (second attack) | Follow‐up for 8 years | Age at disease onset, size of the foci of demyelination | ND |
| Zhang 2019 | Conversion to definite MS | Demonstration of dissemination in Time by clinical relapse or new MRI lesions | 3 years | Total lesion number, total lesion volume, minimum, maximum, mean, SD for surface area, sphericity, surface‐volume‐ratio, and volume of individual lesions | ND |
| Bergamaschi 2001 BREMS | Conversion to progressive MS | Time to earliest date of observation of progressive worsening (EDSS increase ≥ 1) persistent for ≥ 6 months | Follow‐up mean (SD, range): 7.5 years (5.7 years, 3 years to 25 years) | Age at onset, female, sphincter onset, pure motor onset, motor‐sensory onset, sequelae after onset, number of involved FS at onset, number of sphincter plus motor relapses, EDSS ≥ 4 outside relapse | Regression model without baseline hazard or proneness to failure |
| Brichetto 2020 | Conversion to progressive MS | ND | Unclear reporting | ABILHAND item 12, ABILHAND total, HADS item 7, HADS sub1, HADS sub2, HADS total, Life Satisfaction Index total, MFIS item 2, MFIS sub1, MFIS sub2, MFIS sub3, MFIS total, Overactive Bladder Questionnaire item 1, Overactive Bladder Questionnaire item 4, Overactive Bladder Questionnaire total, Functional Independence Measure item 10, Functional Independence Measure item 11, Functional Independence Measure item 12, Functional Independence Measure item 14, Functional Independence Measure sub3, Functional Independence Measure sub4, Functional Independence Measure sub5, Functional Independence Measure sub6, Functional Independence Measure total, Montreal Cognitive Assessment item 1, Montreal Cognitive Assessment item 9, Montreal Cognitive Assessment tot1, Montreal Cognitive Assessment tot2, PASAT, SDMT, years of education, height, weight | List of selected predictors |
| Calabrese 2013 | Conversion to progressive MS | EDSS increase ≥ 1.0 EDSS not related to a relapse and confirmed at 6 months | Up to 5 years, time to outcome median (range): 52 months (29 months to 64 months) | Age, cortical lesion volume, cerebellar cortical volume | Full regression model |
| Manouchehrinia 2019 | Conversion to progressive MS | Time to earliest recognised date of SPMS onset determined by neurologist at routine visit | Follow‐up mean (SD): 12.5 years (8.7 years) | Calendar year of birth, male sex, onset age, first‐recorded EDSS score, age at the first‐recorded EDSS score | Nomograms for calculating probabilities of 10, 15, and 20 year risk (web app at https://aliman.shinyapps.io/SPMSnom/) |
| Misicka 2020 10 years | Conversion to progressive MS | Time from participant‐reported age of RRMS onset to participant‐reported age of SPMS onset | Up to 10 years | Age of MS onset, male sex, time to second relapse, cancer, brainstem/bulbar, HLA‐A*02:01 0.60 | Nomogram |
| Misicka 2020 20 years | Conversion to progressive MS | Time from participant‐reported age of RRMS onset to participant‐reported age of SPMS onset | Up to 20 years | Age of MS onset, male sex, time to second relapse, obesity, neurological disorders, HLA‐A*02:01 0.56 | Nomogram |
| Misicka 2020 ever | Conversion to progressive MS | Time from participant‐reported age of RRMS onset to participant‐reported age of SPMS onset | ND | Age of MS onset, male sex, time to second relapse, neurological disorders, spasticity, HLA‐A*02:01 | Nomogram |
| Pinto 2020 SP | Conversion to progressive MS | SPMS diagnosis by clinician | Unclear (with predictors at 2 years from baseline) | ND | ND |
| Pisani 2021 | Conversion to progressive MS | Time to occurrence of continuous disability accumulation independent of relapses confirmed 12 months later (transitory plateaus in the progressive course were allowed) | Follow‐up mean (range): 9.55 years (6.8 years to 13.13 years) | At onset: cortical lesion number, age, EDSS, white matter lesion number; difference (between 0 years and 2 years): global cortical thickness, cerebellar cortical volume, new cortical lesion number | Combination of heat map value for 2 predictors plus other predictor values weighted by their minimal depth |
| Seccia 2020 180 days | Conversion to progressive MS | Assessed by treating clinician | 180 days from the index visit | Longitudinal trajectories of: age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs | ND |
| Seccia 2020 360 days | Conversion to progressive MS | Assessed by treating clinician | 360 days from the index visit | Longitudinal trajectories of: age at onset, gender, age at visit, EDSS, Number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs | ND |
| Seccia 2020 720 days | Conversion to progressive MS | Assessed by treating clinician | 720 days from the index visit | Longitudinal trajectories of: age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs | ND |
| Skoog 2014 | Conversion to progressive MS | Time from RRMS onset to retrospectively‐determined continuous progression for at least 1 year without remission | Time to outcome median (range): 11.5 years (0.7 years to 56.7 years) | Age, attack grade, time since last relapse (interaction with attack grade) | Web app at http://msprediction.com |
| Tacchella 2018 180 days | Conversion to progressive MS | Gradual worsening of RRMS course determined by change in EDSS independent of relapses over a period of at least 6 or 12 months | 180 days after visit of interest | Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, T25FW, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score | ND |
| Tacchella 2018 360 days | Conversion to progressive MS | Gradual worsening of RRMS course determined by change in EDSS independent of relapses over a period of at least 6 or 12 months | 360 days after visit of interest | Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, T25FW, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score | ND |
| Tacchella 2018 720 days | Conversion to progressive MS | Gradual worsening of RRMS course determined by change in EDSS independent of relapses over a period of at least 6 or 12 months | 720 days after visit of interest | Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, T25FW, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score | ND |
| Vasconcelos 2020 | Conversion to progressive MS | Time until confirmed progressive and sustained worsening for at least 6 months (irreversible EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), 0.5 point (> 5.5) independent of relapse) | Time to outcome mean (SD): 13.70 years (8.88 years) | Pyramidal and cerebellar impairment at onset of the disease, treatment before EDSS 3, age at disease onset, African descent, time between first and second relapses (unclear if the coefficient for 'recovery' is needed or the model fit without recovery is presented) | Unweighted sum score from 0 to 5 (unclear if based on refit minus 'recovery,' found to be insignificant at multivariable analysis) |
| Ahuja 2021 | Composite (relapse) | Clinical/radiological relapse (radiological relapse: new T1‐enhancing lesion or new/enlarging T2‐FLAIR hyperintense lesion on brain, orbit, or spinal cord MRI) | 1 year | Step 1 model: 111 features (12 Current Procedural Terminology (CPT) codes, 60 CUIs derived from free‐text, and 35 PheCodes from ICD data, age, sex, race, disease duration), step 2 model: age, disease duration, relapse history estimated by model 1 | Regression coefficients without intercept |
| de Groot 2009 cognitive | Composite (cognitive ‐ see definition) | Cognitive impairments: score of mean – SD for 1 or more subtests of a cognitive screening test (subscales of consistent long‐term retrieval and long‐term storage of the selective reminding test, 10/36 spatial recall test, symbol digit modalities test, PASAT, and word list generation) | 3 years | Age, gender, how well can you concentrate?, T2‐weighted supratentorial lesion load | Score chart |
| Kosa 2022 | Composite (see definition) | MS‐DSS (model output based on measured CombiWISE (EDSS, SNRS, T25FW, NDH‐9HPT), therapy adjusted CombiWISE, COMRIS‐CTD (lesion/atrophy measures), time from disease onset to first therapy, difference between adjusted and unadjusted CombiWISE, age, and family history of MS) | Follow‐up mean: 4.3 years | SOMAmer ratios, age, sex | ND |
| Pellegrini 2019 | Composite (EDSS, T25FW, 9HPT, PASAT, VFT) | Time to disability progression (EDSS increase (for EDSS baseline) ≥ 1 (≥ 1), 1.5 (< 1) or 20% worsening on either T25FW or 9HPT or PASAT or 10‐letter worsening on VFT) confirmed at 24 weeks | 2 years | PASAT, SF‐36 physical component summary, visual function test | Regression coefficients without baseline hazard |
2D: 2‐dimensional 3D: 3‐dimensional 9HPT: 9‐hole peg test ABILHAND: interview‐based assessment of a patient‐reported measure of the perceived difficulty in using their hand to perform manual activities Ada: adaptive boosting BMS: benign MS BPF: brain parenchymal fraction BREMS: Bayesian Risk Estimate for Multiple Sclerosis CDMS: conversion to clinically definite MS CLCN4: chloride voltage‐gated channel 4 CombiWISE: Combinatorial Weight‐adjusted Disability Score COMRIS‐CTD: Combinatorial MRI scale of CNS tissue destruction CPT: current procedural terminology CSF: cerebrospinal fluid CUIs: concept unique identifiers CXCL13: chemokine ligand 13 CV: cross‐validation dGM: deep grey matter DIS: dissemination in space DIT: dissemination in time DIT2010: dissemination in time according to McDonald 2010 criteria DMT: disease‐modifying therapy DSS: Disability Status Scale DT: decision tree EDSS: Expanded Disability Status Scale FLAIR: fluid‐attenuated inversion recovery FLJ10201: anti‐YEATS2 antibody FLP: first level predictor FTP: fine tuning predictor GD: gadolinium GM: grey matter HADS: Hospital Anxiety and Depression Scale HLA: human leukocyte antigen ICBM‐DTI: International Consortium of Brain Diffusion Tensor Imaging ICD: International Classification of Diseases IFN: interferon IgG: immunoglobulin G IL2: interleukin‐2 ILIRN: interleukin‐1 receptor antagonist IQR: interquartile range KIAA1043: a gene LGBM: light gradient boosting machine MEFV: Mediterranean fever gene MEP/mEPS: motor evoked potentials MFIS: Modified Fatigue Impact Scale MNI: Montreal Neurological Institute MR: magnetic resonance MRI: magnetic resonance imaging MS: multiple sclerosis MS‐DSS: MS disease severity scale MTR: magnetisation transfer imaging ND: no data available NDH‐9HRT: non‐dominant hand 9‐hole peg test NEMO: network modification tool NF‐L: neurofilament light chain NOS2: nitric oxide synthase 2 OB: oligoclonal bands OCB: oligoclonal bands PASAT: Paced Auditory Serial Addition Test PD: patient‐determined PDCD2: human programmed cell death protein 2 PITPNCI: phosphatidylinositol transfer protein PNMT: phenylethanolamine‐N‐methyltransferase gene PP: primary progressive PPMS: primary progressive MS RF: random forest RNRL: retinal nerve fibre layer RRMS: relapsing‐remitting multiple sclerosis SD: standard deviation SDMT: symbol digit modalities test SF‐36: 36‐Item Short Form Health Survey SLM: a gene SMARCA1: SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 1 SMS: severe multiple sclerosis SNRS: Scripps neurologic rating scale SOMAmer: short, single stranded deoxyoligonucleotides SP: secondary progression SPMS: secondary progressive MS T25FW: timed 25‐foot walk T2LV: T2 lesion volume T2w: T2‐weighted TGFB2: transforming growth factor beta 2 TRIM22: tripartite motif containing 22 VFT: visual function test WM: white matter XGB: extreme gradient boosting
Appendix 6. Additional figures
8.
Tables of predictor domains included or considered in included models. Top: models developed with machine learning; bottom: models developed with traditional methods. CSF: cerebrospinal fluid, Dev: development, ND: no data available
9.
Risk of bias assessments per analysis grouped by machine learning developments, traditional statistics developments, and all validations. Dev: development, ML: machine learning, Val: validation, y: year(s), d: days
10.
Percent of study items reported over time by analysis type. Data for the year 2021 are incomplete (only until July). ML: machine learning.
11.
Tables of TRIPOD items in included analyses. Top: models developed with machine learning; middle: models developed with traditional methods; bottom: model validations. The white box indicates that item 13c, which pertains to external validations, was not applicable for the Skoog 2019 Val analysis, which used the development participants. Dev: development, ML: machine learning, Val: validation, y: year(s), d: days
Characteristics of studies
Characteristics of included studies [ordered by study ID]
Aghdam 2021.
| Study characteristics | ||
| General information |
Model name Not applicable Primary source Journal Data source Cohort, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Patients admitted to the ophthalmology and neurology departments of Rassoul Akram Hospital, a tertiary referral centre in Tehran, Iran Age (years) Mean 40.0 (at ON) Sex (%F) 74.1 Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria McDonald 2010 (Polman 2011) Treatment
Disease description History of ON: 11.9% Recruitment period 2008 to 2018 |
|
| Predictors |
Considered predictors Age (as continuous and/or dichotomised), gender, season of attack (spring vs other), best corrected visual acuity (as continuous or dichotomised as logMAR ≤ or > 1), optic disc swelling (type of ON), ocular pain, ON history, plaque positive (white matter lesions ≥ 3 mm in diameter in juxtacortical, periventricular, infratentorial or spinal cord regions), dissemination in space (hyperintense T2 lesions in ≥ 2 of juxtacortical, periventricular, infratentorial or spinal cord regions), treatment with prednisolone Number of considered predictors ≥ 10 Timing of predictor measurement At presentation due to ON Predictor handling Unclear, all might be dichotomised and/or continuous |
|
| Outcome |
Outcome definition Conversion to definite MS (Polman 2011): CDMS based on 2010 revised McDonald criteria Timing of outcome measurement Follow‐up mean (SD): 5.30 years (2.94 years), no precise time point of measurement |
|
| Missing data |
Number of participants with any missing value 60 Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events) 277 (117) Modelling method Classification tree Predictor selection method
Hyperparameter tuning Not reported Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Random split: 70% training, 30% test Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 0.74, sensitivity = 0.71, specificity = 0.76, PPV = 0.65, NPV = 0.79 Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Decision tree Number of predictors in the model 4 Predictors in the model Presence of plaque in MRI, history of previous optic neuritis attack, type of optic neuritis, gender Effect measure estimates Tree given Predictor influence measure Not reported Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To evaluate the predisposing factors of conversion to MS in the Iranian population with ON to organise a decision tree for predicting the probability of conversion to MS Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Planned study, survival, incorporating other factors such as CSF components or serum vitamin D level, adding visual outcomes, use McDonald 2017 (Thompson 2018b) |
|
| Notes |
Applicability overall Low Applicability overall rationale Study authors confirmed that participants who had already experienced the outcome at baseline were not included in the development set. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Unclear | Participants were excluded for having missing data. |
| Predictors | Yes | The predictors are collected by fellows and are standard enough. |
| Outcome | No | Although some predictors could have been known while assessing parts of the outcome, we consider the outcome to be robust to such information. However, a specific time point for outcome assessment was not used and its variability is high. Also, dissemination in time and space are amongst the predictors, which form part of the outcome definition. |
| Analysis | No | The EPV was around 10 for the entire dataset. Predictors were dichotomised and selected prior to multivariable modelling. The differing outcome time was not addressed. Neither discrimination nor calibration was assessed. A random split was used for validation. |
| Overall | No | At least one domain is at high risk of bias. |
Agosta 2006.
| Study characteristics | ||
| General information |
Model name Not applicable Primary source Journal Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment All participants were recruited previously by Filippi (2000) and followed‐up at short term by Rovaris (2003), at the University Hospital San Raffaele in Milan, Italy (unclear, based on Ethics Committee approval) Age (years) Mean 33.5 Sex (%F) 69.9 Disease duration (years) Range: 0 to 25 Diagnosis 27.4% CIS, 46.6% RRMS, 26.0% SPMS Diagnostic criteria Mixed: Poser 1983, Lublin 1996 Treatment
Disease description EDSS median (range): CIS 0.0 (0.0 to 1.5), RRMS 2.5 (1.0 to 5.5), SPMS 5.5 (3.5 to 6.5) Recruitment period Not reported |
|
| Predictors |
Considered predictors Age, disease duration, clinical phenotype (CIS+RR vs SP), baseline EDSS, baseline T2 lesion volume, baseline T1 lesion volume, baseline brain parenchymal fraction, baseline grey matter fraction, baseline white matter fraction, baseline average whole‐brain magnetisation transfer ratio, baseline average grey matter magnetisation transfer ratio, baseline average normal‐appearing white matter magnetisation transfer ratio, baseline average lesion magnetisation ratio, baseline whole‐brain magnetisation transfer ratio histogram peak height, baseline grey matter histogram peak height, baseline normal‐appearing white matter histogram peak height, brain parenchymal fraction percentage change, grey matter fraction percentage change, white matter fraction percentage change, average whole‐brain magnetisation transfer ratio percentage change, average grey matter magnetisation transfer ratio percentage change, average normal‐appearing white matter magnetisation transfer ratio percentage change, average lesion magnetisation transfer ratio percentage change, (unclear adjustment for follow‐up duration) Number of considered predictors 26 Timing of predictor measurement At study baseline (cohort entry at least 2 years after diagnosis of definite MS or 3 months after CIS), 12 months (± 10 days) after baseline, at final follow‐up (outcome measurement) Predictor handling Continuously |
|
| Outcome |
Outcome definition Disability (EDSS): clinically worsened defined as an EDSS score increase ≥ 1.0, when baseline EDSS was < 6.0, or an EDSS score increase ≥ 0.5 when baseline EDSS was ≥ 6.0; EDSS changes had to be confirmed by a second visit after a 3‐month, relapse‐free period Timing of outcome measurement Follow‐up median: 8 years, mean: 7.7 years |
|
| Missing data |
Number of participants with any missing value 6, only missing outcome reported Missing data handling Mixed: last value carried forward, complete case |
|
| Analysis |
Number of participants (number of events) 70 (44) Modelling method Logistic regression Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights No Performance evaluation dataset Development Performance evaluation method Cross‐validation: LOOCV Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 46/70, sensitivity = 30/41, specificity = 16/29 Overall performance Nagelkerke's R2 = 0.28 Risk groups Not reported |
|
| Model |
Model presentation Regression coefficients without the intercept or coefficient for "adjustment for follow‐up duration" Number of predictors in the model 2 or 3 (unclear if follow‐up duration included) Predictors in the model Baseline GM histogram peak height, average lesion MTR percentage change after 12 months, follow‐up duration Effect measure OR (95% CI): baseline GM histogram peak height 0.97 (0.94 to 0.99), average lesion MTR percentage change after 12 months 0.88 (0.80 to 0.98), follow‐up duration (not reported) Predictor influence Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To assess the value of MT MRI quantities and their short‐term changes in predicting the long‐term accumulation of disability in multiple sclerosis patients Primary aim The primary aim of this study is not the prediction of individual outcomes. Rather, the focus is on the usefulness of MRI measures. Model interpretation Exploratory Suggested improvements Not reported |
|
| Notes |
Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of new MRI measures. Auxiliary references Filippi M, Inglese M, Rovaris M, Sormani MP, Horsfield P, Iannucci PG, et al. Magnetization transfer imaging to monitor the evolution of MS: a 1‐year follow‐up study. Neurology 2000;55(7):940‐6. Lechner‐Scott J, Kappos L, Hofman M, Polman CH, Ronner H, Montalban X, et al. Can the expanded disability status scale be assessed by telephone? Mult Scler 2003;9(2):154‐9. Rovaris M, Agosta F, Sormani MP, Inglese M, Martinelli V, Comi G, et al. Conventional and magnetisation transfer MRI predictors of clinical multiple sclerosis evolution: a medium‐term follow‐up study. Brain 2003;126(Pt 10):2323‐32. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | Patients were included probably from a prospectively designed cohort study with clear eligibility criteria. |
| Predictors | Yes | Even though there is no clear indication of predictors being collected similarly across patients, the authors described when changes were made from the outcome, suggesting they would describe if there were different assessments of variables. |
| Outcome | Yes | Although EDSS was determined differently either in person or by phone, EDSS assessment by phone has been shown to be valid. It is unclear if the outcome assessment was blinded to predictors, but we consider EDSS to be an objective measure. |
| Analysis | No | The EPV was far less than 10. Predictors were included based on univariable analyses. Calibration and discrimination were not assessed. Although cross‐validation was used, it is unclear whether the variable selection process was included within this procedure. It is unclear if all participants were analysed and how missing data were handled. Follow‐up time was added as a predictor instead of using methods to deal with different observation times. |
| Overall | No | At least one domain is at high risk of bias. |
Ahuja 2021.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source
Study type Development + external validation (spectrum) |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment
Age (years) Median 43.3 (at first MS code) Sex (%F)
Disease duration (years)
Diagnosis Approximately 70% to 80% RRMS, 10% PPMS, 10% to 20% SPMS Diagnostic criteria Not reported Treatment
Disease description Not reported Recruitment period 2006 to 2016 |
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition Composite (relapse): a relapse event as a clinical and/or radiological relapse; clinical relapse: new or recurrence of neurological symptoms lasting persistently for ≥ 24 h without fever or infection; radiological relapse: either a new T1‐enhancing lesion and/or a new or enlarging T2‐FLAIR hyperintense lesion on brain, orbit, or spinal cord MRI on clinical radiology report. Timing of outcome measurement 1 year |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events)
Modelling method
Predictor selection method
Hyperparameter tuning
Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method Not applicable Calibration estimate
Discrimination estimate
Classification estimate Sensitivity = 0.499, specificity = 0.719, PPV = 0.223, NPV = 0.900, F1 = 0.307 Overall performance Not reported Risk groups
|
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure
Validation model update or adjustment
|
|
| Interpretation |
Aim of the study To develop and test a clinically deployable model for predicting 1‐year relapse risk in MS patients Primary aim The primary aim of this study is the prediction of individual outcomes Model interpretation Probably exploratory Suggested improvements To incorporate MRI features |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | EHR include data collected for a different purpose; the data are collected without protocol and various users enter data in the records. Therefore, the data are extremely heterogeneous. |
| Predictors | Yes |
|
| Outcome | Yes |
|
| Analysis | No |
|
| Overall | No | At least one domain is at high risk of bias. |
Bejarano 2011.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Cohort, primary Study type Development + validation (model refit), location |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment
Age (years)
Sex (%F)
Disease duration (years)
Diagnosis
Diagnostic criteria McDonald 2005 (Polman 2005) Treatment
Disease description
Recruitment period Not reported |
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition Disability (EDSS): change in EDSS as numeric delta Timing of outcome measurement At 2 years |
|
| Missing data |
Number of participants with any missing value
Missing data handling
|
|
| Analysis |
Number of participants (number of events)
Modelling method
Predictor selection method
Hyperparameter tuning
Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method
Calibration estimate Not reported Discrimination estimate
Classification estimate
Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure
Validation model update or adjustment
|
|
| Interpretation |
Aim of the study To evaluate the usefulness of clinical, imaging and neurophysiological variables for predicting short‐term disease outcomes in MS patients Primary aim The primary aim of this study is the prediction of individual outcomes Model interpretation Exploratory Suggested improvements Incorporating GM atrophy or other new MRI metrics |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | A cohort was formed for prediction purposes, and inclusion/exclusion criteria seem appropriate. |
| Predictors | Yes | The predictors were collected in a prospective way at a single clinic. |
| Outcome | Yes | The outcome is considered objective, so risk of bias from knowledge of predictors is not expected. The outcome was conceptualised as a change in EDSS and treated as a score that can be subtracted although EDSS is an ordinal measure. Yet, the modelling method of neural networks can accommodate interactions amongst baseline predictors, including baseline EDSS. |
| Analysis | No | The number of participants was low. Calibration was not assessed. |
| Overall | No | At least one domain is at high risk of bias. |
Bendfeldt 2019.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Randomised trial participants, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment
Age (years)
Sex (%F)
Disease duration (years) Up to 0.16 Diagnosis 100% CIS Diagnostic criteria Own definition Treatment
Disease description
Recruitment period 2002 to 2005 |
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement At disease onset (CIS) (RCT baseline within 60 days after onset) Predictor handling
|
|
| Outcome |
Outcome definition Conversion to definite MS (modified Poser): CDMS diagnosis within 2 years and confirmed by a central committee based on modified Poser; modified Poser defined as a relapse with clinical evidence of at least 1 CNS lesion, and if the first presentation was monofocal distinct from the lesion responsible for the CIS presentation, or 2) sustained progression by ≥ 1.5 points on the EDSS reaching a total EDSS score of ≥ 2.5 and confirmed at a consecutive visit 3 months later Timing of outcome measurement Median (IQR) in days
|
|
| Missing data |
Number of participants with any missing value
Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events)
Modelling method Support vector machine, radial kernel Predictor selection method
Hyperparameter tuning
Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method
Calibration estimate Not reported Discrimination estimate Not reported Classification estimate
Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model
Predictors in the model
Effect measure estimates Not reported Predictor influence measure
Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To determine whether pattern classification using SVMs facilitates predicting conversion to clinically definite multiple sclerosis (CDMS) from clinically isolated syndrome (CIS) Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on identifying advanced MRI features capable of improving prediction. Model interpretation Exploratory Suggested improvements Feature selection methods; larger, independent test data; ensembles of classifiers; other para‐clinical markers (synthesis of oligoclonal bands and genetic factors); other lesional or degenerative MRI features |
|
| Notes |
Applicability overall High Applicability overall rationale Although this study contained models, the main aim was not to create a model for prediction of individual outcomes but was rather to identify advanced imaging features capable of improving prediction. Auxiliary references Bakshi R, Dandamudi VS, Neema M, De C, Bermel RA. Measurement of brain and spinal cord atrophy by magnetic resonance imaging as a tool to monitor multiple sclerosis. J Neuroimaging 2005;15(4 Suppl):30s‐45s. Barkhof F, Polman CH, Radue EW, Kappos L, Freedman MS, Edan G, et al. Magnetic resonance imaging effects of interferon beta‐1b in the BENEFIT study: integrated 2‐year results. Arch Neurol 2007;64(9):1292‐8. Kappos L, Polman C H, Freedman MS, Edan G, Hartung HP, Miller DH, et al. Treatment with interferon beta‐1b delays conversion to clinically definite and McDonald MS in patients with clinically isolated syndromes. Neurology 2006;67(7):1242‐9. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | The data source was RCT, expected to be high in quality and with a clear eligibility assessment. The number of participants reported in the Methods section and described in results did not match, yet no exclusion criteria were reported to explain the difference. Hence, this discrepancy was addressed in the analysis domain. |
| Predictors | Yes | The predictors were derived from an RCT, expected to have sufficient standardisation, and there is no reason to believe that the feature extraction/processing was different. Although some of the predictors were created after the outcome was assessed, the predictor creation is automated, and the risk of bias is considered to be low. |
| Outcome | Yes | Imaging was assessed at a central location, and the outcome should be robust to the knowledge of demographics and baseline EDSS. The outcome is not common, but it was probably pre‐specified. |
| Analysis | No | M7 placebo arm and M9 IFNb arm: The number of patients included in Table 2 was inconsistent with the number of trial participants and what was reported in the Introduction. There was no mention of missing data, making it likely that a complete‐case analysis was performed. The number of observations per variable was low. Accuracy was computed, but not discrimination or calibration. Standardisation was done on the full dataset instead of within the resampling structure. The performance measures were calculated at the same level as model selection would probably occur but not tuning parameter selection. The presentation of a final selected model is unclear. Linear SVM: The number of patients included in Table 2 was inconsistent with the number of trial participants and what was reported in the Introduction. There was no mention of missing data, making it likely that a complete‐case analysis was performed. The number of observations per variable was low. Accuracy was computed, but not discrimination or calibration. Standardisation was done on the full dataset instead of within the resampling structure. There was no mention of SVM tuning, so performance was probably evaluated in the same data as tuning. The presentation of a final selected model is unclear. |
| Overall | No | At least one domain is at high risk of bias. |
Bergamaschi 2001.
| Study characteristics | ||
| General information |
Model name BREMS Dev Primary source Journal Data source Mixed (registry, routine care), secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Patients at the Centre for Multiple Sclerosis of Fondazione C. Mondino (Pavia), the only facility for MS patients in the district, Italy Age (years) Mean 28.5 Sex (%F) 62.9 Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Mixed: Poser 1983, Lublin 1996 Treatment
Disease description Not reported Recruitment period Until 1997 |
|
| Predictors |
Considered predictors Unclear if it is the complete list: gender, age at onset, type of initially involved functional systems (FSs), number of initially involved FSs, whether initial relapse was followed by sequelae, interval between first 2 attacks, pre‐1 year relapse counts by type of involved FSs, maximum neurological score reached in each distinct FS, whether EDSS ≥ 4 during or outside of relapse in first year, (intermediate predictors: relapses in each neurological systems, FS‐specific impairment scores, EDSS evolution, use of preventive therapies) Number of considered predictors > 9 Timing of predictor measurement At disease onset (RRMS) and regular visits up to 1 year after onset (baseline) Predictor handling EDSS dichotomised |
|
| Outcome |
Outcome definition Conversion to progressive MS: time of onset of secondary progressive phase, defined as the earliest date of observation of a progressive worsening, severe enough to determine an increase of at least 1 point on the EDSS; the worsening had to persist for at least 6 months after the onset of progression in order to be confirmed Timing of outcome measurement Follow‐up mean (SD, range): 7.5 years (5.7 years, 3 years to 25 years)/follow‐up visit frequency every 6 months on average, but the frequency depended on the course of the disease: patients with 'active' relapsing disease were followed every 3 months, patients with 'stable' relapsing disease every 6 to 12 months |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events) 186 (34) Modelling method Survival, Bayesian joint survival model using Monte Carlo particle filtering Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights Modelling method Performance evaluation dataset None Performance evaluation method Not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Regression model without baseline hazard or proneness to failure Number of predictors in the model 9 Predictors in the model Age at onset, female, sphincter onset, pure motor onset, motor‐sensory onset, sequelae after onset, number of involved FS at onset, number of sphincters plus motor relapses, EDSS ≥ 4 outside relapse Effect measure estimates Local relative risks (95% credible interval): age at onset 1.05 (1.02 to 1.09), female 0.39 (0.17 to 0.78), sphincter onset 2.98 (1.10 to 6.10), pure motor onset 2.11 (0.90 to 4.20), motor‐sensory onset 2.4 (1.15 to 4.41), sequelae after onset 1.76 (1.04 to 2.88), number of involved FS at onset 1.39 (1.16 to 1.64), number of sphincter plus motor relapses 2.10 (1.56 to 2.89), EDSS ≥ 4 outside relapse 2.28 (0.40 to 6.50), to be understood as hazard ratios Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study With the aid of a Bayesian statistical model of the natural course of relapsing‐remitting MS, to identify short‐term clinical predictors of long‐term evolution of the disease, with particular focus on predicting onset of secondary progressive course failure event on the basis of patient information available at an early stage of disease. Primary aim The primary aim of this study is not the prediction of individual outcomes. Rather, the focus is on predictor identification. Model interpretation Probably confirmatory Suggested improvements Not reported |
|
| Notes |
Applicability overall High Applicability overall rationale Although this study contained a model, the main aim was not to create a model for prediction of individual outcomes but was rather to search for predictors. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | This study used routine care data, which may introduce risk of bias. |
| Predictors | Yes | Because the data were collected at clinical visits, the predictors were probably collected without knowledge about the outcome. A group of neurologists saw the participants at the same clinic, so the predictors were probably assessed in a similar way for all patients. The proposed model only used the early predictors, which were collected within 1 year of disease onset. |
| Outcome | No | The frequency of visits depended on the disease course and was different for patients with active relapsing disease, every 3 to 6 months, vs patients with stable relapsing disease, every 6 to 12 months, which causes differential assessment of the outcome in different patients. |
| Analysis | No | The EPV was low, and the model or its optimism were not evaluated in any way. Many other details, including the method of dealing with missing data and overfitting, were not reported. EDSS was dichotomised. |
| Overall | No | At least one domain is at high risk of bias. |
Bergamaschi 2007.
| Study characteristics | ||
| General information |
Model name BREMS Primary source Journal Data source Cohort, secondary Study type External validation (initial validation), location |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment MS centres in Pavia (Northern Italy), Florence (Central Italy), Bari (Southern Italy) Age (years) Median 24.8 Sex (%F) 69.3 Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Poser 1983 Treatment
Disease description Not reported Recruitment period Not reported |
|
| Predictors |
Considered predictors Not applicable Number of considered predictors Not applicable Timing of predictor measurement Not applicable Predictor handling Not applicable |
|
| Outcome |
Outcome definition Conversion to progressive MS: time at which the patient reached the confirmed SP, defined as the earliest date of observation of a progressive worsening, severe enough to lead to an increase of at least 1 point on the EDSS, and confirmed at least 1 year after progression Timing of outcome measurement Follow‐up mean (SD, range): 17.1 years (2.1 years, 10 years to 48 years), time to endpoint median (range): 10.5 years (2 years to 44 years)/follow‐up visit frequency every 6 months on average, but the frequency depended on the course of the disease: patients with 'active' relapsing disease were followed every 3 months, patients with 'stable' relapsing disease every 6 to 12 months. |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events) 535 (87 within 10 years) Modelling method Not applicable Predictor selection method Not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Not applicable Performance evaluation dataset External validation Performance evaluation method Not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate
Overall performance Not reported Risk groups Very high risk: 95th percentile (score ≥ 2.0), very low risk: 5th percentile (score ≤ −0.63) |
|
| Model |
Model presentation Not applicable Number of predictors in the model Not applicable Predictors in the model Not applicable Effect measure estimates Not applicable Predictor influence measure Not applicable Validation model update or adjustment None |
|
| Interpretation |
Aim of the study To test the trustworthiness of the Bayesian risk score on the basis of a new and larger sample of patients Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Confirmatory Suggested improvements Incorporation of additional clinical aspects of disease (cognitive impairment and fatigue), genetic, neuroimmunological, neuroradiological, and neurophysiological findings |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | This study probably used cohort data and the eligibility criteria were clear. |
| Predictors | Yes | Because the data were collected at clinical visits, the predictors were probably collected without knowledge about the outcome. A group of neurologists saw the participants at the same clinic, so the predictors were probably assessed in a similar way for all patients. The proposed model only used the early predictors, which were collected within 1 year of disease onset. |
| Outcome | No | The frequency of visits depended on the disease course and was different for patients with active relapsing disease, every 3 to 6 months, vs patients with stable relapsing disease, every 6 to 12 months, which causes differential assessment of the outcome in different patients. |
| Analysis | No | The subset on which performance measures were estimated contained fewer than 100 events. Only classification measures were addressed. Methods of dealing with missing data were not reported. |
| Overall | No | At least one domain is at high risk of bias. |
Bergamaschi 2015.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Registry, secondary Study type
|
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment
Age (years) Mean 31.1 Sex (%F) 71.3 Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria McDonald 2001 Treatment Unclear timing, 72.2% on treatment Disease description Not reported Recruitment period Not reported |
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition
Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value 2965 Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events)
Modelling method Not applicable Predictor selection method Not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Not applicable Performance evaluation dataset External validation Performance evaluation method Not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate
Overall performance Not reported Risk groups Quartiles by risk score, the first (< −0.58) and third quartiles (> 0.52) |
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model Not applicable Effect measure estimates Not applicable Predictor influence measure Not applicable Validation model update or adjustment
|
|
| Interpretation |
Aim of the study To predict the natural course of MS using the Bayesian Risk Estimate for MS at Onset (BREMSO), which gives an individual risk score calculated from demographic and clinical variables collected at disease onset Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Confirmatory Suggested improvements Not reported |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | This study used registry data and there was a substantial amount of missing data that the exclusion was based on. |
| Predictors | Yes | The data were collected prospectively, and only early predictors that were collected within 1 year of disease onset were included in the model to be used early in the disease. It is a multicentre study but with well‐defined tools. |
| Outcome | Yes | BREMS Ext Val and BREMSO for SP: We rated this domain for these analyses as having an unclear risk of bias. Due to the data source, the outcome was probably assessed with knowledge of the predictors, but we consider the definition for conversion to secondary progressive MS based on EDSS to be a rather hard outcome. It is unclear if the frequency of visits at which the outcome was assessed differed from patient to patient, which is likely due to the nature of the data source. BREMSO MSSS: We rated this domain for this analysis as having a low risk of bias. Due to the data source, the outcome was probably assessed with knowledge of the predictors, but we consider EDSS to be a rather hard outcome. |
| Analysis | No | BREMS Ext Val: This was a validation study without any reported discrimination or calibration measures, especially those for censored data. Exclusion for missing data was handled in the Participants section. BREMSO for SP and BREMSO MSSS: This validation study did not assess discrimination or calibration. Variables were dropped from the developed model, and the coefficients for the rest of the predictors were used as if this did not occur. |
| Overall | No | At least one domain is at high risk of bias. |
Borras 2016.
| Study characteristics | ||
| General information |
Model name CH3L1 + CNDP1 Primary source Journal Data source Cohort, unclear Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Hospital Ramon and Cajal in Madrid, Spain Age (years) Median 35.5 (unclear when) Sex (%F) 66.0 Disease duration (years) Median 0.22 (range: 0.01 to 0.35) Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Unclear timing, and no specific treatment details, 8% on treatment Disease description EDSS median (range): 1.5 (0 to 5) Recruitment period From 2001 onward |
|
| Predictors |
Considered predictors CH3L1, CNDP1, CLUS, A1AG1, 2‐AACT, CNTN1, AACT, SEM7A, HPT, PGCB, 3‐AACT, OSTP, CMGA, SCG2, A2MG, A1AG1, TTHY Number of considered predictors Between 32 and 17 (discrepant lists) Timing of predictor measurement At disease onset (CIS) reported as 'first relapse' (4 to 126 days between CIS and Lumbar Puncture) Predictor handling Continuously (log2 transformed) |
|
| Outcome |
Outcome definition Conversion to definite MS: conversion to CDMS defined as the presence of IgG oligoclonal bands and an abnormal brain MRI at baseline (2, 3, or 4 Barkhof criteria) Timing of outcome measurement Follow‐up median (SD): CIS 3.25 years (1.32 years), CDMS 4.08 years (2.48 years) |
|
| Missing data |
Number of participants with any missing value 1, only missing predictor reported Missing data handling Single value imputation of predictors (a minimum estimated log2‐transformed abundance for a given protein across runs) |
|
| Analysis |
Number of participants (number of events) 49 (24) Modelling method Logistic regression Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights No Performance evaluation dataset Development Performance evaluation method Unclear, point estimates for full dataset, plots also depict median performance and measure of uncertainty for subset of 500 repeats of training‐validation split Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.858, optimism‐corrected = 0.785 Classification estimate Sensitivity = 0.84, specificity = 0.83 Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Heat maps Number of predictors in the model 2 Predictors in the model CH3L1, CNDP1 Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To establish a diagnostic molecular classifier with high sensitivity and specificity able to differentiate between clinically isolated syndrome patients with a high and a low risk of developing multiple sclerosis over time. To build a statistical model able to assign to each patient a precise probability of conversion to clinically defined multiple sclerosis. Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Not reported |
|
| Notes |
Applicability overall High Applicability overall rationale The predictors used were proteins and no other predictor domain was considered for use in the model. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Unclear | No details were provided about the eligibility criteria other than the diagnostic subtype and the data source is not clearly reported. |
| Predictors | Yes | The predictors are relatively objective to assess, available at the intended time of prognostication. Although it is unclear when the predictor assessment was done relative to outcome data collection, there is nothing to indicate different assessments for participants. |
| Outcome | Unclear | No exact timing of the outcome assessment was specified. Some patients were followed up for short periods and others for years. |
| Analysis | No | The number of participants was much lower than necessary, and EPV was less than 10. Only discrimination was addressed, but not calibration. A bootstrap procedure was used, but the variability in AUC only accounted for training samples that chose those predictors. The time for which predictions were to be made was never addressed; therefore, participants had different follow‐up times, and this was not accounted for. It is unclear whether the weights of the predictors corresponded to a final selected model or not. Although not all patients were included in the analysis, only a single patient was excluded, which is less than 5%. |
| Overall | No | At least one domain is at high risk of bias. |
Brichetto 2020.
| Study characteristics | ||
| General information |
Model name Future course assignment Primary source Journal Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria Unclear, as reported in auxiliary reference
Exclusion criteria Unclear, as reported in auxiliary reference
Recruitment Patients followed as outpatients or at‐home by Italian Multiple Sclerosis Society (AISM) Rehabilitation Centres of Genoa, Padua and Vicenza, Italy Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis Unclear, RRMS, SPMS Diagnostic criteria Not reported Treatment Not reported Disease description Not reported Recruitment period 2014 to 2017 |
|
| Predictors |
Considered predictors ABILHAND, Edinburgh Handedness Inventory, Hospital Anxiety and Depression Scale, Life Satisfaction Index, Modified Fatigue Impact Scale, Overactive Bladder Questionnaire, Functional Independence Measure, Montreal Cognitive Assessment, Paced Auditory Serial Addition Task, Symbol Digit Modalities Test, education (years), number of relapses in past 4 months, height, weight Number of considered predictors 143 Timing of predictor measurement Unclear, at multiple assessments every 4 months Predictor handling Unclear, probably continuously |
|
| Outcome |
Outcome definition Conversion to progressive MS Timing of outcome measurement Unclear if next visit or 4 months |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Unclear, K‐nearest neighbour data imputing strategy |
|
| Analysis |
Number of participants (number of events) ≤ 3398 evaluations of 810 participants (unclear how many are used in the FCA model, 1451 evaluations of RR, 1947 evaluations are SP) Modelling method Unspecified ML techniques/multitask elastic net (for prediction of disease descriptors) followed by gradient boosting (for classification based on predicted predictors) in auxiliary reference Predictor selection method
Hyperparameter tuning Unclear, according to auxiliary reference, parameter tuning done using inner parameter optimisation via grid‐search in cross‐validation, modelling/tuning not reported in Brichetto 2020 Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Unclear, methods not reported, unclear how much to rely on auxiliary reference Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy FCA = 0.826, CCA = 0.860 Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation List of selected predictors Number of predictors in the model 33 Predictors in the model ABILHAND item 12, ABILHAND total, HADS item 7, HADS sub1, HADS sub2, HADS total, Life Satisfaction Index total, MFIS item 2, MFIS sub1, MFIS sub2, MFIS sub3, MFIS total, Overactive Bladder Questionnaire item 1, Overactive Bladder Questionnaire item 4, Overactive Bladder Questionnaire total, Functional Independence Measure item 10, Functional Independence Measure item 11, Functional Independence Measure item 12, Functional Independence Measure item 14, Functional Independence Measure sub3, Functional Independence Measure sub4, Functional Independence Measure sub5, Functional Independence Measure sub6, Functional Independence Measure total, Montreal Cognitive Assessment item 1, Montreal Cognitive Assessment item 9, Montreal Cognitive Assessment tot1, Montreal Cognitive Assessment tot2, PASAT, SDMT, years of education, height, weight Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To confirm the important role of applying ML to PROs and CAOs of people with relapsing‐remitting (RR) and secondary progressive (SP) form of multiple sclerosis (MS), to promptly identify information useful to predict disease progression Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on showing the relevancy of PRO and CAO to MS prediction. Model interpretation Exploratory Suggested improvements Including data on therapy and MRIs |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on outcomes and their assessment, applicability is unclear. Additionally, it is unclear whether some patients had already experienced the outcome at baseline. Auxiliary references Bebo BF Jr, Fox RJ, Lee K, Utz U, Thompson AJ. Landscape of MS patient cohorts and registries: recommendations for maximising impact. Mult Scler 2018;24(5):579‐86. Fiorini S, Verri A, Barla A, Tacchino A, Brichetto G. Temporal prediction of multiple sclerosis evolution from patient‐centred outcomes. In: Proceedings of the 2nd Machine Learning for Healthcare Conference; 2017 August 18‐19; Boston MA. Boston MA: Proceedings of Machine Learning Research, 2017. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Unclear | The data were prospectively collected from a cohort. Although references cited in the article have some study inclusion/exclusion criteria, the number of patients used in this article does not match them. Thus, the inclusion/exclusion criteria used in the article are unclear. |
| Predictors | No | There are patient‐reported outcomes that could be influenced by the current diagnoses conveyed to patients by clinicians. It is clear that at 1 stage (CCA) of the FCA modelling, patients with different diagnoses (RR, SP) were included. It is unclear whether assessors of clinical predictors at the different clinics have the same level of experience. |
| Outcome | Unclear | The outcomes were not clearly defined and their assessments were not described. |
| Analysis | No | EPV of the compound FCA model was at most 10.1. There was no mention of the complexities and uncertainties of two‐stage modelling or inclusion of different time points from the same patients in training/validation/test sets being taken into account. Neither calibration nor discrimination measures was reported. It was unclear how the missing data were handled. The method of internal validation was unclear. Model selection and evaluation did not appear to be properly separated. |
| Overall | No | At least one domain is at high risk of bias. |
Calabrese 2013.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Cohort, primary Study type Development + external validation, time |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Consecutive patients receiving medical care at the outpatient rooms of the MS Centre of Veneto Region–First Neurological Clinic at University Hospital of Padua, Italy Age (years)
Sex (%F)
Disease duration (years)
Diagnosis 100% RRMS Diagnostic criteria McDonald 2001 Treatment
Disease description
Recruitment period
|
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition Conversion to progressive MS: SPMS defined as an increase of at least 1.0 EDSS point compared to T0, not related to a relapse, observed at any time of the follow‐up and confirmed at 6 months; EDSS scored every 6 months and in case of a relapse Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value
Missing data handling Complete case |
|
| Analysis |
Number of participants (number of events)
Modelling method
Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method
Calibration estimate Not reported Discrimination estimate Not reported Classification estimate
Overall performance Not reported Risk groups
|
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure Not applicable Validation model update or adjustment
|
|
| Interpretation |
Aim of the study A prospective 5‐year longitudinal study to assess demographic, clinical, and magnetic resonance imaging (MRI) parameters that could predict the changing clinical course of MS. Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on predictor identification. Model interpretation Probably exploratory Suggested improvements Confirmation in different MS populations and with a longer follow‐up |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | The data were reported to be from a cohort study with predefined data collection times. |
| Predictors | Yes | Because of the prospective nature of data collection, there is no reason to suspect assessment of predictors differently or with knowledge of outcome data. The predictors were collected by a small group of clinicians at a single centre, and only the variables at the time of baseline were used. |
| Outcome | Yes | The outcome is standard and well‐reported. The authors used a 6‐month confirmation period to ensure that the EDSS increase is stable, and found that the results were also stable at 12 months in all patients. |
| Analysis | No | Dev: The EPV was below 10 in the development. No calibration or discrimination measures were reported. The internal validation did not address the whole model selection procedure, but an external validation was done. However, the need for shrinkage was not addressed. A low percentage of patients were lost to follow‐up, and complete case analysis was done after the reason was reported, so we do not consider this a large source of possible bias. However, these patients could have been included if time‐to‐event data were used instead. Ext Val: The number of events was extremely low in the validation. No calibration or discrimination measures were reported. 1 participant was excluded due to a missing outcome, but this is not considered a large possible source of bias. |
| Overall | No | At least one domain is at high risk of bias. |
De Brouwer 2021.
| Study characteristics | ||
| General information |
Model name GRU‐ODE‐Bayes Primary source Journal Data source Registry, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment MSBase registry Age (years) Mean 32.2 (onset) Sex (%F) 71.1 Disease duration (years) Mean 6.9 (range: 3 to 25) Diagnosis 85.6% RRMS, 4.9% SPMS, 3.3% PPMS, 1.4% PRMS, 4.8% unknown Diagnostic criteria Lublin 1996 Treatment Not reported Disease description Prior 3‐year EDSS per patient mean (SD, range): 2.38 (1.48, 0 to 8.5) Recruitment period Not reported |
|
| Predictors |
Considered predictors Gender, age at onset, MS course at time t = 0 (RRMS, SPMS, PPMS, or CIS), disease duration at time t = 0, EDSS at t = 0, Last used DMT at t = 0 (none, interferons, natalizumab, fingolimod, teriflunomide, dimethyl‐fumarate, glatiramer, alemtuzumab, rituximab, cladribine, ocrelizumab, other (contains stem cells therapy, siponimod and daclizumab)), RF only: EDSS closest to t‐3 (first EDSS in dataset), maximum EDSS within t‐3 to t0, difference between maximum and minimum EDSS between t‐3 and t0, number of visits between t‐3 and t0, number of relapses between t‐3 and t0, BPTF/NN: EDSS trajectories Number of considered predictors 24+EDSS trajectories Timing of predictor measurement At multiple visits, at least 6 in 3‐year period Predictor handling Continuously |
|
| Outcome |
Outcome definition Disability (EDSS): disability progression after 2 years defined as a minimum increase in EDSS (baseline EDSS) of 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5); needed to be confirmed at least 6 months after Timing of outcome measurement Closest observation time to 2 years (t2*, between t1 and t3) and confirmed with a measurement after t2*(at least 6 months later), median (IQR) = 1.995 years (1.887 years to 2.112 years) |
|
| Missing data |
Number of participants with any missing value ≥ 48,520, unclear exactly how many participants have any missing Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events) 6882 (1114) Modelling method Neural network, continuous‐time gated recurrent unit variant of recurrent neural network Predictor selection method
Hyperparameter tuning Validation set (separate from train and test) used for tuning parameter selection during 5‐fold CV optimising binary cross‐entropy Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 5‐fold (train/validation/test) Calibration estimate Calibration plot upon request Discrimination estimate c‐Statistic= 0.66 (SD = 0.02) Classification estimate Not reported Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model 19+EDSS trajectories Predictors in the model Gender, age at onset, MS course at time t = 0 (RRMS, SPMS, PPMS, or CIS), disease duration at time t = 0, EDSS at t = 0, last used DMT at t = 0 (none, interferons, natalizumab, fingolimod, teriflunomide, dimethyl‐fumarate, glatiramer, alemtuzumab, rituximab, cladribine, ocrelizumab, other (contains stem cells therapy, siponimod and daclizumab)), EDSS trajectories Effect measure estimates Not reported Predictor influence measure Average AUC degradation after random shuffling of each predictor's values Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To predict disability progression on the EDSS using longitudinal clinical patient data. Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on utilising patient trajectories. Model interpretation Exploratory Suggested improvements Not reported |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data source was a registry, and inclusion/exclusion criteria were based on predictor/outcome data availability. |
| Predictors | No | Predictors were probably collected prior to outcome assessment and were available when the model was used. Most predictors were basic enough that we are not concerned about them being assessed in different ways across patients. However, disease type was used as a predictor. This predictor was probably not measured in the same way across patients or across time, as the diagnostic criteria changed. Also, the category progressive‐relapsing was probably used heterogeneously. |
| Outcome | Yes | The outcome is standard and was assessed similarly across patients. It did not contain predictors. Although some predictors could have been known while assessing parts of the outcome, we consider the outcome to be robust to such information. The reported assessment time was 1 year to 3 years, but upon follow‐up with the author, it was stated that the IQR was 1.9 years to 2.1 years. |
| Analysis | Yes | Calibration was not explicitly assessed in the report, but the model was calibrated using Platt scaling, and a calibration plot was provided during correspondence. A final model/tool was not provided, but given the model reporting, there is no reason to believe the final model differs from the multivariable analysis. |
| Overall | No | At least one domain is at high risk of bias. |
de Groot 2009.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Consecutive patients visiting the outpatient clinics of 5 neurology departments in Amsterdam and Rotterdam, Netherlands Age (years) Mean 37.4 Sex (%F) 63.7 Disease duration (years) Up to 6 months Diagnosis 82% relapse onset, 18% non‐relapse onset Diagnostic criteria Poser 1983 Treatment
Disease description EDSS median (IQR): 2.5 (2.0 to 3.0) Recruitment period 1998 to 2000 |
|
| Predictors |
Considered predictors
Number of considered predictors 5 Timing of predictor measurement At disease onset (definite MS) (study baseline within 6 months after diagnosis) Predictor handling
|
|
| Outcome |
Outcome definition
Timing of outcome measurement 3 years |
|
| Missing data |
Number of participants with any missing value
Missing data handling Mixed: complete case for outcome, multiple imputation (twice) for predictors |
|
| Analysis |
Number of participants (number of events)
Modelling method Logistic regression Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights Uniform shrinkage Performance evaluation dataset Development Performance evaluation method Bootstrap, B = 250 Calibration estimate
Discrimination estimate
Classification estimate Not reported Overall performance Not reported Risk groups 3 risk categories: high (probability of adverse outcome > 75%), moderate (probability of adverse outcome 25% to 75%), and low (probability of adverse outcome < 25%) |
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To predict functioning after 3 years in patients with recently diagnosed multiple sclerosis (MS) Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements New cohort recruited in a different geographic area, at a different point in time, or, assessed with different diagnostic criteria |
|
| Notes |
Applicability overall High Applicability overall rationale This study included participants who had already experienced the outcome at baseline. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Participants known to already have the outcome at baseline were included. |
| Predictors | Yes | Predictors available early in disease were used. There is no reason to believe the predictor assessments were made with knowledge of outcome, as the collection was prospective. |
| Outcome | Yes | It is unclear if the outcome was determined blinded to predictors or not, but the outcomes are relatively objective, which reduces the risk of bias. |
| Analysis | No | Even though the number of predictors was limited and shrinkage was used, the EPV was below or around 10. More than 5% of the participants were removed due to missing outcomes. The bootstrap procedure did not include the predictor selection step. |
| Overall | No | At least one domain is at high risk of bias. |
Gout 2011.
| Study characteristics | ||
| General information |
Model name Not applicable Primary source Journal Data source Registry, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Consecutive patients admitted to the neurology department of the Foundation A de Rothschild in Paris, France Age (years) Median 31.0 Sex (%F) 70.2 Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria Not reported Treatment 0% Disease description EDSS median (range): 2 (0 to 6) Recruitment period 1994 to 2006 |
|
| Predictors |
Considered predictors Gender, age, family history, previous symptoms suggestive of CNS involvement, initial involvement (optic nerve (ref), spinal cord, brainstem/cerebellum, polyregional/cerebrum), Initial Expanded Disability Status Scale ≥ 2.5, ≥ 2 T2 lesions (MR), 3‐4 + Barkhof criteria, CSF white blood cell count > 4/mm3, IgG oligoclonal band, positive CSF (> 4 WBC/mm3 or IgG OB), ≥ 2 T2 lesions + IgG OB, McDonald DIS (3‐4 +BC or 2 T2 lesions +IgG OB) Number of considered predictors ≥ 15 (unclear how many interactions tested) Timing of predictor measurement At disease onset (CIS) leading to admission Predictor handling
|
|
| Outcome |
Outcome definition Conversion to definite MS (Poser 1983): date of occurrence of a second demyelinating event defined as the occurrence of a symptom or symptoms of neurological dysfunction lasting more than 24 hours with objective confirmation at least 1 month after initial event, or the last follow‐up date in the case of patients remaining event‐free Timing of outcome measurement Follow‐up median (range): 3.5 years (1.0 year to 12.7 years), time to outcome in those who experience it median (range) 16.6 months (1.1 months to 112.5 months) |
|
| Missing data |
Number of participants with any missing value 213 Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events) 208 (141) Modelling method Survival, Cox Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups 3 risk groups: low‐risk group (score = 0), intermediate‐risk group (0 < score < 5) and high‐risk group (score = 5) based on Kaplan Meier plots/estimates. |
|
| Model |
Model presentation
Number of predictors in the model 3 Predictors in the model Age (≤ 31 years), 3‐4+ MR Barkhof Criteria, CSF white blood cell count > 4/mm3 Effect measure estimates HR (95% CI): age ≤ 31 years 1.44 (1.02 to 2.01), 3‐4+ MR Barkhof criteria 2.07 (1.47 to 2.91), CSF white blood cell count > 4/mm3 1.44 (1.03 to 2.02) Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To assess whether CSF analysis at the time of a first demyelinating event is a useful tool to predict CDMS. Specifically: first, to assess the predictive value of CSF analysis independently of the other known prognostic factors, and, second, to provide a simple classification for predicting CDMS based on a multivariate Cox model. Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the CSF analysis. Model interpretation Exploratory Suggested improvements Validation in another cohort |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data were used for the second time, with study inclusion based on availability of data, availability of both MRI and CSF measures. |
| Predictors | Yes | The predictor assessments were probably performed before the outcome due to the prospective nature of data collection, and all predictors are expected to be collected at the onset of the disease, which is the time of intended use. It is a single‐centre study, so predictor collection and assessment should be similar in all patients. |
| Outcome | Yes | Although the blinding of outcome assessment was not reported, the outcome definition based on new symptoms is relatively objective. The new event had to be a month apart from the first event to ensure they were separate events. |
| Analysis | No | The EPV was less than 20. Predictors were selected based on univariable analysis. The predictors were dichotomised, sometimes using clinically meaningful cutoffs, sometimes at sample median. No model performance measures were reported other than cumulative incidence plots per risk group in the development set. The multivariable model coefficients were rounded to simplify it into a score, but the steps are clear and reproducible. There was no assessment of the model before simplifying it. There was no examination of the need for shrinkage. |
| Overall | No | At least one domain is at high risk of bias. |
Gurevich 2009.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Unclear Study type
|
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Unclear, Sheba Medical Centre, Israel Age (years)
Sex (%F)
Disease duration (years)
Diagnosis
Diagnostic criteria McDonald 2001 Treatment
Disease description
Recruitment period Not reported |
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition
Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value
Missing data handling
|
|
| Analysis |
Number of participants (number of events)
Modelling method
Predictor selection method
Hyperparameter tuning
Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method
Calibration estimate
Discrimination estimate
Classification estimate
Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure
Validation model update or adjustment
|
|
| Interpretation |
Aim of the study To determine if subsets of genes can predict the time to the next acute relapse in patients with MS Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the use of genetic information. Model interpretation Probably exploratory Suggested improvements To find sets of predictive genes that give significant results when their gene expression is measured by cheaper, small‐scale, technologies such as kinetic RT‐PCR, to predict radiological MRI lesions (that are possibly clinically silent) from gene expression in PBMC |
|
| Notes |
Applicability overall Unclear Applicability overall rationale
|
|
| Item | Authors' judgement | Support for judgement |
| Participants | Unclear | FLP Dev and FTP: The data source was not clearly reported. 100 patients were sampled from a larger population of unclear source. Although 6 of the samples were dropped due to QC issues, missingness is expected to be at random. FLP Ext Val: The recruitment of this additional cohort was not described at all. |
| Predictors | Yes | Although microarray analysis of transcriptome can be affected by different batch effects, there were efforts to circumvent it. Even though the microarray analysis might have occurred after the outcomes became known, the procedure was relatively automated and is expected not to be affected by the information. The intended time of model use concerning patient's disease history is unclear, but it may be any time that blood was drawn. |
| Outcome | Yes | FLP Dev: We rated this domain for this analysis as having a high risk of bias. This outcome was defined after seeing the outcome information, so it is not standard. FTP: We rated this domain for this analysis as having a low risk of bias. It is unclear which predictors were known at outcome assessment, but we consider the relapse definition to be robust. FLP Ext Val: We rated this domain for this analysis as having a high risk of bias. This outcome was defined after seeing the outcome information in the development set, so it is not standard. |
| Analysis | No | FLP Dev and FTP: The authors chose to use multi‐class modelling for a time‐to‐event analysis, unnecessarily categorising the outcome. The outcome groups were based on the development data distribution. The number of observations and the number of events were low. Calibration and discrimination were not addressed. Patients were excluded for poor quality in transcription data. It seems that there was no difference in the data used for predictor selection and model evaluation. The final model is unclear. FLP Ext Val: The authors chose to use multi‐class modelling for a time‐to‐event analysis, unnecessarily categorising the outcome. The number of observations and the number of events were low. Calibration and discrimination were not addressed. |
| Overall | No | At least one domain is at high risk of bias. |
Kosa 2022.
| Study characteristics | ||
| General information |
Model name Not applicable Primary source Journal Data source Mixed (case‐control, cohort), primary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Prospectively recruited in the study 'Comprehensive Multimodal Analysis of Neuroimmunological Diseases of the Central Nervous System' (NCT00794352), unclear which centre(s), USA Age (years) Mean 49.6 Sex (%F) 54.2 Disease duration (years) Mean 12.2 (pooled SD: 8.51) Diagnosis 30.8% RRMS, 24.2% SPMS, 44.9% PPMS Diagnostic criteria Mixed: McDonald 2010 (Polman 2011), McDonald 2017 (Thompson 2018b) Treatment
Disease description EDSS mean (SD): development set; RRMS 1.8 (1.2), SPMS 5.9 (1.2), PPMS 5.3 (1.6)/validation set RRMS 2.2 (1.6), SPMS 5.5 (1.5), PPMS 5.2 (1.6) Recruitment period 2004 to 2021 |
|
| Predictors |
Considered predictors All possible Somamer ratios from 1305 Somamers (unclear adjustment for age and sex) along with individual markers Number of considered predictors 852,167 or 852,165 (unclear adjustment for age and sex) Timing of predictor measurement At lumbar puncture Predictor handling Continuously (transformed into ratios) |
|
| Outcome |
Outcome definition Composite (EDSS, SNRS, T25FW, NDH‐9HPT): MS‐DSS, a model output based on measured CombiWISE (which contains EDSS, SNRS, T25FW, NDH‐9HPT), therapy adjusted CombiWISE (which includes a treatment efficacy model), COMRIS‐CTD (including several lesion and atrophy measures), time from disease onset to first therapy, difference between adjusted and unadjusted CombiWISE, age, and family history of MS Timing of outcome measurement Mean: 4.3 years |
|
| Missing data |
Number of participants with any missing value NR Missing data handling
|
|
| Analysis |
Number of participants (number of events) 227 (continuous outcome) Modelling method Random forest, numeric outcome Predictor selection method
Hyperparameter tuning Unclear, number of predictors to include chosen by out of bag error, random forest tuning parameters not reported Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Random split Calibration estimate Calibration plot Discrimination estimate Not applicable Classification estimate Not applicable Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model 21 or 23 (unclear if age and sex are predictors) Predictors in the model Somamer ratios, age, sex Effect measure estimates R2 = 0.264 Predictor influence measure Variable importance Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To test the hypothesis that CSF biomarker models provide insight into MS pathophysiology, identify molecular disease heterogeneity, and lead to an independent‐cohort validated prognostic test Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the prognostic value of CSF biomarkers. Model interpretation Probably exploratory Suggested improvements Further mechanistic research |
|
| Notes |
Applicability overall High Applicability overall rationale The outcome was not a clinical measure but rather a value produced by another model with unclear interpretation. Auxiliary references Calle ML, Urrea V, Boulesteix AL, Malats N. AUC‐RF: a new strategy for genomic profiling with random forest. Hum Hered 2011;72(2):121‐32. Kosa P, Komori M, Waters R, Wu T, Cortese I, Ohayon J, et al. Novel composite MRI scale correlates highly with disability in multiple sclerosis patients. Mult Scler Relat Disord 2015;4(6):526‐35. Roxburgh RH, Seaman SR, Masterman T, Hensiek AE, Sawcer SJ, Vukusic S, et al. Multiple Sclerosis Severity Score: using disability and disease duration to rate disease severity. Neurology 2005;64(7):1144‐51. Weideman A M, Barbour C, Tapia‐Maltos MA, Tran T, Jackson K, Kosa P, et al. New multiple sclerosis disease severity scale predicts future accumulation of disability. Front Neurol 2017;8:598. NCT00794352. Comprehensive multimodal analysis of neuroimmunological diseases of the central nervous system. https://clinicaltrials.gov/show/NCT00794352 (first received 20 November 2008). |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Although the study categorised itself as a case‐control study, the model we are interested in used prospectively measured predictors and outcomes of interest. In addition, the inclusion criteria depended on the availability of some tests, which is likely to introduce risk of bias. |
| Predictors | Yes | Predictors were collected prospectively according to a standard operating procedure by investigators blinded to clinical and MRI outcomes. The predictors were available at the intended time of use, reported as first lumbar puncture. |
| Outcome | No | During the study, the calculation of neurological scales changes from manual to an app where it is likely to introduce variability. The timing of the outcome was not well defined and, despite the prospective design of the study, follow‐up time has high variability. |
| Analysis | No | The sample size was small. Participants with missing outcome data were excluded from the analysis via exclusion criteria. The model performance was assessed suboptimally in a random‐split sample. There was no indication of a final selected model that could be used by others. |
| Overall | No | At least one domain is at high risk of bias. |
Kuceyeski 2018.
| Study characteristics | ||
| General information |
Model name Pairwise disconnection and GM atrophy Primary source Journal Data source Mixed (cohort, registry, routine care), secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Not reported Age (years) Mean 36.8 (unclear when) Sex (%F) 73.3 Disease duration (years) Mean 1.5 (SD 1.3) Diagnosis 100% RRMS Diagnostic criteria Mixed: McDonald 2010 (Polman 2011), McDonald 2017 (Thompson 2018b) Treatment
Disease description EDSS mean (SD): 1.1 (1.1) Recruitment period Not reported |
|
| Predictors |
Considered predictors Age, sex, disease duration, treatment duration, baseline SDMT, baseline EDSS, regional GM atrophy (86 regions), NEMO pairwise disconnection measures (610 of 3655 considered), (other models: JHU‐MNI atlas overlap (176 regions), regional disconnection (86 ChaCo scores)), number of months between time points Number of considered predictors 965 Timing of predictor measurement At baseline image (early RRMS within 5 years of their first neurologic symptom), at follow‐up (unclear: outcome measurement) Predictor handling Continuously |
|
| Outcome |
Outcome definition Disability (cognitive, SDMT): future processing speed measured using Symbol Digits Modality Test (SDMT) scores Timing of outcome measurement Mean (SD): 28.6 months (10.3 months) |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events) 60 (continuous outcome) Modelling method Partial least squares regression Predictor selection method
Hyperparameter tuning Ten‐fold cross‐validation to identify number of components that minimised predicted residual sum of squares Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Unclear (cross‐validation and bootstrap used for model selection/fitting) Calibration estimate Calibration plot Discrimination estimate Not applicable Classification estimate Not applicable Overall performance R2 = 0.79 (95% CI 0.80 to 0.97) Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model 703 predictors transformed into 6 principal components Predictors in the model Age, sex, disease duration, treatment duration, baseline SDMT, baseline EDSS, regional GM atrophy (86 regions), NEMO pairwise disconnection measures (610 considered), number of months between time points Effect measure estimates Not reported Predictor influence measure Median and bootstrapped 95% confidence intervals for coefficients of statistically significant predictors Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study
Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on the usefulness of MRI measures and identifying promising MRI features. Model interpretation Exploratory Suggested improvements Increase sample size, the scores addressing SDMT domains as outcome measures, add WM damage measures |
|
| Notes |
Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of MRI‐based connectome measures. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Combination of data from a cohort study, registry, and routine care were used. No information was reported about the eligibility criteria. |
| Predictors | Yes | The predictors are objective measures or scores. All except for number of months between time points could be collected at the time of the first image. |
| Outcome | Yes | It is unclear whether the outcome was blinded to the predictors, but we consider the outcome based on SDMT to be objective. SDMT is a validated measure of cognitive function measurement. |
| Analysis | No | Even when based on the number of principal components, the EPV was low. No information on the missing data or how it was handled was provided. Details of the model were not reported. The post‐baseline time variable number of months between time points was included in the models. Although bootstrapping was used for confidence interval calculation, there was no indication that any optimism correction was performed. |
| Overall | No | At least one domain is at high risk of bias. |
Law 2019.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Randomised trial participants, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment
Age (years) Mean 50.9 Sex (%F) 64.1 Disease duration (years) Mean 9.3 (SD 5.0) Diagnosis 100% SPMS Diagnostic criteria Own definition Treatment
Disease description EDSS median (IQR): 6.0 (4.5 to 6.5) Recruitment period 2004 to 2009 |
|
| Predictors |
Considered predictors Timed 25‐foot walk (T25W), 9HPT, Paced Auditory Serial Addition Test (PASAT), EDSS, disease duration, age, sex, T2 lesion volume (T2LV), brain parenchymal fraction (BPF) Number of considered predictors 9 Timing of predictor measurement At study baseline (RCT) Predictor handling Continuously |
|
| Outcome |
Outcome definition Disability (EDSS): confirmed disability progression defined as an increase in EDSS (≥ 1.0 or ≥ 0.5 for baseline EDSS ≤ 5.5 or ≥ 6, respectively) sustained for 6 months Timing of outcome measurement At 2 years |
|
| Missing data |
Number of participants with any missing value Unclear exactly how many participants have any missing
Missing data handling Mixed: mean imputation for single patient's disease duration, exclusion |
|
| Analysis |
Number of participants (number of events) 485 (115) Modelling method
Predictor selection method
Hyperparameter tuning
Shrinkage of predictor weights
Performance evaluation dataset Development Performance evaluation method Cross‐validation, 10‐fold Calibration estimate Not reported Discrimination estimate
Classification estimate
Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model 9 Predictors in the model Timed 25‐foot walk (T25W), 9HPT, Paced Auditory Serial Addition Test (PASAT), EDSS, disease duration, age, sex, T2 lesion volume (T2LV), brain parenchymal fraction (BPF) Effect measure estimates Not reported Predictor influence measure Mean % feature contribution/importance Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To evaluate individual and ensemble model performance built using decision tree (DT)‐based algorithms compared to logistic regression (LR) and support vector machines (SVMs) for predicting SPMS disability progression Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on modelling methods. Model interpretation Exploratory Suggested improvements Bigger samples, more predictors with non‐linear relationships with progression, using random trees instead of simple DTs in AdBoost |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. Auxiliary references Freedman MS, Bar‐Or A, Oger J, Traboulsee A, Patry D, Young C, et al. A phase III study evaluating the efficacy and safety of MBP8298 in secondary progressive MS. Neurology 2011;77(16):1551‐60. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Data from an RCT were used, but only complete cases were included. Around 10% of patients were excluded due to missing values, and it is unclear if the excluded patients differed from the included patients. |
| Predictors | Yes | The predictors were collected during an RCT; therefore, they are expected to be collected in the same way across all patients. |
| Outcome | Yes | The outcome was standard and was assessed during an RCT. We expect EDSS assessment to be objective and do not think that predictor knowledge influences results. |
| Analysis | No | The EPV was close to 10. Discrimination was addressed, but not calibration. The final model is unclear. |
| Overall | No | At least one domain is at high risk of bias. |
Lejeune 2021.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source
Study type Development + external validation, location |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment
Age (years)
Sex (%F)
Disease duration (years)
Diagnosis 100% RRMS Diagnostic criteria McDonald 2005 (Polman 2005) Treatment
Disease description
Recruitment period
|
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition Disability (EDSS): residual disability at 6 months after relapse defined as an increase of at least 1 EDSS point compared with pre‐relapse EDSS Timing of outcome measurement At 6 months |
|
| Missing data |
Number of participants with any missing value
Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events)
Modelling method
Predictor selection method
Hyperparameter tuning
Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method
Calibration estimate
Discrimination estimate
Classification estimate
Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure
Validation model update or adjustment
|
|
| Interpretation |
Aim of the study To develop and validate a clinical‐based model for predicting the risk of residual disability at 6 months post‐relapse in MS Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Confirmatory Suggested improvements Not reported |
|
| Notes |
Applicability overall Low Auxiliary references NCT00984984. Efficacy and safety of methylprednisolone per os versus IV for the treatment of multiple sclerosis (MS) relapses. https://ClinicalTrials.gov/show/NCT00984984 (first received 25 September 2009). |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Dev: RCT data, considered to be a valid source, were used for modelling. However, > 5% of participants were excluded for missing data. Ext Val: The use of data from routine clinical care may introduce bias. Of the 978 people in the registry, 781 were excluded for missing data. Due to the exclusion of this significant number of participants, it is unclear whether the model results are generalisable. |
| Predictors | Yes | Dev: Due to the RCT nature, predictors should have been defined and assessed in a similar way across participants. Due to the prospective nature of the RCT, predictors were collected without knowledge of the outcome. The authors specifically set out to create a prediction model in which all predictors were readily available at baseline. Ext Val: There is no reason to believe differential or post‐outcome assessment of the predictors in this routine hospital data set from a single centre. |
| Outcome | Yes | We consider the outcome, which is based on EDSS, to be robust to sources of bias, such as knowledge of predictors at outcome assessment. |
| Analysis | No | Dev: The EPV was less than 10. Continuous predictors were dichotomised or possibly categorised without clear explanation. It was unclear how missing predictor data were handled, other than exclusion (handled in Participants section). Although calibration measures for the development set were not reported, they were reported for the external validation set of the same publication. Ext Val: The number of events was fewer than 100. It was unclear how missing predictor data were handled other than exclusion, which was handled in the Participants section. |
| Overall | No | At least one domain is at high risk of bias. |
Malpas 2020.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Registry, secondary Study type Development + external validation, location |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment
Age (years)
Sex (%F)
Disease duration (years)
Diagnosis 100% RRMS Diagnostic criteria McDonald 2010 (Polman 2011) Treatment
Disease description
Recruitment period Not reported |
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition Disability (EDSS): aggressive MS defined as all of (i) EDSS ≥ 6 reached within 10 years of symptom onset, (ii) EDSS ≥ 6 confirmed and sustained over ≥ 6 months, and (iii) EDSS ≥ 6 sustained until the end of follow‐up (≥ 10 years) Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value
Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events)
Modelling method
Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method
Calibration estimate Not reported Discrimination estimate
Classification estimate
Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure Not applicable Validation model update or adjustment
|
|
| Interpretation |
Aim of the study To evaluate whether patients who will develop aggressive multiple sclerosis can be identified based on early clinical markers Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on predictor identification. Model interpretation Probably confirmatory Suggested improvements Add MRI and CSF data |
|
| Notes |
Applicability overall
Applicability overall rationale
Auxiliary references Butzkueven H, Chapman J, Cristiano E, Grand'Maison F, Hoffmann M, Izquierdo G, et al. MSBase: an international, online registry and platform for collaborative outcomes research in multiple sclerosis. 12(6):769‐74. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Dev: The data source was reported unclearly: called a registry but also a cohort study. Although the authors referred to a Quality Assurance paper for the data source, the concerned article only described the system of quality summary. Also, it was not reported what happens when centres/observations deviated from quality standards. Inappropriate inclusion of participants with outcomes at baseline may lead to bias in predictions and complicate the interpretation, regardless of whether the model estimated change or not. The sensitivity analysis only addressed whether the same predictors were included, not whether the predictions changed. Ext Val: The data source was a registry with inclusion/exclusion depending on the length of follow‐up. Also, it is unclear whether participants had the outcome at baseline, as in the development. |
| Predictors | Yes | The final model is simple and timely. The included predictors are simple enough to be considered objective. The included predictors were collected up to 1 year after symptom onset. We consider the period up to 1 year after onset to still be onset. For this reason, and because the logistic model, as opposed to the survival model, does not require a starting point, the predictors are considered to be available at the time of model application. |
| Outcome | Yes | The outcome was pre‐specified. It was based on EDSS, which we consider to be relatively robust, so we are not concerned about the possible lack of blinding of the outcome assessor to the patient history. Participants with the outcome at baseline were included, but this was addressed in the Participants section. |
| Analysis | No | Dev: The EPV was less than 10. Complete case analysis was performed. No calibration measures were reported, and apparent discrimination was reported. External validation was done in the same paper but also without calibration. No assessment of the need for shrinkage was done. The model coefficients were given at correspondence, but in the paper, the reduced model was presented only in terms of a chart with relative risks based on combinations of predictors. Ext Val: The event number was fewer than 100. Complete case analysis was performed. No calibration measures were reported. |
| Overall | No | At least one domain is at high risk of bias. |
Mandrioli 2008.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Cohort, secondary Study type Development + external validation, time |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Consecutive patients identified during regular follow‐ups at the Neurology Clinic of Modena University Hospital, Italy Age (years)
Sex (%F)
Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment
Disease description
Recruitment period
|
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition Disability (EDSS): severe MS (SMS) defined as an EDSS score of 4 or more after a disease duration of 10 years or less, benign MS (BMS) otherwise (Kurtzke 1977 criteria); progression to a new EDSS score had to be confirmed in 2 consecutive examinations Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value
Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events)
Modelling method
Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method
Calibration estimate Not reported Discrimination estimate Not reported Classification estimate
Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure Not applicable Validation model update or adjustment
|
|
| Interpretation |
Aim of the study To create a multifactorial prognostic index (MPI) providing the probability of a severe MS course at diagnosis based on clinical and immunological CSF parameters Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements Include MRI data; validate in a large, prospective cohort |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Data were retrospectively collected and the data source is not clearly reported. Exclusion criteria included follow‐up and predictor availability. |
| Predictors | Yes | The study authors reported analysing immunological and clinical data blindly. CSF Immunological assessments were performed twice by 2 neurologists. They specifically chose to include only predictors available at RRMS diagnosis in the final model. |
| Outcome | Yes | The outcome was defined in an independent study (Kurtzke 1977) and was based on EDSS, so we consider it a hard outcome with little risk of bias. Furthermore, the timing of 10 years is a reasonable amount of time for reaching EDSS 4. |
| Analysis | No | Dev: EPV was less than 10. Missing data were addressed by exclusion and handled in the Participants section. Neither discrimination nor calibration was addressed. Univariate analyses were used for variable selection. It was unclear whether the predictors and their assigned weights in the final model corresponded to the results from multivariable analysis because the OR measures provided in the results table had different signs than the model formula provided in the text. Although an external validation dataset was used in the same study, only classification measures related to it were reported. The need for shrinkage was not assessed. Ext Val: The number of participants was fewer than 100. Neither discrimination nor calibration was addressed. Participants with missing data were excluded and handled in the Participants section. |
| Overall | No | At least one domain is at high risk of bias. |
Manouchehrinia 2019.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source
Study type Development + external validation, multiple (location, time, spectrum) |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment
Age (years)
Sex (%F)
Disease duration (years) Unclear Diagnosis 100% RRMS Diagnostic criteria
Treatment
Disease description
Recruitment period
|
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition
Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value
Missing data handling
|
|
| Analysis |
Number of participants (number of events)
Modelling method
Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method
Calibration estimate
Discrimination estimate
Classification estimate Not reported Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure Not applicable Validation model update or adjustment
|
|
| Interpretation |
Aim of the study To design a nomogram, a prediction tool, to predict the individual’s risk of conversion to secondary progressive multiple sclerosis (SPMS) at the time of multiple sclerosis (MS) onset Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Not reported |
|
| Notes |
Applicability overall Low Auxiliary references Calabresi PA, Radue EW, Goodin D, Jeffery D, Rammohan KW, Reder AT, et al. Safety and efficacy of fingolimod in patients with relapsing‐remitting multiple sclerosis (FREEDOMS II): a double‐blind, randomised, placebo‐controlled, phase 3 trial. Lancet Neurol 2014;13(6):545‐56. Derfuss T, Sastre‐Garriga J, Montalban X, Rodegher M, Wuerfel J, Gaetano L, et al. The ACROSS study: long‐term efficacy of fingolimod in patients with relapsing‐remitting multiple sclerosis. Mult Scler J Exp Transl Clin 2020;6(1):2055217320907951. Hillert J, Stawiarz L. The Swedish MS registry – clinical support tool and scientific resource. Acta Neurol Scand 2015;132(199):11‐9. Kappos L, Antel J, Comi G, Montalban X, O'Connor P, Polman C H, et al. Oral fingolimod (FTY720) for relapsing multiple sclerosis. N Engl J Med 2006;355(11):1124‐40. Kappos L, Radue EW, O'Connor P, Polman C, Hohlfeld R, Calabresi P, et al. A placebo‐controlled trial of oral fingolimod in relapsing multiple sclerosis. N Engl J Med 2010;362(5):387‐401. NCT02307838. Long‐term follow‐up of fingolimod phase II study patients (ACROSS). https://clinicaltrials.gov/ct2/show/NCT02307838 (first received 4 December 2014). |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | Dev: We rated this domain for this analysis as having a high risk of bias. The data source was a nationwide registry; hence, it is expected to be heterogeneous. The inclusion criteria were based on the presence of a predictor. Ext Val 1: We rated this domain for this analysis as having a high risk of bias. The inclusion criteria were based on the presence of a predictor. The data source is not very clear, although it was referred to as a cohort. Ext Val 2: We rated this domain for this analysis as having a high risk of bias. The data source was a secondary use of an RCT. Although the initial inclusion/exclusion criteria were not reported, we expect them to be appropriate due to the inherent prospective nature. However, only participants with complete follow‐up were included for this analysis, even though survival analysis was used. Ext Val 3: We rated this domain for this analysis as having a low risk of bias. The data source was a secondary use of an RCT. Although the initial inclusion/exclusion criteria were not reported, it is expected to be appropriate due to the inherent prospective nature. The number of patients completing the FREEDOMS studies and the number included here are the same. |
| Predictors | No | Onset age and age at the first‐recorded EDSS score were predictors in the final model. The intended time of model use was stated to be the RRMS onset. Age at first recorded EDSS was available only several years after onset instead of at time of model use. |
| Outcome | Yes | Dev: We rated this domain for this analysis as having a high risk of bias. The participants were seen 4 to 7 times in 5 years to 10 years, considered to be close to the expected frequency of yearly visits. However, the outcome is not clearly operationalised in the report or in the criteria referred. Ext Val 1: We rated this domain for this analysis as having a high risk of bias. The participants were seen 3 to 5 times in 5 years to 10 years, less than a visit per year. Since the outcome was time‐to‐event, the varying density of observations might introduce a bias. However, the outcome is not clearly operationalised in the report or in the criteria referred. Ext Val 2: We rated this domain for this analysis as having a high risk of bias. The outcome is not clearly operationalised in the report or in the criteria referred. Ext Val 3: We rated this domain for this analysis as having a low risk of bias. The outcome assessment was made outside the trial based on on‐trial EDSS information. The outcome assessment was EDSS‐based and therefore relatively robust to bias due to lack of blinding. |
| Analysis | No | Dev: Many candidate predictor values were missing, and in which subset of patients the backward selection took place was not reported. Complete case analysis was used. The candidate predictors of the number of lesions were categorised, but EDSS was handled using linear splines. The bootstrap procedure for performance measures did not include predictor selection, but external validation was done. However, the external validations did not address calibration. Ext Val 1: Calibration was not assessed. The amount of missing data and how it was handled was not reported. Ext Val 2: There were too few events in this validation set, and calibration was not assessed. Ext Val 3: Calibration was not assessed. No information was reported on the handling of missing data. |
| Overall | No | At least one domain is at high risk of bias. |
Margaritella 2012.
| Study characteristics | ||
| General information |
Model name Not applicable Primary source Journal Data source Routine care, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Patients referred to a single MS centre in Milan, Italy Age (years) Mean 28.6 (onset) Sex (%F) 79.3 Disease duration (years) Mean 10.1 (SD 7.3) Diagnosis 89.7% RRMS, 3.4% PPMS, 6.9% benign MS Diagnostic criteria Mixed: McDonald 2001, McDonald 2005 (Polman 2005) Treatment Not reported Disease description EDSS mean (SD): 2.1 (1.5) Recruitment period 2005 to 2008 |
|
| Predictors |
Considered predictors mEPS (1 year lag), age, age at onset, gender, disease course type (RR, SP, PP, benign), EDSS (1 year lag) Number of considered predictors ≥ 8 (unclear transformations) Timing of predictor measurement At multiple assessments consecutively for 3 years until 1 year prior to outcome Predictor handling Continuously, unclear: mEPS as square root |
|
| Outcome |
Outcome definition Disability (EDSS): EDSS score Timing of outcome measurement At 1 year after included mEPS and EDSS predictors (probably occurring multiple yearly periods ‐ up to 3 per patient) |
|
| Missing data |
Number of participants with any missing value 163 Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events) 58 participants, ≤ 174 observations (continuous outcome) Modelling method Linear regression Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Histogram of differences between measured and predicted values Discrimination estimate Not applicable Classification estimate % predictions within ± 0.5 of observed = 0.72 Overall performance R2 = 0.8 Risk groups Not reported |
|
| Model |
Model presentation Full regression model Number of predictors in the model 6 Predictors in the model EDSS, mEPS, age at onset, gender, benign course, PP course Effect measure estimates Linear model coefficients (SE): EDSS 0.86 (0.589), mEPS 0.11 (0.038), age at onset −0.009 (0.014), gender 0.25 (0.201), benign course −0.26 (0.186), PP course −0.98 (0.594), intercept 19.86 (27.93) Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To re‐evaluate the usefulness of mEP for short‐term prediction of the EDSS by considering mEP not as a single predictor but within a multivariate statistic approach derived from economics that can be easily implemented and tested Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the prognostic value of multimodal EP. Model interpretation Probably confirmatory Suggested improvements Test model on more heterogeneous patient groups and test for ability to predict out longer than 1 year, including motor evoked potentials |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Exclusions were based on data availability because of the data source, routine clinical data. |
| Predictors | Yes | Predictors were collected according to recommended protocols at a single centre by limited number of technicians. Even if post‐processing might be after the outcome, the predictor definitions seem objective. |
| Outcome | Yes | The outcome was based on EDSS, which is considered to be an objective measure. |
| Analysis | No | Whether or not the sample size was sufficient could not be judged. There was no appropriate calibration plot. Overfitting and optimism were not addressed. EDSS score was treated as a continuous, normally distributed variable, although it is an ordinal measure. Participants with missing EP and EDSS data were excluded, but further missing data handling was not reported. |
| Overall | No | At least one domain is at high risk of bias. |
Martinelli 2017.
| Study characteristics | ||
| General information |
Model name MRI criteria + all significant Primary source Journal Data source Routine care Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Patients admitted to the MS centre at San Raffaele Hospital in Milan, Italy Age (years) Mean 32.0 Sex (%F) 67.9 Disease duration (years) Up to 3 months Diagnosis 100% CIS Diagnostic criteria Not reported Treatment
Disease description Not reported Recruitment period 2000 to 2013 |
|
| Predictors |
Considered predictors 2010 DIS criteria fulfilled, 2010 DIT criteria fulfilled; age at onset; sex; multifocal or monofocal type of onset; partial or complete recovery; brainstem, optic neuritis, spinal cord, or other type of CIS; binary T2 lesions; binary T1 lesions; binary Gd‐enhancing lesions; binary CSF cells; binary CSF proteins; CSF oligoclonal bands present or absent; binary Link Index; binary Tourtellotte Index; binary Reiber Index; binary blood‐brain barrier damage index; abnormal or normal visual evoked potentials; abnormal or normal auditory evoked potentials; abnormal or normal somatosensory evoked potentials; abnormal or normal motor evoked potentials; abnormal or normal overall evoked potential score (adjusted for steroids use in 4 weeks prior to examinations, DMDs during follow‐up) Number of considered predictors Between 24 and 36 (unclear adjustments and transformations) Timing of predictor measurement At disease onset (CIS) and up to 3 months after disease onset Predictor handling
|
|
| Outcome |
Outcome definition Conversion to definite MS (Poser 1983): time‐to‐CDMS defined as interval between onset of the first neurological event and last neurological visit or CDMS, new symptoms or signs occurring after an interval of at least 1 month from the onset of CIS only when other diagnoses are excluded Timing of outcome measurement Follow‐up median (IQR): 7.3 years (3.5 years to 10.2 years) |
|
| Missing data |
Number of participants with any missing value 224 Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events) 243 (108) Modelling method Survival, Cox Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Gronnesby and Borgan statistic Discrimination estimate Pencina's c‐statistic 5‐year: 0.695 (95% CI 0.635 to 0.753), 2‐year: 0.74 (95% CI 0.677 to 0.804) Classification estimate Categories defined as low: 0% to 33.3%, moderate: 33.3% to 66.7%, high: 66.6% to 100%, net reclassification improvement = 0.3 Overall performance Not reported Risk groups 3 risk groups low: 0% to 33.3%, moderate: 33.3% to 66.7%, high: 66.6% to 100% |
|
| Model |
Model presentation List of selected predictors Number of predictors in the model 5 or 7 (unclear adjustment) Predictors in the model 2010 DIS criteria fulfilled, 2010 DIT criteria fulfilled, age, T1 lesions, CSF oligoclonal bands (steroid use in 4 weeks prior to study, DMT use during follow‐up) Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To determine whether multiple biomarkers improved the prediction of MS in patients with CIS in a real‐world clinical practice Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the added value of considering multiple biomarkers as opposed to univariate prediction. Model interpretation Probably exploratory Suggested improvements Multicentric prospective studies, enroling a larger number of patients with CIS and taking into consideration all the possible biomarkers (e.g.comorbidities, spinal MRI) of CDMS risk |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data were from medical records, probably from routine care. According to the flow chart, the data were retrospectively identified from a database but described as a cohort in Table 1. Inclusion was based on the length of follow‐up (approximately 6% excluded for this reason) and availability of all routine workup measures (n = 195). |
| Predictors | No | The clinical predictors can be measured relatively objectively and were measured during the inclusion hospitalisation. The EP and MRI assessments were reported to be blinded to the follow‐up data and outcome. Although the time of intended model use is not explicit, the inclusion criteria indicated that it is 3 months from symptom onset, and all predictors were reported as measured at baseline examinations. However, all models were adjusted for treatment during follow‐up. Information was not available at the time of prediction. |
| Outcome | Yes | Although the blinding of outcome assessment was not reported, the outcome definition based on new symptoms is relatively objective. |
| Analysis | No | The details of the model itself were not explicitly reported. The number of events per variable was low. Complete case analysis was used. Only P values were reported to address calibration. Assessment occurred only in the full development set. |
| Overall | No | At least one domain is at high risk of bias. |
Misicka 2020.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Registry, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Participants in the Accelerated Cure Project for MS, a repository of biological samples and epidemiological data for persons with demyelinating diseases, recruited from the patient base or the surrounding communities from 10 MS speciality clinics, USA Age (years)
Sex (%F) 78.1 Disease duration (years) Median 11.0 (IQR: 5 to 19) Diagnosis 100% RRMS Diagnostic criteria Mixed: McDonald 2005 (Polman 2005) and McDonald 2010 (Polman 2011) Treatment
Disease description Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1% Recruitment period 2006 to 2013 |
|
| Predictors |
Considered predictors Age of MS onset, sex, years of education, history of infectious mononucleosis prior MS onset, tobacco smoking within 5 years prior to MS onset, obesity, high cholesterol, high blood pressure, type II diabetes, cancer, neurological disease, physical disease, psychological disorders, other autoimmune diseases; y/n for impaired functional domains: motor, cerebellar, spasticity, optic nerve, facial (motor), facial (sensory), brainstem and bulbar, cognitive, sexual, bladder and bowel, affect mood, fatigue; time to second relapse (TT2R; ≤ 1 year, 2 years to 5 years, and ≥ 6 years), the number of relapses experienced in the first 2 years after MS onset (NR2Y; ≤ 1, 2 to 3, ≥ 4 relapses, and NA), HLA‐A*02:01 alleles (0, 1, 2, NA), HLA‐DRB1*15:01 alleles (0, 1, 2, NA), Genetic Risk Score Number of considered predictors 35 Timing of predictor measurement At study interview (the same as the time of outcome reporting) Predictor handling Continuously except time to second relapse and the number of relapses in the first 2 years after MS onset, which were categorised |
|
| Outcome |
Outcome definition Conversion to progressive MS: time to SPMS defined as the difference between participant‐reported age of onset of RRMS, age of first symptom or exacerbation, and age of onset of SPMS Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value ≥ 323, unclear exactly how many participants have any missing Missing data handling Mixed: complete case for genetic variables, single imputation with a forest for other predictors, single imputation with category NA for NR2Y |
|
| Analysis |
Number of participants (number of events)
Modelling method Survival, Cox Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Overall performance
Risk groups Not reported |
|
| Model |
Model presentation Nomogram Number of predictors in the model 6 (7 df) Predictors in the model
Effect measure estimates
Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To construct prediction models for SPMS using sociodemographic and self‐reported clinical measures that would be available at or near MS onset, with specific considerations for MS genetic risk factors Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Not reported |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on outcomes and their definition, applicability is unclear. Auxiliary references Saroufim P, Zweig SA, Conway DS, Briggs FBS. Cardiovascular conditions in persons with multiple sclerosis, neuromyelitis optica and transverse myelitis. Mult Scler Relat Disord 2018;25:21‐5. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The clinical data were collected cross‐sectionally by asking the patients about their medical history. Therefore, there is a high chance of recall bias or length‐time bias. |
| Predictors | No | The nature of clinical data collection by medical history taken from patients introduces recall bias. For example, the patients who had SP might remember the details or the diseases they had more vividly. Alternatively, patients with a shorter disease duration at the time of the interview might remember the details at disease onset more accurately. |
| Outcome | No | The outcome was based on patient‐reported time of RRMS diagnosis, and while CDMS was confirmed by a neurologist, the authors did not report that the timing was also confirmed. A definition of what was considered SPMS was not given. This makes the outcome assessment non‐standard and non‐uniform. Also, the patients knew all their clinical history while reporting the outcome. |
| Analysis | No | The EPV was less than 10. Time to second relapse was categorised. Neither calibration nor discrimination was addressed. Evaluation occurred in the full development set only. Missing values for non‐genetic variables were handled with multiple imputation. Participants not contributing genetic data were excluded. |
| Overall | No | At least one domain is at high risk of bias. |
Montolio 2021.
| Study characteristics | ||
| General information |
Model name Disability Course ‐ LSTM Primary source Journal Data source Routine care, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Miguel Servet University Hospital in Zarazoga, Spain Age (years) Mean 42.4 Sex (%F) 67.1 Disease duration (years) Mean 10.1 (pooled SD 7.74) Diagnosis 92.7% RRMS, 6.1% SPMS, 1.2% PPMS Diagnostic criteria McDonald 2001 Treatment Unclear timing, 40% on IFN, 30% on immunomodulators, 30% none Disease description EDSS mean 2.6 (SD between 1.27 and 2.02) Recruitment period Not reported |
|
| Predictors |
Considered predictors Baseline visit: age, sex, MS duration, MS subtype, ON antecedent; at baseline and the following 2 annual visits: BCVA, relapse in past year, EDSS, peripapillary thickness, superior thickness, nasal thickness, inferior thickness, temporal thickness, foveal thickness Number of considered predictors 39 Timing of predictor measurement At 3 visits over 2 years (not defined baseline and annual visits 1 and 2) Predictor handling Continuously, one‐hot encoding for categories |
|
| Outcome |
Outcome definition Disability (EDSS): worsening defined as at least a 1‐point increase in EDSS between visit 2‐ and 10‐year follow‐up Timing of outcome measurement Follow‐up for 10 years from baseline, 8 years from the last predictors |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events) 82 (37) Modelling method Long short‐term memory recursive neural network Predictor selection method
Hyperparameter tuning Search for optimal number of hidden layers (30), epochs (30), and mini‐batch size (20) in cross validation Shrinkage of predictor weights Not reported Performance evaluation dataset Development Performance evaluation method Cross‐validation, 10‐fold Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.8165 Classification estimate Accuracy = 0.817, sensitivity = 0.811, specificity = 0.822, PPV = 0.789 Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation List of selected predictors Number of predictors in the model 5 (4 of them longitudinal) Predictors in the model Disease duration, relapse in preceding year, EDSS, temporal RNFL thickness, superior RNFL thickness Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To improve the MS diagnosis and predict the long‐term course of disability in MS patients based on clinical data and retinal nerve fibre layer (RNFL) thickness, measured by optical coherence tomography (OCT) Primary aim The primary aim of this study is only partially about the prediction of individual outcomes. The focus is on OCT measures and machine learning. Model interpretation Probably exploratory Suggested improvements Use of OCT devices in combination with other techniques such as MRI, EP or CSF analysis, used in combination with clinical data, such as the EDSS |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data source was routine care even though there were clearly defined inclusion and exclusion criteria. |
| Predictors | Yes | Although the predictors are collected up to year 2 to predict 10‐year outcome from the baseline, the situation is clarified by reporting the prediction as for 8 years. |
| Outcome | No | The outcome was based on a 1‐point increase in EDSS. However, the meaning of a 1‐point change depends on the baseline value. This study included participants of different MS subtypes and a range of EDSS at baseline, which are expected to have different patterns of change due to disease. The outcome was not reported to be confirmed at a later point. |
| Analysis | No | The EPV was very low. Information on missing data and handling was not reported. Calibration was not assessed. Parameter tuning, modelling method selection, and final performance resulted from unnested CV. No model was provided. |
| Overall | No | At least one domain is at high risk of bias. |
Olesen 2019.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment 3 hospital units with ophthalmology departments and 44 ophthalmologists in general practice (Primary Care Ophthalmology) in the administrative unit Region of Southern Denmark Age (years) Median 36.0 Sex (%F) 67.5 Disease duration (years) Not reported Diagnosis 100% CIS (isolated optic neuritis) Diagnostic criteria Optic Neuritis Study Group criteria 1991 Treatment
Disease description Not reported Recruitment period 2014 to 2016 |
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement At disease onset (ON), from ON onset median (range): 14 days (2 days to 38 days) Predictor handling Continuously |
|
| Outcome |
Outcome definition Conversion to definite MS (McDonald 2010, Polman 2011): MS diagnosed according to McDonald 2010 Timing of outcome measurement Follow‐up median (range): 29.6 months (19 months to 41 months) |
|
| Missing data |
Number of participants with any missing value
Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events)
Modelling method Logistic regression Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Bootstrap, B = 500 Calibration estimate Calibration plot, Hosmer‐Lemeshow test Discrimination estimate c‐Statistic
Classification estimate Not reported Overall performance
Risk groups Not reported |
|
| Model |
Model presentation Nomogram Number of predictors in the model 3 Predictors in the model
Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study We propose that markers of inflammation and of neurodegeneration (a) may differ between patients with MS‐related ON and patients with ON unrelated to MS and (b) may predict development of MS in patients with acute ON. Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the association of CSF markers with CDMS. Model interpretation Probably exploratory Suggested improvements Validation in larger, well‐designed cohorts including differential diagnoses and other ethnicities from multiple centres |
|
| Notes |
Applicability overall High Applicability overall rationale The predictors used were CSF biomarkers and no other predictor domain was considered for use in the model. Auxiliary references Soelberg K, Jarius S, Skejoe H, Engberg H, Mehlsen JJ, Nilsson AC, et al. A population‐based prospective study of optic neuritis. Mult Scler 2017;23(14):1893‐901. Soelberg K, Skejoe HPB, Grauslund J, Smith TJ, Lillevang ST, Jarius S, et al. Magnetic resonance imaging findings at the first episode of acute optic neuritis. Mult Scler Relat Disord 2018;20:30‐6. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Unclear | A prospectively collected population‐based cohort was used, but it is unclear if the participants who experienced the outcome were included. 12 participants were diagnosed with clinically definite MS in less than 2 months. |
| Predictors | Yes | Predictors were collected shortly after onset and were collected without knowledge of the outcome due to the prospective collection. |
| Outcome | No | From the 40 patients included and 16 events of clinically definite MS, 12 were diagnosed with MS at the acute stage of optic neuritis in less than 2 months, while the predictors, venous blood and CSF, from all included patients were collected within 38 days of ON onset (median, 14 days; range 2 to 38). Thus, the time difference between predictor collection and outcome seems to be too short. |
| Analysis | No | The number of participants was too low. Not all participants were included in the analysis, and a complete case analysis was probably applied. Besides, the number of patients with missing data was about 5% and was not expected to increase the risk of bias. Univariate analyses were used to select candidate predictors for multivariate analysis. Logistic regression was applied even though there was no defined timing of the outcome and follow‐up duration varied amongst participants. The resampling method excluded the variable selection process. Effect estimates were not reported, so it is unclear whether the final model corresponds to the multivariable analysis. |
| Overall | No | At least one domain is at high risk of bias. |
Oprea 2020.
| Study characteristics | ||
| General information |
Model name Mixed treatment ‐ disability Primary source Journal Data source Routine care, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Neurology Department of the Bucharest Emergency University Hospital (BEUH), Romania Age (years) Mean 40.3 Sex (%F) 61.6 Disease duration (years) Mean 10.2 Diagnosis RRMS, PPMS Diagnostic criteria Not reported Treatment
Disease description Not reported Recruitment period Not reported |
|
| Predictors |
Considered predictors Gender, age at diagnosis, age, EDSS at onset, disease duration, number of treatments Number of considered predictors 6 Timing of predictor measurement At a single time point during outcome determination Predictor handling Unclear, continuously or EDSS at onset categorised |
|
| Outcome |
Outcome definition Disability (EDSS): keeping an EDSS score less than or equal to EDSS score threshold (chosen model with EDSS threshold ≤ 2.5) at final visit Timing of outcome measurement Not reported |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events) 151 (not reported) Modelling method Logistic regression Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Cross‐validation, 10,000 shuffle split, train‐test: 14:1 Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.8221 Classification estimate Accuracy = 0.7662, sensitivity = 0.7775, PPV = 0.8145, F1 = 0.7806 Overall performance Brier score = 0.1754 Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model 6 Predictors in the model Gender, age at diagnosis, age, EDSS at onset, disease duration, number of treatments Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To develop a disability and outcome prediction algorithm in MS patients Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements More patients, more relevant predictors, online platform |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to ambiguities and the lack of reporting on participants, predictors, and outcomes, applicability is unclear. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The study data come from routine care and eligibility criteria are unclear. |
| Predictors | No | The timing of data collection for predictors and outcome measurement are the same. Hence, probably the predictors are not available at the time of the intended model use. |
| Outcome | No | The outcome definition is unclear and does not mention confirmation. Also, the timing of the outcome assessment with respect to the prognostication is unclear, probably making the outcome highly variable for different patients with different periods between onset and assessment visit. |
| Analysis | No | The number of events was unclear, but even in the best case scenario, the number of events per predictor was lower than 15. No information on missing data and its handling was reported. Timing of predictor and outcome assessment was not considered. The final model was not presented. Although cross‐validation was used for internal validation, the need for shrinkage was not assessed. |
| Overall | No | At least one domain is at high risk of bias. |
Pellegrini 2019.
| Study characteristics | ||
| General information |
Model name Final model with 3 predictors Primary source Journal Data source Randomised trial participants, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment
Age (years) Mean 37.1 Sex (%F) 71.0 Disease duration (years) Mean 7.5 (SD 6.5) Diagnosis 100% RRMS Diagnostic criteria Mixed: McDonald 2001, McDonald 2005 Treatment
Disease description EDSS mean (SD): 2.5 (1.2), number of relapses 1 year prior to study entry mean (SD): 1.4 (0.7) Recruitment period Not reported |
|
| Predictors |
Considered predictors Age (in years), gender (male vs female), ethnicity (white vs other), number of relapses 1 year prior to study entry, number of relapses 3 years prior to study entry, MS disease duration (in years), time since pre‐study relapse (in months), prior treatment (yes vs no), EDSS, T25FW, 9HPT, PASAT, VFT 2.5%, gadolinium‐enhancing lesion number, T1 lesion volume (log‐scale), T2 lesion volume (log‐scale), brain volume standardised Z‐score, brain parenchymal fraction, SF‐36 Physical Component Summary, SF‐36 Mental Component Summary, study identifier (as fixed term adjustment) Number of considered predictors 23 Timing of predictor measurement At study baseline (RCT) Predictor handling
|
|
| Outcome |
Outcome definition Composite (EDSS, T25FW, 9HPT, PASAT, VFT): time to disability progression confirmed at 24 weeks on either EDSS (≥ 1 point increase if baseline EDSS ≥ 1.0 or 1.5 point increase otherwise) or any of timed 25‐foot walk (T25FW) test, 9HPT, Paced Auditory Serial Addition Test (PASAT), and visual function test (VFT; 2.5% contrast level) components (20% worsening on either T25FW or 9HPT or PASAT or 10‐letter worsening on VFT) Timing of outcome measurement Up to 2 years |
|
| Missing data |
Number of participants with any missing value Missing MRI data for 44% and 48% by design (DEFINE and CONFIRM) Missing data handling Multiple imputation, 10 MCMC‐based imputation sets |
|
| Analysis |
Number of participants (number of events) 1582 (434) Modelling method Survival, Cox Predictor selection method
Hyperparameter tuning Parameter tuning of ML models leading to predictor selection well‐described in supplementary material Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Bootstrap Calibration estimate Calibration slope 1 year = 1.10 (bootstrap = 1.08, SE 0.17), 2 years = 1.00 (bootstrap = 0.97, SE 0.15) Discrimination estimate
Classification estimate Not reported Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Regression coefficients without baseline hazard Number of predictors in the model 3 Predictors in the model PASAT, SF‐36 physical component summary, visual function test Effect measure estimates HR (95% CI): PASAT 0.94 (0.90 to 0.98), SF‐36 physical component summary 0.92 (0.88 to 0.97), visual function test 0.95 (0.92 to 0.99) Predictor influence measure Relative importance ranking Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To compare the aforementioned regression and machine learning methods in their ability to assess the ranking of common prognostic factors for MS progression and to generate consistent risk predictions in clinical trial data settings Primary aim The primary aim of this study is the prediction of individual outcomes. The focus is on the factors. Model interpretation Exploratory Suggested improvements Explore alternative predictors and their change over time, sensitivity of the endpoints’ definition to a set of baseline characteristics using a multivariate (i.e. joint) endpoint assessment based on variance components |
|
| Notes |
Applicability overall Low Auxiliary references Calabresi PA, Kieseier BC, Arnold DL, Balcer LJ, Boyko A, Pelletier J, et al. Pegylated interferon beta‐1a for relapsing‐remitting multiple sclerosis (ADVANCE): a randomised, phase 3, double‐blind study. Lancet Neurol 2014;13(7):657‐65. Fox RJ, Miller DH, Phillips JT, Hutchinson M, Havrdova E, Kita M, et al. Placebo‐controlled phase 3 study of oral BG‐12 or glatiramer in multiple sclerosis. N Engl J Med 2012;367(12):1087‐97. Gold R, Kappos L, Arnold DL, Bar‐Or A, Giovannoni G, Selmaj K, et al. Placebo‐controlled phase 3 study of oral BG‐12 for relapsing multiple sclerosis. N Engl J Med 2012;367(12):1098‐107. Polman CH, O'Connor PW, Havrdova E, Hutchinson M, Kappos L, Miller DH, et al. A randomized, placebo‐controlled trial of natalizumab for relapsing multiple sclerosis. N Engl J Med 2006;354(9):899‐910. NCT00906399. Efficacy and safety study of peginterferon beta‐1a in participants with relapsing multiple sclerosis (ADVANCE). https://clinicaltrials.gov/ct2/show/NCT00906399 (first received 21 May 2009). NCT00027300. Safety and efficacy of natalizumab in the treatment of multiple sclerosis. https://clinicaltrials.gov/ct2/show/NCT00027300 (first received 3 December 2001). NCT00420212. Efficacy and safety of oral BG00012 in relapsing‐remitting multiple sclerosis (DEFINE). https://clinicaltrials.gov/ct2/show/NCT00420212 (first received 11 January 2007). NCT00451451. Efficacy and safety study of oral BG00012 with active reference in relapsing‐remitting multiple sclerosis (CONFIRM). https://clinicaltrials.gov/ct2/show/NCT00451451 (first received 23 March 2007). |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | Data from an RCT were used. Although the inclusion and exclusion criteria for the prediction study were not described, the number of patients per study matched up with the original RCT publications; hence, there is no reason to assume that there were additional eligibility criteria for the prediction study. |
| Predictors | Yes | The predictors were collected during an RCT; therefore, they are expected to be collected in the same way across all patients. |
| Outcome | Yes | The outcome was composite with clear components of clinical interest, which are considered to be objective measurements. Assessments occurred during RCTs and are expected to be standardised. |
| Analysis | Yes | The EPV was around 20. Overfitting and optimism were accounted for. Calibration and discrimination were assessed. |
| Overall | Yes | All domains are at low risk of bias. |
Pinto 2020.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Routine care, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Patients with PPMS Recruitment Neurology Department of Centro Hospitalar e Universitario de Coimbra, Portugal Age (years)
Sex (%F)
Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria McDonald (undefined) Treatment Not reported Disease description Not reported Recruitment period Not reported |
|
| Predictors |
Considered predictors
Number of considered predictors 1306 Timing of predictor measurement At multiple visits dependent on which n‐year model (n = 1 to 5) Predictor handling
|
|
| Outcome |
Outcome definition
Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value Unclear exactly how many participants have any missing Missing data handling Mixed: single imputation of the feature mean for predictors, exclusion for outcome |
|
| Analysis |
Number of participants (number of events)
Modelling method Support vector machine Predictor selection method
Hyperparameter tuning Default parameters of MatLab function Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, (10 times) 10‐fold Calibration estimate Not reported Discrimination estimate
Classification estimate
Overall performance Not reported Risk groups Not applicable |
|
| Model |
Model presentation Not reported Number of predictors in the model Unclear which predictors make up the final model Predictors in the model Not reported Effect measure estimates Not reported Predictor influence measure Predictive power (% of iterations predictor selected in) Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To predict MS progression, based on the clinical characteristics of the first 5 years of the disease Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements Include more information such as MRI examination; use on different disease severeness criteria and compare; considering disease phenotypes as an interaction |
|
| Notes |
Applicability overall High Applicability overall rationale Approximately half of the participants had already experienced the outcome before the measurement of predictors, which included the baseline measure of the outcome itself. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data source was routine care, and inclusion was based on the availability of data. |
| Predictors | No | SP: The intended time of prediction is unclear and defined as availability in the dataset. Hence, using the predictors at 2 years seems to be unavailable at baseline. There is no enough information to judge whether predictors were defined and assessed in a similar way across patients, especially predictors related to instruments since the timing of data collection is unclear. Severity 6 years and severity 10 years: The prognosis was presented as a 6‐year prediction while the predictors were from the second year, effectively shortening the prediction window. There is no enough information to judge whether predictors were defined and assessed in a similar way across patients, especially predictors related to instruments since the timing of data collection is unclear. |
| Outcome | No | SP: The outcome was SPMS from a routine care database, which is expected to be not standardised or operationalised. Also, the timing of the outcome was not clearly defined but was limited to the availability in the database. At 2 years, which was the chosen model, half of the events had had already occurred. Severity 6 years and severity 10 years: EDSS scores were included in the predictors, and almost half of the participants have already experienced the event, the definition of which is based on EDSS by year 2. Also, the EDSS change was not reported to be confirmed. |
| Analysis | No | The amount of missing data was substantial and was handled by mean imputation within the cross‐validation structure. The sample size was too small. Univariable predictor selection was used. Calibration was not assessed. Parameter tuning for the main chosen model, SVM, was not reported. At correspondence, the defaults were reported to be used. It is unclear whether model selection and evaluation were properly separated. The final model is unclear. |
| Overall | No | At least one domain is at high risk of bias. |
Pisani 2021.
| Study characteristics | ||
| General information |
Model name SP‐RiSc Primary source Journal Data source Cohort, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment MS specialist centre of Verona University Hospital, Italy Age (years) Mean 33.5 Sex (%F) 58.4 Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria McDonald 2005 (Polman 2005) Treatment
Disease description EDSS median (range): 1.5 (0 to 3.5) Recruitment period 2005 to 2018 |
|
| Predictors |
Considered predictors
Number of considered predictors 12 or 13 (unclear adjustment) Timing of predictor measurement At diagnosis (RRMS) and up to 2 years after diagnosis Predictor handling Continuously |
|
| Outcome |
Outcome definition Conversion to progressive MS (Lublin 1996): time to the occurrence of continuous disability accumulation independently of relapses, confirmed 12 months later, transitory plateaus in the progressive course were allowed, steady progression was the rule Timing of outcome measurement Examination every 6 months or when a relapse occurred; mean (range) follow‐up: 9.55 years (6.8 years to 13.13 years) |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events) 262 (69) Modelling method Random survival forest Predictor selection method
Hyperparameter tuning Parameters, but not tuning methods, mentioned in appendix Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Random split for tool performance; development out‐of‐bag for random forest performance Calibration estimate Not reported Discrimination estimate Harrel’s c‐index on development out‐of‐bag for evaluating random forest at:
Classification estimate Cutoff = 17.7, accuracy = 0.88 (95% CI 0.75 to 0.96), sensitivity = 0.92 (95% CI 0.70 to 1.00), specificity = 0.87 (95% CI 0.70 to 0.96), PPV = 0.75 (95% CI 0.48 to 0.93), NPV = 0.96 (95% CI 0.81 to 1.00) from evaluation of final tool using random split Overall performance Brier score (95% CI) using development out‐of‐bag for evaluating RF at:
Risk groups 3 risk groups (high for those with ensemble mortality > third quartile, medium, low for those with ensemble mortality < first quartile) |
|
| Model |
Model presentation Combination of heat map value for 2 predictors plus other predictor values weighted by their minimal depth Number of predictors in the model 7 Predictors in the model
Effect measure estimates Not reported Predictor influence measure Minimal depth Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To develop the secondary progressive risk score (SP‐RiSc), which integrates demographic, clinical, and MRI data collected from a cohort of RRMS patients during the first 2 years after the disease diagnosis Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements An additional validation, especially on a larger independent cohort with neuroimaging data from different field strength MRI scanners |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Unclear | Although the data source is reported to be a cohort study, there are no details related to eligibility criteria. |
| Predictors | No | Images were produced at a single centre and analysed by 2 clinicians with experience. The included predictors are relatively objective. The model is meant to be used at RRMS diagnosis and the survival model counts time from diagnosis only. This means that the predictors measured at 2 years from the diagnosis are should be considered unavailable. |
| Outcome | No | The secondary progression conversion outcome was clearly defined but it was not operationalised and hence the application of this definition might vary greatly based on assessors and experience level. |
| Analysis | No | The number of events was low. Missing data and its handling was not mentioned. Discrimination and overall performance of the original RF is evaluated internally with out‐of‐bag error, but the final model is only assessed with classification measures. There is no mention of parameter tuning. The final prediction tool does not correspond to the multivariable model. |
| Overall | No | At least one domain is at high risk of bias. |
Roca 2020.
| Study characteristics | ||
| General information |
Model name Aggregated model Primary source Journal Data source Registry, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Subset of the OFSEP (Observatoire français de la sclérose en Plaques) registry from 37 institutions in 13 French cities, France Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis Not reported Diagnostic criteria Not applicable Treatment Unclear Disease description Not reported Recruitment period From 2008 onward |
|
| Predictors |
Considered predictors
Number of considered predictors Non‐tabular data + 65 Timing of predictor measurement At FLAIR imaging (initial in the dataset) Predictor handling Unclear, probably continuously |
|
| Outcome |
Outcome definition Disability (EDSS): EDSS score Timing of outcome measurement At 2 years from the initial imaging |
|
| Missing data |
Number of participants with any missing value 19 Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events) 1427 (continuous outcome) Modelling method Ensemble: convolutional neural network (linear and non‐linear registration), random forest (single and dual), and manifold learning Predictor selection method
Hyperparameter tuning Not reported Shrinkage of predictor weights Unclear Performance evaluation dataset Development Performance evaluation method Random split of approximately 1/3 for test set, further random split of remaining data (90% training, 10% validation) Calibration estimate Plot of MSE per EDSS category, MSE 2.21 (validation), 3 (test) Discrimination estimate Not applicable Classification estimate Not applicable Overall performance Not reported Risk groups Not applicable |
|
| Model |
Model presentation Not reported Number of predictors in the model Unstructured data + 65 Predictors in the model
Effect measure estimates Not reported Predictor influence measure Most informative features by RF variable importance Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To create an algorithm that combines multiple machine‐learning techniques to predict the expanded disability status scale (EDSS) score of patients with multiple sclerosis at 2 years solely based on age, sex and fluid‐attenuated inversion recovery (FLAIR) MRI data Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Using additional factors such as baseline EDSS score, including quantitative metrics coming from T1‐weighted‐based segmentation, a larger cohort or oversampling of high EDSS score examples or generating synthetic data, further validated on an external larger test cohort |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to provide a model/tool for the prediction of individual MS outcomes. Auxiliary references Vukusic S, Casey R, Rollot F, Brochet B, Pelletier J, Laplaud DA, et al. Observatoire Francais de la Sclerose en Plaques (OFSEP): a unique multimodal nationwide MS registry in France. Mult Scler 2020;26(1):118‐22. NCT02889965. The French multiple sclerosis registry (OFSEP). https://clinicaltrials.gov/ct2/show/NCT02889965(first received 7 September 2016). |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data, although collected prospectively in a registry, were known to have inclusion biases. The full dataset (DS1, DS2, DS3) corresponded to all the MRI scans that were recorded in the OFSEP database, meaning inclusion was defined by availability of data. |
| Predictors | No | The final features are based on heterogeneously collecting imaging data. It was unclear whether outcomes were known when features were created, but we did not believe this to be a source of bias. |
| Outcome | Yes | The outcome was based on EDSS, which we assume to be standard and robust to predictor knowledge. The visit frequency was reported to be about yearly. |
| Analysis | No | Calibration was not fully explored and reported (the bar chart and MSE did not allow for an understanding of the direction of the errors). Missing data were addressed in the Participant section. An additional 19 people were dropped due to data quality, but this number was very small (~1%) compared to the total amount. Random splits of the data were used for evaluation. Hyperparameter tuning details were unclear. The number of participants per predictor may be low given the complex modelling techniques used. The internal evaluation used the validation set to weight the models in the aggregate and then again to assess performance of this model. Presentation of the final model was unclear. |
| Overall | No | At least one domain is at high risk of bias. |
Rocca 2017.
| Study characteristics | ||
| General information |
Model name 15‐month clinical and MR Primary source Journal Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Consecutively from the outpatient populations attending MS clinics of the participating institutions, unclear which, Italy Age (years) Mean 51.3 Sex (%F) 50.0 Disease duration (years) Median (range) 10 (2 to 26) Diagnosis 100% PPMS Diagnostic criteria Thompson 2000 Treatment
Disease description EDSS median (IQR): 6.0 (4.5 to 6.5) Recruitment period Not reported |
|
| Predictors |
Considered predictors Age, log disease duration, baseline EDSS, baseline MS severity score, change in EDSS at 15 months, log baseline T2 lesion volume, T2 lesion volume percentage change, log baseline T1 lesion volume, T1 lesion volume percentage change, number of new T2 lesions, number of new T1 lesions, normalised brain volume, percentage brain volume change, cervical cord cross‐sectional area, cervical cord cross‐sectional area percentage change, average lesion mean diffusivity, average lesion mean diffusivity percentage change, average lesion fractional anisotropy, average lesion fractional anisotropy percentage change, average normal‐appearing white matter mean diffusivity, average normal‐appearing white matter mean diffusivity percentage change, average normal‐appearing white matter fractional anisotropy, average normal‐appearing white matter fractional anisotropy percentage change, average grey matter mean diffusivity, average grey matter mean diffusivity percentage change, (in another model: change in EDSS at 5 years) Number of considered predictors 26 Timing of predictor measurement At study baseline (cohort entry), at median 15 months after baseline, and at median 56 months (called 5 years) after baseline Predictor handling Continuously |
|
| Outcome |
Outcome definition Disability (EDSS): EDSS change between baseline and at 15‐year follow‐up; any EDSS change is always confirmed by a second visit after a further 3 months Timing of outcome measurement Median (IQR): 15.1 years (13.9 years to 15.4 years) |
|
| Missing data |
Number of participants with any missing value 5, only missing outcome reported Missing data handling
|
|
| Analysis |
Number of participants (number of events) 49 (continuous outcome) Modelling method Linear regression Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Cross‐validation, LOOCV Calibration estimate Not reported Discrimination estimate Not applicable Classification estimate EDSS change precision within 1 point = 0.776 Overall performance R2 = 0.61 Risk groups Not reported |
|
| Model |
Model presentation Regression coefficients without the intercept Number of predictors in the model 5 Predictors in the model Baseline EDSS, 15‐month EDSS change, 15‐month new T1 hypointense lesions, percentage brain volume change, baseline grey matter mean diffusivity Effect measure Linear model coefficients (P value): baseline EDSS −0.54 (< 0.001), 15‐month EDSS change 0.39 (0.09), 15‐month new T1 hypointense lesions 0.28 (0.003), percentage brain volume change −0.24 (0.05), baseline grey matter mean diffusivity 3.86 (0.03) Predictor influence Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To investigate the added value of magnetic resonance imaging measures of brain and cervical cord damage in predicting long‐term clinical worsening of primary progressive multiple sclerosis compared to simple clinical assessment Primary aim The primary aim of this study is not the prediction of individual outcomes. Rather, the focus is on the usefulness of MRI measures. Model interpretation Exploratory Suggested improvements To widen clinical measures and include further MRI measures |
|
| Notes |
Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of new MRI measures. Auxiliary references Rovaris M, Gallo A, Valsasina P, Benedetti B, Caputo D, Ghezzi A, et al. Short‐term accrual of gray matter pathology in patients with progressive multiple sclerosis: an in vivo study using diffusion tensor MRI. Neuroimage 2005;24(4):1139‐46. Rovaris M, Judica E, Gallo A, Benedetti B, Sormani MP, Caputo D, et al. Grey matter damage predicts the evolution of primary progressive multiple sclerosis at 5 years. Brain 2006;129(Pt 10):2628‐34. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | The data source was a cohort study, and that was collected with the aim of imaging predictor search, but no eligibility criteria were mentioned other than the diagnosis. It was explicitly stated that inclusion did not depend on disease duration, progression rate, and disability level. |
| Predictors | Unclear | The stated interest was in predicting 15 years outcomes, but the data used were at 15 months after baseline, which brought the window down to less than 14 years. The intended time of the model used is unclear. |
| Outcome | Unclear | The outcome was based on EDSS, which is considered to be measured objectively. It was conceptualised as a change in EDSS and treated as a score that can be subtracted, and the change was treated as a continuous outcome in a linear regression. However, EDSS is not a numeric scale. It is accepted to be an ordinal scale, and it is unclear if treating it as numeric is appropriate or not. |
| Analysis | No | The 5 participants (> 10% of the sample size) lost to follow‐up were excluded from the analysis, without any mention of how they compared to other patients. The number of events per predictor was far lower than 10. Change in EDSS was assumed to follow a normal distribution by modelling it linearly ‐ without any interaction terms ‐ although there was no information if this assumption was violated or not since EDSS itself is not considered a linear scale but an ordinal one. Although cross‐validation was used, it is unclear whether the variable selection process was included within this procedure. There is no predicted vs observed plot or a similar measure of calibration. |
| Overall | No | At least one domain is at high risk of bias. |
Rovaris 2006.
| Study characteristics | ||
| General information |
Model name Not applicable Primary source Journal Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Consecutively from the outpatient populations attending MS clinics of the participating institutions, unclear which, Italy Age (years) Mean 51.3 Sex (%F) 50.0 Disease duration (years) Median 10 (range: 2 to 26) Diagnosis 100% PPMS, 45 definite, 9 probable Diagnostic criteria Thompson 2000 Treatment
Disease description EDSS median (range): 5.5 (2.5 to 7.5) Recruitment period Not reported |
|
| Predictors |
Considered predictors Age, gender, disease duration, EDSS, baseline T2 LV, T2 LV percent change, baseline T1 LV, T1 LV percent change, number of new T2 lesions, number of new T1 lesions, normalised brain volume, percentage brain volume percent change, cervical cord cross‐sectional cord area, cervical cord cross‐sectional cord area percent change, average lesion mean diffusivity, average lesion MD percent change, average lesion fractional anisotropy, average lesion fractional anisotropy percent change, average lesion normal‐appearing white matter mean diffusivity, average lesion normal‐appearing white matter mean diffusivity percent change, average lesion normal‐appearing white matter fractional anisotropy, average lesion normal‐appearing white matter fractional anisotropy percent change, average lesion grey matter mean diffusivity, average lesion grey matter mean diffusivity percent change, (adjustment for follow‐up time) Number of considered predictors 25 Timing of predictor measurement At study baseline (cohort entry), at 15 months post‐baseline (follow‐up), at final follow‐up (outcome measurement) Predictor handling Continuously |
|
| Outcome |
Outcome definition Disability (EDSS): clinically worsened defined as an EDSS score increase ≥ 1.0, when baseline EDSS was < 6.0, or an EDSS score increase ≥ 0.5, when baseline EDSS was ≥ 6.0; confirmed by a second visit after a 3‐month interval Timing of outcome measurement Follow‐up median (range): 56.0 months (35 months to 63 months) |
|
| Missing data |
Number of participants with any missing value ≤ 11, unclear exactly how many participants have any missing Missing data handling Complete case, the details are not explicitly reported |
|
| Analysis |
Number of participants (number of events) 52 (35) Modelling method Logistic regression Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Cross‐validation, LOOCV Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 0.808, sensitivity = 31/35 = 0.89, specificity = 11/17 = 0.65 Overall performance Nagelkerke's R2 = 0.44 Risk groups Not reported |
|
| Model |
Model presentation Regression coefficients without intercept and follow‐up time Number of predictors in the model 2 or 3 (unclear if follow‐up time included) Predictors in the model Baseline EDSS, grey matter mean diffusivity, follow‐up Effect measure estimates OR (95% CI): baseline EDSS 0.48 (0.26 to 0.91), average grey matter mean diffusivity 1.21 (1.06 to 1.38), follow‐up not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To investigate whether conventional and DT‐MRI‐derived measures can predict the long‐term clinical evolution of PP multiple sclerosis Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on the usefulness of MRI measures. Model interpretation Exploratory Suggested improvements Not reported |
|
| Notes |
Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of MRI measures. Auxiliary references Rovaris M, Gallo A, Valsasina P, Benedetti B, Caputo D, Ghezzi A, et al. Short‐term accrual of gray matter pathology in patients with progressive multiple sclerosis: an in vivo study using diffusion tensor MRI. Neuroimage 2005;24(4):1139‐46. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | The data source was a cohort study, and the data seems to be collected with the aim of imaging predictor search, but no eligibility criteria were mentioned other than the diagnosis. It was specifically stated that inclusion did not depend on disease duration, progression rate, and disability level. |
| Predictors | Yes | The predictors in the final model were based on baseline measurements. Due to the prospective nature of data collection and automated analysis of images, predictors are considered to be assessed without knowledge of outcome data. Both automated MR analysis and EDSS measurements are considered to be objective. |
| Outcome | Yes | Even though the same physician assessed EDSS, EDSS is considered to be an objective measure. There was approximately a full 2‐year range in outcome assessment time, but the clinical authors did not find this problematic. |
| Analysis | No | The EPV was very low. Predictor selection started with univariate analyses. Neither calibration nor discrimination was addressed for the final model. Cross‐validation was used, but it did not cover all modelling steps. Final model coefficients were provided for EDSS and average GM MD, but not for follow‐up time. At least 6 participants (> 10%) had missing data, and complete case analysis was probably used. The model was adjusted for follow‐up time, which we consider to be an inappropriate use of post‐baseline data. |
| Overall | No | At least one domain is at high risk of bias. |
Runia 2014.
| Study characteristics | ||
| General information |
Model name Not applicable Primary source Dissertation Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Consecutive patients at the Rotterdam MS Centre, Netherlands Age (years) Unclear Sex (%F) 72.9 Disease duration (years) Up to 0.5 Diagnosis 100% CIS Diagnostic criteria Own definition Treatment Not reported Disease description Not reported Recruitment period Not reported |
|
| Predictors |
Considered predictors Age (unclear if linear or 3 categories), sex, optic nerve (binary), fatigue, presence of first‐ or second‐degree relatives with MS, abnormal MRI (1 or more lesions), number of T2 lesions (0 lesions/1 to 9 lesions/> 9 lesions), gadolinium enhancement, presence of a lesion in the corpus callosum, modified Barkhof criteria (at least 3 of 4 criteria fulfilled), Swanton criteria, DIS + DIT2010 (the baseline scan fulfils criteria for dissemination in time and place according to the 2010 revised McDonald criteria (Polman 2011)), IgG index, presence of oligoclonal bands, serum 25‐OH‐vitamin D (fatigue as continuous and localisation of first symptoms as optic nerve, spinal cord, brainstem, or other were chosen to be included otherwise due to 'discriminating ability') Number of considered predictors ≥ 16 or 21 (unclear transformations) Timing of predictor measurement At disease onset (CIS) (at study baseline within 6 months after onset) Predictor handling All categorised or dichotomised in Table 2/FSS (justified by comparison to the continuous version based on discriminative ability) and number of T1 lesions dichotomised, number of T2 lesions categorised/unclear: age categorised, 25‐OH‐vitamin D dichotomised, IgG Index dichotomised |
|
| Outcome |
Outcome definition Conversion to definite MS (Poser 1983): time from start of first symptoms to CDMS diagnosed in case of clinical evidence for dissemination in space and time Timing of outcome measurement Unclear, up to > 90 months |
|
| Missing data |
Number of participants with any missing value ≥ 356, unclear exactly how many participants have any missing Missing data handling Mixed: complete case for outcome, multiple imputation for predictors |
|
| Analysis |
Number of participants (number of events) 431 (109 by 2 years) Modelling method Survival, Cox Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Bootstrap Calibration estimate Not reported Discrimination estimate c‐Statistic:
Classification estimate Not reported Overall performance Not reported Risk groups 3 risk categories from the sum score: low (0 to 1), intermediate (2 to 3), and high (4 to 5) |
|
| Model |
Model presentation
Number of predictors in the model 5 Predictors in the model DIS + DIT2010, corpus callosum lesions, oligoclonal bands, fatigue, abnormal MRI Effect measure estimates HR (95% CI): DIS + DIT2010 2.2 (1.4 to 3.3), corpus callosum lesions 1.9 (1.2 to 2.9), oligoclonal bands 1.7 (1.1 to 2.6), fatigue 2.3 (1.4 to 3.9), abnormal MRI 2.3 (0.9 to 6.0) Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To develop a simple and reliable prediction model for MS in patients with CIS Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements External validation |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | The study used data collected for the Predicting the Outcome of a Demyelinating Event (PROUD) study protocol, and the inclusion/exclusion criteria were based on the baseline status. |
| Predictors | Yes | Predictors were collected before the outcome and therefore blinded, and all are available at onset. |
| Outcome | Yes | A standard definition of conversion to definite MS was used. The outcome was probably not blinded to predictors due to the clinical setting, but the outcome is considered relatively objective. The predictors dissemination in time (DIT) and dissemination in space (DIS) in McDonald include MRI, but the Poser does not, so the predictors were not included in the outcome definition. |
| Analysis | No | The EPV was low. Patients lost to follow‐up, with no reported reason or comparison with the remaining cohort, were excluded even though it was a survival analysis. Calibration was not assessed. Bootstrap methods were used to account for optimism but probably did not include the whole modelling process. Predictors were selected based on univariate analyses. Many continuous predictors seem to be categorised, although the reason is not very clear. |
| Overall | No | At least one domain is at high risk of bias. |
Seccia 2020.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Routine care, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Sant’Andrea University hospital in Rome, Italy Age (years) Mean 29.0 (onset) Sex (%F) 69.8 Disease duration (years) Mean 19.0 Diagnosis 100% RRMS Diagnostic criteria Latest criteria at time of diagnosis Treatment Unclear timing, 73% on DMT at some point Disease description Not reported Recruitment period 1985 to 2018 |
|
| Predictors |
Considered predictors Longitudinal trajectories of age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs, feature saving: status T1, status T2, oligoclonal banding Number of considered predictors 21 predictor trajectories Timing of predictor measurement At multiple visits comprising patient history to the current visit of interest Predictor handling Continuously |
|
| Outcome |
Outcome definition Conversion to progressive MS: transition from the RR to the SP phase within 180 days as assessed by the treating clinician Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value 0 Missing data handling Exclusion of 3 variables with missing values |
|
| Analysis |
Number of participants (number of events)
Modelling method Long short‐term memory recurrent neural network Predictor selection method
Hyperparameter tuning Number of neurones chosen through trial and error procedure, dropout probability set to 0.2 Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Random split, train‐test splits preserving outcome proportions with balance‐inducing bagging Calibration estimate Not reported Discrimination estimate Not reported Classification estimate
Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model 18 predictor trajectories Predictors in the model Longitudinal trajectories of age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To explore the possibility of predicting whether a patient will pass from RR to SP phase in a given time window, using a real‐world dataset, built in close collaboration between computer experts and neurologists Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements Using the LSTM model with different endpoints that are less unbalanced, using large and well maintained clinical databases |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to provide a model/tool for the prediction of individual MS outcomes. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data source was routine medical records, and there were no eligibility criteria other than the MS subtype. |
| Predictors | No | The data were collected between 1978 and 2018. The first patient entering analysis was seen in 1985, probably due to the missingness of predictors prior to that time. Due to changing diagnostic criteria and technology, predictors such as age at onset, T1/T2 status, and treatment options are expected to be heterogeneous over time. |
| Outcome | No | The outcome was SPMS from a routine care database, which is expected to be not standardised or operationalised. Given that the diagnostic criteria changed over time, the outcome definition is expected to be somewhat different over time. |
| Analysis | No | The sample size and number of events were low. No discrimination or calibration measures were assessed. Many participants were dropped in the feature‐saving analysis, but here we focused on the record‐saving analysis. For computational reasons, a random split was used for assessment. There was no separation of data used for parameter tuning and data used to estimate performance in future patients. A final model did not appear to be selected, fitted, and presented. |
| Overall | No | At least one domain is at high risk of bias. |
Skoog 2014.
| Study characteristics | ||
| General information |
Model name MSPS Primary source Journal Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Medical records from the Sahlgrenska Neurology Department and outpatient clinic, the only neurological service in the Gothenburg area, Sweden Age (years) Mean 33.5 Sex (%F) 65.0 Disease duration (years) Median 2 Diagnosis 100% RRMS Diagnostic criteria Poser 1983 Treatment 0% Disease description Not reported Recruitment period Not reported |
|
| Predictors |
Considered predictors Age at onset attack, current age (spline), gender, time from the second attack, number of previous attacks, monofocal symptoms at onset attack, afferent symptoms at onset attack, complete remission from the onset attack, monofocal symptoms at the most recent attack, afferent symptoms at the most recent attack, complete remission from the most recent attack, time since the most recent attack, severity grade of attack (0 to 2, number of unfavourable 'no' responses to afferent symptoms and complete remission), interaction term between the attack grade and the interval between the most recent attack and current time Number of considered predictors ≥ 15 (unclear transformations) Timing of predictor measurement At last relapse, at time of prognostication Predictor handling
|
|
| Outcome |
Outcome definition Conversion to progressive MS (Lublin 1996): continuous progression for at least 1 year without remission and detectable at time intervals of months or years, determined retrospectively after 1 year of observation and recorded the probable year of onset retrospectively; observation terminated at onset of secondary progression, at censoring due to competing causes of death, other disabling diseases, migration or the end of follow‐up; time since RRMS onset Timing of outcome measurement Time from the first relapse to censoring or outcome median (range): 11.5 years (0.7 years to 56.7 years) |
|
| Missing data |
Number of participants with any missing value 171 attacks, unclear exactly how many participants Missing data handling Mixed, complete case for attacks, and regression methods for loss to follow‐up |
|
| Analysis |
Number of participants (number of events) 157 (118 participants, unit of analysis is participants, 749 attacks) Modelling method Survival, Poisson Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate O:E table Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups Low‐risk periods: score < 0.04, high‐risk periods: score > 0.06 |
|
| Model |
Model presentation
Number of predictors in the model 3 (4 df) Predictors in the model Age, attack grade, time since last relapse (interaction with attack grade) Effect measure estimates log HR (SE): constant −11.5081 (4.0138), lower age predictor 0.3167 (0.1507), upper age predictor −0.0199 (0.0088), attack grade 0.7164 (0.1467), attack grade × time since last relapse −0.0457 (0.0158) Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To search for independent demographic and clinical factors that contributed to the risk of transition to SP and to simplify these complex relationships into a continuous individualised prediction based on repeated assessments expressed as a clinically and scientifically useful score Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements Investigation and replication in an independent patient cohort, taking into account the therapy |
|
| Notes |
Applicability overall Low Auxiliary references Runmarker B, Andersen O. Prognostic factors in a multiple sclerosis incidence cohort with twenty‐five years of follow‐up. Brain 1993;116 (Pt 1):117‐34. Skoog B, Runmarker B, Winblad S, Ekholm S, Andersen O. A representative cohort of patients with non‐progressive multiple sclerosis at the age of normal life expectancy. Brain 2012;135(Pt 3):900‐11. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Exclusion criteria contained a second attack in the year of onset of SP, but it was not explained. The author defined the data source as a cohort, but the diagnoses and other categorisations probably needed to be performed retrospectively. Thus, it is unclear whether it was a cohort study or not. |
| Predictors | Yes | Although the recruitment lasted 14 years and the median time of follow‐up till event/censoring was over 11 years, the data was from a single centre, and the predictor definitions seem to be clear. Hence, the assessment is considered to be similar amongst patients. The predictor assessments might have been performed retrospectively, but the definitions seem to be clear, leaving no space for subjective judgement. |
| Outcome | No | The outcome was clearly defined but it is not an operationalised one. Assessors, experience level, and blinding were not explicitly reported, hence the application of this definition might vary greatly. |
| Analysis | No | The EPV was low whether assessed in terms of a binary outcome or continuous one. No information on missing data was provided. All the reported measures were evaluated in the full development set. Discrimination and optimism were not addressed. |
| Overall | No | At least one domain is at high risk of bias. |
Skoog 2019.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source
Study type
|
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment
Age (years) Mean 33.0 (CDMS onset, i.e. 2nd attack) Sex (%F)
Disease duration (years)
Diagnosis 100% RRMS Diagnostic criteria Poser 1983 Treatment
Disease description Not reported Recruitment period
|
|
| Predictors |
Considered predictors Not applicable Number of considered predictors Not applicable Timing of predictor measurement Not applicable Predictor handling Not applicable |
|
| Outcome |
Outcome definition Conversion to progressive MS (Lublin 1996): continuous progression for at least 1 year, without remission, and detectable at time intervals of months or years, determined retrospectively after one year of observation and recorded the probable year of onset retrospectively Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value
Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events)
Modelling method Not applicable Predictor selection method Not applicable Hyperparameter tuning Not applicable Shrinkage of predictor weights Not applicable Performance evaluation dataset
Performance evaluation method
Calibration estimate
Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups Nor calibration measures: periods with predetermined MSPS strata < 0.025, 0.025 to 0.05, 0.05 to 0.075, 0.075 to 0.10, 0.10 to 0.125, > 0.125 (simplified to < 0.05, 0.05 to 0.075, 0.075 to 0.10, > 0.10) |
|
| Model |
Model presentation
Number of predictors in the model Not applicable Predictors in the model Not applicable Effect measure estimates Not applicable Predictor influence measure Not applicable Validation model update or adjustment Recalibration |
|
| Interpretation |
Aim of the study To validate this model with an essentially untreated Swedish cohort Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements Demonstrating generalisability in non‐Swedish cohorts collected with different methods, considering DMT use |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Val: Exclusion criteria contained a second attack in the year of onset of SP, but it was not explained. The author defined the data source as a cohort, but the diagnoses and other categorisations probably needed to be performed retrospectively. Thus, it is unclear whether it was a cohort study or not. Ext Val: The data source was a registry for which patient data were entered retrospectively. Also, the diagnoses and other categorisations probably needed to be performed retrospectively. |
| Predictors | Yes | The data were from a single centre and the predictor definitions seem to be clear. Hence, the assessment is considered to be similar amongst patients. The predictor assessments might have been performed retrospectively, but the definitions seem to be clear, leaving no space for subjective judgement. |
| Outcome | No | The outcome was clearly defined but it is not an operationalised one. Assessors, experience level, and blinding were not explicitly reported, hence the application of this definition might vary greatly. |
| Analysis | No | Val: Missing values and how they were treated were not clearly discussed. Discrimination was not addressed. Ext Val: The number of events in the validation was low. Missing data were not clearly discussed. Discrimination was not addressed. |
| Overall | No | At least one domain is at high risk of bias. |
Sombekke 2010.
| Study characteristics | ||
| General information |
Model name Outcome dichotomous MSSS, predictors clinical + genetics Primary source Journal Data source Unclear, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Natural history studies at the MS Centre of the VU University Medical Centre in Amsterdam, Netherlands Age (years) Mean 32.4 (onset) Sex (%F) 63.8 Disease duration (years) Mean 13.1 (SD 8.3) Diagnosis 51.2% RRMS, 31.4% SPMS, 17.4% PPMS Diagnostic criteria Mixed: Poser 1983, McDonald 2005 (Polman 2005) Treatment Not reported Disease description EDSS median (IQR): 4.0 (3.5) Recruitment period Not reported |
|
| Predictors |
Considered predictors Gender, onset type, age at onset, SNPs (69) Number of considered predictors 72 Timing of predictor measurement At baseline (already available or retrospectively collected) Predictor handling Age continuously, SNPs categorised |
|
| Outcome |
Outcome definition Disability (MSSS): MSSS ≥ 2.5; MSSS denotes the speed of disability accumulation of an individual patient compared with a large patient cohort Timing of outcome measurement Not reported |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events) 605 (86) Modelling method Logistic regression Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Bootstrap (B = 500), unclear if for optimism correction or just confidence intervals Calibration estimate Hosmer‐Lemeshow test Discrimination estimate c‐Statistic = 0.78 (95% CI 0.75 to 0.84) Classification estimate Sensitivity = 0.37, specificity = 0.953, LR+ = 7.9 Overall performance Nagelkerke's R2 = 0.219 Risk groups Not reported |
|
| Model |
Model presentation Regression coefficients without intercept Number of predictors in the model 9 (13 df) Predictors in the model Age at onset, male gender, progressive onset type, NOS2 level, PITPNCI level, IL2 level, CCL5 level, ILIRN level, PNMT level Effect measure estimates OR (95% CI): age at onset 1.05 (1.02 to 1.08), male gender 2.02 (1.14 to 3.57), progressive onset type 4.69 (1.32 to 16.63), NOS2 level AG 0.53 (0.32 to 0.89), NOS2 level AA 0.24 (0.09 to 0.67), PITPNCI level AG 0.45 (0.27 to 0.75), PITPNCI GG 0.59 (0.18 to 1.95), IL2 level GT 0.39 (0.22 to 0.70), IL2 level TT 0.38 (0.17 to 0.84), CCL5 level CT 2.04 (1.12 to 3.70), CCL5 level TT 1.47 (0.38 to 5.67), ILIRN level CT/TT 0.60 (0.36 to 0.99), PNMT level GG 0.52 (0.29 to 0.92) Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To evaluate the additional prognostic value of genetic information of a DNA chip, containing a set of candidate genes, previously correlated to MS (either susceptibility or phenotypes) over available demographics and clinical characteristics, aiming to improve the prediction of the expected disease severity for future patients Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the prognostic value of genetic data. Model interpretation Exploratory Suggested improvements Test on patients with longer disease duration, use SNPs assessed during the GWAS era, include MRI parameters, yet‐to‐be‐discovered genes, environmental factors |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data source on which the prediction model study relied on is unclear. Some predictors were collected retrospectively, and the inclusion criteria contained the availability of DNA and clinical assessment of disability. |
| Predictors | Yes | Genetic data are not likely to be biased. Clinical data were simple and easy to collect. The predictors are objective measures and could be available at the time of model use. |
| Outcome | Yes | Assuming that MSSS is a relatively standard outcome, it accounts for the difference in time from disease onset in patients, and the outcome was collected at a single point in time. |
| Analysis | No | The EPV was less than 10. Only discrimination was assessed, and it is unclear how missing information was handled, other than through exclusion criteria, that was handled in the Participants section. It seems like univariate analyses were used to select the predictors. It is unclear if model overfitting and optimism in model performance were accounted for or not. |
| Overall | No | At least one domain is at high risk of bias. |
Sormani 2007.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Randomised trial participants, secondary Study type Development + external validation, spectrum |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment
Age (years)
Sex (%F) Not reported Disease duration (years)
Diagnosis 100% RRMS Diagnostic criteria Poser 1983 Treatment
Disease description
Recruitment period
|
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition Relapse: time of first relapse occurrence defined as appearance of one or more new neurological symptoms or the reappearance of one or more previously experienced neurological symptoms; neurological deterioration had to last at least 48 hours and be preceded by a relatively stable or improving neurological state in the prior 30 days; the symptoms had to be accompanied by objective changes in the neurological examination corresponding to an increase of at least 0.5 points on the EDSS, or one grade in the score of 2 or more functional systems or 2 grades in 1 functional system; deterioration associated with fever or infections that can cause transient, secondary impairment of neurological function or change in bowel, bladder, or cognitive function alone was not accepted as a relapse Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value 9, not explicitly reported in this report Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events) Dev: 539 (unclear, approximately 270) Ext Val: 117 (not reported) Modelling method
Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method
Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups
|
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure Not applicable Validation model update or adjustment
|
|
| Interpretation |
Aim of the study To generate and validate a composite (clinical and MRI‐based) score able to identify individual patients with relapsing‐remitting multiple sclerosis (RRMS) with a high risk of experiencing relapses in the short term Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements A validation in natural history cohorts (but this is not feasible because current patients are treated) |
|
| Notes |
Applicability overall Low Auxiliary references Comi G, Filippi M, Wolinsky JS. European/Canadian multicentre, double‐blind, randomized, placebo‐controlled study of the effects of glatiramer acetate on magnetic resonance imaging‐‐measured disease activity and burden in patients with relapsing multiple sclerosis. European/Canadian glatiramer acetate study group. Ann Neurol 2001;49(3):290‐7. Filippi M, Wolinsky JS, Comi G. Effects of oral glatiramer acetate on clinical and MRI‐monitored disease activity in patients with relapsing multiple sclerosis: a multicentre, double‐blind, randomised, placebo‐controlled study. Lancet Neurol 2006;5(3):213‐20. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | Although participants with missing predictor measurements were excluded from the current study, they probably comprised only a small percentage (< 5%) of the eligible population from the original RCT cohort. |
| Predictors | Yes | Dev: The predictors appear to be collected at baseline. The predictors were not explicitly named in the text but table 2 consists of predictors entering the univariable and multivariable analysis. The data source is an RCT, so assessment is assumed to be similar across patients. Val: The data from this trial were collected using different MRI machines of various strengths, but contrast‐enhancing lesions should be robust to the use of different machines. |
| Outcome | Yes | The details of the outcome definition were not explicitly reported in the prediction model study but can be found in the RCT. We expect the outcome to be standardised and determined appropriately due to the data source. The outcome may or may not be determined with the knowledge of predictors, but the outcome is considered an objective one. |
| Analysis | No | Dev: Variable selection began with univariable analysis. No discrimination or calibration measures were reported. Although external validation was done, there was no indication of model shrinkage or other attempts at addressing overfitting and optimism. Val: The number of events was not reported but was expected to be at best 56.5. No relevant performance measures were reported. |
| Overall | No | At least one domain is at high risk of bias. |
Spelman 2017.
| Study characteristics | ||
| General information |
Model name Not applicable Primary source Journal Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Fifty MS clinics participating in MSBase Incident Study (MSBASIS), a substudy within MSBase registry Age (years) Median 31.6 (at MS onset) Sex (%F) 70.5 Disease duration (years) Up to 1 year Diagnosis 100% CIS Diagnostic criteria Poser 1983 Treatment
Disease description EDSS median (IQR): 2 (1 to 2.5) Recruitment period From 2004 onward |
|
| Predictors |
Considered predictors Sex, age at onset, EDSS, first symptom location (categorical with optic pathways as reference, supratentorial, brainstem or spinal cord), T1 gadolinium lesions (binary), T2 hyperintense lesions (3 levels), infratentorial lesions (binary), juxtacortical (binary), periventricular (3 levels), number of spinal T1 gadolinium lesions (binary), number of spinal T2 lesions (binary), oligoclonal bands (binary), (unclear adjustment for country) Number of considered predictors ≥ 16 (unclear how many interactions tested) Timing of predictor measurement At disease onset (CIS) (up to 12 months after disease onset) Predictor handling
|
|
| Outcome |
Outcome definition Conversion to definite MS (Poser 1983): time to first relapse following CIS, i.e. CDMS, defined as examination evidence of a symptomatic second neurological episode attributable to demyelination of more than 24 hours duration and more than 4 weeks from the initial attack; follow‐up time was defined as the time that lapsed between the date of CIS onset (baseline) and either the date of first post‐CIS relapse or, where no subsequent post‐CIS relapse was observed, the date of the last recorded clinic visit Timing of outcome measurement Follow‐up median (IQR): 1.92 years (0.90 years to 3.71 years) |
|
| Missing data |
Number of participants with any missing value ≤ 1017, unclear how many of the exclusions are due to missing Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events) 3296 (1953) Modelling method Survival, Cox Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights Unclear Performance evaluation dataset Development Performance evaluation method Bootstrap (B = 1500) Calibration estimate Calibration plot Discrimination estimate
Classification estimate Not reported Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation
Number of predictors in the model 7 (11 df) Predictors in the model Sex, age, EDSS, first symptom location, T2 Infratentorial lesions, T2 periventricular lesions, OCB in CSF Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To examine determinants of second attack and validate a prognostic nomogram for individualised risk assessment of clinical conversion Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements External validation with a larger sample, more patients with 0 T2 lesions |
|
| Notes |
Applicability overall Unclear Applicability overall rationale It is unclear whether some patients had already experienced the outcome at the time of predictor collection. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Although the data source is appropriate, eligibility criteria contained both baseline predictor measurement and regular follow‐up, which may cause risk of bias. |
| Predictors | No | Due to the prospective collection of the data, predictors were probably assessed without knowledge of outcomes and because only baseline variables were used, all predictors should be available at intended time of prediction. No information is reported on whether predictors were defined and assessed in a similar way for all patients. Especially the imaging predictors from multiple centres in many countries participating in the MSBase registry are likely to be introducing risk of bias. |
| Outcome | Unclear | We consider relapses to be a relatively objective outcome; therefore, we believe that the assessment with knowledge of predictor information does not increase the risk of bias. However, the predictors were collected within 12 months of onset, and according to the survival curves, a substantial amount of patients (between 0.2 and 0.7) already may have had the event at the time of predictor collection. |
| Analysis | No | Some continuous predictors were categorised with only 2 to 3 levels. Over 1000 enrolled patients were excluded from the study without any description of the reasons and how they differed from those included; thus, it is unclear whether complete case analysis was appropriate. The authors mentioned adjusting for country, but the methods were not described, making it unclear whether hierarchical models were used or a categorical predictor included, or some other method was used. The method of arriving at the weights in the nomogram and if any optimism correction was done are unclear. |
| Overall | No | At least one domain is at high risk of bias. |
Szilasiová 2020.
| Study characteristics | ||
| General information |
Model name Not applicable Primary source Journal Data source Cohort, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Department of Neurology of Louis Pasteur University Hospital in Kosice, Slovak Republic Age (years) Unclear Sex (%F) 64.7 Disease duration (years) Mean 6.7 (range 0.5 to 30) Diagnosis 63.5% RRMS, 29.4% SPMS, 7.1% PPMS Diagnostic criteria McDonald 2001 Treatment Reported for the original cohort of 110 patients, unclear timing: 64.7% interferon‐beta and 35.3% some DMT Disease description EDSS mean (SD, range): 3.03 (1.5, 1.0 to 7.0) Recruitment period 2003 to 2018 |
|
| Predictors |
Considered predictors Age, sex, disease duration, EDSS, MS form (SP vs R or P), P300 latency, P300 amplitude, lesion load (# T2 lesions), education (primary, secondary, university) Number of considered predictors 11 Timing of predictor measurement At study baseline (cohort entry) Predictor handling Continuously |
|
| Outcome |
Outcome definition Disability (EDSS): clinically worsened defined as an EDSS ≥ 5 Timing of outcome measurement 15 years |
|
| Missing data |
Number of participants with any missing value 25 Missing data handling Complete case |
|
| Analysis |
Number of participants (number of events) 85 (not reported) Modelling method Logistic regression Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate Unclear due to mismatch between ROC curve and reported statistics: 0.94 (95% CI 0.889 to 0.984) Classification estimate Unclear because these points do not correspond to point on plot, sensitivity = 0.94, specificity = 0.89 Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Full regression model Number of predictors in the model 6 (7 df) Predictors in the model Sex, age, MS form, EDSS, MS duration, P300 latency (ms) Effect measure estimates OR (95% CI); sex: 0.17 (0.02 to 1.295), age: 0.87 (0.74 to 1.040), RRMS: 3,156,828,983.597 (0.000 to NA), PMS: 751,474,054.21 (0.000 to NA), EDSS: 3.06 (1.028 to 9.139), MS duration: 1.21 (1.007 to 1.451), P300 latency (ms): 1.06 (1.008 to 1.110), constant: 0.0 (NA to NA) Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To determine whether ERPs have a prognostic significance for a patient’s future disability Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on usefulness of ERPs. Model interpretation Probably exploratory Suggested improvements Not reported |
|
| Notes |
Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to identify predictors. Additionally, this study included participants who had already experienced the outcome at baseline. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data source is a cohort study. But, according to Table 1, the EDSS range at study entry was from 1.0 to 7.0, which means participants with the outcome at entry were included in the analysis. |
| Predictors | Yes | This was a single‐centre study with well described procedures for electrophysiological predictor collection. The other predictors were standard and/or easy to assess. Predictors were assessed at study entry. |
| Outcome | Yes | The outcome was based on an EDSS landmark and was assessed at 15‐year follow‐up. Predictor information was probably known, but we consider EDSS to be a robust outcome measure. |
| Analysis | No | The sample size was too low and the number of events was not reported. Participants lost to follow‐up were excluded from the analysis instead of being accounted for in time‐to‐event analysis. Calibration was not assessed. Shrinkage was not applied and only apparent performance measures were reported. |
| Overall | No | At least one domain is at high risk of bias. |
Tacchella 2018.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Routine care, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Outpatients of the MS service of Sant'Andrea hospital in Rome, Italy Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria McDonald 2017 (Thompson 2018b) Treatment Unclear timing and distribution, 89.3% on DMTs, 43% on first‐line treatments, 57% on second‐line treatments Disease description Not reported Recruitment period Not reported |
|
| Predictors |
Considered predictors Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, timed 25‐foot walk, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score Number of considered predictors 46 Timing of predictor measurement At visit of interest Predictor handling Continuously |
|
| Outcome |
Outcome definition Conversion to progressive MS: SP stage defined as a history of gradual worsening following the initial RR course determined by objective measure of change of disability (EDSS score) independent of relapses over a period of at least 6 or 12 months Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events)
Modelling method Random forest Predictor selection method
Hyperparameter tuning Default parameters of SciKit library Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, (a form of) LOOCV Calibration estimate Not reported Discrimination estimate c‐Statistic:
Classification estimate Not reported Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model 46 Predictors in the model Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, timed 25‐foot walk, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To obtain predictions on the probability that MS patients in the RR phase will convert to a SP form within a certain time frame Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on collective intelligence rather than individual prediction. Model interpretation Exploratory Suggested improvements (For hybrid model) to investigate the best ways to combine predictions of different agents, to recruit more expert opinions |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Routine care data were used with no reported inclusion/exclusion criteria other than diagnostic subtype. |
| Predictors | Yes | The study was conducted on data from a single centre, and data were collected according to international standards. |
| Outcome | Yes | The outcome was defined based on gradual increase in EDSS. |
| Analysis | No | 180 days: Calibration was not assessed. Parameter and model tuning was not described. The number of events was low. The final model is unclear. 360 days and 720 days: Calibration was not assessed. Parameter and model tuning was not described. The number of events is low. There is no indication that the model is fit to the entire dataset to finalise the model. |
| Overall | No | At least one domain is at high risk of bias. |
Tommasin 2021.
| Study characteristics | ||
| General information |
Model name Radiological Primary source Journal Data source Unclear, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment The Human Neuroscience Department of Sapienza University, the MS centre of the Federico II University, Italy Age (years) Mean 39.7 Sex (%F) 63.8 Disease duration (years) Mean 9.9 (SD 8.06) Diagnosis 74.8% RRMS, 25.2% PMS Diagnostic criteria Mixed: McDonald 2010 (Polman 2011), McDonald 2017 (Thompson 2018b) Treatment Unclear timing, 32.5% first line, 39.9% 2nd line, 27.6% none Disease description EDSS median (range): 3.0 (0.0 to 7.5) Recruitment period 2003 to 2018 |
|
| Predictors |
Considered predictors Clinical: disease duration, age, sex, disease phenotype, EDSS at baseline, therapy, time‐to‐follow‐up; radiological: mean diffusivity of normal appearing WM, GM volume, WM volume, T2 lesion load, cerebellar volume, thalamic volume, fractional anisotropy of normal appearing WM, site, random feature Number of considered predictors 16 Timing of predictor measurement At assessment (not defined), at follow‐up Predictor handling Continuously |
|
| Outcome |
Outcome definition Disability (EDSS): disability progression defined as a minimum increase in EDSS (since baseline EDSS) of 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5) Timing of outcome measurement Follow‐up mean (SD, range): 3.93 years (0.95 years, 2 years to 6 years) |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events) 163 (58) Modelling method Random forest Predictor selection method
Hyperparameter tuning Not reported Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 1000 random splits (only those with accuracy difference < 0.02 between training and validation considered) Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.92 Classification estimate Accuracy = 0.92, sensitivity = 0.92, specificity = 0.91 Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation List of predictors (model selected) Number of predictors in the model 4 Predictors in the model T2 lesion load, cerebellar volume, thalamic volume, fractional anisotropy of normal appearing WM Effect measure estimates Not reported Predictor influence measure Feature importance (percentage of classifiers in which predictor more important than random feature) Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To evaluate the accuracy of a data‐driven approach, such as machine learning classification, in predicting disability progression in MS Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on imaging and machine learning. Model interpretation Exploratory Suggested improvements Prospective studies to evaluate other aspects of brain involvement, as well as other CNS structures (e.g. spinal cord) using additional techniques (e.g. fMRI, MTR, qMRI) |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data source is unclear. Participants were included based on availability of follow‐up data at least 2 years later. |
| Predictors | Unclear | The final model contains radiological predictors that were assessed by trained experts at 2 centres at study entry. However, it is unclear if follow‐up time was included in the final model as a predictor. |
| Outcome | No | The timing of the outcome assessment was any time between 2 years and 6 years, making assessment different across patients. |
| Analysis | No | The sample size was small. No information on missing data was reported. It was unclear if the differing follow‐up time among the patients was appropriately accounted for. Only discrimination was assessed. It was not clear that the methods used optimally accounted for overfitting and optimism. A final model was not presented. |
| Overall | No | At least one domain is at high risk of bias. |
Tousignant 2019.
| Study characteristics | ||
| General information |
Model name 3D CNN + lesion masks Primary source Conference proceeding Data source Randomised trial participants, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Participants in 2 large proprietary, multi‐scanner, multi‐centre clinical trials (names not reported) Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment 0% Disease description Not reported Recruitment period Not reported |
|
| Predictors |
Considered predictors MRI channels: volumes from T1‐weighted pre‐contrast (T1c), T1‐weighted post‐contrast (T1p), T2‐weighted (T2w), proton density‐weighted (Pdw), fluid‐attenuated inversion (FLAIR); T2‐weighted lesion masks; gadolinium enhanced lesion masks Number of considered predictors Non‐tabular data Timing of predictor measurement At imaging Predictor handling Continuously |
|
| Outcome |
Outcome definition Disability (EDSS): increase in EDSS score within 1 year and sustained for ≥ 12 weeks (baseline EDSS of 0: increase of ≥ 1.5; baseline EDSS of 0.5 to 5.5: increase of ≥ 1; baseline EDSS of ≥ 6: increase of ≥ 0.5) Timing of outcome measurement 1 year |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events) 1083, unit of analysis is probably observations, of 465 participants (103) Modelling method 3D convolutional neural network Predictor selection method
Hyperparameter tuning Unclear, tuning parameters and cross‐validation mentioned, but not tuning details Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 4‐fold (75% training, 15% validation, 10% test) Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.701 (SD 0.027) Classification estimate Not reported Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation List of predictors (no selection) Number of predictors in the model Unstructured data Predictors in the model MRI channels: volumes from T1‐weighted pre‐contrast (T1c), T1‐weighted post‐contrast (T1p), T2‐weighted (T2w), proton density‐weighted (Pdw), fluid‐attenuated inversion (FLAIR); T2‐weighted lesion masks; gadolinium enhanced lesion masks Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To present the first automatic end‐to‐end deep learning framework for the prediction of future patient disability progression (1 year from baseline) based on multi‐modal brain magnetic resonance images (MRI) of patients with multiple sclerosis (MS) Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Alternative ways of quantifying uncertainty, adapting architecture to leverage longitudinal clinical information (e.g. age, disability stage) |
|
| Notes |
Applicability overall High Applicability overall rationale The predictors used were imaging features and no other predictor domain was considered for use in the model. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Unclear | The data source was unspecified randomised clinical trials but the participants were excluded based on incomplete follow‐up without any specification of the reasons or the number. |
| Predictors | Yes | The predictors were collected during a clinical trial, so we expect them to be defined and assessed homogeneously. Several scanners across multiple sites were used, but standardisation across sites is mentioned. Expert raters were used in semi‐automated procedures. |
| Outcome | Yes | We consider standard EDSS outcomes rather objective. The outcome was assessed within clinical trials. |
| Analysis | No | The sample size was probably small (the highest possible number of events was 103 with 7 inputs and a very complex model). Discrimination was addressed but not calibration. It was unclear whether complexities in the data were appropriately addressed as the analysis appeared to be at the visit level. It was unclear if the optimism in performance was accounted for due to the fact that the outcome assessment periods might be overlapping. No model was provided for future use. |
| Overall | No | At least one domain is at high risk of bias. |
Vasconcelos 2020.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Unclear Study type Development + external validation, time |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment MS Centre of the Hospital da Lagoa in Rio de Janeiero, Brazil Age (years)
Sex (%F)
Disease duration (years)
Diagnosis 100% RRMS Diagnostic criteria Mixed: Poser 1983, McDonald 2001 Treatment
Disease description Patients with more than one relapse at first year of disease: 74% Recruitment period 1993 to 2017 |
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition Conversion to progressive MS: time elapsed until the year of confirmed progressive and sustained worsening for at least 6 months and not associated with the occurrence of acute relapse, an irreversible increase of at least 1.0 points in the EDSS when its value was ≤ 5.5 or 0.5 point when it was > 5.5 (independent of relapses and corticosteroid treatment) Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value
Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events)
Modelling method
Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method
Calibration estimate
Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups 2 risk categories: high (> 2 points), low (≤ 2 points) |
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure Not applicable Validation model update or adjustment
|
|
| Interpretation |
Aim of the study To construct a clinical risk score for MS long‐term progression that could be easily applied in clinical practice Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Exploratory Suggested improvements Validation in different cohorts, especially those with greater diversity concerning the genetic background, and exploration of other factors capable of influencing disease progression (e.g. neuroimaging data) |
|
| Notes |
Applicability overall Low |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data source is unclear. Excluding participants without complete follow‐up may introduce selection bias. Also, at least 248 patients were excluded for missing data. |
| Predictors | Unclear | There is no reason to believe predictors were assessed differently across patients. Predictors were collected from onset up to at least 2 years later. It is not clearly stated at what point the model was applied and whether onset referred to CIS onset or RR onset. |
| Outcome | Yes | The outcome was based on observing an EDSS increase that was confirmed at a later time point. It is unclear whether the outcome was assessed blinded to the predictors, but we do not consider this to be problematic because EDSS assessment is relatively objective, and the definition required confirmation at 6 months. Participants have regular follow‐ups due to inclusion criteria, so assessment timing is likely homogenous. |
| Analysis | No | Dev: The EPV was 11, which is relatively low. Continuous variables such as age were treated as binary variables. Univariable predictor selection was used. Discrimination and calibration were not addressed properly. The statistical model was simplified into an unweighted sum score (by unclear rounding rules) without the performance of this model being assessed. Besides the large number of participants excluded for irregularly timed data, only 1 participant was reported as being excluded after enrolment, and this probably had little effect on results. Although an external validation set was reported, no assessment was performed for need of shrinkage. Ext Val: The number of events was low, and discrimination was not addressed. Complete case analysis was used for enrolled participants, but this only led to a drop of 2 participants. |
| Overall | No | At least one domain is at high risk of bias. |
Vukusic 2004.
| Study characteristics | ||
| General information |
Model name Not applicable Primary source Journal Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment
Age (years) Mean 30.0 Sex (%F) 100.0 Disease duration (years) Mean 6 (SD 4) Diagnosis 96% RRMS, 4% SPMS Diagnostic criteria Poser 1983 Treatment
Disease description DSS at beginning of pregnancy mean (SD): 1.3 (1.4), annualised relapse rate during the year before pregnancy (95% CI): 0.7 (0.6 to 0.8) Recruitment period 1993 to 1995 |
|
| Predictors |
Considered predictors Number of relapses in pre‐pregnancy year, number of relapses during pregnancy, DSS at pregnancy onset, epidural analgesia (ref: no), breast‐feeding (ref: no), total number of relapses before pregnancy, disease duration, age at multiple sclerosis onset, age at pregnancy onset, number of previous pregnancies, child gender (ref: male) Number of considered predictors 11 Timing of predictor measurement At study baseline (cohort entry during pregnancy week 4 to 36), at examinations at 20, 28, 36 weeks of gestation, and also post‐partum Predictor handling Continuously |
|
| Outcome |
Outcome definition Relapse: a post‐partum relapse, defined as the appearance, reappearance or worsening of symptoms of neurological dysfunction lasting > 24 hours; fatigue alone not considered as a relapse Timing of outcome measurement During 3 months after delivery |
|
| Missing data |
Number of participants with any missing value ≥ 17, unclear exactly how many participants have any missing Missing data handling Complete case |
|
| Analysis |
Number of participants (number of events) 223 (63) Modelling method Logistic regression Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.72 Classification estimate Accuracy = 0.72 (cutoff = 0.5) Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Full regression model Number of predictors in the model 3 Predictors in the model Number of relapses in pre‐pregnancy year, number of relapses during pregnancy, MS duration Effect measure estimates OR (95% CI): number of relapses in pre‐pregnancy year 1.94 (1.32 to 2.80), number of relapses during pregnancy 1.87 (1.12 to 3.13), MS duration 1.11 (1.03 to 1.20) Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To report the 2‐year post‐partum follow‐up and to analyse the factors predictive of relapse in the 3 months after delivery Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on predictor identification. Model interpretation Exploratory Suggested improvements Not reported |
|
| Notes |
Applicability overall Low Auxiliary references Confavreux C, Hutchinson M, Hours MM, Cortinovis‐Tourniaire P, Moreau T. Rate of pregnancy‐related relapse in multiple sclerosis. Pregnancy in multiple sclerosis group. N Engl J Med 1998;339(5):285‐91. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | The study used cohort study data collected to assess the effect of pregnancy on MS courses, and the inclusion criteria are appropriate. |
| Predictors | No | Almost 50% of patients were not followed up prospectively, and nearly 25% were not known to their neurologists before recruitment. Hence, the number of relapses before pregnancy was probably collected non‐uniformly from a mixture of patients and neurologists in a retrospective or prospective manner. |
| Outcome | Yes | The outcome is a relatively objective one, so even if the predictor information was available at the time of its assessment, it would not introduce risk of bias. |
| Analysis | No | The EPV was below 10. Calibration was not addressed, and only apparent validation was reported. Participants lost to follow‐up were excluded from the analysis. Reporting of missing data handling was ambiguous but probably based on complete case analysis. |
| Overall | No | At least one domain is at high risk of bias. |
Weinshenker 1991.
| Study characteristics | ||
| General information |
Model name M3 Dev Primary source Journal Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria MS Exclusion criteria Not reported Recruitment Consecutive patients referred to the MS Clinic at the University Hospital in London, Ontario, Canada Age (years) Mean 30.5 (onset) Sex (%F) 65.7 Disease duration (years) Mean 11.9 (SE 0.3) Diagnosis Other: 65.8% RRMS, 14.8% relapsing progressive, 18.7% chronically progressive, 0.9% unknown/83.3% diagnosis probable, 16.4% diagnosis possible Diagnostic criteria Poser 1983 Treatment 0% Disease description Not reported Recruitment period 1972 to 1984 |
|
| Predictors |
Considered predictors Unclear if it is the complete list, age at onset, sex, seen at onset of MS, initial symptoms ‐ motor, systems involved ‐ brainstem, systems involved cerebellar, systems involved cerebral, systems involved ‐ pyramidal, (in other models: initial symptoms ‐ limb ataxia and balance, remitting at onset, first interattack interval, number of attacks in first 2 years, DSS at 2 years, DSS at 5 years) Number of considered predictors ≥ 13 (unclear if complete list) Timing of predictor measurement At assessment (not defined), at follow‐up Predictor handling Continuously |
|
| Outcome |
Outcome definition Disability (DSS): time to reach DSS 6 Timing of outcome measurement Follow‐up for 12 years |
|
| Missing data |
Number of participants with any missing value 38 Missing data handling Complete case |
|
| Analysis |
Number of participants (number of events) 1060 (498) Modelling method Survival, Weibull Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset None Performance evaluation method Not applicable Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Full regression model Number of predictors in the model 7 Predictors in the model Age at onset, seen at MS onset, motor (insidious), brainstem, cerebellar, cerebral, pyramidal Effect measure Log HR (SE): intercept 4.25 (0.132), age at onset −0.030 (0.003), seen at MS onset −0.568 (0.104), motor (insidious) −0.224 (0.077), brainstem −0.184 (0.061), cerebellar −0.430 (0.073), cerebral −0.255 (0.100), pyramidal −0.230 (0.090), scale 0.648 (0.022) Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study A multivariate hierarchical analysis to assess the significance of several demographic and clinical factors in multiple sclerosis patients (analysis similar to multiple regression was used to generate predictive models which permit the calculation of the median time to DSS 6 for patients with a given set of covariates). Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the factors. Model interpretation Exploratory Suggested improvements Not reported |
|
| Notes |
Applicability overall Low Auxiliary references Weinshenker BG, Bass B, Rice GP, Noseworthy J, Carriere W, Baskerville J, et al. The natural history of multiple sclerosis: a geographically based study. I. Clinical course and disability. Brain 1989;112 (Pt 1):133‐46. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Yes | The authors described collecting a clinical cohort that intended to include all MS patients in the geographical area. They followed up with the patients regularly, and the study data were separate from the routine clinical charts. No inclusion criteria were discussed explicitly, but the study aimed to include all patients with MS in the entire area and called itself a natural history study. |
| Predictors | No | Although this study is a population‐based cohort and standardised data fields were created with MS research in mind, almost 4/5 of the patients were not seen from the onset onwards. Thus, data on predictors related to the onset were collected differently for patients as retrospectively or prospectively. |
| Outcome | Yes | The outcome was probably defined with knowledge of the predictors because only a few clinicians saw the patients in a routine care setting. DSS 6 is a relatively 'hard' outcome in which patients become dependent on a walking aid, so we judge the risk of bias due to knowledge of predictors to be low. |
| Analysis | No | Although not all enrolled participants were included in modelling due to complete case analysis, the missing data were in less than 5% of patients and hence are not expected to introduce risk of bias. Calibration and discrimination were both not addressed; also, model optimism was not addressed. The evaluation only included patients experiencing the outcome instead of using methods that account for the censoring. |
| Overall | No | At least one domain is at high risk of bias. |
Weinshenker 1996.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Routine care, secondary Study type
|
|
| Participants |
Inclusion criteria Not reported Exclusion criteria Not reported Recruitment Consecutive participants seen by first author at Ottawa Regional MS clinic, Canada Age (years) Mean 44.1 Sex (%F) 69.1 Disease duration (years) Mean 12 Diagnosis Other: 84.3% RRMS, 2.0% relapsing progressive, 13.7% chronically progressive Diagnostic criteria Not reported Treatment Tot reported Disease description Unclear Recruitment period Not reported |
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition
Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value
Missing data handling Complete case |
|
| Analysis |
Number of participants (number of events)
Modelling method
Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method
Calibration estimate Not reported Discrimination estimate Not reported Classification estimate
Overall performance
Risk groups Not reported |
|
| Model |
Model presentation
Number of predictors in the model
Predictors in the model
Effect measure
Predictor influence measure Not applicable Validation model update or adjustment
|
|
| Interpretation |
Aim of the study
Primary aim
Model interpretation Exploratory Suggested improvements Implicitly suggests that model should be applied to correct patient population (based on temporal course and baseline disability) as opposed to any patients available |
|
| Notes |
Applicability overall
Applicability overall rationale
|
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | Although the authors discussed the probable lack of referral bias, the data were collected for reasons other than this study where no inclusion/exclusion criteria other than the diagnosis were reported. |
| Predictors | No | Short‐term: The stated interest was in predicting 15‐year outcomes, but the data used were at 15 months after baseline, which brought the window down to less than 14 years. The intended time of the model used is unclear. M3 Ext Val: The fact that 'seen at MS onset' was a predictor in the model made it likely that the data on the participants were collected as a mixture of retrospective and prospective, just like in Weinshenker 1991. |
| Outcome | Yes | Short‐term: We rated this domain for this analysis as having a high risk of bias. The short‐term progression was not confirmed, although the EDSS might fluctuate. Although this study probably pre‐dates the standard definition of progression, an outcome based on EDSS is standard and considered to be objective. M3 Ext Val: We rated this domain for this analysis as having a low risk of bias. Although the outcome was probably assessed with the knowledge of the predictors, DSS 6 can be considered a hard outcome; thus, knowledge of predictors introduces a little risk of bias. |
| Analysis | No | Short‐term: The EPV was low. Disease duration was dichotomised, justified by clinical knowledge, but the nonlinearity could have been more thoroughly explored. Many participants were excluded without reporting reasons and comparing these participants to those included. Complete case analysis was used. Neither calibration nor discrimination was assessed, and classification measures were assessed in‐sample. Follow‐up time was added as a predictor instead of using methods to deal with different observation times. M3 Ext Val: The number of events was far below 100, only complete case analysis was done, and the models were not evaluated using calibration and discrimination measures accounting for censoring. |
| Overall | No | At least one domain is at high risk of bias. |
Wottschel 2015.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Cohort, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment UK Age (years)
Sex (%F)
Disease duration (years) Mean 0.1 (SD 0.07) Diagnosis 100% CIS Diagnostic criteria Not reported Treatment 0% Disease description EDSS median (range): 1 (0 to 8) Recruitment period 1995 to 2004 |
|
| Predictors |
Considered predictors Age, gender, type of CIS (brainstem/cerebellum, spinal cord, optic neuritis, other), EDSS, lesion count, lesion load, average lesion PD intensity, average lesion T2 intensity, average distance of lesions from the centre of the brain, presence of lesions in proximity of the centre of the brain, the shortest horizontal distance of a lesion from the vertical axis of the brain, lesion size profile Number of considered predictors 14 Timing of predictor measurement At disease onset (CIS) and up to a mean of 6.15 weeks (SD 3.4) after disease onset Predictor handling Continuously (polynomial kernel) |
|
| Outcome |
Outcome definition Conversion to definite MS: clinical conversion to MS due to the occurrence of a second clinical attack attributable to demyelination of more than 24 hours in duration and at least 4 weeks from the initial attack Timing of outcome measurement
|
|
| Missing data |
Number of participants with any missing value
Missing data handling
|
|
| Analysis |
Number of participants (number of events) 1 year: 74 (22) 3 years: 70 (31) Modelling method Support vector machine, polynomial kernel Predictor selection method
Hyperparameter tuning Several values for polynomial degree considered in cross‐validation, other tuning parameters not reported Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, LOOCV repeated on 100 balanced bootstrap samples Calibration estimate Not reported Discrimination estimate Not reported Classification estimate
Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation List of selected predictors and kernel degree Number of predictors in the model
Predictors in the model
Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To determine if machine learning techniques, such as support vector machines (SVMs), can predict the occurrence of a second clinical attack, which leads to the diagnosis of clinically definite multiple sclerosis (CDMS) in patients with a clinically isolated syndrome (CIS), on the basis of single patient's lesion features and clinical/demographic characteristics Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably exploratory Suggested improvements Use automatically derived features (instead of semi‐automated/manual features), features containing information on different aspects of imaging data (scale, directionality), imaging features not related to lesions (magnetism transfer imaging), other para‐clinical predictors (OCB, grey matter atrophy, genetic factors, spinal cord lesions, cortical lesions, Gd enhancing lesions), larger independent dataset, including temporal ordering of events, novel algorithms |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data source was a secondary use of cohort study, but only participants with complete data were included. |
| Predictors | Yes | It is unclear if the neurologist circling the lesions was informed of the outcome. Still, we do not believe this would induce any considerable bias as imaging is considered to be an objective predictor. Other predictors are basic and objective. |
| Outcome | Yes | Although the outcome might have been measured with knowledge of the predictors, clinical attacks are considered objective. |
| Analysis | No | 1 year: The EPV was very low. Only classification measures were evaluated. Model selection and model evaluation occurred on the same data. The final model is unclear. 3 year: The EPV was very low. 4 participants were lost to follow‐up by 3 years, but this was only a small amount of the total patients. Only classification measures were evaluated. Model selection and model evaluation occurred on the same data. The final model is unclear. |
| Overall | No | At least one domain is at high risk of bias. |
Wottschel 2019.
| Study characteristics | ||
| General information |
Model name BCGLMS Primary source Journal Data source Cohort, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment
Age (years) Mean 32.7 (onset) Sex (%F) 66.3 Disease duration (years) Up to 0.27 Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Not reported Disease description EDSS median (range): 2 (0 to 8) Recruitment period Not reported |
|
| Predictors |
Considered predictors Global features: (whole‐brain measures) GM volume, WM volume, brain volume as a percentage of the intracranial volume, age, sex, CIS type (brainstem/optic nerve/spinal cord/other), EDSS; region of interest (ROI) features: 143 ROIs (excluding ROIs describing ventricles, skull and background) based on the Neuromorphometrics atlas, each ROI from the brain parcellation used to mask each patient's GM probability map, CT map, lesion segmentation and T1 scan (to estimate the volume); lobe features: (ROIs were merged into nine larger areas according to their anatomical location) limbic, insular, frontal, parietal, temporal, occipital, cerebellum, GM and WM, deep grey matter defined as thalamus, hippocampus, nucleus accumbens, amygdala, caudate nucleus, pallidum, putamen and basal ganglia Number of considered predictors 214 Timing of predictor measurement At disease onset (CIS) and up to 14 weeks after disease onset Predictor handling Continuously |
|
| Outcome |
Outcome definition Conversion to definite MS: occurrence of a second clinical episode Timing of outcome measurement 1 year |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events) 400 (91) Modelling method Support vector machine, linear kernel Predictor selection method
Hyperparameter tuning Not applicable Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, k‐fold CV (k = 2, 5, 10 (where possible), LOO) repeated on 100 balanced bootstrap samples Calibration estimate Not reported Discrimination estimate Not reported Classification estimate 5‐fold CV: accuracy = 0.685 (95% CI 0.683 to 0.687), sensitivity = 0.678, specificity = 0.693, LOOCV: accuracy = 0.708 (95% CI 0.706 to 0.71), sensitivity = 0.703, specificity = 0.713 Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation List of selected predictors for peak accuracy when using 2‐fold CV Number of predictors in the model 36 (for 2‐fold CV) Predictors in the model Type of CIS, WM lesion load ‐ whole brain, WM lesion load ‐ frontal, WM lesion load ‐ limbic, WM lesion load ‐ temporal, WM lesion load ‐ dGM, WM lesion load ‐ WM, GM ‐ cerebellum, GM ‐ thalamus, GM ‐ frontal operculum, GM ‐ middle cingulate gyrus, GM ‐ precentral gyrus medial segment, GM ‐ posterior cingulate gyrus, GM ‐ praecuneus, GM ‐ parietal operculum, GM ‐ post‐central gyrus, GM ‐ planum polare, GM ‐ subcallosal area, GM ‐ supplementary motor cortex, GM ‐ superior occipital gyrus, cortical thickness ‐ central operculum, cortical thickness ‐ cuneus, cortical thickness ‐ fusiform gyrus, cortical thickness ‐ inferior temporal gyrus, cortical thickness ‐ middle occipital gyrus, cortical thickness ‐ post‐central gyrus medial segment, cortical thickness ‐ occipital pole, cortical thickness ‐ opercular part of the inferior frontal gyrus, cortical thickness ‐ orbital part of the inferior frontal gyrus, cortical thickness ‐ planum temporale, cortical thickness ‐ superior occipital gyrus, volume ‐ whole brain, volume ‐ ventral diencephalon, volume ‐ middle temporal gyrus, volume ‐ supramarginal gyrus, volume ‐ limbic Effect measure estimates Not reported Predictor influence measure Not reported Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To distinguish CIS converters from non‐converters at onset of a CIS, using recursive feature elimination and weight averaging with support vector machines. Also, to assess the influence of cohort size and cross‐validation methods on the accuracy estimate of the classification Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the influence of sample size and CV methods on results. Model interpretation Probably exploratory Suggested improvements To compare 2 or more cross‐validation schemes to estimate potential biases when it is not possible to use completely distinct data sets for training and testing, advanced imaging techniques such as magnetisation transfer imaging (MTR) or double or phase‐shifted inversion recovery (DIR/PSIR), genetic or environmental predictors, larger cohort, longitudinal MRI data, prospective harmonised imaging protocols |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | A multicentre cohort was used, but patients with missing data were excluded. |
| Predictors | Yes | Even though imaging predictors might have been a result of preprocessing after the outcome, we do not expect such information to affect an automatised procedure. Imaging data from several sites were used, but they were all MAGNIMS sites and collaborated in defining imaging protocols for the field. |
| Outcome | Yes | Although the outcome might have been measured with knowledge of the predictors, clinical attacks are considered objective. |
| Analysis | No | The EPV was very low. Only classification measures were evaluated. Selection and assessment occurred at the same resampling level. The final model is unclear. |
| Overall | No | At least one domain is at high risk of bias. |
Ye 2020.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Unclear, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Unclear, Sheba Medical Centre, Israel Age (years) Mean 36.3 (unclear when) Sex (%F) 63.8% Disease duration (years) Mean 5.7 (pooled SD 0.89) Diagnosis 34.0% CIS, 66.0% CDMS Diagnostic criteria McDonald 2001 Treatment Unclear timing, 30.9% on DMT Disease description EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1) Recruitment period Not reported |
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement At study baseline (cohort entry) Predictor handling
|
|
| Outcome |
Outcome definition Relapse: relapse‐free survival (relapse defined as the onset of new objective neurological symptoms and signs or worsening of existing neurological disability not accompanied by metabolic changes, fever or other signs of infection, and lasting for a period of at least 48 hours accompanied by objective change of at least 0.5 in the EDSS score) Timing of outcome measurement Follow‐up mean (SD): 1.97 (1.3) |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Single imputation, using k‐nearest neighbours |
|
| Analysis |
Number of participants (number of events) 94 (64) Modelling method
Predictor selection method
Hyperparameter tuning
Shrinkage of predictor weights
Performance evaluation dataset Development Performance evaluation method
Calibration estimate Not reported Discrimination estimate
Classification estimate Not reported Overall performance Not reported Risk groups
|
|
| Model |
Model presentation
Number of predictors in the model 5 Predictors in the model
Effect measure estimates
Predictor influence measure
Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To develop and validate an effective and noninvasive prognostic gene signature for predicting the probability of relapse and remission period in MS patients via an integrated analysis of blood microarrays Primary aim The primary aim of this study is the prediction of individual outcomes. Model interpretation Probably confirmatory Suggested improvements Include Asian participants |
|
| Notes |
Applicability overall
Applicability overall rationale
Auxiliary references Gurevich M, Tuller T, Rubinstein U, Or‐Bach R, Achiron A. Prediction of acute multiple sclerosis relapses by transcription levels of peripheral blood cells. BMC Medical Genomics [Electronic Resource] 2009;2:46. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Unclear | The data source was not clearly reported in this study, nor in the original study from which the data came. |
| Predictors | Yes | According to the study from which the data came, although microarray analysis of transcriptome can be affected by different batch effects, there were efforts to circumvent it. Even though the microarray analysis might have occurred after the outcomes became known, the procedure was relatively automated and expected not to be affected by the information. The intended time of model use concerning patient's disease history is unclear, but it may be any time that blood was drawn. |
| Outcome | Yes | Information related to the outcome was retrieved from Gurevich 2009. A standard definition of relapse was used, which we considered robust to possible predictor knowledge. |
| Analysis | No | 5‐gene signature: The number of predictors compared to number of events was too large. Univariable analysis was used for predictor selection. Although no information related to missing data was reported, the number of included patients matches with the publication on the data source. Only discrimination, which did not appear to account for censored‐data, was assessed. A random split was used for assessment. Parameter tuning was not discussed and the plots corresponding to the Cox LASSO model selection do not correspond to the final model presented (plots depict optimal predictor number 11, not 5). Nomogram: The number of predictors compared to number of events was too large. Although no information related to missing data was reported, the number of included patients matches with the publication on the data source. Only discrimination was assessed. A random split was used for assessment, in addition to a bootstrap procedure in the training set that did not correct for optimism. |
| Overall | No | At least one domain is at high risk of bias. |
Yoo 2019.
| Study characteristics | ||
| General information |
Model name CNN EDT, pretraining, all user‐defined features Primary source Journal Data source Randomised trial participants, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Minocycline RCT participants recruited from MS clinics in Canada and USA:
Age (years) Mean 35.9 (onset) Sex (%F) 69.0 Disease duration (years) Median 0.2 (range 0.06 to 0.52) Diagnosis 100% CIS Diagnostic criteria Not reported Treatment
Disease description EDSS median (range): 1.5 (0 to 4.5) Recruitment period 2009 to 2013 |
|
| Predictors |
Considered predictors MRI mask images/user‐defined: T2w lesion volume, brain parenchymal fraction, diffusely abnormal white matter, gender, initial CIS event cerebrum, initial CIS event optic nerve, initial CIS event cerebellum, initial CIS event brainstem, initial CIS event spinal cord, EDSS, CIS monofocal or multifocal type at onset Number of considered predictors Non‐tabular data + 11 (user‐defined) Timing of predictor measurement At disease onset (CIS) (RCT baseline within 180 days after disease onset) Predictor handling Continuously except for DAWM, which was dichotomised (justified as binary being more reliable) |
|
| Outcome |
Outcome definition Conversion to definite MS (McDonald 2005 (Polman 2005)): MS at the end of 2 years determined by new T2 lesions, new T1 gadolinium enhancing lesions and/or new clinical relapse Timing of outcome measurement 2 years |
|
| Missing data |
Number of participants with any missing value 9, only missing outcome reported Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events) 140 (80) Modelling method Convolutional neural network Predictor selection method
Hyperparameter tuning Empirically determined L1 and L2‐norm parameters (Montavon 2012), early stopping convergence target found by test error increase during cross‐validation, grid search over several values for replication and scale factors using cross‐validation accuracy Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 7‐fold Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.746 (SD 0.114) Classification estimate Accuracy = 0.75 (SD 0.113), sensitivity = 0.787 (SD 0.122), specificity = 0.704 (SD 0.154) Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model Non‐tabular data + 11 Predictors in the model Non‐tabular: MRI mask images, tabular: T2w lesion volume, brain parenchymal fraction, diffusely abnormal white matter, gender, initial CIS event cerebrum, initial CIS event optic nerve, initial CIS event cerebellum, initial CIS event brainstem, initial CIS event spinal cord, EDSS, CIS monofocal or multifocal type at onset Effect measure estimates Not reported Predictor influence measure Average relative importance of the user‐defined features Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To determine whether deep learning can extract latent MS lesion features that, when combined with user‐defined radiological and clinical measurements, can predict conversion to MS ... in patients with early MS symptoms (clinically isolated syndrome), a prodromal stage of MS, more accurately than imaging biomarkers that have been used in clinical studies to evaluate overall disease state, such as lesion volume and brain volume Primary aim The primary aim of this study is not entirely on the prediction of individual outcomes. The focus is on the ability of deep learning to extract latent features. Model interpretation Exploratory Suggested improvements Examine more sophisticated strategies such as augmenting input feature vectors with the squared values or by taking polynomial combinations of feature vectors to increase feature dynamic range and creating an augmented network that has the ability to learn higher order features. |
|
| Notes |
Applicability overall High Applicability overall rationale Although this study contained a model, the main aim was not to create a model for prediction of individual outcomes but was rather to how the ability of deep learning algorithms to extract prognostic factors. Auxiliary references Metz LM, Li DKB, Traboulsee AL, Duquette P, Eliasziw M, Cerchiaro G, et al. Trial of minocycline in a clinically isolated syndrome of multiple sclerosis. N Engl J Med 2017;376(22):2122‐33. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Unclear | The data came from an RCT with well‐explained, appropriate inclusion/exclusion criteria. It is unclear why cerebrospinal fluid oligoclonal bands, or spinal MRI changes typical of demyelination, were required for participants over age 50 years. This means that the patients were known to have the outcome by more current diagnostic criteria. It is unclear if this introduces any risk of bias. |
| Predictors | Yes | Even though imaging predictors might have been a result of preprocessing after the outcome, we do not expect such information to affect an automatised procedure. The seeds were set by a single expert and checked by another single expert. Other predictors were assessed by MS clinicians. |
| Outcome | Yes | Although the outcome might have been measured with knowledge of the predictors, the diagnostic criteria are considered objective. |
| Analysis | No | The number of participants and events relative to the number of tabular features was very low. It was unclear how missing data were handled, but 12‐month outcomes were used for 9 participants. Calibration was not assessed. Evaluation and tuning occurred at the same level, where there was no nested structure to resampling. No final model/tool was given. |
| Overall | No | At least one domain is at high risk of bias. |
Yperman 2020.
| Study characteristics | ||
| General information |
Model name RF literature + time series predictors Primary source Journal Data source Routine care, secondary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Rehabilitation & MS Centre in Overpelt, Belgium Age (years) Mean 45.0 Sex (%F) 71.8 Disease duration (years) Not reported Diagnosis CIS 1.7%, RRMS 53.2%, SPMS 10.7%, PPMS 2.9%, unknown 32.9% Diagnostic criteria Unclear, unrecorded in the dataset Treatment
Disease description EDSS mean (SD): 3.0 (1.8) Recruitment period Not reported |
|
| Predictors |
Considered predictors Latencies, EDSS at T0, age, peak‐to‐peak amplitude (L and R), gender, type of MS, around 5885 time series features extracted from the EPTS Number of considered predictors 5893 Timing of predictor measurement At visit of interest Predictor handling Continuously |
|
| Outcome |
Outcome definition Disability (EDSS): disability progression defined as EDSS(T1) ‐ EDSS(T0) ≥ 1.0 for EDSS(T0) ≤ 5.5, or if EDSS(T1) ‐ EDSS(T0) ≥ 0.5 for EDSS(T0) > 5.5 Timing of outcome measurement Time from EDSS_baseline measurement to EDSS_outcome median (IQR): 1.98 years (1.84 years to 2.08 years), time from MEP_baseline measurement to EDSS_outcome median (IQR): 1.99 years (1.87 years to 2.08 years) |
|
| Missing data |
Number of participants with any missing value 3717, unclear exactly how many participants have any missing Missing data handling Mixed, exclusion, complete case (on visit level), and complete feature analysis |
|
| Analysis |
Number of participants (number of events) 2502 visits (unit of analysis is visit) of 419 participants (275) Modelling method Random forest Predictor selection method
Hyperparameter tuning Unclear, maximum number of features, number of trees and minimum samples for split chosen in cross‐validation Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 100 grouped stratified shuffle split within 1000 grouped stratified shuffle split Calibration estimate Calibration plot upon request Discrimination estimate c‐Statistic = 0.75 (SD 0.07) Classification estimate Not reported Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model ≤ 9 (unclear subset) Predictors in the model Selected predictors unclear, at least latencies, EDSS at T0, age Effect measure estimates Not reported Predictor influence measure 20 highest ranking features across all splits Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To investigate whether a machine learning approach that includes extra features from the EPTS can increase the predictive performance of EP in MS (progression in 2 years) Primary aim The primary aim of this study is not the prediction of individual outcomes. The focus is on the value of EP features. Model interpretation Exploratory Suggested improvements Data augmentation to expand the size of the training set, to stabilise the performance estimate, analysing the whole longitudinal trajectory of the patient, to use TS algorithms not included in HCTSA, using short timescale EPTS changes (e.g. 6 months) to predict EDSS changes on longer time‐scales or to detect non‐response to treatment, incorporating the left/right symmetry in a more advanced way, other variables such as MRI, cerebrospinal fluid, and genomic data, evaluation in larger datasets (preferably multicentre), VEP and SEP should be included in prediction process |
|
| Notes |
Applicability overall High Applicability overall rationale Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the added usefulness of EP with extra time series features using machine learning. Additionally, no final model was reported. Auxiliary references Fulcher BD, Little MA, Jones NS. Highly comparative time‐series analysis: the empirical structure of time series and their methods. J R Soc Interface 2013;10(83):20130048. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data source was routine care, and the exclusion of visits/measurements was dependent on the quality of measurements and the availability of the outcome measurement. |
| Predictors | No | Two different machines were used; however, our clinical authors do not find this to be problematic. The predictors were probably assessed without outcome knowledge and are available at prediction model use. However, the disease type variable did not exist in the original data source and was inferred based on other variables only for a subset of patients. |
| Outcome | No | Progression was not confirmed. Table 1 suggested similar rates of worsening across disease subtypes in the paper, which is not expected and could be due to the lack of confirmation. |
| Analysis | No | The sample size was small relative to the large number of predictors. Exclusion of patients for missing data was addressed in the Participants section, but further exclusion due to missing data seems likely. The analysis was done at the visit level. Group stratified internal validation was used to address this, but it is unclear if this was enough. The correlation between observations was not addressed in the fitting. It is unclear how it would be addressed had a final model been selected and fit to the entire dataset. Univariable selection was used. The feature extraction and standardisation were done on the entire dataset, instead of within cross‐validation, making data leakage possible. A calibration plot was provided at follow‐up and showed severe miscalibration. No final model seems to be selected, fit, and presented. |
| Overall | No | At least one domain is at high risk of bias. |
Zakharov 2013.
| Study characteristics | ||
| General information |
Model name Not applicable Primary source Journal Data source Unclear Study type Development |
|
| Participants |
Inclusion criteria Patients with monofocal CIS Exclusion criteria Not reported Recruitment Department of Neurology and Neurosurgery of the Samara State Medical University and at the Centre for MS at the Samara Regional Clinical Hospital, Russia Age (years) Mean 25.1 Sex (%F) 70.0 Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Not reported Disease description EDSS ≤ 2 Recruitment period 2004 to 2012 |
|
| Predictors |
Considered predictors Unclear if it is the complete list, age, number of foci, location of foci, size of the demyelination foci Number of considered predictors ≥ 2 (unclear if complete list) Timing of predictor measurement Unclear, at first MRI after CIS onset (timing distribution unknown) Predictor handling Not reported |
|
| Outcome |
Outcome definition Conversion to definite MS (McDonald 2010 (Polman 2011)): development of CDMS defined as the time of the onset of the second attack Timing of outcome measurement Follow‐up for 8 years |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling Not reported |
|
| Analysis |
Number of participants (number of events) 102 (23) Modelling method Logistic regression Predictor selection method Not reported Hyperparameter tuning Not applicable Shrinkage of predictor weights None Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Sensitivity = 0.727, specificity = 0.345 Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model 2 Predictors in the model Age at disease onset, size of the foci of demyelination Effect measure estimates Not reported Predictor influence measure Not applicable Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To study the clinical and instrumental parameters of the patient population with the first attack of the demyelinating process and involvement of only one functional system, most relevant to the term 'monofocal CIS' Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the predictors. Model interpretation Probably exploratory Suggested improvements Increase in the number of variables involved in the model, such variables as immunological indicators and data from neurophysiological methods – multimodal evoked potential |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on predictors and model timing, applicability is unclear. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Unclear | The data source is unclear without an indication of an associated study or registry. There were no detailed exclusion/inclusion criteria other than the diagnosis of monofocal CIS. |
| Predictors | Unclear | There are few details on how predictors were assessed or when they were assessed. |
| Outcome | Unclear | The definition of a second attack for a CDMS is standard and is expected to be measured relatively objectively. It is unclear how much time there was between the predictor assessment and outcome determination because the timing of the predictor measurement is unclear. It is unclear if there were regular visits for outcome assessment or it was assessed whenever/if a patient came in. |
| Analysis | No | Although many details of the analysis were not reported, there are clear indicators to assess the risk of bias of this domain as high. EPV is at most 11.5, based on the number of variables in the final model, not the unknown number of variables considered. There was no information on missing data, including censoring, during the 8‐year follow‐up period. The only model performance measures reported were sensitivity and specificity evaluated in the development set. A final model is not presented. |
| Overall | No | At least one domain is at high risk of bias. |
Zhang 2019.
| Study characteristics | ||
| General information |
Model name Shape Primary source Journal Data source Cohort, primary Study type Development |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Prospectively from a single centre (unclear), Germany Age (years) Mean 42.4 (unclear when) Sex (%F) 69.9 Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria McDonald 2010 (Polman 2011) Treatment
Disease description EDSS median 1 Recruitment period 2009 to 2013 |
|
| Predictors |
Considered predictors Total lesion number, total lesion volume, minimum, maximum, mean, standard deviation for surface area, sphericity, surface‐volume‐ratio, and volume of individual lesions, (other models: minimum, maximum, mean, standard deviation for skewness, kurtosis, entropy of intensity histograms) Number of considered predictors 30 Timing of predictor measurement At disease onset (CIS) (during primary clinical work‐up for CIS) Predictor handling Continuously (by summary statistics of parameters from multiple lesions within single patients) |
|
| Outcome |
Outcome definition Conversion to definite MS (McDonald 2010): demonstration of dissemination in time by a clinical relapse or the occurrence of new MRI lesions Timing of outcome measurement 3 years |
|
| Missing data |
Number of participants with any missing value 2 Missing data handling Exclusion |
|
| Analysis |
Number of participants (number of events) 84 (66) Modelling method Random forest, oblique ‐ linear multivariable model splitting Predictor selection method
Hyperparameter tuning Number of variables considered at each node (considered 3, sqrt (number of variables) and 7 variables) and number of trees (considered 100, 200 and 300 trees) were optimised on out‐of‐bag error during 3‐fold CV Shrinkage of predictor weights Modelling method Performance evaluation dataset Development Performance evaluation method Cross‐validation, 3‐fold Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 0.85 (95% CI 0.75 to 0.91), sensitivity = 0.94 (95% CI 0.85 to 0.98), specificity = 0.50 (95% CI 0.26 to 0.74), PPV = 0.87 (95% CI 0.81 to 0.91), NPV = 0.69 (95% CI 0.44 to 0.87), DOR = 15.50 (95% CI 3.93‐ to 60.98), balanced accuracy = 0.72 (posterior probability interval 0.60 to 0.82) Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model 18 Predictors in the model Total lesion number, total lesion volume, minimum, maximum, mean, standard deviation for surface area, sphericity, surface‐volume‐ratio, and volume of individual lesions Effect measure estimates Not reported Predictor influence measure Bootstrapped importance scores Validation model update or adjustment Not applicable |
|
| Interpretation |
Aim of the study To predict the conversion from CIS to multiple sclerosis (MS) based on the baseline MRI scan by studying image features of these lesions Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the new MRI features. Model interpretation Probably exploratory Suggested improvements Independent validation, other features (texture, advanced deep learning, clinical, paraclinical), predict disease course not only conversion |
|
| Notes |
Applicability overall High Applicability overall rationale The predictors used were imaging features and no other predictor domain was considered for use in the model. Auxiliary references Filippi M, Preziosa P, Meani A, Ciccarelli O, Mesaros S, Rovira A, et al. Prediction of a multiple sclerosis diagnosis in patients with clinically isolated syndrome using the 2016 MAGNIMS and 2010 McDonald criteria: a retrospective study. Lancet Neurol 2018;17(2):133‐42. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | No | The data source was described as a cohort, and patients could not have the outcome at baseline according to the information received during follow‐up. However, the amount of follow‐up was an inclusion criterion, which may introduce risk of bias. |
| Predictors | Yes | The predictors were collected at a single centre, and sensitivity to the lesion extraction method was examined. |
| Outcome | Yes | The outcome was based on well‐defined standard diagnostic criteria and we believe it is robust to knowledge of predictor information. |
| Analysis | No | The EPV was low. Neither calibration nor discrimination was addressed. During follow‐up, it was confirmed that patients were included in the analysis based on the availability of follow‐up, but the frequency of patients excluded from the analysis due to missing outcome data was less than 5% and hence is not expected to increase the risk of bias. The final model was unclear. |
| Overall | No | At least one domain is at high risk of bias. |
Zhao 2020.
| Study characteristics | ||
| General information |
Model name
Primary source Journal Data source Cohort, primary Study type Development + validation (unclear if model refit), location |
|
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment
Age (years)
Sex (%F)
Disease duration (years)
Diagnosis
Diagnostic criteria Not reported Treatment
Disease description
Recruitment period From 2000 onward |
|
| Predictors |
Considered predictors
Number of considered predictors
Timing of predictor measurement
Predictor handling
|
|
| Outcome |
Outcome definition Disability (EDSS): worsening defined as an increase in EDSS ≥ 1.5 Timing of outcome measurement Up to 5 years |
|
| Missing data |
Number of participants with any missing value Not reported Missing data handling
|
|
| Analysis |
Number of participants (number of events)
Modelling method
Predictor selection method
Hyperparameter tuning
Shrinkage of predictor weights
Performance evaluation dataset
Performance evaluation method
Calibration estimate Not reported Discrimination estimate
Classification estimate
Overall performance Not reported Risk groups Not reported |
|
| Model |
Model presentation Not reported Number of predictors in the model
Predictors in the model
Effect measure estimates
Predictor influence measure
Validation model update or adjustment
|
|
| Interpretation |
Aim of the study To apply machine learning techniques to predict the disability level of MS patients at the 5‐year time point using the first 2 years of clinical and neuroimaging longitudinal data Primary aim The primary aim of this study is somehow the prediction of individual outcomes. The focus is on machine learning methods. Model interpretation Probably exploratory Suggested improvements Time series models to better capture the temporal dependencies, incorporate genetic information, and additional biomarkers |
|
| Notes |
Applicability overall Unclear Applicability overall rationale Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. Additionally, it is unclear how many participants had already experienced the outcome by 2 years, at which time point the predictors were still being collected. Auxiliary references Bove R, Chitnis T, Cree BA, Tintoré M, Naegelin Y, Uitdehaag BM, et al. SUMMIT (serially unified multicenter multiple sclerosis investigation): creating a repository of deeply phenotyped contemporary multiple sclerosis cohorts. Mult Scler 2018;24(11):1485‐98. Gauthier SA, Glanz B I, Mandel M, Weiner HL. A model for the comprehensive investigation of a chronic autoimmune disease: the multiple sclerosis CLIMB study. Autoimmun Rev 2006;5(8):532‐6. |
|
| Item | Authors' judgement | Support for judgement |
| Participants | Unclear | The data were prospectively collected from a cohort. Although references cited in the article have some study inclusion/exclusion criteria, the number of patients used in this article does not match them. Thus, the inclusion/exclusion criteria used in the article are unclear. |
| Predictors | Unclear | Due to the fact that the data were from cohort studies, the predictors are expected to be similar. The intended time of model use is unclear, and the predictors from 2 years were used to predict 5‐year outcome. |
| Outcome | No | A pre‐specified outcome was probably used, but EDSS change was not confirmed. We are not concerned that the outcome assessment could be biased by knowledge of predictors' values. However, it is unclear how many patients had had the outcome by 2 years at the time point where predictors were still collected. |
| Analysis | No | XGBoost All, XGBoost Common, LightGBM All and LightGBM Common: The number of events per variable was low compared to the unknown number of predictors considered. Calibration was not assessed. Parameter tuning was reported to occur over a grid of values within an inner CV loop, so optimism was probably addressed. The final model was unclear. XGBoost Common and LightGBM Common: Calibration was not assessed. The model appears to be refit in the external validation set. |
| Overall | No | At least one domain is at high risk of bias. |
25FW (also seen as T25FW): timed 25‐foot walk 2D: 2‐dimensional 3D: 3‐dimensional 3:4‐DAP: 3,4‐diaminopyridine 4‐AP: 4‐aminopyridine 9‐HPT (also seen as 9HPT): 9‐hole peg test ABILHAND: interview‐based assessment of a patient‐reported measure of the perceived difficulty in using their hand to perform manual activities ACTH: adrenocorticotropic hormone Ada (AdaBoost): adaptive boosting ADL: activities of daily living AH: abductor hallucis AIC: Akaike information criterion AISM: Italian Multiple Sclerosis Society APB: abductor pollicis brevis AUC: area under the curve BCVA: best corrected visual acuity BENEFIT: Betaferon/Betaseron in Newly Emerging MS for Initial Treatment BIC: Bayesian information criterion BMA: Bayesian model averaging BMS: benign MS BPF: brain parenchymal fraction BPTF: Bayesian probabilistic tensor factorisation BREMS: Bayesian Risk Estimate for Multiple Sclerosis BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset CAO: clinician‐assessed outcomes CCA: current course assignment CCV: cerebral cortical volume CDMS: clinically definite multiple sclerosis CEL: contrast‐enhancing lesion CI: confidence interval CIS: clinically isolated syndrome CL: cortical lesion CLCN4: chloride voltage‐gated channel 4 CLIMB: Comprehensive Longitudinal Investigation of Multiple Sclerosis at Brigham and Women's CMCT: central motor conduction time CombiWISE: Combinatorial Weight‐adjusted Disability Score COMRIS‐CTD: Combinatorial MRI scale of CNS tissue destruction CNN: convolutional neural network CNS: central nervous system COPOUSEP: Corticothérapie Orale dans les Poussées de Sclérose en Plaques CPT: current procedural terminology CSF: cerebrospinal fluid CT: computed tomography CTh: cortical thickness CUIs: concept unique identifiers CXCL13: chemokine ligand 13 CV: cross‐validation DAWM: diffusely abnormal white matter Dev: development df: degrees of freedom Dgm: deep grey matter DIR: double inversion recovery DIS: dissemination in space DIT2010: dissemination in time according to McDonald 2010 criteria DMD: disease‐modifying drug DMT: disease‐modifying treatment DNA: deoxyribonucleic acid DSS: Disability Status Scale DT: decision time; decision tree EDSS: expanded disability status scale EDT: Euclidean distance transform EHR: electronic health record EP: evoked potential EPIC: expression, proteomics, imaging, clinical EPTS: evoked potential time series EPV: events per variable Ext: external F: female F1: F‐score FCA: future course assignment FLAIR: fluid‐attenuated inversion recovery FLP: first level predictor FREEDOMS: FTY720 Research Evaluating Effects of Daily Oral therapy in Multiple Sclerosis FS: functional systems FTP: fine tuning predictor GA: glatiramer acetate Gd (also seen as “GD”): gadolinium Gd‐DTPA: gadolinium diethylenetriamine penta‐acetic acid GM: grey matter GRU‐ODE‐Bayes: Gated Recurrent Unit‐Ordinary Differential Equation‐Bayes HLA: human leukocyte antigen HR: hazard ratio ICBM‐DTI: International Consortium of Brain Mapping diffusion tensor imaging ICD: International Classification of Disease IFN: interferon IgG: immunoglobulin G IL2: interleukin‐2 ILIRN: interleukin‐1 receptor antagonist IQR: interquartile range JHU‐MNI: Johns Hopkins University‐Montreal Neurological Institute KFSS: Kurtzke Functional Systems Scores LASSO: least absolute shrinkage and selection operator LGBM: light gradient‐boosting machine logMAR: logarithm of the minimum angle of resolution LOO: leave‐one‐out LOOCV: leave‐one‐out cross‐validation LR: logistic regression LSTM: long short‐term memory MAGNIMS: Magnetic Resonance Imaging in MS MEP (also seen as “mEPS”): motor evoked potentials mEPS: motor evoked potentials MF: motor function MFIS: Modified Fatigue Impact Scale ML: machine learning MNI: Montreal Neurological Institute MPI: multifactorial prognostic index MR: magnetic resonance MRI: magnetic resonance imaging MS: multiple sclerosis MSBASIS: MSBase Incident Study MS‐DSS: MS disease severity scale MSE: mean squared error MSFC: multiple sclerosis functional composite MSPS: multiple sclerosis prediction score MSSS: MS severity score MT/MTR: magnetisation transfer imaging NA: not applicable NDH‐9HPT: non‐dominant hand 9‐hole peg test NEMO: network modification tool NF‐L: neurofilament light chain level NHPT: nine‐hole peg test NMO: neuromyelitis optica NMOSD: neuromyelitis optica spectrum disorder NN: neural network NPV: negative predictive value NR: not reported NR2Y: number of relapses experienced in the first 2 years after MS onset O:E: observed to expected ratio OB: oligoclonal bands OCB: oligoclonal bands OCT: optical coherence tomography OFSEP: Observatoire Français de la Sclérose en Plaques ON: optic neuritis OND: other neurologic disease OR: odds ratio PASAT: Paced Auditory Serial Addition Test PBMC: peripheral blood mononuclear cells PD: patient‐determined PDCD2: human programmed cell death‐2 gene Pdw: proton density‐weighted PP: primary progressive PPMS: primary progressive MS PPV: positive predictive value PRIMS: pregnancy in MS PRO: patient‐reported outcome PSIR: phase‐shifted inversion recovery QOL: quality of life RCT: randomised controlled trial RF: random forest RH: relapse history RNA: ribonucleic acid RNRL: retinal nerve fibre layer ROC: receiver operating characteristic ROI: region of interest RR: relapsing‐remitting RRMS: relapsing‐remitting multiple sclerosis RT‐PCR: reverse transcription polymerase chain reaction SCL: spinal CL SD: standard deviation SDMT: symbol digits modality test SE: standard error SF‐36: 36‐Item Short Form Survey SMS: severe multiple sclerosis SMSreg: Swedish MS registry SNP: single nucleotide polymorphism SNRS: Scripps neurological rating scale SP: secondary progression SPMS: secondary progressive multiple sclerosis SUMMIT: Serially Unified Multicenter Multiple Sclerosis Investigation SVM: support vector machine T1c: T1‐weighted pre‐contrast T1p: T1 weighted post‐contrast T2LV (also seen as “T2 LV”): T2 lesion volume T2w: T2‐weighted TT2R: time to second relapse TWT: timed walk test Val: validation VFT: visual function test WBC: white blood cell WM: white matter XGB: extreme gradient boosting
Characteristics of excluded studies [ordered by study ID]
| Study | Reason for exclusion |
|---|---|
| Achiron 2006 | Ineligible model: there is no multivariable prognostic prediction model in this study. Rather, there is the description of EDSS evolution in a cohort and its comparison with new patients. |
| Ahlbrecht 2016 | Ineligible study type: the objective of this study is to assess associations between microRNAs detected in the CSF and conversion from CIS to RRMS. No model is developed or validated for prognostic prediction. |
| Andersen 2015 | Ineligible study type: the aim of this study is not to create a prognostic prediction model but to describe the natural history of the disease. |
| Azevedo 2019 | Ineligible study type: the aim of this conference abstract is to identify minimum clinically meaningful differences in brain atrophy rather than using multivariable models for prediction of future outcomes in individuals. |
| Barkhof 1997 | Ineligible model: this is a count‐score study. The multivariable logistic regression is used for predictor selection. This is followed by counting the abnormal variables and forming a univariable logistic regression with the count to derive predicted risk. |
| Brettschneider 2006 | Ineligible model: this study aims to assess whether cerebrospinal fluid biomarkers can improve diagnostic criteria for prediction of conversion from CIS to CDMS. However, no statistically developed multivariable models are used for predicting future conversion in individual patients; rather, diagnostic criteria are combined. |
| Bsteh 2021 | Ineligible model: the models aim to predict outcomes after treatment withdrawal, which can be considered treatment response. |
| Castellaro 2015 | Ineligible study type: based on the presented aim, results, and conclusion, the aim of this conference abstract is to show that specific brain measures are predictive of conversion. |
| Chalkou 2021 | Ineligible model: the objective of this study is treatment effect prediction, and the prognostic model is only a step towards that goal. |
| Costa 2017 | Ineligible study type: this poster presents a prognostic factor study that aims to investigate the prognostic role of different biomarkers. |
| Cutter 2014 | Ineligible study type: the objective of this poster presentation is not prognostic prediction but an indirect comparison of different treatment regimens. |
| Damasceno 2019 | Ineligible study type: the aim of this study is to analyse cognitive trajectories using longitudinal models. Hence, there is no prediction of outcomes in individuals. |
| Daumer 2007 | Ineligible model: there is no prediction in this study. A matching algorithm based on similarity is used, which is followed by a description of the data. |
| Dekker 2019 | Ineligible study type: the objective of this study is to show the predictive value of brain measures. The multivariable models fit in the study are not used for predictions and are not interpreted as prognostic models in the discussion section. |
| Esposito 2011 | Ineligible outcome: the outcome, classification of lesions as normal or abnormal, is not a clinical outcome. |
| Filippi 2010 | Ineligible study type: the objective of this conference abstract is to develop diagnostic criteria for MS. |
| Filippi 2013 | Ineligible study type: the objective of this study is to identify MRI predictors. Although random forests are used, it is to assess the importance of predictors for future outcomes. |
| Fuchs 2021 | Ineligible study type: the aim of this study is to compare the use of imaging features extracted from routine clinical data with modified methods to those collected according to research standards. |
| Gasperini 2021 | Ineligible model: the developed score is not statistically derived, and it is unclear whether the aim of the study is the prediction of treatment responses. |
| Gomez‐Gonzalez 2010 | Ineligible study type: the aim of this study is to demonstrate the use of an automated tool for oligoclonal band analysis and to show that the information extracted relates to patient subgroups. |
| Hakansson 2017 | Ineligible study type: the objective is to search for prognostic markers in CSF, and there is no individual‐level prediction with a multivariable model. |
| Ho 2013 | Ineligible population: this study aims to predict the risk of MS diagnosis in the general population, not the risk of future MS outcomes. |
| Ignatova 2018 | Ineligible study type: the aim of this study is to find predictors of progression. |
| Invernizzi 2011 | Ineligible model: this study investigates the prognostic value of an evoked potentials score, which is not a multivariable model for prognostic prediction. |
| Jackson 2020 | Ineligible timing: the outcome in this study is based on cross‐sectionally collected data, precluding prognostic prediction. |
| Kalincik 2013 | Ineligible study type: the stated aim of this study is to evaluate associations between genetic susceptibility markers and MS phenotypes. |
| Leocani 2017 | Ineligible study type: the objective is of this conference abstract is to demonstrate the prognostic value of evoked potentials, not prognostic prediction. |
| Morelli 2020 | Ineligible study type: the objective of this study is to show the predictive value of putamen hypertrophy with cognitive impairment instead of prognostic prediction. |
| Palace 2013 | Ineligible study type: the objective of this study is to assess cost‐effectiveness. |
| Pappalardo 2020 | Ineligible outcome: the study looks at endpoints that are unrelated to relapse, disability, or conversion to a more advanced disease stage. |
| Petrou 2018 | Ineligible study type: the aim of this conference abstract is to assess correlations between biomarkers and clinical outcomes. Also, the study contains no multivariable prognostic model for the prediction of future outcomes but rather assesses a non‐statistical combination of two biomarkers. |
| Preziosa 2015 | Ineligible study type: this study aims to show the value of MRI measures and uses a multivariable model to this end. |
| Rajda 2019 | Ineligible population: at the moment of prognostication, the people included are those yet to be diagnosed, and the outcome is differentiation of people with MS vs controls. |
| Rio 2019 | Ineligible model: this conference presentation compares different treatment response scores and a count score, which is not a multivariable model, with the stated intention of treatment response prediction. |
| Rodriguez 2012 | Ineligible study type: the aim is to apply a novel model to an MS dataset. The focus is not clinical prediction but demonstration of the methodology. |
| Rothman 2016 | Ineligible study type: in this study, multivariable models are used to assess the association between retinal measurements, visual function, and future disease disability rather than predicting individual outcomes. |
| Roura 2018 | Ineligible timing: this study aims to evaluate the longitudinal changes in brain fractal geometry and its association with disability worsening. Correspondence with the authors has confirmed that the models are not predicting future outcomes but current states. |
| Sbardella 2011 | Ineligible study type: the objective of this study is to demonstrate the predictive value of diffuse brain damage as opposed to prognostic prediction. |
| Schlaeger 2012 | Ineligible study type: the objective of this study is to demonstrate the predictive value of evoked potentials. |
| Srinivasan 2020 | Ineligible outcome: in this abstract, the presented outcomes (QoL, fatigue, depression, falls) are not related to clinical disability with respect to the definition we are using in our review. |
| Tintore 2015 | Ineligible model: this conference presentation contains no prediction but rather categorisation into different groups by analysis of time‐to‐event data and description of these groups' characteristics. |
| Tomassini 2019 | Ineligible model: this is a count‐score. In this study, Cox regression is used to select predictors that are later counted to give a discrete score. This score is used in a univariate model as a factor to report risk stratification. |
| Tossberg 2013 | Ineligible model: the model in this study is not used for prognostic prediction but for diagnostic purposes. The developed score for diagnostic purposes is used for prognostic prediction only in those that convert to MS. |
| Uher 2017a | Ineligible model: in this study, multivariable models are used to select adjusted predictors, followed by counting the positive predictors to create a score. |
| Uher 2017b | Ineligible study type: the purpose is not prognostic prediction but to demonstrate the concurrent predictive value of MRI measures on cognitive impairment. |
| Veloso 2014 | Ineligible model: this publication presents a simulation interface based on previously published studies, most relevant to our review being BREMS, but does not perform any new prediction and only describes or reports correlations for the included study participants. |
| Vukusic 2006 | Ineligible study type: unrelated review |
| Wahid 2019 | Ineligible model: the two models in this conference abstract are not longitudinal in nature, and the only longitudinal model is presented as a treatment response prediction tool. |
| Zephir 2009 | Ineligible study type: the objective of this study is to demonstrate the usefulness of IgG as a biomarker of pathology. |
| Ziemssen 2019 | Ineligible outcome: in this poster presentation, the objective is differentiating between the relapsing and progressing diagnoses instead of prognostic prediction. |
BREMS: Bayesian Risk Estimate for Multiple Sclerosis CDMS: clinically definite multiple sclerosis CIS: clinically isolated syndrome CSF: cerebrospinal fluid EDSS: Expanded Disability Status Scale IgG: immunoglobulin G MRI: magnetic resonance imaging MS: multiple sclerosis QoL: quality of life RNA: ribonucleic acid RRMS: relapsing–remitting multiple sclerosis
Characteristics of studies awaiting classification [ordered by study ID]
Achiron 2007.
| General information |
Reason for awaiting classification It is unclear whether the study design is longitudinal or whether the sampling is performed at the same time as the outcome assessment. Model name Not reported Primary source Journal Data source Cohort, primary Study type Development |
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Israel Age (years) Mean 43 Sex (%F) 69.8 Disease duration (years) Mean 10.5 (pooled SD 2.4) Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment 100% on interferon beta‐1a Disease description (For the source population including all outcomes) EDSS (unclear if mean and SD): development 2.0 (1.0), validation 2.5 (0.2); mean annualised relapse rate: 1.1 Recruitment period Not reported |
| Predictors |
Considered predictors PBMC RNA microarray analysis of gene transcripts |
| Outcome |
Outcome definition Composite (includes relapse and scores (EDSS)): good outcome defined as no deterioration in neurological disability and no relapse, poor outcome as EDSS score change ≥ 0.5 that needed to be confirmed at 3 months during 2‐year follow‐up Timing of outcome measurement Unclear, follow‐up for 2 years |
| Analysis |
Number of participants (number of events) 56 (unclear how many events in the validation set, ≥ 9) Modelling method Support vector machine Performance evaluation dataset Development Performance evaluation method Random split Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Classification rate = 88.9% Predictors in the model 34 gene transcripts from the following 29 genes: ADD1, CA11, CCL17, CD44, COL11A2, CRYGD, DNM1, DR1, GNMT, GPP3, GSTA1, HAB1, HSPA8, IGLJ3, IGLVJ, IL3RA, KIAA0980, KLF4, KLK1, MUC4, NY‐REN‐24, ODZ2, PTN, RRN3, S100B, TCRBV, TOP3B, TPSB2, VEGFB |
| Interpretation |
Aim of the study To evaluate whether gene expression profiling can differentiate RRMS patients according to their clinical course – either favourable or poor |
| Notes | — |
Behling 2019.
| General information |
Reason for awaiting classification Conference abstract Model name Not reported Primary source Abstract Data source Mixed (routine care, claims), secondary Study type Development |
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Patients from a variety of provider practice types across the USA included in the OM1 Data cloud, USA Age (years) Median 54 Sex (%F) Not reported Disease duration (years) Not reported Diagnosis Not reported Diagnostic criteria Not reported Treatment
Disease description Not reported Recruitment period 2015 to 2018 |
| Predictors |
Considered predictors Probably not the complete list, most significant predictors: the number of relapses in the previous 12 months, antiemetic medication use, skeletal muscle relaxants, MS‐related fatigue symptoms |
| Outcome |
Outcome definition Relapse: MS‐related inpatient stay, emergency room visit, or outpatient visit with documented MS and a corticosteroid prescription within 7 days Timing of outcome measurement Within 6 months after index |
| Analysis |
Number of participants (number of events) 18,137 (unclear, calculated from reported event rate, 1415) Modelling method Random forest Performance evaluation dataset Development Performance evaluation method Random split, 80% training, 20% test Calibration Not reported Discrimination estimate c‐Statistic > 0.70 Classification estimate Cutoff determined from data, PPV = 0.203, 1‐NPV = 0.058; unclear whether other reported measure (0.84) is accuracy or sensitivity Predictors in the model Probably not the complete list, most significant predictors: the number of relapses in the previous 12 months, antiemetic medication use, skeletal muscle relaxants, MS‐related fatigue symptoms |
| Interpretation |
Aim of the study To use advanced analytics to predict relapses amongst MS patients treated with DMTs identified from a large, representative database of linked EMR and claims data |
| Notes | — |
Castellazzi 2019.
| General information |
Reason for awaiting classification The age range of the included patients is not reported, and it is unclear whether the objective is the development of a diagnostic or prognostic model. Model name Not reported Primary source Poster Data source Not reported Study type Development |
| Participants |
Inclusion criteria RRMS patients and healthy controls for developing the classifier and CIS patients for applying Exclusion criteria Not reported Recruitment Not reported Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 39.6% CIS, 30.2% RRMS, and 30.2% healthy controls Diagnostic criteria McDonald (undefined) Treatment Not reported Disease description EDSS (unclear if mean and SD): CIS 1.3 (0.8), RRMS 1.7 (1.2) Recruitment period Not reported |
| Predictors |
Considered predictors Thresholded and processed cross‐correlation matrix of mean rs‐fMRI signals of parcellated preprocessed rs‐fMRI images |
| Outcome |
Outcome definition Conversion to definite MS (McDonald, undefined): RRMS Timing of outcome measurement 12 months |
| Analysis |
Number of participants (number of events) 106 (unclear how many events in the prediction group of CIS patients, ≥ 32) Modelling method
Performance evaluation dataset External validation Performance evaluation method Model developed to differentiate healthy controls from RRMS is used to predict RRMS conversion in CIS patients Classification estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 69% (SVM), 56% (ANFIS) Predictors in the model 10 features from 10 distinct AAL areas, including cuneus, pallidum, calcarine, fusiform, cbl‐6/7b/8, supp motor area, sup/mid occip gyri |
| Interpretation |
Aim of the study To predict the conversion to RRMS in participants with CIS |
| Notes | — |
Chaar 2019.
| General information |
Reason for awaiting classification The time points used in the model are not reported, and it is unclear whether the design was longitudinal in nature. Also, the age range of included patients is not described. Model name Not reported Primary source Abstract Data source Cohort, primary Study type Development |
| Participants |
Inclusion criteria Patients on fingolimod Exclusion criteria Not reported Recruitment Not reported Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis Not reported Diagnostic criteria Not reported Treatment 100% on fingolimod Disease description Not reported Recruitment period Not reported |
| Predictors |
Considered predictors Unclear features extracted from magnetic resonance imaging, magnetic resonance spectroscopy, magnetisation transfer ratio, diffusion tensor imaging, and optical coherence tomography |
| Outcome |
Outcome definition Disability (EDSS) Timing of outcome measurement Unclear, 3 time points at 1‐year intervals |
| Analysis |
Number of participants (number of events) Unclear unit of analysis, 50 participants, 135 time points (not reported) Modelling method Neural network, single hidden‐layered feed‐forward ANN with Bayesian regularisation Performance evaluation dataset Development Performance evaluation method Random split, 85% training, 15% test Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Mean squared error = 1.213, accuracy = 77.9% Predictors in the model Unclear, possibly non‐tabular data from MRI, MRS, MTR, DTI, and OCT |
| Interpretation |
Aim of the study To predict the clinical disability based on multiple important imaging biomarkers |
| Notes | — |
Dalla Costa 2014.
| General information |
Reason for awaiting classification It is unclear whether all the predictors are used to predict an outcome in the future or concurrent to the predictor measurement. Model name Not reported Primary source Abstract Data source Not reported Study type Development |
| Participants |
Inclusion criteria Admittance within 3 months from the onset of a CIS Exclusion criteria Not reported Recruitment Patients admitted to the San Raffaele Hospital, Neurological Department, Italy Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Not reported Disease description Not reported Recruitment period Not reported |
| Predictors |
Considered predictors Unclear features from clinical data as well as MRI, multimodal EP, and CSF data |
| Outcome |
Outcome definition Conversion to definite MS Timing of outcome measurement Unclear, follow‐up mean 6.82 (SD 2.78) |
| Analysis |
Number of participants (number of events) 227 (120) Modelling method Neural network, multilayer perceptron with a back propagation algorithm Performance evaluation dataset Development Performance evaluation method Random split, 80% training, 20% validation Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Accuracy = 87% Predictors in the model Clinical, MRI, CSF, and EP data |
| Interpretation |
Aim of the study To develop an ANN‐based diagnostic model integrating both clinical and paraclinical baseline data |
| Notes | — |
Ghosh 2009.
| General information |
Reason for awaiting classification The age range of included patients is not reported. It is not clear whether individual prediction occurred. Also, the multivariable nature of the model cannot be determined from the limited information. Model name Not reported Primary source Abstract Data source Randomised trial participants, secondary Study type Development |
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Ian McDonald database Age (years) Mean 27.9 (at onset) Sex (%F) Not reported Disease duration (years) Mean 7.5 Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment Not reported Disease description EDSS mean 3 Recruitment period Not reported |
| Predictors |
Considered predictors Number of Gd‐enhancing lesions, T2 lesion volume |
| Outcome |
Outcome definition Relapse Timing of outcome measurement Unclear, follow‐up ≤ 129 weeks |
| Analysis |
Number of participants (number of events) 108 (58) Modelling method Joint longitudinal model, 3 models connected via random effects, parameter estimates by Markov chain Monte Carlo Performance evaluation dataset Not reported Performance evaluation method Not reported Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Not reported Predictors in the model Not reported |
| Interpretation |
Aim of the study To establish a model that allows the prediction of occurrence of relapses by including longitudinal information on the number of Gd‐enhancing lesions and T2 lesion volume simultaneously |
| Notes | — |
Kister 2015.
| General information |
Reason for awaiting classification Conference abstract Model name Not reported Primary source Poster Data source Secondary
Study type Development and validation (unclear whether predictions adapted), location |
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment
Age (years)
Sex (%F)
Disease duration (years)
Diagnosis
Diagnostic criteria Not reported Treatment
Disease description Not reported Recruitment period Not reported |
| Predictors |
Considered predictors Gender, age, baseline P‐MSSS |
| Outcome |
Outcome definition
Timing of outcome measurement
|
| Analysis |
Number of participants (number of events)
Modelling method Logistic regression Performance evaluation dataset
Performance evaluation method
Calibration estimate
Discrimination estimate
Classification estimate
Predictors in the model Gender, age, P‐MSSS |
| Interpretation |
Aim of the study
|
| Notes |
Auxiliary references Charlson R, Herbert J, Kister I. >CME/CNE article: severity grading in multiple sclerosis: a proposal. Int J MS Care 2016;18(5):265‐70. Kister I, Chamot E, Salter AR, Cutter GR, Bacon TE, Herbert J. Disability in multiple sclerosis: a reference for patients and clinicians. Neurology 2013;80(11):1018‐24. Kister I, Bacon TE, Cutter GR. Short‐term disability progression in two multiethnic multiple sclerosis centers in the treatment era. Ther Adv Neurol Disord 2018;11:1756286418793613. Kister I, Kantarci OH. Multiple sclerosis severity score: concept and applications. Mult Scler 2020;26(5):548‐53. Learmonth YC, Motl RW, Sandroff BM, Pula JH, Cadavid D. Validation of patient determined disease steps (PDDS) scale scores in persons with multiple sclerosis. BMC Neurol 2013;13:37. |
Mallucci 2019.
| General information |
Reason for awaiting classification Conference abstract Model name Not reported Primary source Abstract Data source Not reported Study type Development |
| Participants |
Inclusion criteria CIS patients Exclusion criteria Not reported Recruitment Not reported Age (years) Median 32.3 Sex (%F) 65.6 Disease duration (years) Unclear, upper limit 1 year Diagnosis 100% CIS Diagnostic criteria Not reported Treatment
Disease description Not reported Recruitment period Not reported |
| Predictors |
Considered predictors Not reported |
| Outcome |
Outcome definition Composite (includes symptoms, disability): no evidence of disease activity (NEDA3) status in which NEDA3 maintenance is defined by no relapses, no disability progression, and no MRI activity Timing of outcome measurement Unclear, 12 months |
| Analysis |
Number of participants (number of events) 279 (not reported) Modelling method Logistic regression, Bayesian Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate c‐Statistic = 0.83 Classification estimate
Predictors in the model Age, onset with optic neuritis, abnormal upper sensory EPs, abnormal visual EPs, therapy with DMD |
| Interpretation |
Aim of the study To define a prognostic model for the early forecast of losing NEDA3 status (no relapses, no disability progression, no MRI activity) in CIS patients within 12 months from disease onset |
| Notes | — |
Medin 2016.
| General information |
Reason for awaiting classification Conference abstract Model name Composite Primary source Poster Data source Routine care, secondary Study type Development |
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Neuro Trans Data, a group of neurology practices, Germany Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment
Disease description Not reported Recruitment period 2010 to 2015 |
| Predictors |
Considered predictors Unclear whether it is the complete list, demographics (born in Central Europe, aged < 30 years at index date, aged ≥ 30 years and < 40 years at index date), diagnostic history, treatment (fingolimod was available at the index date, teriflunomide was available at the index date), disability status (EDSS score of 0 earlier than 360 days prior to index date), disability history (at least one relapse in the 180 days to 360 days prior to index date, at least one relapse in the 360 days to 720 days prior to index date), cranial and spinal lesion count |
| Outcome |
Outcome definition Relapse: relapse defined as binary over the 12‐month follow‐up period defined as patient‐reported or objectively observed events typical of an acute inflammatory demyelinating event in the central nervous system, current or historical, with duration of at least 24 hours, in the absence of fever or infection Timing of outcome measurement 12 months; the period is randomly chosen |
| Analysis |
Number of participants (number of events) 4129 (751 or 752, calculated from reported event rate) Modelling method Logistic regression, elastic net Performance evaluation dataset Development Performance evaluation method Cross‐validation, k‐fold Calibration estimate Quintiles of predicted probability of relapse vs actual relapse rate Discrimination estimate c‐Statistic = 0.69 (95% CI 0.67 to 0.71) Classification estimate Not reported Predictors in the model Whether the patient experienced at least 1 relapse in the 180 days to 360 days prior to index date, whether the patient was aged < 30 years at index date, whether the patient experienced at least one relapse in the 360 days to 720 days prior to index date, whether the patient was aged ≥ 30 years and < 40 years at index date, whether the patient was born in Central Europe, whether Gilenya was available at the index date, whether Aubagio was available at the index date, whether the patient has an EDSS score of 0 earlier than 360 days prior to index date |
| Interpretation |
Aim of the study To predict disease activity for patients with RRMS using EMR |
| Notes | — |
Pareto 2017.
| General information |
Reason for awaiting classification Conference abstract Model name Converter and nonconverter Primary source Abstract Data source Not reported Study type Development |
| Participants |
Inclusion criteria Not reported Exclusion criteria Not reported Recruitment Consecutively Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Not reported Disease description Not reported Recruitment period Not reported |
| Predictors |
Considered predictors PRoNTo‐based imaging parameters from segmented grey matter masks |
| Outcome |
Outcome definition Conversion to definite MS (McDonald 2010 (Polman 2011)): either MRI or clinical demonstration of dissemination in space and time Timing of outcome measurement Follow‐up for 3 years |
| Analysis |
Number of participants (number of events) 90 (45) Modelling method Support vector machine Performance evaluation dataset Development Performance evaluation method Cross‐validation, LOOCV Calibration estimate Not reported Discrimination estimate Not reported Classification estimate Sensitivity (converters) = 0.65 Specificity (nonconverters) = 0.63 Predictive values = 0.65 (converters) and 0.64 (nonconverters) Predictors in the model PRoNTo‐based imaging parameters from segmented grey matter masks |
| Interpretation |
Aim of the study To test whether 3D‐T1‐weighted structural images in conjunction with the pattern recognition tool PRoNTo could differentiate between CIS patients that converted and CIS patients that did not convert to MS |
| Notes |
Auxiliary references Schrouff J, Rosa MJ, Rondina JM, Marquand AF, Chu C, Ashburner J, et al. PRoNTo: pattern recognition for neuroimaging toolbox. Neuroinformatics 2013;11(3):319‐37. |
Sharmin 2020.
| General information |
Reason for awaiting classification Conference abstract Model name Not reported Primary source Presentation Data source Registry, secondary Study type Development |
| Participants |
Inclusion criteria Not reported Exclusion criteria
Recruitment MSBase registry Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis Unclear, 16.99% CIS, 66.63% RRMS, 7.46% SPMS, 7.34% PPMS, 1.57% PRMS Diagnostic criteria Mixed: McDonald 2005 (Polman 2005), McDonald 2010 (Polman 2011) Treatment Not reported Disease description Not reported Recruitment period Not reported |
| Predictors |
Considered predictors Age (years); sex (female ref); MS course (CIS ref, RR, SP, PP, PR); disease duration (years); EDSS (0 to 5.5 ref, 6+); change in EDSS; recency of relapse (> 2 months ref, < 1 month, 1 month to 2 months); number of affected FSSs; separate predictor for worsening in each of pyramidal, cerebellar, brainstem, sensory, bowel‐bladder, visual, and cerebral systems; 2‐way interaction between disease duration and each of FSS worsening predictors; annualised visit density |
| Outcome |
Outcome definition Disability (unclear): risk of 6‐month confirmed disability progression event being sustained over the long term Timing of outcome measurement Median (IQR): 9.48 years (6.02 years to 13.32 years) |
| Analysis |
Number of participants (number of events) 14,802, unit of analysis is event of 8741 participants (not reported) Modelling method Survival (Cox) Performance evaluation dataset Development Performance evaluation method Random split Calibration estimate Not reported Discrimination estimate Harrell's c‐statistic = 0.89 Classification estimate Not reported Predictors in the model Age, male, primary progressive, relapsing‐remitting, relapse in previous month, EDSS ≥ 6, EDSS change, number of affected FSSs, worsening pyramidal FSS, worsening in cerebellar FSS, worsening in brainstem FSS, worsening in sensory FSS, worsening in visual FSS, worsening in cerebral FSS, worsening in pyramidal FSS: disease duration, worsening in sensory FSS: disease duration, worsening in cerebral FSS: disease duration, (other: annualised visit density) |
| Interpretation |
Aim of the study To identify those 6‐month confirmed disability progression events that are more likely to represent a long‐term disability worsening |
| Notes |
Auxiliary references Giovannoni G, Comi G, Cook S, Rammohan K, Rieckmann P, Soelberg Sørensen P, et al. A placebo‐controlled trial of oral cladribine for relapsing multiple sclerosis. N Engl J Med 2010;362(5):416‐26. NCT00641537. CLARITY extension study. https://ClinicalTrials.gov/show/NCT00641537 (first received 28 March 2008). |
Silva 2017.
| General information |
Reason for awaiting classification Conference abstract Model name MS‐COT
Primary source Poster Data source Randomised trial participants Study type Development |
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment
Age (years) Mean 38.7 Sex (%F) 73.6 Disease duration (years) 9.3 (range 0 to 37) Diagnosis 100% RRMS Diagnostic criteria McDonald 2005 (Polman 2005) Treatment
Disease description EDSS mean 2.4, previous year number of relapses 1.5 Recruitment period 2006 to 2011 |
| Predictors |
Considered predictors Treatment (fingolimod or placebo), T1 hypointense volume, gender, T2 lesion volume rate, age, NBV, number of relapses in the last year, individualised NBV, number of relapses in the last 2 years, number of Gd+ T1 lesions, EDSS, T2 lesion volume, duration of MS since the first symptom, total number of relapses since the first diagnosis, number of previous DMTs, progression index |
| Outcome |
Outcome definition
Timing of outcome measurement Unclear which of the models is reported, at 1 year or at 2 years |
| Analysis |
Number of participants (number of events)
Modelling method
Performance evaluation dataset Development Performance evaluation method Random split with CV within training for predictor ranking Calibration Not reported Discrimination estimate c‐Statistic
Classification estimate Not reported Predictors in the model Not reported |
| Interpretation |
Aim of the study To develop an educational predictor tool based on machine learning techniques to help physicians identify clinical and imaging parameters that influence and contribute to long‐term outcomes in patients with RMS |
| Notes |
Auxiliary references Calabresi PA, Radue EW, Goodin D, Jeffery D, Rammohan KW, Reder AT, et al. Safety and efficacy of fingolimod in patients with relapsing‐remitting multiple sclerosis (FREEDOMS II): a double‐blind, randomised, placebo‐controlled, phase 3 trial. Lancet Neurol 2014;13(6):545‐56. Kappos L, Radue EW, O'Connor P, Polman C, Hohlfeld R, Calabresi P, et al. A placebo‐controlled trial of oral fingolimod in relapsing multiple sclerosis. N Engl J Med 2010;362(5):387‐401. |
Tayyab 2020.
| General information |
Reason for awaiting classification Conference abstract Model name Not reported Primary source Presentation Data source Randomised trial participants, secondary Study type Development |
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Participants in the placebo‐controlled randomised trial of minocycline, Canada Age (years) Mean 35.9 (onset) Sex (%F) 69.0 Disease duration (years) Median 0.23 (range 0.06 to 0.52) Diagnosis 100% CIS Diagnostic criteria McDonald 2005 (Polman 2005) Treatment
Disease description EDSS median (range): 1.5 (0 to 4.5) Recruitment period 2009 to 2013 |
| Predictors |
Considered predictors Unclear whether it is the complete list: individual DGM nuclei volumes, minocycline vs placebo, CIS type (monofocal vs multifocal), NBV, sex, EDSS, variable for each location of initial CIS event: cerebrum, optic nerve, cerebellum, brainstem, spinal cord, brain parenchymal fraction |
| Outcome |
Outcome definition Composite (includes relapse): new disease activity (clinical or MRI) within 2 years of a first clinical demyelinating event, defined by the McDonald 2005 criteria (Polman 2005) for conversion to definite MS Timing of outcome measurement 2 years |
| Analysis |
Number of participants (number of events) 140 (60) Modelling method Random forest Performance evaluation dataset Development Performance evaluation method Cross‐validation, 3‐fold Calibration Not reported Discrimination estimate c‐Statistic = 0.76 Classification estimate Accuracy = 0.821, sensitivity = 0.81, PPV = 0.87, F1 = 0.84 Predictors in the model DGM volumes |
| Interpretation |
Aim of the study To develop a machine learning model for predicting new disease activity (clinical or MRI) within 2 years of a first clinical demyelinating event, using baseline DGM volumes |
| Notes |
Auxiliary references Metz LM, Li DKB, Traboulsee AL, Duquette P, Eliasziw M, Cerchiaro G, et al. Trial of minocycline in a clinically isolated syndrome of multiple sclerosis. N Engl J Med 2017;376(22):2122‐33. |
Thiele 2009.
| General information |
Reason for awaiting classification Conference abstract Model name Model‐based approach Primary source Abstract Data source Registry, secondary Study type Development |
| Participants |
Inclusion criteria RRMS Exclusion criteria Not reported Recruitment Danish MS register, Denmark Age (years) Mean 32.3 (at onset) Sex (%F) 69 Disease duration (years) Mean 5.6 (range 0 to 30) Diagnosis 100% RRMS Diagnostic criteria Not reported Treatment Not reported Disease description EDSS mean (range): 2.63 (0 to 7.5), 24 months pre‐study attacks mean (range): 2.64 (1 to 10) Recruitment period 1997 to 2001 |
| Predictors |
Considered predictors Sex, age at onset, disease duration, number of attacks in the 24 months prior to study, and baseline EDSS |
| Outcome |
Outcome definition Relapse: annualised relapse rates Timing of outcome measurement Not reported |
| Analysis |
Number of participants (number of events) 1202 (continuous outcome) Modelling method Count data glm (quasi‐Poisson, negative‐binomial, zero‐inflated Poisson) Performance evaluation dataset Development Performance evaluation method Cross‐validation, LOOCV Calibration estimate Other: mean prediction error 0.53 to 0.54 Discrimination estimate Not applicable Classification estimate Not applicable Predictors in the model Sex, age at onset, disease duration, number of attacks in the 24 months prior to study, baseline EDSS |
| Interpretation |
Aim of the study To compare the performance of a matching‐based approach to predict annualised relapse rates of MS patients versus statistical models |
| Notes | — |
Tintoré 2015.
| General information |
Reason for awaiting classification It is unclear whether the study is longitudinal in nature. Model name Not reported Primary source Presentation Data source Cohort, primary Study type Development |
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Spain Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% CIS Diagnostic criteria Not reported Treatment Not reported Disease description Not reported Recruitment period 1996 to 2014 |
| Predictors |
Considered predictors Unclear whether the list is complete: gender, age (40 years to 49 years, 30 years to 39 years, 20 years to 29 years, 0 years to 19 years), optic neuritis, number of T2 lesions (0, 1 to 3, 4 to 9, ≥ 10), DMT before second attack, DMT after second attack, topography, CSF: OB, 12‐month number of T2 lesions, 12‐month Gd+, treatment, relapse during first year |
| Outcome |
Outcome definition Unclear, composite: conversion to definite MS, EDSS ≥ 3 Timing of outcome measurement Unclear, follow‐up every 12 months and 5 years |
| Analysis |
Number of participants (number of events) 1059 (unclear, different numbers reported in abstract and presentation) Modelling method Multiple models: decision tree based on survival model (Cox) Performance evaluation dataset Development Performance evaluation method Apparent Calibration estimate Not reported Discrimination estimate Harrell's c‐statistic (for 12‐month model) CDMS 0.76, EDSS 0.75 Classification estimate Not reported Predictors in the model At baseline: number of T2 lesions, oligoclonal bands, optic neuritis, sex, age; at first year: number of new T2 lesions, onset of DMD during first year, relapse during the first year |
| Interpretation |
Aim of the study To elaborate a dynamic model for predicting long‐term prognosis |
| Notes | — |
Tommasin 2019.
| General information |
Reason for awaiting classification Conference abstract Model name Not reported Primary source Poster Data source Not reported Study type Development |
| Participants |
Inclusion criteria
Exclusion criteria
Recruitment Not reported Age (years) Mean 38.3 Sex (%F) 76.2 Disease duration (years) Not reported Diagnosis 81% RRMS, 19% SPMS Diagnostic criteria Mixed: McDonald 2010, McDonald 2017 Treatment 30.5% first line, 29.5% second line, 40% none Disease description EDSS median 2.0 (range 0.0 to 7.5) Recruitment period Not reported |
| Predictors |
Considered predictors 3D T1 images (slices of the sagittal, axial, coronal projections) |
| Outcome |
Outcome definition Disability (EDSS): 5‐year disease progression defined as 1.5‐point increase for patients with a baseline EDSS of 0, 1 point for scores from 1.0 to 5.0, and 0.5 points for scores equal to or higher than 5.5; confirmed at 6 months Timing of outcome measurement 4 to 6 years |
| Analysis |
Number of participants (number of events) 105 (36) Modelling method Convolutional neural network Performance evaluation dataset Development Performance evaluation method Random split, 90% training, 10% validation Calibration Not reported Discrimination estimate Not reported Classification estimate Cutoff = 0.5, sensitivity and specificity reported for unclear selections Predictors in the model 3D T1 images |
| Interpretation |
Aim of the study To investigate the efficacy of deep learning models to accurately predict those patients who will have disability progression in the following 5 years and those who will be stable, based on 3D T1 MRI images acquired at 3T |
| Notes |
Auxiliary references Rio J, Rovira A, Tintore M, Otero‐Romero S, Comabella M, Vidal‐Jordana A, et al. Disability progression markers over 6‐12 years in interferon‐beta‐treated multiple sclerosis patients. Mult Scler 2018;24(3):322‐30. |
Wahid 2018.
| General information |
Reason for awaiting classification Conference abstract Model name Not reported Primary source Abstract Data source Randomised trial participants, secondary Study type Development |
| Participants |
Inclusion criteria
Exclusion criteria Not reported Recruitment Subset of participants in CombiRx RCT, USA Age (years) Not reported Sex (%F) Not reported Disease duration (years) Not reported Diagnosis 100% RRMS Diagnostic criteria Mixed: Poser 1983, McDonald (undefined) Treatment Unclear number of participants on interferon beta, glatiramer acetate, their combination Disease description Not reported Recruitment period Not reported |
| Predictors |
Considered predictors Radiomics (shape, intensity, texture), age, sex, baseline EDSS, lesion volume |
| Outcome |
Outcome definition Disability (EDSS): EDSS < 2 vs EDSS ≥ 2 Timing of outcome measurement 3 years |
| Analysis |
Number of participants (number of events) 33 (not reported) Modelling method Gradient boosting Performance evaluation dataset Development Performance evaluation method Cross‐validation, repeated Calibration Not reported Discrimination estimate Not reported Classification estimate Accuracy = 0.867 (SD 0.024) Predictors in the model Radiomic shape, intensity, and texture measures |
| Interpretation |
Aim of the study To evaluate the predictive performance of machine learning models constructed from MRI radiomic features at baseline to predict clinical outcomes at 3 years in RRMS |
| Notes |
Auxiliary references Bhanushali MJ, Gustafson T, Powell S, Conwit RA, Wolinsky JS, Cutter GR, et al. Recruitment of participants to a multiple sclerosis trial: the CombiRx experience. Clinical Trials. 2014;11(2):159‐66. NCT00211887. Combination therapy in patients with relapsing‐remitting multiple sclerosis (MS)CombiRx. https://clinicaltrials.gov/ct2/show/NCT00211887(first received 21 September 2005). |
3T: 3 Tesla AAL: automated anatomical labelling ANFIS: adaptive‐neuro‐fuzzy‐inference system ANN: artificial neural network BRACE: Betaseron (interferon beta‐1b), Rebif (interferon beta‐1a), Avonex (interferon beta‐1a), Copaxone (Glatiramer acetate), and Extavia (interferon beta‐1b) CDMS: clinically definite multiple sclerosis CIS: clinically isolated syndrome CSF: cerebrospinal fluid CV: cross‐validation DGM: deep grey matter DMD: disease‐modifying drug DMT: disease‐modifying therapy DTI: diffusion tensor imaging EDSS: Expanded Disability Status Scale EMR: electronic medical records EP: evoked potential FLAIR: fluid‐attenuated inversion recovery FSS: Functional Systems Score Gd: gadolinium IQR: interquartile range LASSO: least absolute shrinkage and selection operator LOOCV: leave‐one‐out cross‐validation MRI: magnetic resonance imaging MRS: magnetic resonance spectroscopy MS: multiple sclerosis MS‐COT: multiple sclerosis care optimisation tool MTR: magnetisation transfer ratio NARCOMS: North American Research Consortium on Multiple Sclerosis NBV: normalised brain volume NEDA3: no evidence of disease activity 3 NPV: negative predictive value OB: oligoclonal bands OCT: optical coherence tomography P‐MSSS: patient‐derived MS Severity Score PBMC: peripheral blood mononuclear cell PDDS: patient‐determined disease steps PPMS: primary progressive MS PPV: positive predictive value PRMS: progressive‐relapsing multiple sclerosis PRoNTo: Pattern Recognition for Neuroimaging Toolbox RCT: randomised controlled trial RNA: ribonucleic acid RRMS: relapsing–remitting multiple sclerosis rs‐fMRI: resting state functional magnetic resonance imaging SD: standard deviation SPMS: secondary progressive multiple sclerosis SVM: support vector machine
Differences between protocol and review
Objectives
We relocated the details on the investigation of sources of heterogeneity between studies from the Objectives to the Methods for conciseness and readability.
Criteria for considering studies for this review
In 'Types of studies', the eligibility criterion of aiming to develop or validate a prognostic model was already present at the protocol stage. We further operationalised its implementation in the review text. We also clarified that the statistical method used to develop the prognostic model was not a criterion for selection, but that studies on prognostic factors or treatment response prediction were excluded. The possible data sources for prognostic model studies and what is meant by validation was also defined in the review text rather than the protocol.
During the review, we came across eligible prognostic model validation studies of models whose development studies would not meet the eligibility criteria outlined in the protocol. In order to have the necessary details on these models, we added a new eligibility criterion to include studies that developed models which were validated in other eligible prognostic prediction studies.
In 'Targeted population', we clarified that we included prognostic model studies in people with MS regardless of the MS subtyping they reported. For transparency, we also reported that we considered an episode of optic neuritis as a clinically isolated syndrome and thus studies on people with this condition were eligible.
In 'Types of outcomes', we clarified that the data type of the outcome was not a criterion for selection. Also, what was considered to constitute one of the five outcomes (four clinical outcome categories plus their composite) as defined in the protocol was further detailed by giving no evidence of disease activity as an example to the composite outcome and clarifying that cognitive disability fit into one of those categories, but that fatigue, depression, or falls did not.
Search
As per the recently published PRISMA statement (Page 2021), we gave details on the platforms used to search the databases and the studies we used during the validation of the search.
Originally we had planned to perform backward citation tracking by handsearching the references of related studies. While using Web of Science for the forward search, we realised that it contains a similar functionality for the backward search. We decided to use this convenient functionality because it allowed not only deduplication but also simultaneous screening of titles and abstracts of the references, which would not have been the case during handsearching.
Selection of studies
We reported the details of how the pilot screening was conducted, which were absent in the protocol text.
During screening, we additionally searched the Internet for or contacted the authors of studies that could not be included or excluded with the reported information, including all conference abstracts.
At the protocol stage, we had not planned on how to proceed with eligible conference abstracts without any full‐text report. During the review, it became clear that the information contained in an abstract was not sufficient for selection or assessment of risk of bias. Hence, we decided to present the data extracted from the conference abstracts without a full‐text report in Characteristics of studies awaiting classification.
How we were going to screen non‐English abstracts (by using online translators) and full‐texts (with support from native speakers) was missing from the protocol and is clarified in the review text.
We reported the study selection based on the flow‐chart of the recently published updated PRISMA statement (Page 2021), rather than the PRISMA statement (Moher 2009), as proposed in the protocol.
For transparency, we elaborated on the details of how we operationalised and interpreted study eligibility criteria in a new subsection titled 'Details regarding selection of studies' of the review text.
Data extraction and management
During the review, we came across multiple reports from a single study that sometimes had conflicting information. Our prioritisation in such cases was not defined in the protocol but was in the review text.
Due to the range of studies we came across, there were minor changes to the extracted data items during the review, e.g. adding a tuning parameter item in order to collect important details related to models developed with machine learning (ML) or using the terms primary/secondary data use rather than prospective/retrospective due to the confusion on and misuse of the latter in the literature. These are reflected in the list of items in this section and elaborated in the Appendices.
Assessment of reporting deficiencies
In the protocol, this section came after 'Dealing with missing data'. For a better flow of the text, this section was reported after 'Data extraction and management'.
In the protocol we had only mentioned that TRIPOD was going to be used for the assessment of reporting. In the review text we gave the details of our operationalisation based on the domains and items we used for this task.
Assessment of risk of bias in included studies
In this section of the protocol we had referred to PROBAST as the risk of bias and applicability assessment tool for prognostic model studies and had briefly summarised its domains. Due to the importance of the risk of bias assessment, the challenges encountered by studies in people with MS, and the limitation of the current tool in its applicability to models developed using machine learning, we had to interpret the items in PROBAST. For transparency, in the review we elaborated on our interpretations and assessment of the risk of bias and applicability in the included analyses.
Measures of association or predictive performance measures to be extracted
In the protocol we had proposed describing the adjusted effect measures of prognostic factors in models developed over time. Although we extracted data on the effect measures and their uncertainties from studies that could and did report these, comparing them was not possible due to the variety of the predictors considered, differences in their definitions, and the considerable amount of included ML methods for which traditional effect measures may not be applicable.
For clarity, we operationalised the classification measures and validation categories we collected.
Dealing with missing data
In the protocol we had proposed to contact the authors for missing information needed for quantitative data synthesis or risk of bias assessment. In the review we reported that we also contacted them for unclear or missing information needed for not only the aforementioned purposes but also study eligibility and basic study description.
In the protocol we had proposed applying methods to derive missing performance measures (the c‐statistic for discrimination and the O:E ratio for calibration) and their precision from the reported information. The data reported in the studies did not allow for calculation of missing c‐statistics or missing calibration measures, specifically O:E ratios. Thus, we have changed this in the review to only describe the method we used to derive the missing precision of a reported c‐statistic.
Data synthesis
We had intended to perform meta‐analysis for models with at least three external validations and had described the methods under the subheading 'Data synthesis and meta‐analysis approaches' in the protocol. However, no single model had at least three independent external validation studies outside its development study, thus we decided against performing a meta‐analysis and this subheading was removed from the review text. Instead, we added a subheading called 'Synthesis' to describe how we described and summarised the findings in this review.
Because there was no meta‐analysis, we did not perform any sensitivity analysis and removed the subsection 'Sensitivity analysis'.
Investigation of sources of heterogeneity between studies
In the protocol there were two subheadings to give details on assessment of heterogeneity, 'Assessment of heterogeneity' under 'Data collection' and 'Subgroup analysis and investigation of heterogeneity' under 'Data synthesis'. It was planned to report heterogeneity measures from the meta‐analysis and to perform meta‐regression for different models with the same outcome when at least 10 models for that outcome were identified. Due to the large variability in outcome definitions and poor reporting of performance measures, we were not able to perform this meta‐regression but could only describe the heterogeneity qualitatively. Hence, we reduced the space allocated to this topic to one subsection under 'Data synthesis'.
Terms used for reporting and synthesis
For clarity, we added this section to define the terms we used in reporting the review.
Contributions of authors
| Task | Authors responsible |
| Draft the protocol | BIO, KAR, JH, MG, JB, UH, UM |
| Develop and run the search strategy | MG, KAR, BIO, ZA |
| Obtain copies of studies | AA, BIO, KAR, MG |
| Select which studies to include | BIO, KAR, AA, ZA, AG |
| Provide consultation on which studies to include | UH, JH, JB, UM |
| Extract data from the studies | KAR, BIO, AA, AG |
| Provide consultation on data extraction | HS, UH, UM, JB, JH |
| Assess risk of bias | KAR, BIO, AA, AG |
| Provide consultation on risk of bias assessment | HS, UH, UM, JB, JH |
| Enter data into RevMan 5 | BIO, KAR, ZA, AG |
| Carry out the analysis | KAR, BIO, ZA |
| Interpret the analysis | KAR, BIO, JB, UH, UM, HS, JH |
| Draft the final review | BIO, KAR, JB, JH, UH, UM, MG, HS, AG, ZA, AA |
| Update the review | UM, UH |
Sources of support
Internal sources
-
DIFUTURE Project at Ludwig‐Maximilians‐Universität München, Germany
DIFUTURE is funded by the German Federal Ministry of Education and Research under 01ZZ1804B and 01ZZ1804C.
-
Clinical Research Priority Program (CRPP), University of Zurich, Switzerland
The CRPP funded the project PrecisionMS: Implementing Precision Medicine in Multiple Sclerosis.
-
Privatdozenten‐Stiftung, University of Zurich, Switzerland
Privatdozenten‐Stiftung provided partial financial support for project costs including electronic search consulting and research assistant help.
External sources
No sources of support provided
Declarations of interest
JH reports a grant for OCT research from the Friedrich‐Baur‐Stiftung and Merck, personal fees and non‐financial support from Alexion, Bayer HealthCare Pharmaceuticals, Biogen, Celgene, F. Hoffman‐La Roche, Janssen Biotech, Merck, Novartis, and Sanofi Genzyme and non‐financial support from the Guthy‐Jackson Charitable Foundation, all outside the submitted work.
UH received financial compensation once for a lecture organised by CSL Behring, after submission of the manuscript, and outside the submitted work.
BIO has provided consultancy to Roche once on a topic outside the submitted work.
KAR, JB, MG, AA, ZA, AG, HS, UM: nothing to declare
These authors should be considered joint first author
These authors contributed equally to this work
New
References
References to studies included in this review
Aghdam 2021 {published data only}
- Abri Aghdam K, Aghajani A, Kanani F, Soltan Sanjari M, Chaibakhsh S, Shirvaniyan F, et al. A novel decision tree approach to predict the probability of conversion to multiple sclerosis in Iranian patients with optic neuritis. Multiple Sclerosis and Related Disorders 2021;47:102658. [DOI] [PubMed] [Google Scholar]
Agosta 2006 {published data only}
- Agosta F, Rovaris M, Pagani E, Sormani MP, Comi G, Filippi M. Magnetization transfer MRI metrics predict the accumulation of disability 8 years later in patients with multiple sclerosis. Brain 2006;129(Pt 10):2620-7. [DOI: 10.1093/brain/awl208] [DOI] [PubMed] [Google Scholar]
Ahuja 2021 {published data only}
- Ahuja Y, Kim N, Liang L, Cai T, Dahal K, Seyok T, et al. Leveraging electronic health records data to predict multiple sclerosis disease activity. Annals of Clinical and Translational Neurology 2021;8(4):800-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bejarano 2011 {published data only}
- Bejarano B, Bianco M, Gonzalez-Moron D, Sepulcre J, Goni J, Arcocha J, et al. Computational classifiers for predicting the short-term course of multiple sclerosis. BMC Neurology 2011;11:67. [DOI: 10.1186/1471-2377-11-67] [DOI] [PMC free article] [PubMed] [Google Scholar]
Bendfeldt 2019 {published data only}
- Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. MRI-based prediction of conversion from clinically isolated syndrome to clinically definite multiple sclerosis using SVM and lesion geometry. Brain Imaging and Behavior 2019;13(5):1361-74. [DOI: 10.1007/s11682-018-9942-9] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. Predicting conversion to clinically definite multiple sclerosis using machine learning on the basis of cerebral grey matter segmentations. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116222.
- Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. Predicting conversion to clinically definite multiple sclerosis using machine learning on the basis of cerebral grey matter segmentations. Multiple Sclerosis Journal 2015;23(Suppl 11):498-9. [Google Scholar]
- Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. MRI-based prediction of conversion from clinically isolated syndrome to clinically definite multiple sclerosis using SVM and lesion geometry. Brain Imaging and Behavior 2018;13(5):1361-74. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bergamaschi 2001 {published data only}
- Bergamaschi R, Berzuini C, Romani A, Cosi V. Predicting secondary progression in relapsing-remitting multiple sclerosis: a bayesian analysis. Journal of the Neurological Sciences 2001;189(1-2):13-21. [DOI] [PubMed] [Google Scholar]
Bergamaschi 2007 {published data only}
- Bergamaschi R, Quaglini S, Trojano M, Amato MP, Tavazzi E, Paolicelli D, et al. Early prediction of the long term evolution of multiple sclerosis: the bayesian risk estimate for multiple sclerosis (BREMS) score. Journal of Neurology, Neurosurgery and Psychiatry 2007;78(7):757-9. [DOI: 10.1136/jnnp.2006.107052] [DOI] [PMC free article] [PubMed] [Google Scholar]
Bergamaschi 2015 {published data only}
- Bergamaschi R, Montomoli C, Mallucci G, Lugaresi A, Izquierdo G, Grand'Maison F, et al. BREMSO: a simple score to predict early the natural course of multiple sclerosis. European Journal of Neurology 2015;22(6):981-9. [DOI: 10.1111/ene.12696] [DOI] [PubMed] [Google Scholar]
- Bergamaschi R, Montomoli C, Mallucci G. Bayesian risk estimate for multiple sclerosis at onset (BREMSO): a simple clinical score for the early prediction of multiple sclerosis long-term evolution. In: 29th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2013 October 2-5; Copenhagen (Denmark). ECTRIMS, 2013. Available at onlinelibrary.ectrims-congress.eu/ectrims/2013/copenhagen/34238.
- Bergamaschi R, Montomoli C, Mallucci G. Bayesian risk estimate for multiple sclerosis at onset (BREMSO): a simple clinical score for the early prediction of multiple sclerosis long-term evolution. Multiple Sclerosis Journal 2013;19(Suppl 1):338. [Google Scholar]
Borras 2016 {published data only}
- Borras E, Canto E, Choi M, Maria Villar L, Alvarez-Cermeno JC, Chiva C, et al. Protein-based classifier to predict conversion from clinically isolated syndrome to multiple sclerosis. Molecular and Cellular Proteomics 2016;15(1):318-28. [DOI: 10.1074/mcp.M115.053256] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Comabella M, Borràs E, Cantó E, Choi M, Villar LM, Álvarez-Cermeño JC, et al. Protein-based biomarker predicts conversion from clinically isolated syndrome to multiple sclerosis. Multiple Sclerosis Journal 2015;21(Suppl 11):634. [Google Scholar]
Brichetto 2020 {published data only}
- Brichetto G, Monti Bragadin M, Fiorini S, Battaglia MA, Konrad G, Ponzio M, et al. The hidden information in patient-reported outcomes and clinician-assessed outcomes: multiple sclerosis as a proof of concept of a machine learning approach. Journal of the Neurological Sciences 2020;41(2):459-62. [DOI: 10.1007/s10072-019-04093-x] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tacchino A, Fiorini S, Ponzio M, Barla A, Verri A, Battaglia MA, et al. Multiple sclerosis disease course prediction: a machine learning model based on patient reported and clinician assessed outcomes. In: 7th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2017 October 25-28; Paris (France). ECTRIMS, 2017. Available at onlinelibrary.ectrims-congress.eu/ectrims/2017/ACTRIMS-ECTRIMS2017/202553.
- Tacchino A, Fiorini S, Ponzio M, Barla A, Verri A, Battaglia MA, et al. Multiple sclerosis disease course prediction: a machine learning model based on patient reported and clinician assessed outcomes. Multiple Sclerosis Journal 2017;23(Suppl 3):58-9. [Google Scholar]
Calabrese 2013 {published data only}
- Calabrese M, Poretto V, Favaretto A, Seppi D, Alessio S, Rinaldi F, et al. The grey matter basis of disability progression in multiple sclerosis. Multiple Sclerosis Journal 2012;18(Suppl 4):121-2. [Google Scholar]
- Calabrese M, Romualdi C, Poretto V, Favaretto A, Morra A, Rinaldi F, et al. The changing clinical course of multiple sclerosis: a matter of gray matter. Annals of Neurology 2013;74(1):76-83. [DOI: 10.1002/ana.23882] [DOI] [PubMed] [Google Scholar]
De Brouwer 2021 {published data only}
- De Brouwer E, Becker T, Moreau Y, Havrdova EK, Trojano M, Eichau S, et al. Longitudinal machine learning modeling of MS patient trajectories improves predictions of disability progression. Computer Methods and Programs in Biomedicine 2021;208:106180. [DOI] [PubMed] [Google Scholar]
- De Brouwer E, Peeters L, Becker T, Altintas A, Soysal A, Van Wijmeersch B, et al. Introducing machine learning for full MS patient trajectories improves predictions for disability score progression. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279466.
- De Brouwer E, Peeters L, Becker T, Altintas A, Soysal A, Van Wijmeersch B, et al. Introducing machine learning for full MS patient trajectories improves predictions for disability score progression. Multiple Sclerosis Journal 2019;25(Suppl 2):63-5. [Google Scholar]
de Groot 2009 {published data only}
- Groot V, Beckerman H, Uitdehaag BM, Hintzen RQ, Minneboo A, Heymans MW, et al. Physical and cognitive functioning after 3 years can be predicted using information from the diagnostic process in recently diagnosed multiple sclerosis. Archives of Physical Medicine and Rehabilitation 2009;90(9):1478-88. [DOI: 10.1016/j.apmr.2009.03.018] [DOI] [PubMed] [Google Scholar]
Gout 2011 {published data only}
- Gout O, Bouchareine A, Moulignier A, Deschamps R, Papeix C, Gorochov G, et al. Prognostic value of cerebrospinal fluid analysis at the time of a first demyelinating event. Multiple Sclerosis Journal 2011;17(2):164-72. [DOI: 10.1177/1352458510385506] [DOI] [PubMed] [Google Scholar]
Gurevich 2009 {published data only}
- Gurevich M, Tuller T, Rubinstein U, Or-Bach R, Achiron A. Prediction of acute multiple sclerosis relapses by transcription levels of peripheral blood cells. BMC Medical Genomics 2009;2:46. [DOI: 10.1186/1755-8794-2-46] [DOI] [PMC free article] [PubMed] [Google Scholar]
Kosa 2022 {published data only}
- Barbour C, Kosa P, Greenwood M, Bielekova B. Constructing a molecular model of disease severity in multiple sclerosis. Neurology 2019;92(Suppl 15):P3.2-006. [Google Scholar]
- Barbour C, Kosa P, Varosanec M, Greenwood M, Bielekova B. Molecular models of multiple sclerosis severity identify heterogeneity of pathogenic mechanisms. medRxiv 2020 May 22 [Epub ahead of print]. [DOI: ] [DOI] [PMC free article] [PubMed]
- Barbour CR, Kosa P, Greenwood M, Bielekova B. Constructing a molecular model of disease severity in multiple sclerosis. Multiple Sclerosis Journal 2019;25:23. [Google Scholar]
- Kosa P, Barbour C, Varosanec M, Wichman A, Sandford M, Greenwood M, et al. Molecular models of multiple sclerosis severity identify heterogeneity of pathogenic mechanisms. Nature Communications 2022;13(1):7670. [DOI: 10.1038/s41467-022-35357-4] [DOI] [PMC free article] [PubMed] [Google Scholar]
Kuceyeski 2018 {published data only}
- Kuceyeski A, Monohan E, Morris E, Fujimoto K, Vargas W, Gauthier SA. Baseline biomarkers of connectome disruption and atrophy predict future processing speed in early multiple sclerosis. NeuroImage: Clinical 2018;19:417-24. [DOI: 10.1016/j.nicl.2018.05.003] [DOI] [PMC free article] [PubMed] [Google Scholar]
Law 2019 {published data only}
- Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning in secondary progressive multiple sclerosis: an improved predictive model for short-term disability progression. Multiple Sclerosis Journal Experimental Translational and Clinical 2019;5(4):2055217319885983. [DOI: 10.1177/2055217319885983] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning outperforms linear regression for predicting disability progression in SPMS. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/228174.
- Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning outperforms linear regression for predicting disability progression in SPMS. Multiple Sclerosis Journal 2018;24(Suppl 2):1025. [Google Scholar]
Lejeune 2021 {published data only}
- Lejeune F, Chatton A, Laplaud D, Wiertlewski S, Edan G, Le Page E, et al. SMILE: a predictive model for scoring the severity of relapses in multIple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):198. [DOI] [PubMed] [Google Scholar]
- Lejeune F, Chatton A, Laplaud DA, Le Page E, Wiertlewski S, Edan G, et al. SMILE: a predictive model for scoring the severity of relapses in multIple sclerosis. Journal of Neurology 2021;268(2):669-79. [DOI] [PubMed] [Google Scholar]
- Lejeune F, Chatton A, Laplaud DA, Wiertlewski S, Edan G, Lepage E, et al. SCOPOUSEP: a predictive model for scoring the severity of relapses in multiple sclerosis. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/229235.
- Lejeune F, Chatton A, Laplaud DA, Wiertlewski S, Edan G, Lepage E, et al. SCOPOUSEP: a predictive model for scoring the severity of relapses in multiple sclerosis. Multiple Sclerosis Journal 2018;24(Suppl 2):791-2. [DOI] [PubMed] [Google Scholar]
Malpas 2020 {published data only}
- Malpas CB, Manouchehrinia A, Sharmin S, Roos I, Horakova D, Havrdova EK, et al. Aggressive form of multiple sclerosis can be predicted early after disease onset. Multiple Sclerosis Journal 2019;25(Suppl 2):605-7. [Google Scholar]
- Malpas CB, Manouchehrinia A, Sharmin S, Roos I, Horakova D, Havrdova EK, et al. Early clinical markers of aggressive multiple sclerosis. Brain 2020;143(5):1400-13. [DOI: 10.1093/brain/awaa081] [DOI] [PubMed] [Google Scholar]
Mandrioli 2008 {published data only}
- Mandrioli J, Sola P, Bedin R, Gambini M, Merelli E. A multifactorial prognostic index in multiple sclerosis. Cerebrospinal fluid IgM oligoclonal bands and clinical features to predict the evolution of the disease. Journal of Neurology 2008;255(7):1023-31. [DOI: 10.1007/s00415-008-0827-5] [DOI] [PubMed] [Google Scholar]
Manouchehrinia 2019 {published data only}
- Manouchehrinia A, Zhu F, Piani-Meier D, Lange M, Silva DG, Carruthers R, et al. Predicting risk of secondary progression in multiple sclerosis: a nomogram. Multiple Sclerosis Journal 2019;25(8):1102-12. [DOI: 10.1177/1352458518783667] [DOI] [PubMed] [Google Scholar]
Margaritella 2012 {published data only}
- Margaritella N, Mendozzi L, Garegnani M, Colicino E, Gilardi E, Deleonardis L, et al. Sensory evoked potentials to predict short-term progression of disability in multiple sclerosis. Journal of the Neurological Sciences 2012;33(4):887-92. [DOI: 10.1007/s10072-011-0862-3] [DOI] [PubMed] [Google Scholar]
Martinelli 2017 {published data only}
- Martinelli V, Dalla Costa G, Messina MJ, Di Maggio G, Moiola L, Rodegher M, et al. Use of multiple biomarkers to improve the prediction of multiple sclerosis in patients with clinically isolated syndromes. Journal of the Neurological Sciences 2015;23(Suppl 11):370-1. [Google Scholar]
- Martinelli V, Dalla Costa G, Messina MJ, Di Maggio G, Sangalli F, Moiola L, et al. Multiple biomarkers improve the prediction of multiple sclerosis in clinically isolated syndromes. Acta Neurologica Scandinavica 2017;136(5):454-61. [DOI: 10.1111/ane.12761] [DOI] [PubMed] [Google Scholar]
Misicka 2020 {published data only}
- Misicka E, Sept C, Briggs FBS. Predicting onset of secondary-progressive multiple sclerosis using genetic and non-genetic factors. Journal of Neurology 2020;267(8):2328-39. [DOI: 10.1007/s00415-020-09850-z] [DOI] [PubMed] [Google Scholar]
Montolio 2021 {published data only}
- Montolio A, Martin-Gallego A, Cegonino J, Orduna E, Vilades E, Garcia-Martin E, et al. Machine learning in diagnosis and disability prediction of multiple sclerosis using optical coherence tomography. Computers in Biology and Medicine 2021;133:104416. [DOI] [PubMed] [Google Scholar]
Olesen 2019 {published data only}
- Olesen MN, Soelberg K, Debrabant B, Nilsson AC, Lillevang ST, Grauslund J, et al. Cerebrospinal fluid biomarkers for predicting development of multiple sclerosis in acute optic neuritis: a population-based prospective cohort study. Journal of Neuroinflammation 2019;16(1):59. [DOI: 10.1186/s12974-019-1440-5] [DOI] [PMC free article] [PubMed] [Google Scholar]
Oprea 2020 {published data only}
- Oprea S, Văleanu A, Negreș S. The development and validation of a disability and outcome prediction algorithm in multiple sclerosis patients. Farmacia 2020;68(6):1147-54. [Google Scholar]
Pellegrini 2019 {published data only}
- Copetti M, Fontana A, Freudensprung U, De Moor C, Hyde R, Bovis F, et al. Predicting MS disease progression remains a significant challenge: results from advanced statistical models of RCT placebo arms. In: 7th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2017 October 25-28; Paris (France). ECTRIMS, 2017. Available at onlinelibrary.ectrims-congress.eu/ectrims/2017/ACTRIMS-ECTRIMS2017/199979.
- Copetti M, Fontana A, Freudensprung U, De Moor C, Hyde R, Bovis F, et al. Predicting MS disease progression remains a significant challenge: results from advanced statistical models of RCT placebo arms. Multiple Sclerosis Journal 2017;23(Suppl 3):113. [Google Scholar]
- Pellegrini F, Copetti M, Sormani M P, Bovis F, Moor C, Debray TP, et al. Predicting disability progression in multiple sclerosis: insights from advanced statistical modeling. Multiple Sclerosis Journal 2019;26(14):1828-36. [DOI: 10.1177/1352458519887343] [DOI] [PubMed] [Google Scholar]
Pinto 2020 {published data only}
- Pinto MF, Oliveira H, Batista S, Cruz L, Pinto M, Correia I, et al. Prediction of disease progression and outcomes in multiple sclerosis with machine learning. Scientific Reports 2020;10(1):21038. [DOI: 10.1038/s41598-020-78212-6] [DOI] [PMC free article] [PubMed] [Google Scholar]
Pisani 2021 {published data only}
- Pisani AI, Scalfari A, Crescenzo F, Romualdi C, Calabrese M. A novel prognostic score to assess the risk of progression in relapsing-remitting multiple sclerosis patients. European Journal of Neurology 2021;28(8):2503-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pisani AI, Scalfari A, Romualdi C, Calabrese M. The progressive multiple sclerosis score: a prognostic assistant tool in multiple sclerosis disease. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279464.
- Pisani AI, Scalfari A, Romualdi C, Calabrese M. The progressive multiple sclerosis score: a prognostic assistant tool in multiple sclerosis disease. Multiple Sclerosis Journal 2019;25(Suppl 2):62. [Google Scholar]
Roca 2020 {published data only}
- Roca P, Attye A, Colas L, Tucholka A, Rubini P, Cackowski S, et al. Artificial intelligence to predict clinical disability in patients with multiple sclerosis using FLAIR MRI. Diagnostic and Interventional Imaging 2020;101(12):795-802. [DOI: 10.1016/j.diii.2020.05.009] [DOI] [PubMed] [Google Scholar]
Rocca 2017 {published data only}
- Filippi M, Rovaris MG, Sormani MP, Caputo D, Ghezzi A, Montanari E, et al. Earlier prognostication in primary progressive multiple sclerosis using MRI: a 15-year longitudinal study. European Journal of Neurology 2017;24(Suppl 1):43. [Google Scholar]
- Rocca MA, Sormani MP, Rovaris M, Caputo D, Ghezzi A, Montanari E, et al. Anticipation of long-term disability progression in PPMS using MRI: a 15-year longitudinal study. Multiple Sclerosis Journal 2017;23(Suppl 3):292-3. [Google Scholar]
- Rocca MA, Sormani MP, Rovaris M, Caputo D, Ghezzi A, Montanari E, et al. Long-term disability progression in primary progressive multiple sclerosis: a 15-year study. Brain 2017;140(11):2814-9. [DOI: 10.1093/brain/awx250] [DOI] [PubMed] [Google Scholar]
Rovaris 2006 {published data only}
- Rovaris M, Judica E, Gallo A, Benedetti B, Sormani MP, Caputo D, et al. Grey matter damage predicts the evolution of primary progressive multiple sclerosis at 5 years. Brain 2006;129(Pt 10):2628-34. [DOI: ] [DOI] [PubMed] [Google Scholar]
Runia 2014 {published data only}
- Runia TF, Jafari N, Siepman DAM, Nieboer D, Steyerberg E, et al. A clinical prediction model for definite multiple sclerosis in patients with clinically isolated syndrome. Multiple Sclerosis 2014;20(Suppl 1):404. [Google Scholar]
- Runia TF. Multiple Sclerosis - Predicting the Next Attack [Dissertation]. Rotterdam (Netherlands): Erasmus University Rotterdam, 2015. [Google Scholar]
Seccia 2020 {published data only}
- Seccia R, Gammelli D, Dominici F, Romano S, Landi AC, Salvetti M, et al. Considering patient clinical history impacts performance of machine learning models in predicting course of multiple sclerosis. PLOS One 2020;15(3):e0230219. [DOI: 10.1371/journal.pone.0230219] [DOI] [PMC free article] [PubMed] [Google Scholar]
Skoog 2014 {published data only}
- Skoog B, Runmarker B, Oden A, Andersen O. Multiple sclerosis: a method to identify high risk for secondary progression. Neurology 2012;78(Suppl 1):P05.089. [Google Scholar]
- Skoog B, Tedeholm H, Runmarker B, Oden A, Andersen O. Continuous prediction of secondary progression in the individual course of multiple sclerosis. Multiple Sclerosis and Related Disorders 2014;3(5):584-92. [DOI: 10.1016/j.msard.2014.04.004] [DOI] [PubMed] [Google Scholar]
- Tedeholm H, Skoog B, Andersen O. A method to identify the risk of transition to the secondary progressive course in multiple sclerosis patients. Neurology 2013;80(Suppl 7):P04.131. [Google Scholar]
- Tedeholm H, Skoog B, Runmarker B, Oden A, Andersen O. A new method to identify multiple sclerosis patients with a high risk for secondary progression. Multiple Sclerosis Journal 2012;18(Suppl 4):91. [Google Scholar]
Skoog 2019 {published data only}
- Skoog B, Link J, Tedeholm H, Longfils M, Nerman O, Fagius J, et al. Short-term prediction of secondary progression in a sliding window: a test of a predicting algorithm in a validation cohort. Multiple Sclerosis Journal - Experimental, Translational and Clinical 2019;5(3):2055217319875466. [DOI: 10.1177/2055217319875466] [DOI] [PMC free article] [PubMed] [Google Scholar]
Sombekke 2010 {published data only}
- Sombekke MH, Arteta D, de Wiel MA, Crusius JB, Tejedor D, Killestein J, et al. Analysis of multiple candidate genes in association with phenotypes of multiple sclerosis. Multiple Sclerosis 2010;16(6):652-9. [DOI: 10.1177/1352458510364633] [DOI] [PubMed] [Google Scholar]
Sormani 2007 {published data only}
- Sormani MP, Rovaris M, Comi G, Filippi M. A composite score to predict short-term disease activity in patients with relapsing-remitting MS. Neurology 2007;69(12):1230-5. [DOI: 10.1212/01.wnl.0000276940.90309.15] [DOI] [PubMed] [Google Scholar]
Spelman 2017 {published data only}
- Spelman T, Meyniel C, Rojas JI, Lugaresi A, Izquierdo G, Grand'Maison F, et al. Quantifying risk of early relapse in patients with first demyelinating events: prediction in clinical practice. Multiple Sclerosis Journal 2017;23(10):1346-57. [DOI: 10.1177/1352458516679893] [DOI] [PubMed] [Google Scholar]
Szilasiová 2020 {published data only}
- Szilasiová J, Rosenberger J, Mikula P, Vitková M, Fedičová M, Gdovinová Z. Cognitive event-related potentials-the P300 wave is a prognostic factor of long-term disability progression in patients with multiple sclerosis. Journal of Clinical Neurophysiology 2020 Oct 05 [Epub ahead of print]. [DOI: 10.1097/WNP.0000000000000788] [DOI] [PubMed]
Tacchella 2018 {published data only}
- Tacchella A, Romano S, Ferraldeschi M, Salvetti M, Zaccaria A, Crisanti A, et al. Collaboration between a human group and artificial intelligence can improve prediction of multiple sclerosis course: a proof-of-principle study. F1000Research 2017;6:2172. [DOI: 10.12688/f1000research.13114.2] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tacchella A, Romano S, Ferraldeschi M, Salvetti M, Zaccaria A, Crisanti A, et al. Collaboration between a human group and artificial intelligence can improve prediction of multiple sclerosis course: a proof-of-principle study. F1000Research 2018;6:2172. [DOI: 10.12688/f1000research.13114.1] [DOI] [PMC free article] [PubMed] [Google Scholar]
Tommasin 2021 {published data only}
- Tommasin S, Cocozza S, Taloni A, Gianni C, Petsas N, Pontillo G, et al. Machine learning classifier to identify clinical and radiological features relevant to disability progression in multiple sclerosis. Journal of Neurology 2021;268(12):4834-45. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tousignant 2019 {published data only}
- Tousignant A, Lemaître P, Precup D, Arnold DL, Arbel T. Prediction of disease progression in multiple sclerosis patients using deep learning analysis of MRI data. Proceedings of Machine Learning Research 2019;102:483-92. [Google Scholar]
Vasconcelos 2020 {published data only}
- Aurenção JCK, Vasconcelos CCF, Thuler LCS, Alvarenga RMP. Validation of a clinical risk score for long-term progression of MS. Multiple Sclerosis Journal 2017;23(Suppl 3):740. [Google Scholar]
- Vasconcelos CCF, Aurenção JCK, Alvarenga RMP, Thuler LCS. Long-term MS secondary progression: derivation and validation of a clinical risk score. Clinical Neurology and Neurosurgery 2020;194:105792. [DOI: 10.1016/j.clineuro.2020.105792] [DOI] [PubMed] [Google Scholar]
- Vasconcelos CCF, Thuler LCS, Calvet Kallenbach Aurenção JCK, Papais-Alvarenga RM. A proposal for a risk score for long-term progression of multiple sclerosis. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/115269.
- Vasconcelos CCF, Thuler LCS, Calvet Kallenbach Aurenção JCK, Papais-Alvarenga RM. A proposal for a risk score for long-term progression of multiple sclerosis. Multiple Sclerosis Journal 2015;21(Suppl 11):732. [Google Scholar]
Vukusic 2004 {published data only}
- Vukusic S, Hutchinson M, Hours M, Moreau T, Cortinovis-Tourniaire P, Adeleine P, et al. Erratum: Pregnancy and multiple sclerosis (the PRIMS study) - clinical predictors of post-partum relapse. Brain 2004;127(Pt 8):1912. [DOI] [PubMed] [Google Scholar]
- Vukusic S, Hutchinson M, Hours M, Moreau T, Cortinovis-Tourniaire P, Adeleine P, et al. Pregnancy and multiple sclerosis (the PRIMS study) - clinical predictors of post-partum relapse. Brain 2004;127(Pt 6):1353-60. [DOI: 10.1093/brain/awh152] [DOI] [PubMed] [Google Scholar]
Weinshenker 1991 {published data only}
- Weinshenker BG, Rice GPA, Noseworthy JH, Carriere W, Baskerville J, Ebers GC. The natural history of multiple sclerosis: a geographically based study. 3. Multivariate analysis of predictive factors and models of outcome. Brain 1991;114(Pt 2):1045-56. [DOI: 10.1093/brain/114.2.1045] [DOI] [PubMed] [Google Scholar]
Weinshenker 1996 {published data only}
- Weinshenker BG, Issa M, Baskerville J. Long-term and short-term outcome of multiple sclerosis: a 3-year follow-up study. Archives of Neurology 1996;53(4):353-8. [DOI: 10.1001/archneur.1996.00550040093018] [DOI] [PubMed] [Google Scholar]
Wottschel 2015 {published data only}
- Ciccarelli O, Kwok PP, Wottschel V, Chard D, Stromillo ML, De Stefano N, et al. Predicting clinical conversion to multiple sclerosis in patients with clinically isolated syndrome using machine learning techniques. Multiple Sclerosis Journal 2012;18(Suppl 4):30-1. [Google Scholar]
- Wottschel V, Alexander DC, Kwok PP, Chard DT, Stromillo ML, De Stefano N, et al. Predicting outcome in clinically isolated syndrome using machine learning. NeuroImage: Clinical 2015;7:281-7. [DOI: 10.1016/j.nicl.2014.11.021] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wottschel V, Ciccarelli O, Chard DT, Miller DH, Alexander DC. Prediction of second neurological attack in patients with clinically isolated syndrome using support vector machines. In: 2013 International Workshop on Pattern Recognition in Neuroimaging. 2013:82-5.
Wottschel 2019 {published data only}
- Wottschel V, Chard DT, Enzinger C, Filippi M, Frederiksen JL, Gasperini C, et al. SVM recursive feature elimination analyses of structural brain MRI predicts near-term relapses in patients with clinically isolated syndromes suggestive of multiple sclerosis. NeuroImage: Clinical 2019;24:102011. [DOI: 10.1016/j.nicl.2019.102011] [DOI] [PMC free article] [PubMed] [Google Scholar]
Ye 2020 {published data only}
- Ye F, Liang J, Li J, Li H, Sheng W. Development and validation of a five-gene signature to predict relapse-free survival in multiple sclerosis. Frontiers in Neurology 2020;11:579683. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yoo 2019 {published data only}
- Yoo Y, Tang LW, Brosch T, Li DKB, Metz L, Traboulsee A, et al. Deep Learning and Data Labeling for Medical Applications. Springer, 2016. [Google Scholar]
- Yoo Y, Tang LYW, Li DKB, Metz L, Kolind S, Traboulsee AL, et al. Deep learning of brain lesion patterns and user-defined clinical and MRI features for predicting conversion to multiple sclerosis from clinically isolated syndrome. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization 2019;7(3):250-9. [DOI: 10.1080/21681163.2017.1356750] [DOI] [Google Scholar]
Yperman 2020 {published data only}
- Yperman J, Becker T, Valkenborg D, Popescu V, Hellings N, Van Wijmeersch B, et al. Machine learning analysis of motor evoked potential time series to predict disability progression in multiple sclerosis. BMC Neurology 2020;20(1):105. [DOI: 10.1186/s12883-020-01672-w] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yperman J, Becker T, Valkenborg D, Popescu V, Hellings N, Van Wijmeersch B, et al. Machine learning analysis of motor evoked potential time series to predict disability progression in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):874-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zakharov 2013 {published data only}
- Zakharov AV, Khinivtseva EV, Poverennova IE, Gindullina EA, Vlasov Ia V, Sineok EV. Assessment of the risk of the transition of a monofocal clinically isolated syndrome to clinically definite multiple sclerosis. Zhurnal Nevrologii i Psikhiatrii Imeni S.S. Korsakova 2013;113(2 Pt 2):28-31. [PMID: ] [PubMed] [Google Scholar]
Zhang 2019 {published data only}
- Zhang H, Alberts E, Pongratz V, Mühlau M, Zimmer C, Wiestler B, et al. Predicting conversion from clinically isolated syndrome to multiple sclerosis - an imaging-based machine learning approach. NeuroImage: Clinical 2019;21:101593. [DOI: 10.1016/j.nicl.2018.11.003] [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao 2020 {published data only}
- Chitnis T, Zhao Y, Healy BC, Rotstein D, Guttmann CRG, Bakshi R, et al. Predicting clinical course in multiple sclerosis using machine learning. In: 6th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2014 September 10-13; Boston (MA). ECTRIMS, 2014. Available at onlinelibrary.ectrims-congress.eu/ectrims/2014/ACTRIMS-ECTRIMS2014/64470.
- Chitnis T, Zhao Y, Healy BC, Rotstein D, Guttmann CRG, Bakshi R, et al. Predicting clinical course in multiple sclerosis using machine learning. Multiple Sclerosis Journal 2014;20(Suppl 1):404. [Google Scholar]
- Zhao Y, Chitnis T, Doan T. Ensemble learning for predicting multiple sclerosis disease course. Multiple Sclerosis Journal 2019;25(Suppl 1):160-1. [Google Scholar]
- Zhao Y, Healy BC, Rotstein D, Guttmann CR, Bakshi R, Weiner HL, et al. Exploration of machine learning techniques in predicting multiple sclerosis disease course. PLOS One 2017;12(4):e0174866. [DOI: 10.1371/journal.pone.0174866] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Y, Wang T, Bove R, Cree B, Henry R, Lokhande H, et al. Ensemble learning predicts multiple sclerosis disease course in the SUMMIT study. NPJ Digital Medicine 2020;3:135. [DOI: 10.1038/s41746-020-00361-9] [DOI] [PMC free article] [PubMed] [Google Scholar]
References to studies excluded from this review
Achiron 2006 {published data only}
- Achiron A. Measuring disability progression in multiple sclerosis. Journal of Neurology 2006;253(6):vi31-6. [Google Scholar]
Ahlbrecht 2016 {published data only}
- Ahlbrecht J, Martino F, Pul R, Skripuletz T, Suhs KW, Schauerte C, et al. Deregulation of microRNA-181c in cerebrospinal fluid of patients with clinically isolated syndrome is associated with early conversion to relapsing-remitting multiple sclerosis. Multiple Sclerosis Journal 2016;22(9):1202-14. [DOI: 10.1177/1352458515613641] [DOI] [PubMed] [Google Scholar]
Andersen 2015 {published data only}
- Andersen O, Skoog B, Runmarker B, Lisovskaja V, Nerman O, Tedeholm H. Fifty years untreated prognosis of multiple sclerosis based on an incidence cohort. European Journal of Neurology 2015;22(Suppl 1):25. [Google Scholar]
Azevedo 2019 {published data only}
- Azevedo C, Cen S, Zheng L, Jaberzadeh A, Pelletier D. Minimum clinically important difference for brain atrophy measures in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):697-8. [Google Scholar]
Barkhof 1997 {published data only}
- Barkhof F, Filippi M, Miller DH, Scheltens P, Campi A, Polman CH, et al. Comparison of MRI criteria at first presentation to predict conversion to clinically definite multiple sclerosis. Brain 1997;120(Pt 11):2059-69. [DOI: 10.1093/brain/120.11.2059] [DOI] [PubMed] [Google Scholar]
Brettschneider 2006 {published data only}
- Brettschneider J, Petzold A, Junker A, Tumani H. Axonal damage markers in the cerebrospinal fluid of patients with clinically isolated syndrome improve predicting conversion to definite multiple sclerosis. Multiple Sclerosis Journal 2006;12(2):143-8. [DOI] [PubMed] [Google Scholar]
Bsteh 2021 {published data only}
- Bsteh G, Hegen H, Riedl K, Altmann P, Auer M, Berek K, et al. Quantifying the risk of disease reactivation after interferon and glatiramer acetate discontinuation in multiple sclerosis: the VIAADISC score. European Journal of Neurology 2021;28(5):1609-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
Castellaro 2015 {published data only}
- Castellaro M, Bertoldo A, Morra A, Monaco S, Calabrese M, Doyle O. Prediction of conversion to secondary progression phase in multiple sclerosis. Multiple Sclerosis Journal 2015;23:198-9. [Google Scholar]
Chalkou 2021 {published data only}
- Chalkou K, Steyerberg E, Egger M, Manca A, Pellegrini F, Salanti G. A two-stage prediction model for heterogeneous effects of treatments. Statistics in Medicine 2021;40(20):4362-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
Costa 2017 {published data only}
- Costa GD, Di Maggio G, Sangalli F, Moiola L, Colombo B, Comi G, et al. Prognostic factors for multiple sclerosis in patients with spinal isolated syndromes. European Journal of Neurology 2017;24:62. [Google Scholar]
Cutter 2014 {published data only}
- Cutter G, Wolinsky JS, Comi G, Ladkani D, Knappertz V, Vainstein A, et al. Indirect comparison of glatiramer acetate 40mg/mL TIW and 20mg/mL QD dosing regimen effects on relapse rate: results of a predictive statistical model. Multiple Sclerosis Journal 2014;20:112. [Google Scholar]
Damasceno 2019 {published data only}
- Damasceno A, Pimentel-Silva LR, Damasceno BP, Cendes F. Cognitive trajectories in relapsing–remitting multiple sclerosis: a longitudinal 6-year study. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/278685. [DOI] [PubMed]
- Damasceno A, Pimentel-Silva LR, Damasceno BP, Cendes F. Cognitive trajectories in relapsing–remitting multiple sclerosis: a longitudinal 6-year study. Multiple Sclerosis Journal 2019;26(13):1740-51. [DOI: 10.1177/1352458519878685] [DOI] [PubMed] [Google Scholar]
Daumer 2007 {published data only}
- Daumer M, Neuhaus A, Lederer C, Scholz M, Wolinsky JS, Heiderhoff M, et al. Prognosis of the individual course of disease - steps in developing a decision support tool for multiple sclerosis. BMC Medical Informatics and Decision Making 2007;7:11. [DOI: 10.1186/1472-6947-7-11] [DOI] [PMC free article] [PubMed] [Google Scholar]
Dekker 2019 {published data only}
- Dekker I, Eijlers AJC, Popescu V, Balk LJ, Vrenken H, Wattjes MP, et al. Predicting clinical progression in multiple sclerosis after 6 and 12 years. European Journal of Neurology 2019;26(6):893-902. [DOI] [PMC free article] [PubMed] [Google Scholar]
Esposito 2011 {published data only}
- Esposito M, De Falco I, De Pietro G. An evolutionary-fuzzy DSS for assessing health status in multiple sclerosis disease. International Journal of Medical Informatics 2011;80(12):e245-54. [DOI] [PubMed] [Google Scholar]
Filippi 2010 {published data only}
- Filippi M, Rocca MA, Calabrese M, Sormani MP, Rinaldi F, Perini P, et al. Intracortical lesions and new magnetic resonance imaging diagnostic criteria for multiple sclerosis. Multiple Sclerosis Journal 2010;16:S42. [DOI] [PubMed] [Google Scholar]
Filippi 2013 {published data only}
- Filippi M, Preziosa P, Copetti M, Riccitelli G, Horsfield MA, Martinelli V, et al. Gray matter damage predicts the accumulation of disability 13 years later in MS. Neurology 2013;81(20):1759-67. [DOI] [PubMed] [Google Scholar]
Fuchs 2021 {published data only}
- Fuchs TA, Dwyer MG, Jakimovski D, Bergsland N, Ramasamy DP, Weinstock-Guttman B, et al. Quantifying disease pathology and predicting disease progression in multiple sclerosis with only clinical routine T2-FLAIR MRI. NeuroImage: Clinical 2021;31:102705. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gasperini 2021 {published data only}
- Gasperini C, Prosperini L, Rovira A, Tintore M, Sastre-Garriga J, Tortorella C, et al. Scoring the 10-year risk of ambulatory disability in multiple sclerosis: the RoAD score. European Journal of Neurology 2021;28(8):2533-42. [DOI] [PubMed] [Google Scholar]
- Gasperini C, Prosperini L, Tortorella C, Haggiag S, Ruggieri S, Mancinelli CR, et al. Scoring the 10-year risk of ambulatory disability in DMD treated multiple sclerosis patients: the RoAD score. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/231905.
- Gasperini C, Prosperini L, Tortorella C, Haggiag S, Ruggieri S, Mancinelli CR, et al. Scoring the 10-year risk of ambulatory disability in DMD treated multiple sclerosis patients: the RoAD score. Multiple Sclerosis Journal 2018;24(Suppl 2):58. [Google Scholar]
Gomez‐Gonzalez 2010 {published data only}
- Gomez-Gonzalez E, Garcia-Sanchez MI, Izquierdo-Ayuso G, Coca De La Torre A, Ramirez-Martinez D, Marco-Ramirez AM, et al. Application of image and signal processing algorithms to oligoclonal IgG bands classification. Multiple Sclerosis Journal 2010;16:341-2. [Google Scholar]
Hakansson 2017 {published data only}
- Hakansson I, Tisell A, Cassel P, Blennow K, Zetterberg H, Lundberg P, et al. Neurofilament light chain in cerebrospinal fluid and prediction of disease activity in clinically isolated syndrome and relapsing-remitting multiple sclerosis. European Journal of Neurology 2017;24(5):703-12. [DOI] [PubMed] [Google Scholar]
Ho 2013 {published data only}
- Ho J, Ghosh J, Unnikrishnan K. Risk prediction of a multiple sclerosis diagnosis. In: 2013 IEEE International Conference on Healthcare Informatics. 2013:175-83.
Ignatova 2018 {published data only}
- Ignatova V, Todorova L, Haralanov L. Predictors of long term disability progression in patients with relapsing remitting multiple sclerosis. Multiple Sclerosis Journal 2018;24(Suppl 2):788-9. [Google Scholar]
Invernizzi 2011 {published data only}
- Invernizzi P, Bertolasi L, Bianchi MR, Turatti M, Gajofatto A, Benedetti MD. Prognostic value of multimodal evoked potentials in multiple sclerosis: the EP score. Journal of Neurology 2011;258(11):1933-9. [DOI] [PubMed] [Google Scholar]
Jackson 2020 {published data only}
- Jackson KC, Sun K, Barbour C, Hernandez D, Kosa P, Tanigawa M, et al. Genetic model of MS severity predicts future accumulation of disability. Annals of Human Genetics 2020;84(1):1-10. [DOI: 10.1111/ahg.12342] [DOI] [PMC free article] [PubMed] [Google Scholar]
Kalincik 2013 {published data only}
- Kalincik T, Guttmann CR, Krasensky J, Vaneckova M, Lelkova P, Tyblova M, et al. Multiple sclerosis susceptibility loci do not alter clinical and MRI outcomes in clinically isolated syndrome. Genes & Immunity 2013;14(4):244-8. [DOI: 10.1038/gene.2013.17] [DOI] [PubMed] [Google Scholar]
Leocani 2017 {published data only}
- Leocani L, Pisa M, Bianco M, Guerrieri S, Di Maggio G, Romeo M, et al. Multimodal EPs predict no evidence of disease activity at two years of first line multiple sclerosis treatment. Neurology 2017;88(Suppl 16):P4.386. [Google Scholar]
Morelli 2020 {published data only}
- Morelli ME, Baldini S, Sartori A, D'Acunto L, Dinoto A, Bosco A, et al. Early putamen hypertrophy and ongoing hippocampus atrophy predict cognitive performance in the first ten years of relapsing-remitting multiple sclerosis. Neurological Sciences 2020;41(10):2893-904. [DOI] [PubMed] [Google Scholar]
Palace 2013 {published data only}
- Palace J, Bregenzer T, Tremlett H, Duddy M, Boggild M, Zhu F, et al. Modelling natural history for the UK multiple sclerosis risk-sharing scheme. Multiple Sclerosis Journal 2013;19(Suppl 1):339. [Google Scholar]
Pappalardo 2020 {published data only}
- Pappalardo F, Russo G, Pennisi M, Parasiliti Palumbo GA, Sgroi G, Motta S, et al. The potential of computational modeling to predict disease course and treatment response in patients with relapsing multiple sclerosis. Cells 2020;9(3):586. [DOI] [PMC free article] [PubMed] [Google Scholar]
Petrou 2018 {published data only}
- Petrou P, Yagmour N, Karussis D. Biomarkes for diagnosis and prognosis in multiple sclerosis. Multiple Sclerosis Journal 2018;24:15. [Google Scholar]
Preziosa 2015 {published data only}
- Preziosa P, Rocca M, Mesaros S, Copetti M, Petrolini M, Drulovic J, et al. Different MRI measures predict clinical deterioration and cognitive impairment in MS: a 5 year longitudinal study. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116658.
Rajda 2019 {published data only}
- Rajda C, Galla Z, Polyák H, Maróti Z, Babarczy K, Pukoli D, et al. High neurofilament light chain and high quinolinic acid levels in the CSF of patients with multiple sclerosis are independent predictors of active, disabling disease. Multiple Sclerosis Journal 2019;25:856. [Google Scholar]
Rio 2019 {published data only}
- Rio J, Rovira A, Gasperini C, Tintore M, Prosperini L, Otero-Romero S, et al. Treatment response scoring systems to assess long term prognosis in relapsing-remitting multiple sclerosis patients. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279564.
- Rio J, Rovira A, Gasperini C, Tintore M, Prosperini L, Otero-Romero S, et al. Treatment response scoring systems to assess long term prognosis in relapsing-remitting multiple sclerosis patients. Multiple Sclerosis Journal 2019;25:121-2. [Google Scholar]
Rodriguez 2012 {published data only}
- Rodriguez JD, Perez A, Arteta D, Tejedor D, Lozano JA. Using multidimensional bayesian network classifiers to assist the treatment of multiple sclerosis. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2012;42(6):1705-15. [DOI: 10.1109/TSMCC.2012.2217326] [DOI] [Google Scholar]
Rothman 2016 {published data only}
- Rothman AM, Button J, Balcer LJ, Frohman EM, Frohman TC, Reich DS, et al. Retinal measurements predict 10-year disability in multiple sclerosis. In: 32nd European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2016 September 14-17; London (UK). ECTRIMS, 2016. Available at onlinelibrary.ectrims-congress.eu/ectrims/2016/32nd/146960.
- Rothman AM, Button J, Balcer LJ, Frohman EM, Frohman TC, Reich DS, et al. Retinal measurements predict 10-year disability in multiple sclerosis. Multiple Sclerosis Journal 2016;22:20-1. [Google Scholar]
Roura 2018 {published data only}
- Roura E, Maclair G, Martinez-Lapiscina EH, Andorra M, Villoslada P. Brain complexity and damage in patients with multiple sclerosis using fractal analysis: a new imaging outcome for monitoring MS severity. Multiple Sclerosis Journal 2018;24(2):210. [Google Scholar]
Sbardella 2011 {published data only}
- Sbardella E, Tomassini V, Stromillo ML, Filippini N, Battaglini M, Ruggieri S, et al. Pronounced focal and diffuse brain damage predicts short-term disease evolution in patients with clinically isolated syndrome suggestive of multiple sclerosis. Multiple Sclerosis Journal 2011;17(12):1432-40. [DOI] [PubMed] [Google Scholar]
Schlaeger 2012 {published data only}
- Schlaeger R, D'Souza M, Schindler C, Grize L, Dellas S, Radue EW, et al. Prediction of long-term disability in multiple sclerosis. Multiple Sclerosis Journal 2012;18(1):31-8. [DOI] [PubMed] [Google Scholar]
Srinivasan 2020 {published data only}
- Srinivasan J, Gudesblatt M. Multiple sclerosis management: predicting disease trajectory of multiple sclerosis on multi-dimensional data including digital cognitive assessments and patient reported outcomes using machine learning techniques. In: 5th Annual Americas Committee for Treatment and Research in Multiple Sclerosis (ACTRIMS); 2020 February 27-29; West Palm Beach (FL). West Palm Beach (FL): ACTRIMS, 2020.
Tintore 2015 {published data only}
- Tintoré M. Predicting MS extremes: benign and aggressive. Multiple Sclerosis 2015;23:56. [Google Scholar]
Tomassini 2019 {published data only}
- Tomassini V, Fanelli F, Prosperini L, Cerqua R, Cavalla P, Pozzilli C. Predicting the profile of increasing disability in multiple sclerosis. Multiple Sclerosis Journal 2019;25(9):1306-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tossberg 2013 {published data only}
- Tossberg JT, Crooke PS, Henderson MA, Sriram S, Mrelashvili D, Vosslamber S, et al. Using biomarkers to predict progression from clinically isolated syndrome to multiple sclerosis. Journal of Clinical Bioinformatics 2013;3(1):18. [DOI] [PMC free article] [PubMed] [Google Scholar]
Uher 2017a {published data only}
- Uher T, Vaneckova M, Sobisek L, Tyblova M, Seidl Z, Krasensky J, et al. Combining clinical and magnetic resonance imaging markers enhances prediction of 12-year disability in multiple sclerosis. Multiple Sclerosis Journal 2017;23(1):51-61. [DOI] [PubMed] [Google Scholar]
Uher 2017b {published data only}
- Uher T, Vaneckova M, Sormani MP, Krasensky J, Sobisek L, Dusankova JB, et al. Identification of multiple sclerosis patients at highest risk of cognitive impairment using an integrated brain magnetic resonance imaging assessment approach. European Journal of Neurology 2017;24(2):292-301. [DOI] [PubMed] [Google Scholar]
Veloso 2014 {published data only}
- Veloso M. A web-based decision support tool for prognosis simulation in multiple sclerosis. Multiple Sclerosis and Related Disorders 2014;3(5):575-83. [DOI] [PubMed] [Google Scholar]
Vukusic 2006 {published data only}
- Vukusic S, Confavreux C. Pregnancy and multiple sclerosis: the children of PRIMS. Clinical Neurology and Neurosurgery 2006;108(3):266-70. [DOI] [PubMed] [Google Scholar]
Wahid 2019 {published data only}
- Wahid K, Charron O, Colen R, Shinohara RT, Kotrotsou A, Papadimitropoulos G, et al. Prediction of disability and treatment response from radiomic features: a machine learning analysis from the combirx multi-center cohort. Multiple Sclerosis Journal 2019;25:112-3. [Google Scholar]
Zephir 2009 {published data only}
- Zephir H, Lefranc D, Dubucquoi S, Seze J, Boron L, Prin L, et al. Serum IgG repertoire in clinically isolated syndrome predicts multiple sclerosis. Multiple Sclerosis Journal 2009;15(5):593-600. [DOI] [PubMed] [Google Scholar]
Ziemssen 2019 {published data only}
- Ziemssen T, Piani-Meier D, Bennett B, Johnson C, Tinsley K, Trigg A, et al. Validation of the scoring algorithm for a novel integrative MS progression discussion tool. European Journal of Neurology 2019;26:872. [Google Scholar]
References to studies awaiting assessment
Achiron 2007 {published data only}
- Achiron A, Gurevich M, Snir Y, Segal E, Mandel M. Zinc-ion binding and cytokine activity regulation pathways predicts outcome in relapsing-remitting multiple sclerosis. Clinical and Experimental Immunology 2007;149(2):235-42. [DOI: 10.1111/j.1365-2249.2007.03405.x] [DOI] [PMC free article] [PubMed] [Google Scholar]
Behling 2019 {published data only}
- Behling M, Bryant A, Brecht T, Cerf S, Gliklich R, Su Z. Predicting relapse episodes in patients with multiple sclerosis treated with disease modifying therapies in a large representative real-world cohort in the United States. Pharmacoepidemiology and Drug Safety 2019;28(Suppl 2):130. [Google Scholar]
Castellazzi 2019 {published data only}
- Castellazzi G, Martinelli D, Collorone S, Alhamadi A, Debernard L, Melzer TR, et al. A clinical decision system based on resting state fMRI-derived features to predict the conversion of CIS to RRMS. Multiple Sclerosis Journal 2019;25(Suppl 2):686-7. [Google Scholar]
- Castellazzi G, Martinelli D, Collorone S, Alhamadi A, Debernard L, Melzer TR, et al. A clinical decision system based on resting state fMRI-derived features to predict the conversion of CIS to RRMS. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/278467: ECTRIMS, 2019.
Chaar 2019 {published data only}
- Chaar D, Kakara M, Razmjou S, Bernitsas E. Predicting EDSS in MS through imaging biomarkers using artificial neural networks. Neurology 2019;92(Suppl 15):P5.2-010. [Google Scholar]
Dalla Costa 2014 {published data only}
- Dalla Costa G, Moiola L, Leocani L, Furlan R, Filippi M, Comi G, et al. Artificial intelligence techniques in the diagnosis of clinically definite multiple sclerosis. Multiple Sclerosis Journal 2014;20(Suppl 1):170. [Google Scholar]
Ghosh 2009 {published data only}
- Ghosh P, Neuhaus A, Daumer M, Basu S. Joint modelling of multivariate longitudinal data for mixed responses and survival in multiple sclerosis. Multiple Sclerosis 2009;15:S157-8. [Google Scholar]
Kister 2015 {published data only}
- Kister I, Bacon T, Levinas M, Green R, Cutter G, Chamot E. Stability and prognostic utility of patient-derived MS severity score (P-MSSS) among MS clinic patients. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/115800.
- Kister I, Bacon T, Levinas M, Green R, Cutter G, Chamot E. Stability and prognostic utility of patient-derived MS severity score (P-MSSS) among MS clinic patients. Multiple Sclerosis Journal 2015;21(Suppl 11):410-1. [Google Scholar]
- Kister I, Cutter G, Salter A, Herbert J, Chamot E. Novel, easy-to-use prediction tool accurately estimates probability of “aggressive MS” at 2-year follow up. Neurology 2015;84(Suppl 14):P3.214. [Google Scholar]
Mallucci 2019 {published data only}
- Mallucci G, Trivelli L, Colombo E, Trojano M, Amato MP, Zaffaroni M, et al. The RECIS (risk estimate in CIS) study: a novel model to early predict clinically isolated syndrome evolution. Multiple Sclerosis Journal 2019;25(Suppl 2):405-6. [Google Scholar]
Medin 2016 {published data only}
- Medin J, Joyeux A, Braune S, Bergmann A, Rigg J, Wang L. Predicting disease activity for patients with relapsing remitting multiple sclerosis using electronic medical records. In: American Academy of Neurology Annual Meeting; 2016 April 15-21; Vancouver (Canada). 2016. Available at neurotransdata.com/images/publikationen/2016-predicting-disease-activity-aan.pdf.
- Medin J, Joyeux A, Braune S, Bergmann A, Rigg J, Wang L. Predicting disease activity for patients with relapsing remitting multiple sclerosis using electronic medical records. Neurology 2016;86(Suppl 16):P1.395. [Google Scholar]
Pareto 2017 {published data only}
- Pareto D, Garcia A, Huerga E, Auger C, Sastre-Garriga J, Tintore M, et al. Pattern recognition for neuroimaging toolbox PRoNTo: a pilot study in predicting clinically isolated syndrome conversion. Multiple Sclerosis Journal 2017;23(Suppl 3):231-2. [Google Scholar]
Sharmin 2020 {published data only}
- Sharmin S, Bovis F, Malpas C, Horakova D, Havrdova E, Ayuso GI, et al. Predicting long-term sustained disability progression in multiple sclerosis. Neurology 2020;94(Suppl 15):2002. [Google Scholar]
- Sharmin S, Bovis F, Sormani MP, Butzkueven H, Kalincik T. Predicting long-term sustained disability progression in multiple sclerosis: application in the clarity trial. Multiple Sclerosis Journal 2020;26(Suppl 3):181. [Google Scholar]
- Sharmin S, Malpas C, Horakova D, Havrdova EK, Izquierdo G, Eichau S, et al. Predicting long-term sustained disability progression in multiple sclerosis. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279563.
- Sharmin S, Malpas C, Horakova D, Havrdova EK, Izquierdo G, Eichau S, et al. Predicting long-term sustained disability progression in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):119-21. [Google Scholar]
- Sharmin S. Follow-up for cochrane review - prognostc predicton models in multiple sclerosis [pers comm]. Email to: On BI 12 April 2021.
Silva 2017 {published data only}
- Silva D, Meier DP, Ritter S, Davorka T, Medin J, Lange M, et al. Multiple sclerosis care optimization tool (MS-COT): a clinical application prototype to predict future disease activity. Neurology 2017;88(16):P1.368. [Google Scholar]
- Silva D, Meier DP, Ritter S, Tomic D, Medin J, Lange M, et al. Multiple sclerosis care optimization tool (MSCOT): a clinical application prototype to predict future disease activity. In: 69th Congress of the American Academy of Neurology; 2017 April 22-28; Boston (MA). Novartis Pharma AG, 2017. Available at novartis.medicalcongressposters.com/Default.aspx?doc=ac1bf.
Tayyab 2020 {published data only}
- Tam R. Follow-up for cochrane review - prognostic prediction models in multiple sclerosis (Tayyab 2020) [pers comm]. Email to: K Reeve 20 July 2021.
- Tayyab M, Metz L, Dvorak A, Kolind S, Au S, Carruthers R, et al. Machine learning of deep grey matter volumes on mri for predicting new disease activity after a first clinical demyelinating event. Multiple Sclerosis Journal 2020;26(Suppl 3):116-7. [Google Scholar]
Thiele 2009 {published data only}
- Thiele A, Lederer C, Neuhaus A, Strobl R, Fahrmeir L, Koch-Henriksen N, et al. Comparison of model-based and matching-based prediction of the annualised relapse-rate of MS-patients. Multiple Sclerosis 2009;15(9):S163. [Google Scholar]
Tintoré 2015 {published data only}
- Tintoré M, Río J, Otero-Romero S, Arrambide G, Tur C, Comabella M, et al. Dynamic model for predicting prognosis in CIS patients. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116690.
- Tintoré M, Río J, Otero-Romero S, Arrambide G, Tur C, Comabella M, et al. Dynamic model for predicting prognosis in CIS patients. Multiple Sclerosis Journal 2015;21(Suppl 11):33. [Google Scholar]
Tommasin 2019 {published data only}
- Tommasin S, Taloni A, Farrelly FA, Petsas N, Ruggieri S, Gianni C, et al. Evaluation of 5-year disease progression in multiple sclerosis via magnetic-resonance-based deep learning techniques. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279263.
- Tommasin S, Taloni A, Farrelly FA, Petsas N, Ruggieri S, Gianni C, et al. Evaluation of 5-year disease progression in multiple sclerosis via magnetic-resonance-based deep learning techniques. Multiple Sclerosis Journal 2019;25(Suppl 2):468. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wahid 2018 {published data only}
- Wahid K, Colen R, Kotrotsou A, Lincoln J, Narayana PA, Cofield SS, et al. Radiomic prediction of clinical outcome in multiple sclerosis patients from the combirx cohort. Multiple Sclerosis Journal 2018;24(Suppl 1):71-2. [Google Scholar]
Additional references
Adelman 2013
- Adelman G, Rane SG, Villa KF. The cost burden of multiple sclerosis in the United States: a systematic review of the literature. Journal of Medical Economics 2013;16(5):639-47. [DOI: 10.3111/13696998.2013.778268] [DOI] [PubMed] [Google Scholar]
Altman 2000
- Altman DG, Royston P. What do we mean by validating a prognostic model? Statistics in Medicine 2000;19(4):453-73. [DOI: ] [DOI] [PubMed] [Google Scholar]
Altman 2014
- Altman DG. The time has come to register diagnostic and prognostic research. Clinical Chemistry 2014;60(4):580-2. [DOI: 10.1373/clinchem.2013.220335] [DOI] [PubMed] [Google Scholar]
Attfield 2022
- Attfield KE, Jensen LT, Kaufmann M, Friese MA, Fugger L. The immunology of multiple sclerosis. Nature Reviews Immunology 2022;22(12):734-50. [DOI: 10.1038/s41577-022-00718-z] [DOI] [PubMed] [Google Scholar]
Bakshi 2005
- Bakshi R, Dandamudi VS, Neema M, De C, Bermel RA. Measurement of brain and spinal cord atrophy by magnetic resonance imaging as a tool to monitor multiple sclerosis. Journal of Neuroimaging 2005;15(4 Suppl):30s-45s. [DOI] [PubMed] [Google Scholar]
Belbasis 2015
- Belbasis L, Bellou V, Evangelou E, Ioannidis JPA, Tzoulaki I. Environmental risk factors and multiple sclerosis: an umbrella review of systematic reviews and meta-analyses. Lancet Neurology 2015;14(3):263-73. [DOI: 10.1177/1352458517690270] [DOI] [PubMed] [Google Scholar]
Bjornevik 2023
- Bjornevik K, Münz C, Cohen JI, Ascherio A. Epstein–Barr virus as a leading cause of multiple sclerosis: mechanisms and implications. Nature Reviews Neurology 2023;19(3):160-71. [DOI: 10.1038/s41582-023-00775-5] [DOI] [PubMed] [Google Scholar]
Bluemke 2020
- Bluemke DA, Moy L, Bredella MA, Ertl-Wagner BB, Fowler KJ, Goh VJ, et al. Assessing radiology research on artificial intelligence: a brief guide for authors, reviewers, and readers—from the Radiology editorial board. Radiology 2020;294(3):487–89. [DOI: 10.1148/radiol.2019192515] [DOI] [PubMed] [Google Scholar]
Bossuyt 2015
- Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 2015;351:h5527. [DOI: 10.1136/bmj.h5527] [DOI] [PMC free article] [PubMed] [Google Scholar]
Boulesteix 2019
- Boulesteix A, Janitza S, Hornung R, Probst P, Busen H, Hapfelmeier A. Making complex prediction rules applicable for readers: current practice in random forest literature and recommendations. Biometrical Journal 2019;61(5):1314-28. [DOI: 10.1002/bimj.201700243] [DOI] [PubMed] [Google Scholar]
Bouwmeester 2012
- Bouwmeester W, Zuithoff NPA, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, et al. Reporting and methods in clinical prediction research: a systematic review. PLOS Medicine 2012;9(5):e1001221. [DOI: 10.1371/journal.pmed.1001221] [DOI] [PMC free article] [PubMed] [Google Scholar]
Bovis 2019
- Bovis F, Carmisciano L, Signori A, Pardini M, Steinerman JR, Li T, et al. Defining responders to therapies by astatistical modeling approach appliedto randomized clinical trial data. BMC Medicine 2019;17:113. [DOI: 10.1186/s12916-019-1345-2] [DOI] [PMC free article] [PubMed] [Google Scholar]
Briggs 2019
- Briggs FB, Thompson NR, Conway DS. Prognostic factors of disability in relapsing remitting multiple sclerosis. Multiple Sclerosis and Related Disorders 2019;30:9-16. [DOI: 10.1016/j.msard.2019.01.045] [DOI] [PubMed] [Google Scholar]
Briscoe 2020
- Briscoe S, Bethel A, Rogers M. Conduct and reporting of citation searching in Cochrane systematic reviews: a cross-sectional study. Research Synthesis Methods 2020;11(2):169-80. [DOI: 10.1002/jrsm.1355] [DOI] [PMC free article] [PubMed] [Google Scholar]
Brown 2020
- Brown FS, Glasmacher SA, Kearns PKA, MacDougall N, Hunt D, Connick P, et al. Systematic review of prediction models in relapsing remitting multiple sclerosis. PLOS One 2020;15(5):e0233575. [DOI: 10.1371/journal.pone.0233575] [DOI] [PMC free article] [PubMed] [Google Scholar]
Chatfield 1995
- Chatfield C. Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society. Series A (Statistics in Society) 1995;158(3):419-66. [DOI: ] [Google Scholar]
Chen 2017
- Chen JH, Asch SM. Machine learning and prediction in medicine — beyond the peak of inflated expectations. New England Journal of Medicine 2017;376(26):2507-9. [DOI: 10.1056/NEJMp1702071] [DOI] [PMC free article] [PubMed] [Google Scholar]
Cochrane 2021
- Cochrane Multiple Sclerosis and Rare Disease of the CNS. Our reviews. msrdcns.cochrane.org/our-review (accessed 30 October 2021).
Cohen 1988
- Cohen J. Statistical Power Analysis for the Behavioral Sciences. Hillsdale (NJ): L. Erlbaum Associates, 1988. [Google Scholar]
Collins 2015
- Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Journal of Clinical Epidemiology 2015;68(2):112-21. [DOI: 10.1016/j.jclinepi.2014.11.010] [DOI] [PubMed] [Google Scholar]
Concato 1993
- Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models. Annals of Internal Medicine 1993;118(3):201-10. [DOI: 10.7326/0003-4819-118-3-199302010-00009] [DOI] [PubMed] [Google Scholar]
Correale 2012
- Correale J, Ysrraelit MC, Fiol MP. Benign multiple sclerosis: does it exist? Current Neurology and Neuroscience Reports 2012;12(5):601-9. [DOI: 10.1007/s11910-012-0292-5] [DOI] [PubMed] [Google Scholar]
Cree 2016
- Cree BAC, Gourraud P-A, Oksenberg JR, Bevan C, Crabtree-Hartman E, Gelfand JM, et al. Long-term evolution of multiple sclerosis disability in the treatment era. Annals of Neurology 2016;80(4):499-510. [DOI: 10.1002/ana.24747] [DOI] [PMC free article] [PubMed] [Google Scholar]
Cree 2019
- Cree BA, Hollenbach JA, Bove R, Kirkish G, Sacco S, Caverzasi E, et al. Silent progression in disease activity-free relapsing multiple sclerosis. Annals of Neurology 2019;85(5):653-66. [DOI: 10.1002/ana.25463] [DOI] [PMC free article] [PubMed] [Google Scholar]
Day 2018
- Day GS, Rae-Grant A, Armstrong MJ, Pringsheim T, Cofield SS, Marrie RA. Identifying priority outcomes that influence selection of disease-modifying therapies in MS. Neurology Clinical Practice 2018;8(3):179-85. [DOI: 10.1212/CPJ.0000000000000449] [DOI] [PMC free article] [PubMed] [Google Scholar]
Debray 2017
- Debray TP, Damen JA, Snell KI, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ 2017;356:i6460. [DOI: 10.1136/bmj.i6460] [DOI] [PubMed] [Google Scholar]
Debray 2019
- Debray TP, Damen JA, Riley RR, Snell K, Reitsma JB, Hooft L, et al. A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes. Statistical Methods in Medical Research 2019;28(9):2768-86. [DOI: 10.1177/0962280218785504] [DOI] [PMC free article] [PubMed] [Google Scholar]
Derfuss 2012
- Derfuss T. Personalized medicine in multiple sclerosis: hope or reality? BMC Medicine 2012;10:116. [DOI: 10.1186/1741-7015-10-116] [DOI] [PMC free article] [PubMed] [Google Scholar]
Dhiman 2021
- Dhiman P, Ma J, Navarro CA, Speich B, Bullock G, Damen JAA, et al. Reporting of prognostic clinical prediction models based on machine learning methods in oncology needs to be improved. Journal of Clinical Epidemiology 2021;138:60-72. [DOI: 10.1016/j.jclinepi.2021.06.024] [DOI] [PMC free article] [PubMed] [Google Scholar]
Diamond 1989
- Diamond GA. Future imperfect: the limitations of clinical prediction models and the limits of clinical prediction. Journal of the American College of Cardiology 1989;14(3):A12-22. [DOI: 10.1016/0735-1097(89)90157-5] [DOI] [PubMed] [Google Scholar]
Diaz 2019
- Diaz C, Zarco LA, Rivera DM. Highly active multiple sclerosis: an update. Multiple Sclerosis and Related Disorders 2019;30:215-24. [DOI: 10.1016/j.msard.2019.01.039] [DOI] [PubMed] [Google Scholar]
Ferrazzano 2020
- Ferrazzano G, Crisafulli SG, Baione V, Tartaglia M, Cortese A, Frontoni M, et al. Early diagnosis of secondary progressive multiple sclerosis: focus on fluid and neurophysiological biomarkers. Journal of Neurology 2021;268(10):3626-45. [DOI: 10.1007/s00415-020-09964-4] [DOI] [PubMed] [Google Scholar]
Foroutan 2020
- Foroutan F, Guyatt G, Zuk V, Vandvik PO, Alba AC, Mustafa R, et al. GRADE Guidelines 28: Use of GRADE for the assessment of evidence about prognostic factors: rating certainty in identification of groups of patients with different absolute risks. Journal of Clinical Epidemiology 2020;121:62-70. [DOI: 10.1016/j.jclinepi.2019.12.023] [DOI] [PubMed] [Google Scholar]
Freedman 2016
- Freedman MS, Rush CA. Severe, highly active, or aggressive multiple sclerosis. Continuum 2016;22(3):761-84. [DOI: 10.1212/CON.0000000000000331] [DOI] [PubMed] [Google Scholar]
Gafson 2017
- Gafson A, Craner MJ, Matthews PM. Personalised medicine for multiple sclerosis care. Multiple Sclerosis Journal 2017;23(3):362-9. [DOI: 10.1177/1352458516672017] [DOI] [PubMed] [Google Scholar]
Gauthier 2007
- Gauthier SA, Mandel M, Guttmann CRG, Glanz BI, Khoury SJ, Betensky RA. Predicting short-term disability in multiple sclerosis. Neurology 2007;68(24):2059-65. [DOI: 10.1212/01.wnl.0000264890.97479.b1.] [DOI] [PubMed] [Google Scholar]
Ge 2006
- Ge Y. Multiple sclerosis: the role of MR imaging. American Journal of Neuroradiology 2006;27(6):1165-76. [PMID: ] [PMC free article] [PubMed] [Google Scholar]
Geersing 2012
- Geersing G-J, Bouwmeester W, Zuithoff P, Spijker R, Leeflang M, Moons K. Search filters for finding prognostic and diagnostic prediction studies in MEDLINE to enhance systematic reviews. PLOS One 2012;7(2):e32844. [DOI: 10.1371/journal.pone.0032844] [DOI] [PMC free article] [PubMed] [Google Scholar]
Hanley 1982
- Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143(1):29-36. [DOI: ] [DOI] [PubMed] [Google Scholar]
Harrell 1996
- Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 1996;15(4):361-87. [DOI: ] [DOI] [PubMed] [Google Scholar]
Harrell 2001
- Harrell F. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York (NY): Springer-Verlag, 2001. [Google Scholar]
Hastie 2009
- Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. 2nd edition. New York (NY): Springer, 2009. [Google Scholar]
Havas 2020
- Havas J, Leray E, Rollot F, Casey R, Michel L, Lejeune F, et al. Predictive medicine in multiple sclerosis: a systematic review. Multiple Sclerosis and Related Disorders 2020;40:101928. [DOI: 10.1016/j.msard.2020.101928] [DOI] [PubMed] [Google Scholar]
Hemmer 2021
- Hemmer B, et al. Diagnosis and therapy of multiple sclerosis, neuromyelitis optica spectrum diseases and MOG-IgG-associated diseases, S2k Guideline [Diagnose und Therapie der Multiplen Sklerose, Neuromyelitis-optica-Spektrum-Erkrankungen und MOG-IgG-assoziierten Erkrankungen, S2k-Leitlinie]. Deutsche Gesellschaft für Neurologie (Hrsg.), Leitlinien für Diagnostik und Therapie in der Neurologie. (www.dgn.org/leitlinien) 2021 (accessed 17 June 2021).
Hempel 2017
- Hempel S, Graham GD, Fu N, Estrada E, Chen AY, Miake-Lye I, et al. A systematic review of modifiable risk factors in the progression of multiple sclerosis. Multiple Sclerosis Journal 2017;23(4):525-33. [DOI: 10.1177/1352458517690270] [DOI] [PubMed] [Google Scholar]
Hernández 2004
- Hernández AV, Steyerberg EW, Habbema JDF. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. Journal of Clinical Epidemiology 2004;57(5):454-60. [DOI: 10.1016/j.jclinepi.2003.09.014] [DOI] [PubMed] [Google Scholar]
Hohlfeld 2016a
- Hohlfeld R, Dornmair K, Meinl E, Wekerle H. The search for the target antigens of multiple sclerosis, part 1: autoreactive CD4+ T lymphocytes as pathogenic effectors and therapeutic targets. Lancet Neurology 2016;15(2):198-209. [DOI: 10.1016/S1474-4422(15)00334-8] [DOI] [PubMed] [Google Scholar]
Hohlfeld 2016b
- Hohlfeld R, Dornmair K, Meinl E, Wekerle H. The search for the target antigens of multiple sclerosis, part 2: CD8+ T cells, B cells, and antibodies in the focus of reverse-translational research. Lancet Neurology 2016;15(3):317-31. [DOI: 10.1016/S1474-4422(15)00313-0] [DOI] [PubMed] [Google Scholar]
Iorio 2015
- Iorio A, Spencer FA, Falavigna M, Alba C, Lang E, Burnand B, et al. Use of GRADE for assessment of evidence about prognosis: rating confidence in estimates of event rates in broad categories of patients. BMJ 2015;350:h870. [DOI: 10.1136/bmj.h870] [DOI] [PubMed] [Google Scholar]
Jarman 2010
- Jarman B, Pieter D, Veen AA, Kool RB, Aylin P, Bottle A, et al. The hospital standardised mortality ratio: a powerful tool for Dutch hospitals to assess their quality of care? BMJ Quality & Safety 2010;19(1):9-13. [DOI: 10.1136/qshc.2009.032953] [DOI] [PMC free article] [PubMed] [Google Scholar]
Justice 1999
- Justice AC. Assessing the generalizability of prognostic information. Annals of Internal Medicine 1999;130(6):515-24. [DOI: 10.7326/0003-4819-130-6-199903160-00016] [DOI] [PubMed] [Google Scholar]
Kalincik 2017
- Kalincik T, Manouchehrinia A, Sobisek L, Jokubaitis V, Spelman T, Horakova D, et al. Towards personalized therapy for multiple sclerosis: prediction of individual treatment response. Brain 2017;140(9):2426-43. [DOI: 10.1093/brain/awx185] [DOI] [PubMed] [Google Scholar]
Kalincik 2018
- Kalincik T. Reply: towards personalized therapy for multiple sclerosis: limitations of observational data. Brain 2018;141(5):e39. [DOI] [PubMed] [Google Scholar]
Kaufman 2011
- Kaufman S, Rosset S, Perlich C. Leakage in data mining: formulation, detection, and avoidance. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 6. 2011:556–63. [DOI: 10.1145/2020408.2020496] [DOI]
Korevaar 2020
- Korevaar DA, Salameh J-P, Vali Y, Cohen JF, McInnes MDF, Spijker R, et al. Searching practices and inclusion of unpublished studies in systematic reviews of diagnostic accuracy. Research Synthesis Methods 2020;11(3):343-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kreuzberger 2020
- Kreuzberger N, Damen JAAG, Trivella M, Estcourt LJ, Aldin A, Umlauff L, et al. Prognostic models for newly-diagnosed chronic lymphocytic leukaemia in adults: a systematic review and meta-analysis. Cochrane Database of Systematic Reviews 2020, Issue 7. Art. No: CD012022. [DOI: 10.1002/14651858.CD012022.pub2] [DOI] [PMC free article] [PubMed] [Google Scholar]
Kurtzke 1977
- Kurtzke JF, Beebe GW, Nagler B, Kurland LT, Auth TL. Studies on the natural history of multiple sclerosis--8. Early prognostic features of the later course of the illness. Journal of Chronic Diseases 1977;30(12):819-30. [DOI: 10.1016/0021-9681(77)90010-8] [DOI] [PubMed] [Google Scholar]
Lorscheider 2016
- Lorscheider J, Buzzard K, Jokubaitis V, Spelman T, Havrdova E, Horakova D, et al. Defining secondary progressive multiple sclerosis. Brain 2016;139(Pt 9):2395-405. [DOI: 10.1093/brain/aww173] [DOI] [PubMed] [Google Scholar]
Lublin 1996
- Lublin FD, Reingold SC. Defining the clinical course of multiple sclerosis: results of an international survey. National Multiple Sclerosis Society (USA) Advisory Committee on Clinical Trials of New Agents in Multiple Sclerosis. Neurology 1996;46(4):907-11. [DOI] [PubMed] [Google Scholar]
Lublin 2014
- Lublin FD, Reingold SC, Cohen JA, Cutter GR, Sørensen PS, Thompson AJ, et al. Defining the clinical course of multiple sclerosis. Neurology 2014;83(3):278-86. [DOI: 10.1212/WNL.0000000000000560] [DOI] [PMC free article] [PubMed] [Google Scholar]
Mateen 2020
- Mateen BA, Liley J, Denniston AK, Holmes CC, Vollmer SJ. Improving the quality of machine learning in health applications and clinical research. Nature Machine Intelligence 2020;2:554-6. [DOI: 10.1038/s42256-020-00239-1] [DOI] [Google Scholar]
McDonald 2001
- McDonald WI, Compston A, Edan G, Goodkin D, Hartung HP, Lublin FD, et al. Recommended diagnostic criteria for multiple sclerosis: guidelines from the International Panel on the diagnosis of multiple sclerosis. Annals of Neurology 2001;50(1):121-7. [DOI: 10.1002/ana.1032] [DOI] [PubMed] [Google Scholar]
Meyer‐Moock 2014
- Meyer-Moock S, Feng Y-S, Maeurer M, Dippel F-W, Kohlmann T. Systematic literature review and validity evaluation of the Expanded Disability Status Scale (EDSS) and the Multiple Sclerosis Functional Composite (MSFC) in patients with multiple sclerosis. BMC Neurology 2014;14:58. [DOI: 10.1186/1471-2377-14-58] [DOI] [PMC free article] [PubMed] [Google Scholar]
Miller 2008
- Miller A, Avidan N, Tzunz-Henig N, Glass-Marmor L, Lejbkowicz I, Pinter RY, et al. Translation towards personalized medicine in multiple sclerosis. Journal of the Neurological Sciences 2008;274(1):68-75. [DOI: 10.1016/j.jns.2008.07.028] [DOI] [PubMed] [Google Scholar]
Moher 2009
- Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLOS Medicine 2009;6(7):e1000097. [DOI: 10.1371/journal.pmed.1000097] [DOI] [PMC free article] [PubMed] [Google Scholar]
Montalban 2018
- Montalban X, Gold R, Thompson AJ, Otero-Romero S, Amato MP, Chandraratna D, et al. ECTRIMS/EAN Guideline on the pharmacological treatment of people with multiple sclerosis. Multiple Sclerosis Journal 2018;24(2):25. [DOI: 10.1177/1352458517751049] [DOI] [PubMed] [Google Scholar]
Montavon 2012
- Montavon G, Orr G, Müller KR. Neural networks: tricks of the trade. Springer, 2012. [Google Scholar]
Moons 2014
- Moons KG, Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLOS Medicine 2014;11(10):e1001744. [DOI: 10.1371/journal.pmed.1001744] [DOI] [PMC free article] [PubMed] [Google Scholar]
Moons 2019
- Moons KG, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Annals of Internal Medicine 2019;170(1):W1-33. [DOI: 10.7326/M18-1377] [DOI] [PubMed] [Google Scholar]
Newcombe 2006
- Newcombe RG. Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 2: asymptotic methods and evaluation. Statistics in Medicine 2006;25(4):559-73. [DOI: 10.1002/sim.2324] [DOI] [PubMed] [Google Scholar]
Niculescu‐Mizil 2005
- Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Proceedings of the 22nd International Conference on Machine Learning. 2005:625-32. [DOI: 10.1145/1102351.1102430] [DOI]
Ontaneda 2019
- Ontaneda D, Tallantyre E, Kalincik T, Planchon SM, Evangelou N. Early highly effective versus escalation treatment approaches in relapsing multiple sclerosis. Lancet Neurology 2019;18(10):973-80. [DOI: 10.1016/S1474-4422(19)30151-6] [DOI] [PubMed] [Google Scholar]
Optic Neuritis Study Group 1991
- Optic Neuritis Study Group. The clinical profile of optic neuritis. Experience of the optic neuritis treatment trial. Archives of Ophthalmology 1991;109(12):1673-8. [DOI: 10.1001/archopht.1991.01080120057025] [DOI] [PubMed] [Google Scholar]
Ouzzani 2016
- Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan - a web and mobile app for systematic reviews. Systematic Reviews 2016;5(1):210. [DOI: ] [DOI] [PMC free article] [PubMed] [Google Scholar]
Page 2021
- Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. PLoS Medicine 2021;18(3):e1003583. [DOI: 10.1371/journal.pmed.1003583] [DOI] [PMC free article] [PubMed] [Google Scholar]
Patsopoulos 2019
- Patsopoulos NA, Baranzini SE, Santaniello A, Shoostari P, Cotsapas C, Wong G et al. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science 2019;365(6460):eaav7188. [DOI: 10.1126/science.aav7188] [DOI] [PMC free article] [PubMed] [Google Scholar]
Peat 2014
- Peat G, Riley RD, Croft P, Morley KI, Kyzas PA, Moons KGM, Group for the PROGRESS. Improving the transparency of prognosis research: the role of reporting, data sharing, registration, and protocols. PLOS Medicine 2014;11(7):e1001671. [DOI: 10.1371/journal.pmed.1001671] [DOI] [PMC free article] [PubMed] [Google Scholar]
Platt 1999
- Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers 1999;10(3):61-74. [Google Scholar]
Polman 2005
- Polman CH, Reingold SC, Edan G, Filippi M, Hartung H-P, Kappos L, et al. Diagnostic criteria for multiple sclerosis: 2005 revisions to the "McDonald Criteria". Annals of Neurology 2005;58(6):840-6. [DOI: 10.1002/ana.20703] [DOI] [PubMed] [Google Scholar]
Polman 2011
- Polman CH, Reingold SC, Banwell B, Clanet M, Cohen JA, Filippi M, et al. Diagnostic criteria for multiple sclerosis: 2010 revisions to the McDonald criteria. Annals of Neurology 2011;69(2):292-302. [DOI: 10.1002/ana.20703] [DOI] [PMC free article] [PubMed] [Google Scholar]
Poser 1983
- Poser CM, Paty DW, Scheinberg L, McDonald WI, Davis FA, Ebers GC, et al. New diagnostic criteria for multiple sclerosis: guidelines for research protocols. Annals of Neurology 1983;13(3):227-31. [DOI: 10.1002/ana.410130302] [DOI] [PubMed] [Google Scholar]
Rae‐Grant 2018
- Rae-Grant A, Day GS, Marrie RA, Rabinstein A, Cree BA, Gronseth GS, et al. Comprehensive systematic review summary: disease-modifying therapies for adults with multiple sclerosis. Neurology 2018;90(17):789-800. [DOI: 10.1212/WNL.0000000000005345] [DOI] [PubMed] [Google Scholar]
Reich 2018
- Reich DS, Lucchinetti CF, Calabresi PA. Multiple sclerosis. New England Journal of Medicine 2018;378(2):169-80. [DOI: 10.1056/NEJMra1401483] [DOI] [PMC free article] [PubMed] [Google Scholar]
Riley 2019
- Riley RD, Snell KIE, Ensor J, Burke DL, Harrell Jr FE, Moons KGM, et al. Minimum sample size for developing a multivariable prediction model: part II - binary and time-to-event outcomes. Statistics in Medicine 2019;38(7):1276-96. [DOI: 10.1002/sim.7992] [DOI] [PMC free article] [PubMed] [Google Scholar]
Riley 2020
- Riley RD, Ensor J, Snell KIE, Harrell FE Jr, Martin GP, Reitsma JB, et al. Calculating the sample size required for developing a clinical prediction model. BMJ 2020;368:m441. [DOI: 10.1136/bmj.m441] [DOI] [PubMed] [Google Scholar]
Roozenbeek 2009
- Roozenbeek B, Maas AIR, Lingsma HF, Butcher I, Lu J, Marmarou A, et al. Baseline characteristics and statistical power in randomized controlled trials: selection, prognostic targeting, or covariate adjustment? Critical Care Medicine 2009;37(10):2683-90. [DOI: 10.1097/ccm.0b013e3181ab85ec] [DOI] [PubMed] [Google Scholar]
Rotstein 2019
- Rotstein D, Montalban X. Reaching an evidence-based prognosis for personalized treatment of multiple sclerosis. Nature Reviews Neurology 2019;15(5):287-300. [DOI: 10.1038/s41582-019-0170-8] [DOI] [PubMed] [Google Scholar]
Runmarker 1994
- Runmarker B, Andersson C, Odén A, Andersen O. Prediction of outcome in multiple sclerosis based on multivariate models. Journal of Neurology 1994;241(10):597-604. [DOI: 10.1007/BF00920623] [DOI] [PubMed] [Google Scholar]
Río 2009
- Río J, Comabella M, Montalban X. Predicting responders to therapies for multiple sclerosis. Nature Reviews Neurology 2009;5(10):553-60. [DOI: 10.1038/nrneurol.2009.139] [DOI] [PubMed] [Google Scholar]
Río 2016
- Río J, Ruiz-Peña JL. Short-term suboptimal response criteria for predicting long-term non-response to first-line disease modifying therapies in multiple sclerosis: a systematic review and meta-analysis. Journal of the Neurological Sciences 2016;361:158-67. [DOI: 10.1016/j.jns.2015.12.043] [DOI] [PubMed] [Google Scholar]
Sawcer 2011
- Sawcer S. The major cause of multiple sclerosis is environmental: genetics has a minor role--no. Multiple Sclerosis 2011;17(10):1174-5. [DOI: 10.1177/1352458511421106] [DOI] [PubMed] [Google Scholar]
Seccia 2021
- Seccia R, Romano S, Salvetti M, Crisanti A, Palagi L, Grassi F. Machine learning use for prognostic purposes in multiple sclerosis. Life 2021;11(2):122. [DOI: 10.3390/life11020122] [DOI] [PMC free article] [PubMed] [Google Scholar]
Sekula 2016
- Sekula P, Pressler JB, Sauerbrei W, Goebell PJ, Schmitz-Dräger BJ. Assessment of the extent of unpublished studies in prognostic factor research: a systematic review of p53 immunohistochemistry in bladder cancer as an example. BMJ Open 2016;6(8):e009972. [DOI] [PMC free article] [PubMed] [Google Scholar]
Simera 2008
- Simera I, Altman DG, Moher D, Schulz KF, Hoey J. Guidelines for reporting health research: the EQUATOR Network's survey of guideline authors. PLOS Medicine 2008;5(6):e139. [DOI: 10.1371/journal.pmed.0050139] [DOI] [PMC free article] [PubMed]
Snell 2020
- Snell KIE, Allotey J, Smuk M, Hooper R, Chan C, Ahmed A, et al. External validation of prognostic models predicting pre-eclampsia: individual participant data meta-analysis. BMC Medicine 2020;18(1):302. [DOI: 10.1186/s12916-020-01766-9] [DOI] [PMC free article] [PubMed] [Google Scholar]
Sormani 2013
- Sormani MP, Rio J, Tintorè M, Signori A, Li D, Cornelisse P, et al. Scoring treatment response in patients with relapsing multiple sclerosis. Multiple Sclerosis Journal 2013;19(5):605-12. [DOI: 10.1177/1352458512460605] [DOI] [PubMed] [Google Scholar]
Sormani 2016
- Sormani MP, Gasperini C, Romeo M, Rio J, Calabrese M, Cocco E et al. Assessing response to interferon-β in a multicenter dataset of patients with MS. Neurology 2016;87(2):134-40. [DOI: 10.1212/WNL.0000000000002830] [DOI] [PubMed] [Google Scholar]
Sormani 2017
- Sormani MP. Prognostic factors versus markers of response to treatment versus surrogate endpoints: three different concepts. Multiple Sclerosis Journal 2017;23(3):378-81. [DOI] [PubMed] [Google Scholar]
Steyerberg 2013
- Steyerberg EW, Moons KG, Windt DA, Hayden JA, Perel P, Schroter S, et al. Prognosis research strategy (PROGRESS) 3: prognostic model research. PLOS Medicine 2013;10(2):e1001381. [DOI: 10.1371/journal.pmed.1001381] [DOI] [PMC free article] [PubMed] [Google Scholar]
Steyerberg 2018
- Steyerberg EW, Claggett B. Towards personalized therapy for multiple sclerosis: limitations of observational data. Brain 2018;141(5):e38. [DOI] [PubMed] [Google Scholar]
Steyerberg 2019
- Steyerberg E. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2nd edition. New York (NY): Springer-Verlag, 2019. [Google Scholar]
Thompson 2000
- Thompson AJ, Montalban X, Barkhof F, Brochet B, Filippi M, Miller DH, et al. Diagnostic criteria for primary progressive multiple sclerosis: a position paper. Annals of Neurology 2000;47(6):831-35. [DOI: ] [DOI] [PubMed] [Google Scholar]
Thompson 2018a
- Thompson AJ, Baranzini SE, Geurts J, Hemmer B, Ciccarelli O. Multiple sclerosis. Lancet 2018;391(10130):1622-36. [DOI: 10.1016/S0140-6736(18)30481-1] [DOI] [PubMed] [Google Scholar]
Thompson 2018b
- Thompson AJ, Banwell BL, Barkhof F, Carroll WM, Coetzee T, Comi G, et al. Diagnosis of multiple sclerosis: 2017 revisions of the McDonald criteria. Lancet Neurology 2018;17(2):162-73. [DOI: 10.1016/S1474-4422(17)30470-2] [DOI] [PubMed] [Google Scholar]
van der Ploeg 2014
- Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology 2014;14:137. [DOI: 10.1186/1471-2288-14-137] [DOI] [PMC free article] [PubMed] [Google Scholar]
van Munster 2017
- Munster CEP, Uitdehaag BMJ. Outcome measures in clinical trials for multiple sclerosis. CNS Drugs 2017;31(3):217-36. [DOI: 10.1007/s40263-017-0412-5] [DOI] [PMC free article] [PubMed] [Google Scholar]
van Smeden 2018
- Smeden M. Should a risk prediction model be developed?; 3 August 2018. https://twitter.com/maartenvsmeden/status/1025315100796899328 (accessed 26 November 2021).
von Elm 2007
- Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Annals of Internal Medicine 2007;147(8):573-7. [DOI: 10.7326/0003-4819-147-8-200710160-00010] [DOI] [PubMed] [Google Scholar]
Völler 2017
- Völler S, Flint RB, Stolk LM, Degraeuwe PLJ, Simons SHP, Pokorna P, et al. Model-based clinical dose optimization for phenobarbital in neonates: an illustration of the importance of data sharing and external validation. European Journal of Pharmaceutical Sciences 2017;109:S90-7. [DOI: 10.1016/j.ejps.2017.05.026] [DOI] [PubMed] [Google Scholar]
Walton 2020
- Walton C, Rachel K, Lindsay R, Wendy K, Emmanuelle L, Ruth AM, et al. Rising prevalence of multiple sclerosis worldwide: Insights from the atlas of MS. Multiple Sclerosis 2020;26(14):1816-21. [DOI: 10.1177/1352458520970841] [DOI] [PMC free article] [PubMed] [Google Scholar]
Warnke 2019
- Warnke C, Havla J, Kitzrow M, Biesalski A-S, Knauss S. Entzündliche Erkrankungen. In: Sturm D, Biesalski A-S, Höffken O, editors(s). Neurologische Pathophysiologie: Ursachen und Mechanismen neurologischer Erkrankungen. Berlin, Heidelberg: Springer, 2019:51-98. [DOI: 10.1007/978-3-662-56784-5_2] [DOI] [Google Scholar]
Weinshenker 1989a
- Weinshenker BG, Bass B, Rice GP, Noseworthy J, Carriere W, Baskerville J, et al. The natural history of multiple sclerosis: a geographically based study. I. Clinical course and disability. Brain 1989;112(1):133-46. [DOI: 10.1093/brain/112.1.133] [DOI] [PubMed] [Google Scholar]
Wiendl 2021
- Wiendl H, Gold R, Berger T, Derfuss T, Linker R, Mäurer M, et al. Multiple Sclerosis Therapy Consensus Group (MSTCG): position statement on disease-modifying therapies for multiple sclerosis (white paper). Therapeutic Advances in Neurological Disorders 2021;14:17562864211039648. [DOI: 10.1177/17562864211039648] [DOI] [PMC free article] [PubMed] [Google Scholar]
Wingerchuk 2016
- Wingerchuk DM, Weinshenker BG. Disease modifying therapies for relapsing multiple sclerosis. BMJ 2016;354:i3518. [DOI: 10.1136/bmj.i3518] [DOI] [PubMed] [Google Scholar]
Wolff 2019
- Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Annals of Internal Medicine 2019;170(1):51-8. [DOI: 10.7326/M18-1376] [DOI] [PubMed] [Google Scholar]
Wynants 2017
- Wynants L, Collins GS, Van Calster B. Key steps and common pitfalls in developing and validating risk models. BJOG: An International Journal of Obstetrics and Gynaecology 2017;124(3):423-32. [DOI: 10.1111/1471-0528.14170] [DOI] [PubMed] [Google Scholar]
Wynants 2020
- Wynants L, Van Calster B, CollinsG S, Riley R D, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 2020;369:m1328. [DOI: 10.1136/bmj.m1328] [DOI] [PMC free article] [PubMed] [Google Scholar]
Zadrozny 2001
- Zadrozny B, Elkan C. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proceedings of the Eighteenth International Conference on Machine Learning. 2001:609-16.
References to other published versions of this review
On Seker 2020
- On Seker BI, Reeve K, Havla J, Burns J, Gosteli MA, Lutterotti A, et al. Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis. Cochrane Database of Systematic Reviews 2020, Issue 5. Art. No: CD013606. [DOI: 10.1002/14651858.CD013606] [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The dataset summarised in this review is available as tables in the Appendices and in Characteristics of included studies. The R code used for the statistical description is available upon request from the authors.










