From the Authors:
We appreciate the interest of Adler and colleagues in our recent manuscript describing a prognostic model for early respiratory insufficiency in amyotrophic lateral sclerosis (ALS).[1] After applying our prediction model in a cohort of 50 ALS patients at their center in Geneva, they obtained a c-statistic, sensitivity, and specificity all virtually identical to our external validation in the Pooled Resource Open-Access Clinical Trials (PRO-ACT) ALS database. In addition, the Geneva cohort calibration curve resembles Figure E5 from PRO-ACT in our online supplement.[1]
In this setting, perhaps the primary conclusion is that Adler et al.’s single center results are very similar to those from our large multicenter validation cohort, despite differences in the study samples. First, 24% of patients in the Geneva group were using non-invasive ventilation (NIV) at baseline and were excluded, compared to 1% in the Penn cohort (and 4% in the PRO-ACT). The European Federation of Neurological Societies (EFNS) guidelines propose a NIV initiation threshold at FVC<80%,[2] which is much higher than that of the American Academy of Neurology guidelines (FVC<50%), which may explain these differences.[3] Only 20% of the 50 patients included in the Geneva cohort developed respiratory insufficiency or died within 6 months of observation, compared to 39% and 35% in Penn and PRO-ACT, differences likely due the exclusion of the sicker patients already using NIV from the study sample (selection bias).
Due to the differences in the underlying risk of respiratory failure in the Geneva sample, we calculated a lower positive predictive value (PPV) (36%, 95% confidence interval [CI], 13 – 65) and higher negative predictive value (NPV) (86%, 95% CI, 71 – 95) than PRO-ACT (62% and 76%, respectively). Of course, the small sample size of the Geneva cohort caused extremely wide 95% confidence intervals for the discrimination estimates and likely for the calibration curve (although not shown). Despite these differences and wide confidence intervals, the findings by Dr. Adler et al. closely resemble our external validation findings, testifying to the robustness and generalizability of our model.
The properties of discrimination and calibration of prediction rules support different uses. Both are key measurements for assessing the validity of prediction models. Calibration refers to the agreement between predicted and observed outcomes in a population. Discrimination refers to the model’s ability to distinguish patients with versus without an outcome. A model that predicts all individuals to have a risk equal to the actual incidence of an outcome would be a model with excellent calibration but poor discrimination. Highlighting that an average predicted risk of 24% is higher than the actual incidence of 20% does not fully characterize the model’s discrimination or calibration abilities. However, the Geneva cohort had a similar c-statistic, sensitivity, and specificity to PRO-ACT (thus similar discrimination) and a similar calibration curve which provided reasonably accurate estimates, realizing that identical and consistent calibration at all levels of risk of the outcome may not be a realistic goal.[4, 5]
We agree with the need for a useful, discriminating, and calibrated prediction model for respiratory events in ALS to expedite timeliness of care, shape patient expectations, and enrich clinical trial design. We also agree that further research is necessary before widespread clinical use of the prediction rule. For example, next steps may include applying the prediction rule to identify high-risk patients for inclusion in randomized clinical trials; stratification of randomized patients by the predicted risk of respiratory failure; or assessing how randomizing patients/clinicians to receiving prediction results affects quality of life and respiratory outcomes. We agree with Adler and colleagues that more works needs to be done in early identification and treatment of ALS patients at high risk of respiratory failure.
References
- 1.Ackrivo J, Hansen-Flaschen J, Wileyto EP, Schwab RJ, Elman L, Kawut SM. Development of a prognostic model of respiratory insufficiency or death in amyotrophic lateral sclerosis. Eur. Respir. J. 2019; 53: [Epubaheadofprint]. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Andersen PM, Abrahams S, Borasio GD, de Carvalho M, Chiò A, Van Damme P, Hardiman O, Kollewe K, Morrison KE, Petri S, Pradat P-F, Silani V, Tomik B, Wasner M, Weber M. EFNS guidelines on the clinical management of amyotrophic lateral sclerosis (MALS)--revised report of an EFNS task force. Eur. J. Neurol. 2012; 19: 360–375. [DOI] [PubMed] [Google Scholar]
- 3.Miller RG, Jackson CE, Kasarskis EJ, England JD, Forshew D, Johnston W, Kalra S, Katz JS, Mitsumoto H, Rosenfeld J, Shoesmith C, Strong MJ, Woolley SC, Quality Standards Subcommittee of the American Academy of Neurology. Practice parameter update: the care of the patient with amyotrophic lateral sclerosis: drug, nutritional, and respiratory therapies (an evidence-based review): report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology 2009; 73: 1218–1226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg EW. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol 2016; 74: 167–176. [DOI] [PubMed] [Google Scholar]
- 5.Alba AC, Agoritsas T, Walsh M, Hanna S, Iorio A, Devereaux PJ, McGinn T, Guyatt G. Discrimination and Calibration of Clinical Prediction Models: Users’ Guides to the Medical Literature. JAMA 2017; 318: 1377–1384. [DOI] [PubMed] [Google Scholar]