Comment on: “A Machine Learning Approach to Concussion Risk Estimation Among Players Exhibiting Visible Signs in Professional Hockey”

Maximilian Klemp; Robert Rein

doi:10.1007/s40279-025-02211-8

letter

. 2025 Apr 7;55(8):2039–2040. doi: 10.1007/s40279-025-02211-8

Comment on: “A Machine Learning Approach to Concussion Risk Estimation Among Players Exhibiting Visible Signs in Professional Hockey”

Maximilian Klemp ^1,^✉, Robert Rein ¹

PMCID: PMC12460376 PMID: 40192934

To the Editor,

We are writing in response to the article titled “A Machine Learning Approach to Concussion Risk Estimation Among Players Exhibiting Visible Signs in Professional Hockey” by Bruce et al., recently published in Sports Medicine [1]. We commend the authors for their valuable contribution to the field of concussion risk assessment using machine learning. The authors developed three predictive models—a conditional inference tree, a random forest, and a logistic regression model—using data from 1563 unique events, 183 of which were later diagnosed as concussive. The models reported by the authors show strong discrimination ability, as reflected in AUROC values of 80.2, 81.8 and 82.2 for the conditional tree, random forest, and logistic regression models, respectively. The authors convincingly show that including personal concussion history improves model discrimination. However, we would like to highlight several concerns regarding the performance of these models, especially in the context of clinical decision-making.

The terms risk model and likelihood of concussion used by Bruce et al. [1] suggest that a probabilistic prediction of the target variable is intended. This aligns with a clinical approach, where estimating the probability of concussion is paramount. While the model choice and training support probabilistic predictions, the evaluation process seemingly overlooks this aspect. In clinical risk prediction models, performance is typically evaluated through two key characteristics: discrimination and calibration [2]. Discrimination refers to the ability of the model to distinguish between positive and negative cases, i.e. whether the model discriminates between cases with greater incidence rates compared to cases with lower incidence rates. As is often the case, discrimination is assessed by Bruce et al. using the area under the receiver operator characteristics curve (AUROC or AUC). One important assumption of this procedure is that sensitivity and specificity are of equal importance. However, whether this is the case with respect to concussions is probably debatable. Calibration in turn, assesses the agreement between the estimated risk and the observed relative occurrence of the event [3]. A well-calibrated model assigns low probabilities to groups with few positive cases and high probabilities to groups with many. Despite the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) guidelines recommending the reporting of calibration in prediction modeling studies [4], calibration is far less commonly assessed than discrimination, as found in systematic reviews [5–7]. This oversight is problematic, as poor calibration can lead to misleading predictions [8].

One of the key issues regarding model performance stems from the often-overlooked fact that models can exhibit poor calibration despite good discrimination. For instance, even if a model correctly classifies patients, its risk estimates may be too extreme (too high for positive cases and too low for negative ones) or too conservative (closer to the overall base rate). Such errors make risk estimates unreliable, which can lead to incorrect, potentially harmful decisions [8]. In Bruce et al.’ case, poor calibration of their risk models could either expose players to unnecessary risk or undermine confidence in the model. If predicted probabilities are too conservative, actual cases of concussion could be missed due to underestimation. Conversely, if the predictions are too extreme, concussion risk might be overestimated, leading to excessive, unnecessary off-field examinations. This could ultimately erode trust in the model from coaches and medical staff. Therefore, ensuring proper calibration is crucial when risk models are intended for clinical decision-making.

The problem is exacerbated when using highly flexible machine learning models, which are at an even higher risk of miscalibration [9, 10]. Studies have reported that certain algorithms tend to produce probabilistic predictions disproportionately skewed towards 0 and 1 [11]. This bias is particularly concerning when the binary target variable shows a considerable class imbalance, as in the current study, where concussions occur in only 12% of events (183 concussions out of 1563 events). The issue of miscalibration in machine learning models has even prompted the development of distinguished calibration techniques to adjust model outputs [12–14]. Given this, calibration should not be optional when machine learning models provide probabilistic predictions; it is a necessary step in performance evaluation, especially in clinical risk models used for decision-making.

Several approaches exist for assessing calibration [3], with the most common being the evaluation of the so-called calibration curve [15]. Calibration curves can be assessed both graphically and statistically through the intercept and slope [16]. Therefore, it is good practice to present the calibration curve when documenting a risk model, as it allows for a more comprehensive evaluation of the model’s reliability beyond its discrimination performance. We strongly encourage Bruce et al. to include the calibration curve for their concussion risk model, as this would enhance its utility for practitioners.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Declarations

Conflict of interest

The authors declare no conflict of interest.

Author Contributions

MK and RR contributed equally to the conception of the letter. MK drafted the letter. RR provided critical reviews and revised the letter. Both authors read and approved the final version.

Funding

The authors received no funding related to this letter.

Data availability

No data has been used for this letter.

References

1.Bruce JM, Riegler KE, Meeuwisse W, Comper P, Hutchison MG, Delaney JS, et al. A machine learning approach to concussion risk estimation among players exhibiting visible signs in professional hockey. Sports Med. 2024. 10.1007/s40279-024-02112-2. [DOI] [PubMed] [Google Scholar]
2.Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019. 10.1186/s12916-019-1466-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg EW. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol. 2016;74:167–76. 10.1016/j.jclinepi.2015.12.005. [DOI] [PubMed] [Google Scholar]
4.Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162:W1-73. 10.7326/m14-0698. [DOI] [PubMed] [Google Scholar]
5.Wessler BS, Paulus J, Lundquist CM, Ajlan M, Natto Z, Janes WA, Tufts PACE, Clinical Predictive Model Registry: Update, et al. Through 2015. Diagnostic and Prognostic Research. 1990;2017:1. 10.1186/s41512-017-0021-2. [DOI] [PMC free article] [PubMed]
6.Bouwmeester W, Zuithoff NPA, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med. 2012;9: e1001221. 10.1371/journal.pmed.1001221. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Collins GS, de Groot JA, Dutton S, Omar O, Shanyinde M, Tajar A, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol. 2014. 10.1186/1471-2288-14-40. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Van Calster B, Vickers AJ. Calibration of risk prediction models: impact on decision-analytic performance. Med Decis Mak. 2014;35:162–9. 10.1177/0272989x14547233. [DOI] [PubMed] [Google Scholar]
9.Van Hoorde K, Van Huffel S, Timmerman D, Bourne T, Van Calster B. A spline-based tool to assess and visualize the calibration of multiclass risk predictions. J Biomed Inform. 2015;54:283–93. 10.1016/j.jbi.2014.12.016. [DOI] [PubMed] [Google Scholar]
10.van der Ploeg T, Nieboer D, Steyerberg EW. Modern modeling techniques had limited external validity in predicting mortality from traumatic brain injury. J Clin Epidemiol. 2016;78:83–9. 10.1016/j.jclinepi.2016.03.002. [DOI] [PubMed] [Google Scholar]
11.Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Proceedings of the 22nd international conference on machine learning—ICML ’05. 2005. 10.1145/1102351.1102430.
12.Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif. 1999;10:61–74. [Google Scholar]
13.Boström H. Calibrating random forests. In: 2008 seventh international conference on machine learning and applications. 2008. 10.1109/icmla.2008.107.
14.Lin H-T, Lin C-J, Weng RC. A note on Platt’s probabilistic outputs for support vector machines. Mach Learn. 2007;68:267–76. 10.1007/s10994-007-5018-6. [Google Scholar]
15.Austin PC, Putter H, Giardiello D, van Klaveren D. Graphical calibration curves and the integrated calibration index (ICI) for competing risk models. Diagn Progn Res. 2022. 10.1186/s41512-021-00114-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Austin PC, Steyerberg EW. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med. 2013;33:517–35. 10.1002/sim.5941. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No data has been used for this letter.

[CR1] 1.Bruce JM, Riegler KE, Meeuwisse W, Comper P, Hutchison MG, Delaney JS, et al. A machine learning approach to concussion risk estimation among players exhibiting visible signs in professional hockey. Sports Med. 2024. 10.1007/s40279-024-02112-2. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019. 10.1186/s12916-019-1466-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg EW. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol. 2016;74:167–76. 10.1016/j.jclinepi.2015.12.005. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162:W1-73. 10.7326/m14-0698. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Wessler BS, Paulus J, Lundquist CM, Ajlan M, Natto Z, Janes WA, Tufts PACE, Clinical Predictive Model Registry: Update, et al. Through 2015. Diagnostic and Prognostic Research. 1990;2017:1. 10.1186/s41512-017-0021-2. [DOI] [PMC free article] [PubMed]

[CR6] 6.Bouwmeester W, Zuithoff NPA, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med. 2012;9: e1001221. 10.1371/journal.pmed.1001221. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Collins GS, de Groot JA, Dutton S, Omar O, Shanyinde M, Tajar A, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol. 2014. 10.1186/1471-2288-14-40. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Van Calster B, Vickers AJ. Calibration of risk prediction models: impact on decision-analytic performance. Med Decis Mak. 2014;35:162–9. 10.1177/0272989x14547233. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Van Hoorde K, Van Huffel S, Timmerman D, Bourne T, Van Calster B. A spline-based tool to assess and visualize the calibration of multiclass risk predictions. J Biomed Inform. 2015;54:283–93. 10.1016/j.jbi.2014.12.016. [DOI] [PubMed] [Google Scholar]

[CR10] 10.van der Ploeg T, Nieboer D, Steyerberg EW. Modern modeling techniques had limited external validity in predicting mortality from traumatic brain injury. J Clin Epidemiol. 2016;78:83–9. 10.1016/j.jclinepi.2016.03.002. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Proceedings of the 22nd international conference on machine learning—ICML ’05. 2005. 10.1145/1102351.1102430.

[CR12] 12.Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif. 1999;10:61–74. [Google Scholar]

[CR13] 13.Boström H. Calibrating random forests. In: 2008 seventh international conference on machine learning and applications. 2008. 10.1109/icmla.2008.107.

[CR14] 14.Lin H-T, Lin C-J, Weng RC. A note on Platt’s probabilistic outputs for support vector machines. Mach Learn. 2007;68:267–76. 10.1007/s10994-007-5018-6. [Google Scholar]

[CR15] 15.Austin PC, Putter H, Giardiello D, van Klaveren D. Graphical calibration curves and the integrated calibration index (ICI) for competing risk models. Diagn Progn Res. 2022. 10.1186/s41512-021-00114-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Austin PC, Steyerberg EW. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med. 2013;33:517–35. 10.1002/sim.5941. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Comment on: “A Machine Learning Approach to Concussion Risk Estimation Among Players Exhibiting Visible Signs in Professional Hockey”

Maximilian Klemp

Robert Rein

Funding

Declarations

Conflict of interest

Author Contributions

Funding

Data availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Comment on: “A Machine Learning Approach to Concussion Risk Estimation Among Players Exhibiting Visible Signs in Professional Hockey”

Maximilian Klemp

Robert Rein

Funding

Declarations

Conflict of interest

Author Contributions

Funding

Data availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases