Skip to main content
Annals of Burns and Fire Disasters logoLink to Annals of Burns and Fire Disasters
. 2018 Jun 30;31(2):89–93.

A comparison of injury scoring systems in predicting burn mortality

B Halgas 1,, C Bay 2, K Foster 3
PMCID: PMC6199008  PMID: 30374258

Summary

The models most widely used to predict burn patient mortality are the revised Baux score, Ryan, Smith, McGwin, Abbreviated Burn Severity Index (ABSI), Belgian Outcome of Burn Injury (BOBI), and the Fatality by Longevity, APACHE II score, Measured Extent of burn, and Sex (FLAMES). Improvements in critical care have reduced mortality resulting from severe burns, which may affect the predictive strength of older models. We conducted a cross-validation study on all burn patients (n = 114) with TBSA greater than 20%, admitted to the Arizona Burn Center between 2014 and 2016. The study compared the accuracy of seven previously validated burn-specific models and one new model derived for our cohort. Data were collected on age, ethnicity, gender, total body surface area burned (TBSA), inhalational injury, associated trauma, and injury severity (ISS, APACHE II). The accuracy of each model was tested using logistic regression, preserving the published regression coefficients. Predictive performance of the models was assessed by Receiving Operator Curve (ROC) curve analyses and Hosmer-Lemeshow (H-L) goodness of fit tests. Age, TBSA and APACHE II score were found to be significant, independent risk factors for patient mortality. The FLAMES model performed best (AUC 0.96) and was comparable to our native model (AUC 0.96). The revised Baux score was both accurate and easy to calculate, making it clinically useful. The older models demonstrated adequate predictive performance compared with the newer models. Even without key burn parameters, the APACHE II score performed well in critically ill patients with moderate to severe burn injuries.

Keywords: mortality, prediction, model, injury severity

Introduction

Mortality continues to be the single most important outcome measure for both burn injury research and in clinical practice. According to the American Burn Repository (ABR), all-comers mortality regardless of burn size is 3.1%.1 It is well established that burn size is strongly associated with in-hospital mortality.2-9 The same ABR data show that burns covering at least 20% total body surface area (TBSA) are associated with an overall mortality of 8.6%. A model that accurately predicts burn mortality can be useful in determining clinical course, discussing treatment options with patients and families, and evaluating new or innovative interventions.

The first attempt to quantify the relationship between burn size, age and mortality was developed as a thesis in 1961 by Professor Serge Baux.2 The patient’s age added to TBSA equaled their probability of death.3 The model was born from a time when a 25 year-old patient with 50% TBSA burns would more than likely die in the hospital. By 1981, the clinical significance of inhalational trauma was well accepted.10-11 The Abbreviated Burn Severity Index (ABSI) produced a relatively easy scoring system to identify and triage high-risk patients. This model used age, TBSA, inhalational injury, gender and the presence of full thickness burns to generate a score and associated probability of survival.4 The models proposed by Smith5 and Ryan6 utilized age, TBSA and inhalational injury. McGwin et al.7 expanded these models to account for the presence of pneumonia and trauma at the time of injury. More recently, the Belgian Outcome of Burn Injury (BOBI) was the product of 6 national burn centers in Belgium from 1999-2004 using data from 5246 patients.8 The FLAMES study published in 2008 by Gomez et al. proposed a hybrid scoring model utilizing both burn specific risk factors and initial APACHE II scores.9,12 The original Baux score was revised and updated in 2010 to include inhalational injury.3

With few exceptions, well-designed, head-to-head comparison of these models is lacking in the burn literature. Decades of advancements in critical care to include the adoption of early excision and grafting, goal-directed fluid resuscitation and topical burn treatments have reduced mortality from severe burns. In theory, these improvements in care should diminish the predictive strength of older models. We conducted a cross-validation study on all burn patients with TBSA greater than 20% admitted to the Arizona Burn Center between 2014 and 2016. The study compared the accuracy of seven previously validated burn-specific models and one exploratory, institutional model derived from our cohort.

Methods

We performed a retrospective chart review of all patients admitted to the Arizona Burn Center from 2014 to 2016 with burn injuries greater than or equal to 20% TBSA. Patients were excluded if they were admitted more than 24 hours after time of injury, discharged to hospice, placed on comfort care only, or refused surgery or use of blood products. All patients were treated with standard protocols for fluid resuscitation, nutrition, excision and grafting, infection control and treatment, and physical or occupational therapy. Data on patient demographics and injury severity were collected. The following clinical data were abstracted from the medical record: TBSA, percent full thickness burns, percent partial thickness burns, inhalational injury, need for mechanical ventilation within 24 hours, pneumonia, co-existent trauma and APACHE II score. Inhalational injury was defined by bronchoscopy findings, injury from closed space fire, facial descriptions (soot, singed hair, hoarseness) or need for mechanical ventilation. The primary outcome of this study was in-hospital mortality.

Data were exported to a Microsoft Excel spreadsheet, and then imported into SPSS for analysis (IBM Corp., Armonk NY). Descriptive statistics, including means, standard deviations, counts and percentages were calculated for demographic and clinical characteristics. Patients who died in-hospital were compared to patients who survived using chi-square or Fisher’s Exact Tests and t-tests or Mann-Whitney tests, as appropriate. A p-value of >0.05 was considered statistically significant.

Each published burn scoring equation was used to calculate our patients’ probability of death (Table I). Predicted mortality was compared to observed mortality in each model. This resulted in a separate probability of death for each scoring system. Receiver operating characteristic (ROC) curves were computed to determine the overall accuracy of each system in predicting mortality, as evidenced by the area under the curve (AUC). Validation of the individual models was determined by calibration and discrimination of events (deaths) from non-events. Calibration was evaluated using the Hosmer-Lemeshow goodness of fit test and discrimination was evaluated using the ROC curves. An exploratory, forward stepwise logistic regression, based on maximum partial likelihood estimates, was used to develop a predictive model specifically for patients in our database. Candidate variables for this “native” model included all variables used in the seven final, published equations.

Table I. Summary of the mortality prediction models (TBSA = total body surface area burned).

Table I

Results

Data were collected on 122 patients. Of those, 114 met inclusion criteria. The average (± SD) age was 38.7 ± 22.4 years and patients who died were more likely to be older (54.4 ± 19.8 vs. 32.9 ± 20.6, p=.001). Most patients (82%) were male and there was no difference in mortality by sex. The average burn size was 39.2% ± 20.1 TBSA and the overall mortality in our cohort was 27.2%. As expected, patients who died presented with more severe injuries, as evidenced by higher injury severity score (ISS), greater total body surface area burned (57.1 ± 24.3 vs. 32.5 ± 13.1, p>0.001), percent full thickness burn (41.0 ± 33.0 vs. 10.8 ± 15.4, p>0.001), and higher APACHE II score (23.6 ± 8.3 vs. 10.3 ± 6.8, p>0.001) (Table II). The equation that optimized prediction of mortality for our Arizona Burn Center model was: AzBC = (- 11.90, intercept) + (0.106 x age) + (0.143 x APACHE II score) - (0.050 x % partial thickness). No interactions were significant.

Table II. Patient demographics and injury severity, survivors versus non-survivors (TBSA = total body surface area burned, PTB = partial thickness burned, FTB = full thickness burned).

Table II

The predictive performance of each model is shown in Table III. Intercepts for the ABSI and BOBI were not published, so they were estimated to optimize fit to our data. The intercept for the APACHE II score was derived from our data. The FLAMES score performed best with an area under the ROC (AUC) of 0.96±0.02 and an H-L goodness-of-fit χ2 of 3.5. By order of decreasing performance, AUC (standard error) were: revised Baux (0.93±0.02), Smith (0.92±0.03), McGwin (0.93±0.02), ABSI (0.90±0.03), BOBI (0.87±0.04), and Ryan (0.83±0.04). The APACHE II score alone demonstrated adequate discrimination and calibration with an AUC of 0.89±0.03 and H-L goodness-of-fit χ2 of 5.4. A comparison of the individual receiver operator characteristic curves is shown in Fig. 1.

Table III. Comparison of sensitivity, specificity, accuracy, Hosmer-Lemeshow goodness of fit (H-L), and area under the Receiver Operator Characteristic curve (AUC).

Table III

Fig. 1. Receiver operator curves (ROC) for the seven burn-specific models with APACHE II and our native model.

Fig. 1

Discussion

We applied seven of the most frequently cited scoring systems to an independent cohort and evaluated their accuracy in predicting in-hospital mortality. Our principal findings were that the FLAMES and revised Baux score demonstrated superior accuracy. The APACHE II score, normally reserved for general intensive care, out-performed several of the burn-specific models.

A weakness of this study is that the data were collected retrospectively from a small sample at a single institution. This may, however, also be advantageous since management within a single institution is largely controlled and not confounded by protocol-based burn management.13 Our sample size is too small to perform stand-alone external validation of each individual model. We specifically chose models based on rigorous methodology, prior external validation and frequency of appearance within the burn literature. It is not required that our sample matches the derivation population since models that can be broadly applied to various case-mixes are more clinically useful. It is generally agreed that a minimum of 100 outcome events is required to externally validate a prognostic model.14-16 As such, our study should not be considered a validation of the individual models, but a modern cross-comparison of their performance at a prototypical burn center. The strength of this study is the selection of both older and newer scoring systems for comparison. The few studies that have performed similar comparisons were limited to three or four models.

The exploratory AzBC model, not surprisingly, best predicted mortality in our sample because it was derived from the Burn Center’s database. It “overfit” the data in the sense that it was tailored to fit random noise in a specific sample rather than reflecting data from the overall population of burn patients. The FLAMES score performed better in our cohort than in the original external validation of the model (AUC 0.97 vs. 0.93) despite the validation cohort having a mean TBSA of only 15.6% and overall mortality of 9.9%.9 The ability of the model to adapt to our higher-severity sample is likely due to the physiological parameters captured by the APACHE II score. The excellent predictive performance of the revised Baux score is comparable to results elsewhere.17-20 The score developed by Ryan et al. was highly specific but neither sensitive nor accurate in predicting mortality. Lower sensitivity means higher false negative rate, in this case underestimating the number of predicted deaths. Other studies have produced similar findings.17,21-22 A reason for this could be that the overall mortality of the original study cohort was only 4% with an emphasis on low-risk burn injuries.23 Our sample is more characteristic of a regional burn center receiving high-risk injuries from multiple states. Despite being in use for over 30 years, the ABSI model still accurately predicts mortality in modern day cohorts. Recent attempts to further optimize the model were unsuccessful.24 Our AUC of 0.90 compares well to prior studies (AUC 0.89 in Pantet et al.,17 0.89 in Woods et al.,19 0.86 in Brusselaers et al.21). The McGwin model7 analyzed 54,000 patients from the National Trauma Data Bank and the National Burn Repository and is one of the few multi-centric models. The model includes other risk factors such as concomitant trauma and the presence of pneumonia on admission. It was validated in a sample of 14,442 patients and performed adequately with an AUC of 0.87 with an H-L statistic of 10.13 Our study is the first independent evaluation of the model and showed excellent discrimination and calibration (AUC= 0.93, H-L= 5.7). However, its clinical relevance is questionable since pneumonia is rarely diagnosed at the time of admission. The Belgian Outcome in Burn Injury (BOBI) score was designed to improve the parameters originally presented by Ryan et al. Our study demonstrated an AUC of only 0.87, but is comparable to Woods et al.19 (AUC 0.87) and Brusselaers et al.21 (AUC 0.86). It was previously validated in a large sample (n=2326) of Hungarian burn patients with an AUC of 0.94. However, the mean TBSA of the cohort was 10.7% with an overall mortality of 1.4%.25

These numbers are more in line with the original derivation cohort (11% TBSA and 4.3% mortality) that also produced an AUC of 0.94. One possibility is that the model suffers from the same bias as Ryan since only 270 of the 5247 patients (5%) were BOBI score 5 or above (out of a total score of 10).23 The model may not be as accurately reproduced in a higher severity cohort. Lastly, the Smith model’s performance in our cohort compares well with other studies.9

The APACHE II score alone demonstrated better discrimination than both Ryan and BOBI with an AUC of 0.89. The FLAMES study reported an AUC of 0.91 using the APACHE score alone.9 The score had previously been shown to be an independent predictor of mortality in a burn population26 since it relies on routine physiologic and laboratory values that are often abnormal in large surface area burns.

Depending on the complexity of the model, scoring systems are useful in either clinical or research applications. The easier-to-calculate models are more useful for bedside risk assessment and initial triage. A common disclaimer is that these scoring systems are in no way intended to replace sound clinical judgment or even to guide withdrawal of care. The logistic regression models are useful to evaluate burn center performance over time and monitor progress and improvement before and after the adoption of innovative and new therapies. The more sophisticated electronic medical records (EMRs) can often populate these models with very little additional input required. Even though FLAMES performed well in our cohort, it is not very practical since it differentiates between partial and full thickness burns. This determination can be difficult for even experienced burn surgeons.27 At best, it is an imprecise science. It is also possible that the APACHE II score would not perform as well in low-severity burn injuries. After suffering small, less severe burns, patients are often hemodynamically stable with laboratory values within normal limits. Values like PaO2 or A-a gradient would not be routinely obtained in these patients. Whether the APACHE II score would still perform well in lower severity burn patients is unknown.

There is not a single best model for predicting mortality in burn patients. The fact that existing models are differently weighted versions of nearly the same variables speaks more to the heterogeneity of burn patients. The real test is when a model can be applied to various case mixes without compromising accuracy.

Conclusion

It would be impossible to prove the superiority of a single prognostic scoring system; however, our data showed that the most frequently cited models have good-to-excellent predictive performance. Age of the model does not appear to be an influencing factor despite advancements and improved survival from severe burns. Models that were derived from relatively low severity cohorts did not perform as well in our sample. The FLAMES and revised Baux score demonstrated superior performance and can be broadly applied across research and clinical settings. The revised Baux score is particularly reproducible and easy to calculate with the use of a nomogram. The FLAMES model relies on less easily obtained information and may be better suited for retrospective research. The APACHE II score performed well in our study, illustrating the weight of aberrant physiology in burn management.

Acknowledgments

Funding.This research did not receive any specific grant from funding agencies in the public, commercialor not-for-profit sectors

References

  • 1.American Burn Repository. American Burn Association, 2006-2015 [Google Scholar]
  • 2.Baux S. Contribution a l’etude du traitement local des brulures thermigues etendues. Paris, These. 1961 [Google Scholar]
  • 3.Osler T, Glance LG, Hosmer DW. Simplified estimates of the probability of death after burn injuries: extending and updating the Baux score. J Trauma Acute Care Surg. 2010;68(3):690–697. doi: 10.1097/TA.0b013e3181c453b3. [DOI] [PubMed] [Google Scholar]
  • 4.Tobiasen J, Hiebert JM, Edlich RF. The abbreviated burn severity index. Ann Emerg Med. 1982;11(5):260–262. doi: 10.1016/s0196-0644(82)80096-6. [DOI] [PubMed] [Google Scholar]
  • 5.Smith DL, Cairns BA, Ramadan F, Dalston JS. Effect of inhalation injury, burn size, and age on mortality: a study of 1447 consecutive burn patients. J Trauma Acute Care Surg. 1994;37(4):655–659. doi: 10.1097/00005373-199410000-00021. [DOI] [PubMed] [Google Scholar]
  • 6.Ryan CM, Schoenfeld DA, Thorpe WP, Sheridan RL. Objective estimates of the probability of death from burn injuries. N Engl J Med. 19998;338(6):362–366. doi: 10.1056/NEJM199802053380604. [DOI] [PubMed] [Google Scholar]
  • 7.McGwin G, George RL, Cross JM, Rue LW. Improving the ability to predict mortality among burn patients. Burns. 2008;34(3):320–327. doi: 10.1016/j.burns.2007.06.003. [DOI] [PubMed] [Google Scholar]
  • 8.Blot S. Development and validation of a model for prediction of mortality in patients with acute burn injury: The Belgian Outcome in Burn Injury Study Group. Br J Surg. 2008;96(1):111–117. doi: 10.1002/bjs.6329. [DOI] [PubMed] [Google Scholar]
  • 9.Gomez M, Wong DT, Stewart TE, Redelmeier DA, Fish JS. The FLAMES score accurately predicts mortality risk in burn patients. J Trauma Acute Care Surg. 2008;65(3):636–645. doi: 10.1097/TA.0b013e3181840c6d. [DOI] [PubMed] [Google Scholar]
  • 10.Clark CJ, Reid WH, Gilmour WH, Campbell D. Mortality probability in victims of fire trauma: revised equation to include inhalation injury. Br Med J. 1986;292(6531):1303–1305. doi: 10.1136/bmj.292.6531.1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Smith DL, Cairns BA, Ramadan F, Dalston JS. Effect of inhalation injury, burn size, and age on mortality: a study of 1447 consecutive burn patients. J Trauma Acute Care Surg. 1994;37(4):655–659. doi: 10.1097/00005373-199410000-00021. [DOI] [PubMed] [Google Scholar]
  • 12.Knaus WA, Draper EA, Wagner DP, Zimmerman JE. JE: APACHE II: a severity of disease classification system. Crit Care Med. 1985;13(10):818–829. [PubMed] [Google Scholar]
  • 13.Hussain A, Choukairi F, Dunn K. Predicting survival in thermal injury: a systematic review of methodology of composite prediction models. Burns. 2013;39(5):835–850. doi: 10.1016/j.burns.2012.12.010. [DOI] [PubMed] [Google Scholar]
  • 14.Vergouwe Y, Steyerberg EW, Eijkemans MJ, Habbema JDF. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol. 2005;58(5):475–483. doi: 10.1016/j.jclinepi.2004.06.017. [DOI] [PubMed] [Google Scholar]
  • 15.Collins GS, Ogundimu EO, Altman DG. Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med. 2016;35(2):214–226. doi: 10.1002/sim.6787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Riley RD, Ensor J, Snell KI, Debray TP. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ. 2016;353:i3140. doi: 10.1136/bmj.i3140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pantet O, Faouzi M, Brusselaers N, Vernay A, Berger MM. Comparison of mortality prediction models and validation of SAPS II in critically ill burns patients. Ann Burns Fire Disasters. 2016;29(2):123. [PMC free article] [PubMed] [Google Scholar]
  • 18.Wibbenmeyer LA, Amelon MJ, Morgan LJ, Robinson BK. Predicting survival in an elderly burn patient population. Burns. 2001;27(6):583–590. doi: 10.1016/s0305-4179(01)00009-2. [DOI] [PubMed] [Google Scholar]
  • 19.Woods JF, Quinlan CS, Shelley OP. Predicting mortality in severe burns - what is the score? Evaluation and comparison of 4 mortality prediction scores in an Irish population. Plast Reconstr Surg Glob Open. 2016;4(1):e606. doi: 10.1097/GOX.0000000000000584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Dokter J, Meijs J, Oen IM, van Baar ME. External validation of the revised Baux score for the prediction of mortality in patients with acute burn injury. J Trauma Acute Care Surg. 2014;76(3):840–845. doi: 10.1097/TA.0000000000000124. [DOI] [PubMed] [Google Scholar]
  • 21.Brusselaers N, Agbenorku P, Hoyte-Williams PE. Assessment of mortality prediction models in a Ghanaian burn population. Burns. 2013;39(5):997–1003. doi: 10.1016/j.burns.2012.10.023. [DOI] [PubMed] [Google Scholar]
  • 22.Douglas HE, Ratcliffe A, Sandhu R, Anwar U. Comparison of mortality prediction models in burns ICU patients in Pinderfields Hospital over 3 years. Burns. 2015;41(1):49–52. doi: 10.1016/j.burns.2014.05.009. [DOI] [PubMed] [Google Scholar]
  • 23.Sheppard NN, Hemington-Gorse S, Shelley OP, Philp B, Dziewulski P. Prognostic scoring systems in burns: a review. Burns. 2011;37(8):1288–1295. doi: 10.1016/j.burns.2011.07.017. [DOI] [PubMed] [Google Scholar]
  • 24.Forster NA, Zingg M, Haile SR, Künzi W. 30 years later - does the ABSI need revision? Burns. 2011;37(6):958–963. doi: 10.1016/j.burns.2011.03.009. [DOI] [PubMed] [Google Scholar]
  • 25.Brusselaers N, Juhász I, Erdei I, Monstrey S, Blot S. Evaluation of mortality following severe burns injury in Hungary: external validation of a prediction model developed on Belgian burn data. Burns. 2009;35(7):1009–1014. doi: 10.1016/j.burns.2008.12.017. [DOI] [PubMed] [Google Scholar]
  • 26.Martynoga R, Fried M. APACHE II score may predict mortality in burns patients. Critical Care. 2009;13(1):504. [Google Scholar]
  • 27.Monstrey S, Hoeksema H, Verbelen J, Pirayesh A, Blondeel P. Assessment of burn depth and burn wound healing potential. Burns. 2008;34(6):761–769. doi: 10.1016/j.burns.2008.01.009. [DOI] [PubMed] [Google Scholar]

Articles from Annals of Burns and Fire Disasters are provided here courtesy of Euro-Mediterranean Council for Burns and Fire Disasters (MBC)

RESOURCES