Summary
The aim of the study was to investigate the possibility of using supervised statistical models to assess burn injury patterns, outcomes and their interrelationship. Using burn study data, a preliminary principal component analysis was carried out and two separate clusters were observed. Observations were split into two classes and analysed by partial least squares (PLS) regression discriminant analysis to assess possible predictors of each class. To assess predictors of total body surface area burned (TBSA), the orthogonal projections to latent structures (OPLS) model was used after PLS regression. The identified classes were later designated as high-risk burn victims and low-risk burn victims. Female gender fell into the high-risk class. Many possible predictors were found to be associated with burn injury extent, after modelling the natural logarithm of TBSA by OPLS. The fitted model explained 76% of variation in Y. It excluded up to 9% of orthogonal variation captured in two orthogonal components. This seems to be the first application of the OPLS model in public health epidemiology. The results of this study were promising regarding the use of supervised models in injury pattern analysis.
Keywords: supervised models, injury epidemiology, burns, injury patterns, OPLS
Abstract
Les Auteurs de cette étude se sont proposés de considérer la possibilité d'utiliser des modèles supervisés statistiques pour évaluer les diverses typologies des lésions, les résultats et leurs interrelations. Ils ont employé les données d'une étude précédente sur les brûlures pour effectuer une analyse préliminaire des composantes principales et deux groupes distincts ont été observés. Les observations ont été divisées en deux classes et analysées par le système des moindres carrés partiels (MCP) afin d'avoir une évaluation des prédicteurs possibles de chaque catégorie. Pour évaluer les prédicteurs de la surface corporelle totale brûlée, les projections orthogonales pour modéliser des structures latentes ont été utilisées, après la régression MCP. Tout cela paraît être la première application du modèle OPLS dans l'épidémiologie de la santé publique. Les résultats de cette étude sont prometteurs en ce qui concerne l'utilisation de modèles sous surveillance dans l'analyse du profil des blessures.
Introduction
Burns are a major public health issue leading to considerable morbidity and mortality, especially in low- and middle-income countries.1 A vital part of designing any injury management plan is to make an epidemiological map of the factors related to injury occurrence or injury outcome. These factors can be related to patterns of injury occurrence or they can be indicators of safety status in the fields of environmental safety, product safety, and safetyrelated behaviours of people regarding injuries. Some of these factors may also be associated with outcomes and situations after the injury has happened. Discriminant investigation of target groups based on these factors, for future passive or active intervention, may be helpful when designing pattern-specific interventions or targeting group priority setting. Another important issue in injury epidemiology is to explore predictors of burn severity that can in turn help in prioritizing prevention strategies. In injury epidemiology, due to the large numbers of possible variables to be studied, appropriate statistical methodologies need to be sought and tested. The application of methods such as principal component analysis (PCA), based on forming latent variables, has been a common tradition in the study of large numbers of variables. However, supervised modelling techniques, such as partial least squares regression (PLS). and orthogonal projections to latent structures (OPLS). may be suitable alternatives or complementary options to the classical modelling techniques be cause they can manage large numbers of variables for smaller sample size and at the same time be less prone to threats from multicollinearity and missing values.2-4 One question asked in this study was whether these models can be used to classify burn victims based on injury patterns and outcomes. The second question was whether supervised models could be used to predict total body surface area (TBSA) burned as an outcome measure by other variables of interest.
Methods
Data
The data for this paper were collected through a study conducted in 2007-2008 at the Fatemi Burn Center, which is the provincial referral burn centre in the Ardabil province, North-west Iran.5 In this study 224 out of 237 burn victims hospitalized in the Arbabil Provincial Burn Center, for whom there was complete information on injury patterns, outcomes and safety measures, were enrolled. Data were collected using a simple questionnaire designed in a previous study and modified to be used in a hospital-based setting.5-7 The main data collected through the questionnaire were: 1. injury event characteristics and patterns of injury occurrence collected through multiple choice questions; 2. injury outcome; 3. some self-reported home safety measurements; 4. injury event characteristics and patterns of injury occurrence collected through an open-ended question, later coded through quantitative content analysis. A unit coding scheme and single concept categorization were used.8 The study was approved by the of the Ardabil University of Medical Sciences Research Committee as part of a joint research between the University and Karolinska Institutet, Sweden.
Modelling process
The first analysis method used was principal component analysis. Three components were captured with cumulative R² = 0.14 and Q² = 0.06. After assessing score plots of first vs. second components (Fig. 1) = 1 1, as well as first vs. third components, two separate clusters were observed. Observations were split into two classes to investigate possible predictors belonging to each class and to study associated variables (class one is called class A and class two is called class B in this paper).
Partial least squares regression discriminant analysis (PLS-DA) was used to discriminate the two classes and assess possible predictors belonging to each class. All variables were scaled to unit variance and centred before being introduced into the model. Only one component was captured by the PLS-DA model. Errors were plotted showing no substantial deviation from normal distribution. Scores were plotted to assess discrimination of observations and to detect possible outliers, showing lack of any strong outlier (Fig. 2). An observed versus predicted values plot and a misclassification table were produced to assess the prediction capabilities. Class prediction by the model was 99.1% correct. Hotelling's T² as a combination of all the scores in the component was plotted across all observations to measure observation distance from the centre of the PLS model hyper-plane. CV-ANOVA verified model significance with a high statistical significance for the F-test. A validation plot was produced to assess the risk that the current PLS-DA model was spurious. The purpose of this validation was to compare the model the goodness of fit (R² and Q²) of the original model with the goodness of fit of several models based on data where the order of the Y-observations was randomly permuted, while the X-matrix was kept intact. This cross-validation run on 20 permutations was promising both for R² and Q² in the applied PLS-DA model. Modelling was carried out using SIMCA P12 statistical software package (UMETRICS, Umea, Sweden).
Ethical issues
The study was conducted in accordance with the ethical standards of the responsible Committee of Ethics in the Ardabil University of Medical Sciences.
Results
Discriminant analysis results
-
- Model characteristics
As can be seen in the coefficients plot in Fig. 3, some variables significantly discriminated burned patients into two classes. Variables with coefficient confidence intervals not crossing the zero line were statistically significant variables in predicting subject class. More than a quarter of the modelled variables and dummies had statistical significance in class prediction. Variable importance for the project (VIP) is plotted in Fig. 4. The sum of squares of all VIPs is equal to the number of terms in the model and the average VIP would be equal to one. VIP values larger than one indicate "important" variables, and values lower than 0.5 indicate "unimportant" variables. The interval between one and 0.5 is a grey zone, where the importance level depends on the size of the data set.
-
- Class A (Higher risk class)
This class consisted of 106 burned patients, two of whom were misclassified by the model. Belonging to Class A was significantly predicted by some variables. They are ordered by their statistical significance as indicated in Table I. Three variables were borderline in statistical significance, namely: Body involved in burn injury, Flame burn injury, and Number of people living in household (Household size).
-
- Class B (Lower risk class)
This class consisted of 118 burned patients, none of whom were misclassified by the model. Belonging to Class B was significantly predicted by some variables. They are ordered by their statistical significance as indicated in Table I. In this class there were also two variables that were borderline in statistical significance, namely: Body (trunk. not involved in burn injury, and Academic education.
Based on information provided by the model we can classify Class A as a higher risk group and Class B as a lower risk group. As can be seen in the models, this classification is based both on burn outcome variables and on pre-injury patterns and issues. Higher age, although not statistically significant, tended descriptively to be included in Class A.
Predictors of TBSA
Different predictors were found to be associated after modelling the natural logarithm of TBSA as a continuous outcome in the OPLS model. The model had 147 variables, 85 of which came from content analysis of the description of injury and 37 from the home safety assessment questions. After expansion of categorical variables, a total of 290 variables was included in the model. The fitted model explained 76% of variation in Y. It excluded up to 9% of orthogonal variation captured in two orthogonal components. Variables statistically significant in predicting burned TBSA and their source of measurements in data collection are given in Table II. These variables are ranked in decreasing order in the table, based on the magnitude of their coefficients. Age and gender were not found to be associated with TBSA in this model.
Discussion
Statistical methodology
The use of supervised modelling techniques has started only recently in injury research.9-12 These models have advantages such as being able to manage large numbers of correlated variables, and offering higher study power, and moderate ability in managing missing values.2,4
We successfully applied PLS to discriminate burn victims using almost 150 variables measuring their injury, patterns of injury occurrence, home safety status, and demographic characteristics. PLS was first presented by Wold in 1975 for modelling complicated datasets in terms of chains of matrices and it was later modified by other researchers.13 In common with principal component analysis (PCA), PLS also looks into the internal relationships in the matrix of variables and cases, combining the characteristics of single variables into new definitions of factors or components; but in contrast to PCA, a main objective in PLS is to predict outcome-related variables from possible predictors. This is done by linking the X and Y matrices. This characteristic of the PLS method, also present in OPLS, makes both methods more effective than PCA, and is the reason why they are referred to as supervised methods. In other words, in supervised methods variables are projected into new coordinating systems similar to PCA, but their aim is to maximize the covariance between outcome and predictor variables instead of strategizing to explain as much variance inside the matrix as possible. If we consider one matrix of possible predictor variables (model X) and one matrix of outcome variables (model Y), PLS tries to model X and Y, and at the same time to predict Y from X. We used partial least squares discriminant analysis (PLS-DA) to separate primarily observed clusters in the PCA model. PLS-DA is a PLS regression where Y is a set of binary variables describing the categories of a categorical (in this case a dichotomous one consisting of class A and class B) variable by a latent variable or variables drawn from original possible predictor variables.14 Interestingly, the observed statistical associations regarding discrimination of the two classes in the present study apparently possesses theoretical plausibility. Variable values such as death and higher TBSA were clustered in the same class. The relation between TBSA and risk of death is well documented in the literature and regarded as a scientific fact in burn research.
The PLS discriminant analysis model was so parsimonious regarding the number of components that we did not consider using the OPLS modelling technique at this stage. However, with regard to the second research question in this study, our final model of choice was OPLS. We successfully applied the OPLS model to investigate variables predicting TBSA. As we had expected, regarding well-known outcome factors associated with TBSA, the model was found to produce plausible results.15 Similarly to PLS, an advantage of using this model was that we were able to model large numbers of variables, some even naturally highly correlated with each other. Thus, an opportunity was provided to study up to 85 variables deriving from content analysis of injury descriptions, 37 about home safety variables, and other conventional variables. This helped us to explore possible predictors not discussed in previous research, or discussed descriptively or without modelling. Both PLS and OPLS are regarded as supervised models, while OPLS compared to the regular PLS regression provides a simpler method with the additional advantage that the orthogonal variation can be analysed separately.4 With an increased number of components and orthogonal variations, OPLS will provide more models - OPLS in fact, in addition to the regular PLS regression, provides a simpler method with the additional advantage that the orthogonal variation can be analysed separately without modelling. Both PLS and OPLS are regarded as supervised models, but OPLS compared to the regular PLS regression provides a simpler method with the additional advantage that the orthogonal variation can be analysed separately.4
With an increased number of components and orthogonal variation, OPLS will provide more interpretable and fewer biased results than PLS.2,4 The amount of orthogonal variation in burn injuries may not be as high as in due chemometrics and OMICS research, but nevertheless even a moderately low improvement of interpretability and bias may be sufficient to prioritize OPLS over PLS.
Burn results
Our study showed that burn victims hospitalized in the Ardabil provincial burn center could be classified into two separate groups. We called the first group the high-risk group. Using this terminology was due not only to the poorer outcome measures in this group but also to the coincidence of environmental, behavioural, and appliancerelated risks in this group. Previous research has found some of these factors to be risk factors for burn injury occurrence in case control studies or has described them as possible risks.16-12 Our research showed clustering of these factors with each other and, at the same time, clustering of these factors with the poor burn outcome measures. The female gender was classified in the high-risk group. it is hard to draw the conclusion, based only on the results of this model, that the female gender is associated with poorer outcomes. it must also be taken into account that many cooking-related appliances are mainly used by women, and a higher chance of falling into a common class can accordingly be expected for the female gender.
However, no matter how highly the female gender can be associated with poorer outcome, in the area of preinjury factors it is obvious that the female gender belonged to a high-risk class that must be considered a target group for a burn prevention plan. In the OPLS model, assessing predictors of TBSA, despite descriptively having higher TBSA, female gender was not found to be a statistically significant predictor of TBSA. Whatever the explanation of this finding, however, we believe that further clarification of this question needs more research focused on gender effect in both research design and analysis. Previous studies have discussed, sometimes controversially, an association between gender and burn severity outcomes but few have investigated this while controlling for injury patterns and many other variables.18,19,22,23 Considering the advantages, as said above, of supervised models, in multivariate analysis using OPLS, the authors were also able to use variables created in content analysis of an open-ended question. We found that getting burned while filling a kerosene burning appliance with fuel can be an important predictor of TBSA, constituting a pattern discussed in certain case series.17,24,25Prior to using the OPLS model we had found it to be important but we had only carried out a bivariate analysis with arbitrary dichotomization.5 Most other patterns had also not been studied through modelling before. In the present study using the OPLS model, age was not explored as a significant TBSA predictor, a finding that must be interpreted with due caution considering the highly probable non-linear association between age and TBSA. Owing to methodological limitations, it was not possible for us to carry out extended subgroup analysis for further clarification of the role of gender and age, and it is therefore highly to be recommended that studies taking advantage of supervised modelling techniques should be conducted on a wider scale. Home safety assessment variables, however, may be useful in classification analysis yielding information for possible prevention programmes even if, in some cases, they can be confusing when modelled for the prediction of burn outcome. In this study most of these home safety variables were not found to be TBSA predictors. One somewhat confusing finding in this regard concerned the association between "knowing police number" and TBSA. Some explanations can be discussed, e.g. that it acted as an indicator of alertness, but we are concerned here with the plausibility of the finding and prefer to consider it a result of a type one random error. Health problems such as epilepsy, dizziness, and skeletal problems were found to act as predictors of TBSA. A theoretical explanation for this is that the event mitigation ability of these people may be lower than that of healthy people. Problems or disorders affecting bodily functions or control of body motions are known as risk factors for burn injuries, but their role as burn severity predictors is not well defined in the literature, possibly due to power shortage and the limitations of statistical methods.26-30
The results of this study are promising but owing to the shortage of studies conducted in the application of supervised models in injury epidemiology, they need to be interpreted cautiously. The statistical models used in this study have yielded mostly plausible results. There is, however, a necessity to investigate the applicability of the models in injury epidemiology studies in different settings before firmer conclusions can be drawn about the usefulness of these methods in injury epidemiology or in other public health studies. Also, in spite of the many advantages of these models compared to classical regression models, these models cannot fully replace classical multivariate methods. Their lower simplicity, the lack of experience in their application, and possibly their limited role in explaining independent causal relationships may be disadvantages of the PLS and OPLS methods. Nevertheless, combined application of such methods with traditional models and alternative use of them in different situations can be recommended if the consistency of their applicability in this field is confirmed in future studies.
References
- 1.Heimbach D. Burn patients, then and now. Burns. 1999;25:1–2. doi: 10.1016/s0305-4179(98)00154-5. [DOI] [PubMed] [Google Scholar]
- 2.Eriksson L, Johansson E, Wold N, et al. Multi- and Megavariate Data Analysis: Advanced Applications and Method Extensions. (1st ed.) Umetrics AB; Umea: 2006. [Google Scholar]
- 3.Sadeghi-Bazargani H, Mohammadi S, Banani Using SIMCA statistical software package to apply orthogonal projections to latent structures modeling; World Automation Congress; Kobe, Japan. 2010. [Google Scholar]
- 4.Trygg J, Wold S. Orthogonal projections to latent structures. J Chemometrics. 2002;16:119–128. [Google Scholar]
- 5.Sadeghi-Bazargani H, Arshi S, Ekman R, et al. Prevention oriented epidemiology of burn injuries in Ardabil provincial burn center, Iran. Burns. 2012;38:319–329. doi: 10.1016/j.burns.2010.09.013. [DOI] [PubMed] [Google Scholar]
- 6.Sadeghi-Bazargani H, Mohammadi R, Svanstrom L, et al. Epidemiology of minor and moderate burns in rural Ardabil, Iran. Burns. 2010;36:933–937. doi: 10.1016/j.burns.2009.10.022. [DOI] [PubMed] [Google Scholar]
- 7.Sadeghi-Bazargani H, Mohammadi R, Arshi S. Burn specific home safety assessment and report of design effects in an Iranian population. Unpublished work. 2010 [Google Scholar]
- 8.Weber RP. Basic content analysis. Sage University paper series on quantitative applications in the social sciences. (2nd ed.) Sage; CA: 1990. [Google Scholar]
- 9.Cadieux J, Roy M, Desmarais L. A preliminary validation of a new measure of occupational health and safety. J Safety Res. 2006;37:413–419. doi: 10.1016/j.jsr.2006.04.008. [DOI] [PubMed] [Google Scholar]
- 10.Eriksson S, Lundquist A, Gustafson Y, et al. Comparison of three statistical methods for analysis of fall predictors in people with dementia: Negative binomial regression (NBR), regression tree (RT), and partial least squares regression (PLSR). Arch Gerontol Geriatr. 2009;9:383–389. doi: 10.1016/j.archger.2008.12.004. [DOI] [PubMed] [Google Scholar]
- 11.Sadeghi-Bazargani H, Shrikant B, Mohammadi S, et al. Application of the new OPLS-DA statistical modeling technique to manage large number of variables in a burn injury case control study; Third Meeting EURO Working Group on Stochastic Modelling; Naflpio. 2010. [Google Scholar]
- 12.Sowa MG, Leonardi L, Payette JR, et al. Classification of burn injuries using near-infrared spectroscopy. J Biomed Opt. 2006;11:054002. doi: 10.1117/1.2362722. [DOI] [PubMed] [Google Scholar]
- 13.Wold S, Sjorn M, Erikson L, et al. PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems. 2001;58:109–130. [Google Scholar]
- 14.Eriksson L, Johansson E, Wold S, et al. Wikstrom C, Wold S. PLS. Multi- and megavariate data analysis: Advanced applications and method extensions. Umea: Umetrics AB; 2006. pp. 63–101. [Google Scholar]
- 15.Gomez M, Wong DT, Stewart TE, et al. The FLAMES score accurately predicts mortality risk in burn patients. J Trauma. 2008;65(3):636–645. doi: 10.1097/TA.0b013e3181840c6d. [DOI] [PubMed] [Google Scholar]
- 16.Petridou E, Trichopoulos D, Mera E, Papadatos Y, Papazoglou K, Marantos A, et al. Risk factors for childhood burn injuries: a casecontrol study from Greece. Burns. 1998;24(2):109–130. doi: 10.1016/s0305-4179(97)00095-8. [DOI] [PubMed] [Google Scholar]
- 17.Peck MD, Kruger GE, van der Merwe AE, et al. Burns and fires from non-electric domestic appliances in low- and middle-income countries Part I. The scope of the problem. Burns. 2008;34:303–311. doi: 10.1016/j.burns.2007.08.014. [DOI] [PubMed] [Google Scholar]
- 18.Othman N, Kendrick D. Epidemiology of burn injuries in the East Mediterranean Region: A systematic review. BMC Public Health. 2010;10:83. doi: 10.1186/1471-2458-10-83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Forjuoh SN. Burns in low- and middle-income countries: A review of available literature on descriptive epidemiology, risk factors, treatment, and prevention. Burns. 2006;32:529–537. doi: 10.1016/j.burns.2006.04.002. [DOI] [PubMed] [Google Scholar]
- 20.Dissanaike S, Boshart K, Coleman E, et al. Cooking-related pediatric burns: Risk factors and the role of differential cooling rates among commonly implicated substances. J Burn Care Res. 2009;30:593–598. doi: 10.1097/BCR.0b013e3181ac02c8. [DOI] [PubMed] [Google Scholar]
- 21.Mashreky SR, Rahman A, Khan TF. Determinants of childhood burns in rural Bangladesh: A nested case-control study. Health Policy. 2010;2 doi: 10.1016/j.healthpol.2010.02.004. [DOI] [PubMed] [Google Scholar]
- 22.Vico P, Papillon J. Factors involved in burn mortality: A multivariate statistical approach based on discriminant analysis. Burns. 1992;18:212–215. doi: 10.1016/0305-4179(92)90071-2. [DOI] [PubMed] [Google Scholar]
- 23.Sharma PN, Bang RL, Al-Fadhli AN, et al. Paediatric burns in Kuwait: Incidence, causes and mortality. Burns. 2006;32:104–111. doi: 10.1016/j.burns.2005.08.006. [DOI] [PubMed] [Google Scholar]
- 24.Grange AO, Akinsulie AO, Sowemimo GO. Flame burns disasters from kerosene appliance explosions in Lagos, Nigeria. Burns Incl Therm Inj. 1988;142:147–150. doi: 10.1016/0305-4179(88)90223-9. [DOI] [PubMed] [Google Scholar]
- 25.Gupta M, Bansal M, Gupta A, et al. The kerosene tragedy of 1994, an unusual epidemic of burns: epidemiological aspects and management of patients. Burns. 1996;22:3–9. doi: 10.1016/0305-4179(95)00082-8. [DOI] [PubMed] [Google Scholar]
- 26.Al-Qattan MM. Burns in epileptics in Saudi Arabia. Burns. 2000;26:561–563. doi: 10.1016/s0305-4179(00)00011-5. [DOI] [PubMed] [Google Scholar]
- 27.Allorto NL, Oosthuizen GV, Clarke DL. The spectrum and outcome of burns at a regional hospital in South Africa. Burns. 2009;35:1004–1008. doi: 10.1016/j.burns.2009.01.004. [DOI] [PubMed] [Google Scholar]
- 28.Ansari Z, Brown K, Carson N. Association of epilepsy and burns: A case control study. Aust Fam Physician. 2008;37:584–589. [PubMed] [Google Scholar]
- 29.Berrocal M. Burns and epilepsy. Acta Chir Plast. 1997;39:22–27. [PubMed] [Google Scholar]
- 30.Al-Qattan SMM. Accidental contact burns of the upper limb in children with obstetric brachial plexus injury. Burns. 1999;25:669–672. doi: 10.1016/s0305-4179(99)00043-1. [DOI] [PubMed] [Google Scholar]