ABSTRACT
The Institute of Medicine proposed six aims for healthcare quality improvement. Nevertheless, trauma care quality research still focuses on one aim at a time. This research investigates how to incorporate all aims into trauma care quality assessments using data from the Michigan Trauma Quality Improvement Program. Through a literature review, we identified quantifiable metrics for most aims, except for equity and patient-centeredness. We proposed two approaches to build composite scores accounting for equity via an adjustment procedure based on observed disparities. The single- and multi-aim approaches were compared through correlation, concordance of trauma centre categorisations, and hypothetical incentives. The differences in the approaches stemmed mainly from the weights allocated to the different aims. Results indicated the potential value of multi-aim quality assessment and provided insights about implementation challenges and opportunities. The methods are applicable to the preferred metrics; nevertheless, further research is needed in measuring patient-centeredness.
KEYWORDS: Quality, trauma care, composite
1. Introduction
Since the quality of US healthcare was reported lower than expected (Cantor et al., 2007; Institute of Medicine, 2001), the efforts to improve quality and reduce costs across domains have increased considerably. This paper focuses on the quality of trauma care. Trauma centres (TC) are hospitals capable of delivering care to patients with serious or critical injury within a community (Bailey et al., 2012). Trauma care is provided by a team of physicians and nursing staff, and may involve assessment, resuscitation, stabilisation, surgery, and intensive care (Soto et al., 2018). US trauma care quality has also been reported below expectations over the years, as evidenced by high costs related to medical treatment and lost productivity (Corso, 2006), as well as preventable medical errors with their related deaths (Gruen et al., 2006; Stelfox et al., 2010). Moreover, in 2013 it was reported that more than 3 million people with injuries were admitted to a hospital and more than 192,000 of them died (Centers for Disease Control and Prevention, National Center for Injury Prevention and Control, 2018). Therefore, there is an imperative need to improve the quality of trauma care.
Incentive programmes and external benchmarking are some of the most common efforts to improve the quality of health care in the US (Berwick et al., 2008; Cantor et al., 2007). In the US, the Trauma Quality Improvement Program (TQIP) is an initiative of the American College of Surgeons (ACS) to improve the quality of trauma care (Shafi et al., 2009) through external benchmarking and self-evaluation (Hemmila et al., 2010). The Premier Hospital Quality Incentive Demonstration (HQID) project, for example, is a pay-for-performance programme from the Centers for Medicare & Medicaid Services (CMS) that offers financial incentives to improve the quality of care in hospitals (Premier Inc., 2006).
The Institute of Medicine (IOM) proposed the following six aims to guide improvement efforts in healthcare: effectiveness, efficiency, safety, timeliness, equity, and patient-centeredness (Institute of Medicine, 2001). Nevertheless, progress towards using all six aims to evaluate and improve the quality of patient care has been insufficient (Berwick et al., 2008). For trauma care in particular, most published quality assessment and benchmarking studies focus mainly on one quality aim at a time. Most recent studies use mortality or risk adjusted mortality rates (Hashmi et al., 2014; Hemmila et al., 2010; Nathens et al., 2012; Santana & Stelfox, 2012, 2014; Shafi et al., 2009; Sharma et al., 2013), which are measures of effectiveness (Santana & Stelfox, 2014). Some other studies have evaluated length of stay (LOS) (Moore et al., 2014; Nathens et al., 2012; Santana & Stelfox, 2012; Shafi et al., 2010; Willis et al., 2007), which has been associated with efficiency (Bradley et al., 2016). While thousands of quality indicators (QIs) have either have been proposed or used to assess trauma care (Santana & Stelfox, 2012; Stelfox et al., 2010), most of those QIs lack supporting evidence or have not been proved to be reliable or valid (Santana & Stelfox, 2012; Stelfox et al., 2010). Santana and Stelfox (2014) developed a set of evidence-informed quality indicators of adult injury care with documented content validity. Nineteen of these measures were hospital QIs representing most IOM aims, except for equity and patient-centeredness. In fact, the aims of equity and patient-centeredness are often underrepresented in the trauma care performance literature when compared to the other aims. This is aggravated by registry data quality issues limiting the usability of such data in assessing performance (Porgo et al., 2016).
While patient-centeredness is difficult to measure at the systems level given the lack of documented information on patients’ preferences, equity can be measured by quantifying disparities in other baseline metrics. Most studies on health disparities are found in the public health domain, where a disparity is generally measured as a function of the difference between the values of a selected QI for disadvantaged and advantaged groups (Braveman, 2006). Although there is still a lack of consensus about the factors that should be considered to assess equity in trauma care (Udyavar et al., 2018), socioeconomic characteristics and racial/ethnic disparities have been frequently associated with differences in treatment, outcomes, and access to trauma care (Hsia et al., 2010; Rosen et al., 2009; Shafi et al., 2007).
In healthcare domains other than trauma, a common approach to perform a multidimensional analysis of performance is to use a composite score that combines more than one QI (e.g., The CMS HQID method) (Premier Inc., 2006; Jha et al., 2007; O’Brien et al., 2007; Reeves et al., 2007; Shahian et al., 2007; Shwartz et al., 2015). In trauma care settings, Willis et al. (2010) concluded that two denominator-based weight composite approaches and an approach based on principal components analysis demonstrated construct validity; these approaches can therefore be useful in evaluating quality at an institutional level with more than one QI. A different approach based on decision theory included up to four aims, but it could not be used if the aim-metrics had high variability (Aragon et al., 2018).
The goal of this research was to investigate how to incorporate the IOM aims into the evaluation of trauma care quality. We identified evidence-based measures per each quality aim that could be calculated using data from the Michigan TQIP (MTQIP) and explored potential metrics for the missing aims. We used two methods to integrate the aim-measures into a single measure: the Individual Hospital Weighted Average method (Shwartz et al., 2008; Willis et al., 2010) and the HQID method (Premier Inc., 2006; O’Brien et al., 2007). Both methods were selected due to their popularity in healthcare applications, making it easier for potential users, including practitioners and payers, to understand, improve upon, and adopt them in the future. In particular, the first method has been validated for trauma care (Shwartz et al., 2008; Willis et al., 2010) and provides a three-way categorisation of performance levels similar to that of the traditional mortality analysis, even though it requires some transformation of the original metrics, particularly the O/E mortality ratio. The second method was selected because it allows the inclusion of the O/E mortality ratio in its original form. We then compared these methods in terms of their data pre-processing needs, outputs, and implications for TCs and payers.
2. Materials and methods
2.1. Data
This study used data from the MTQIP database which, as of 2015, included patients admitted to 27 Level 1 TCs. The analysis focused on patients (age ≥ 18) admitted from 2012 to 2014 with an injury mechanism classified as either blunt or penetrating, Injury Severity Score (ISS) ≥ 5, and Hospital LOS ≥ 1 day. Patients who had no signs of life or were dead on arrival were excluded. Also, incomplete records, those with empty values in the fields used in the QI calculations, were excluded. The data was de-identified before analysis by randomly replacing the TCs’ identification number with letters.
2.2. Quality indicators per aim
Out of the 19 QIs proposed by Santana and Stelfox (2014), the dataset allowed for the calculation of six (Table 1). These included one indicator within effectiveness (observed-to-expected (O/E) mortality ratio), four indicators associated with timeliness (time to CT scan (TCT)), time to acute subdural haematoma evacuation (TAS), and time to ischaemic limb treatment (TIL), and deep vein thrombosis prophylaxis)), and one indicator representing safety (tracheal intubation (TIN)). Deep vein thrombosis prophylaxis was excluded from the analysis because it had very few cases per year and TC. Metrics for the remaining aims were identified from the literature and selected based on the availability of associated data in the MTQIP database. Since LOS is one of the most common QIs currently used to evaluate trauma care and has been associated to efficiency, it was selected to represent this aim (Bradley et al., 2016). We found no articles measuring equity in trauma care, thus we built on the absolute difference (AD) (Cheng et al., 2008; Ontario Agency for Health Protection and Promotion (Public Health Ontario), 2013), a QI commonly used to assess health inequalities in various domains (Braveman, 2006; Ontario Agency for Health Protection and Promotion (Public Health Ontario), 2013). No patient-centeredness QIs that could be calculated using MTQIP data were identified. Thus, patient-centeredness was not included in this study.
Table 1.
Quality indicators per aim
Quality Aim |
||||||||
---|---|---|---|---|---|---|---|---|
Quality Indicator with documented content validity | Safety | Timeliness | Efficiency | Effectiveness | Patient-Center edness | Equity | Available in trauma registries | |
Hospital Indicators | Direct Admission to Emergency Department Shock Room | x | ||||||
Trauma Team Activation | x | |||||||
Tracheal Intubation | x | x | ||||||
Time to CT Scan | x | x | ||||||
Antibiotics for Open Fracture | x | |||||||
Massive transfusion protocol | x | |||||||
Massive Transfusion Protocol Activation | x | x | ||||||
Definitive Bleeding Control | x | |||||||
Time to Acute Subdural Haematoma Evacuation | x | x | ||||||
Time to Ischaemic Limb Treatment | x | x | ||||||
Treatment of Joint Dislocation | ||||||||
Non-Trauma/Surgical Service Admissions | ||||||||
Deep Vein Thrombosis Prophylaxis | x | x | x | |||||
Tertiary Survey | x | x | ||||||
Spine Evaluation | x | x | ||||||
Unplanned Intensive Care Unit Admission | x | x | ||||||
Adverse Event Rate | ||||||||
Mortality Rate | x | x | ||||||
Protocol for Peer Review and Reporting of Quality of Injury Care | x |
2.3. Single-aim analysis
We analysed TC performance along each aim using caterpillar graphs, which are traditionally used by trauma care research studies (Hashmi et al., 2014; Sharma et al., 2013). In a caterpillar graph, the centerline is based on an estimator of the expected value of the measure being evaluated. The categorisation using the caterpillar graphs considers: a low-performing TC has an upper confidence bound lower than the centerline, an average-performing TC has a confidence interval overlapping with the centerline, and a high-performing TC has a lower confidence bound greater than the centerline (Hashmi et al., 2014).
For the aim of effectiveness, we used the O/E mortality ratio and calculated its 95% CI using the method proposed by Ury and Wiggins (1985). Expected (E) mortality was calculated based on a multivariate logistic regression analysis (Hashmi et al., 2014). Age, sex, and ISS, among other patient level covariates, were used to risk adjust. To study the performance along the aim of efficiency, we used an O/E LOS ratio. The expected LOS was calculated based on a multiple linear regression analysis using a natural-log transformation of LOS (Moore et al., 2017, 2014). Age, sex, mechanism of injury, Injury Severity Score, maximum AIS score (MAIS) in each body region injured, GCS, transfer status, initial systolic BP in ED, initial pulse rate in ED, race and ethnicity, and payment type, among other patient level covariates, were used to risk adjust. The central line and interval bounds were estimated in a similar manner as those for the O/E mortality ratio.
The analysis of the aim of timeliness involved more than one QI. Therefore, we first analysed the available QIs for correlation to avoid redundancy using Pearson correlation coefficient (r). Then, a metric was constructed to assess timeliness by combining its available independent QIs through the Individual Hospital Weighted Average method (Shwartz et al., 2008; Willis et al., 2010). The method requires QIs in a ratio form in which the denominator represents the patients who are eligible for the intervention and the numerator represents the patients who actually received the intervention (Shwartz et al., 2008). However, TAS and TIL are defined in units of time from an initial event until the procedure was performed. Thus, these two QIs were transformed into a ratio using the 4-hour threshold recommended by the ACS Committee on Trauma (American College of Surgeons Committee on Trauma, 2006) and an expert panel (Santana & Stelfox, 2014) for TAS and TIL, respectively. The centerline was calculated using the overall timeliness metric of all TCs combined. The confidence bounds were calculated using a 95% CI based on normal approximation of the binomial distribution.
For the aim of safety, we used TIN. Since TIN was also a ratio indicator, its confidence bounds and centerlines were calculated similarly to those of the timeliness metric.
To study TC performance along the aim of equity, the 95% CI of AD was calculated based on the method specified by Chen et al. (2008) and the centerline was calculated as the average of AD. AD requires a baseline metric to compare differences among groups of interest. We illustrate the use of the AD metric in the single aim analysis using the O/E LOS ratio of two groups. The O/E LOS ratio was selected because it allowed to include the greatest number of TCs in the analysis, but the preferred metric could be used as a baseline. Note that the choice of any single metric as the baseline for AD may bias the conclusions. The groups to compare were determined using race/ethnicity (Haider et al., 2013; Torain et al., 2016; Udyavar et al., 2018). In particular, groups of white and black patients were compared in the analysis because they had the highest relative participation in the dataset (86% and 9%, respectively). Patients of all other racial/ethnic groups (Asian, Native Hawaiian or other Pacific Islander, American Indian, Hispanic, other race) each represented less than 2% of the data.
2.4. Multi-aim analysis
We identified and evaluated two composite methods that can allow for the inclusion of more than one aim into the analysis of trauma care performance. These methods were based on the Individual Hospital Weighted Average method (Shwartz et al., 2008; Willis et al., 2010) and the HQID method (Premier Inc., 2006; O’Brien et al., 2007), respectively.
In the equity-adjusted Individual Hospital Weighted Average (E-WA) O/E method, all aim-metrics were calculated or transformed so that higher values could denote good performance (Evans, 2017). Particularly for effectiveness, we used survival rate instead of mortality rate, which represents the ratio of patients who survived (1- observed deaths) (O’Brien et al., 2007) divided by the total number of patients. In addition, all the aim-metrics, except equity, were included in the composite as ratios. LOS, originally defined in days, was transformed into a ratio using a set of thresholds defined by the corresponding expected LOS for each particular patient encounter. The equity metric could not be transformed to a ratio-type indicator. Therefore, we incorporated equity into the analysis by adjusting the composite obtained from all the other available aims.
The E-WA O/E composite per each TC was calculated as shown in Equation (1), where WA O/E Comphs is the individual hospital weighted average composite calculated for trauma centre hth (h ϵ H) and for the sth group of interest (s ϵ S/all) required for the equity adjustment. In this study, H= {A, B, C, …, P} and S = {black, white, all}. The group “all” represents full set of patients.
(1) |
To adjust for equity, Equation (1) calculates the composite as follows. For each TC, composites along with their CIs are calculated separately for two or more groups of interest related to the equity aim (e.g., black and white patients). The CIs are used to determine if the difference between groups’ performance is significant or not. If one of the two groups has a significantly lower composite value, then that lower value is used to represent the performance of the corresponding TC. If there is no significant difference in the composite values between the two groups considered, then the composite value for all patients regardless of their group (e.g., all races) is used as the performance measure for the TC.
The composite WA O/E Comphs was calculated as shown in Equation (2), where is the proportion of encounters in which the corresponding quality interventions associated with the aims included in the analysis are observed and is the proportion of encounters in which the corresponding quality interventions associated with the same aims are expected. These observed and expected values were calculated as shown in Equations (3) and (4), respectively, where nqhs represent the number of encounters that are eligible for the intervention(s) associated with the qth quality aim (q ϵ Q) in the hth trauma centre (h ϵ H) for the sth group (s ϵ S). Oqhs represent the number of encounters in which the qth aim intervention is received within the hth trauma centre for the sth group (s ϵ S). In this study, Q = {effectiveness, efficiency, timeliness, safety}. Eqhs is the expected number of encounters in which the qth aim intervention should have been received within the hth trauma centre for the sth group. For all aims except the effectiveness and equity aims, Eqhs was calculated by multiplying the overall ratio by nqhs as shown in Equation (5). The expected value for the survival rate (EEffectiveness,h) was calculated as the expected survival rate (1 – Expected deaths), where expected deaths was calculated as the expected mortality (E) multiplied by neffectiveness,h.
(2) |
(3) |
(4) |
(5) |
The 95 % CI for the E-WA O/E method was calculated using the method proposed by Ury and Wiggins (1985).
The equity-adjusted HQID (E-HQID) composite was calculated using a similar approach to the one used in the E-WA O/E method. We first calculated the HQID composite for all groups in addition to the composite for each specific group of interest. Then we selected the measure based on the presence of significant differences as shown in Equation (6).
(6) |
The HQID composite has two components as shown in Equation (7). The first component corresponds to the effectiveness aim (survival rate) and the second component combines the metrics for the aims of safety, timeliness, and efficiency. Each component is weighted by the number of aims in each component divided by the total number of aims considered (Shwartz et al., 2008; Willis et al., 2010).
(7) |
Since the distributional properties of the E-HQID composite were unknown, the CIs were calculated using bootstrap resampling based on percentiles. The composite for each TC was computed from 1,000 bootstrap samples per aim-metric (Carpenter & Bithell, 2000; Moore et al., 2013). The centerline is considered the overall E-HQID composite for all TCs.
By definition, both methods are equivalent to a weighted average of the original aim-metrics included. In each composite, those weights depend not only on the form of the metric, but also on the data composition. In the E-WA O/E method, the weight of the effectiveness aim is given by the ratio of EEffectiveness,h divided by the sum of Eqhs corresponding to all qth quality aims including effectiveness. In the same method, the weight of each of the remaining original aim-metrics is given by the ratio of the corresponding nqhs divided by the sum of Eqhs over all quality aims, including effectiveness. In the E-HQID method, the weight of the effectiveness aim is fixed at ¼ when including the same number of aims as included in this study. In general, the weight for effectiveness in this approach is given by 1/(||Q||-1). In the same method, the weight of each remaining original aim-metric is given by ¾ of the ratio of the corresponding nqhs divided by the sum of nqhs corresponding to all the quality aims except effectiveness.
The relationship between the original aim-metrics and the composites themselves were analysed using the Pearson correlation coefficient (r). In addition, the degree of agreement between the resulting TC categorisations for the different methods studied was analysed using the Rand Index (RI) (Rand, 1971; Warrens, 2008). The degree of agreement regarding RI considered: < .2 poor, < .4 fair, < .6 moderate, < .8 good, ≥ .8 to 1 very good; similar to the guidelines used for a common inter-agreement coefficient (Landis & Koch, 1977). Also, the multi-aim methods were compared using an incentive programme based on the CMS HQID project (Premier Inc., 2006) and the relationships of the weights allocated to each aim.
3. Results
The dataset used to perform the comparisons included 16 TCs and 24,794 patients who met the inclusion criteria. Approximately half of the patients in the dataset were younger than 65 years (53%), and approximately half were male patients (53%). Most of the patients in the dataset presented blunt injury (99%) and a small percentage of patients in the dataset had an ISS ≥ 25 (6%).
3.1. Single-aim analysis
Correlation coefficients were low or moderate for all pairs of aim-metrics in all years (r < ± .50), but non-significant linear correlations were found (significance level of 0.01). Thus, we conclude that combining all aims could add value to the analysis. (Figures 1 and 2) illustrate the single-aim analyses using data from 2014. Effectiveness resulted in a categorisation almost identical to those of efficiency and equity, except for one TC having a different categorisation (RI = .76). Effectiveness’ categorisations were most similar to safety (RI = .66, good agreement), followed by timeliness (RI = .29, fair agreement), and efficiency (RI = .28, fair agreement). In 2012 and 2013, effectiveness resulted in moderately different categorisations from those of all the other aims (RI = 0.45–0.56, moderate agreement). In addition, there were no perfect or good agreements found between the categorisations of any of the other aims. If we had used mortality instead of LOS as the baseline for AD, then the results would be approximately similar to those using AD based on the O/E LOS ratio in 2012, 2013 and 2014. Correlations between the other aims were different for both approaches in all years but remained non-significant (significance level of 0.01). However, AD based on mortality would have only allowed for the inclusion of 6, 5, and 7 trauma centres in 2012, 2013, and 2014, respectively, changing the overall context of analysis.
Figure 1.
(a), Caterpillar graph of Mortality O/E ratios with 95% CIs. (b), Caterpillar graph of LOS O/E ratio with 95% CIs.
Figure 2.
(a), Caterpillar graph of TIN with 95% CIs. (b), Caterpillar graph of Timeliness composite with 95% CIs. (c), Caterpillar graph of AD with 95 % CIs.
3.2. Multi-aim analysis
Using the E-WA O/E method, TCs were distributed over the three categories in each year of analysis. For example, out of the 16 TCs analysed in 2014, two were categorised as low performers, nine as average performers, and five as high performers (Figure 3 (a)). The E-WA O/E method categorisation had the highest agreement with timeliness (2013 and 2014: RI = .63) and safety (2012: RI = .61), and had decreasing agreement with the effectiveness, efficiency, and equity aims in all years (RI < .48).
Figure 3.
(a), Caterpillar graph of the E-WA O/E composite method with 95% CIs. (b), Caterpillar graph of E-HQID composite method with 95% CIs.
The E-HQID method resulted in three categories in each year of analysis. For example, in 2014, five TCs were categorised as low performers, five as average performers, and six as high performers (Figure 3 (b)). This categorisation was most consistent with timeliness (2014: RI = .71; 2013: RI = .72; 2012: RI = .67) and least consistent with the effectiveness, efficiency, and equity aims in all years (RI < .48).
The E-WA O/E and the E-HQID composite scores demonstrated moderate positive correlation (r = .51, p = .04) as well as good agreement between the corresponding categorisations in all years of analysis (2014: RI = .71; 2013: RI = .80; 2012: RI = .79). A summary of the categorisations resulting from single and composite measures for 2014, sorted by TC, can be found on (Table 2).
Table 2.
Performance results, single-aim and multiple-aim approaches (2014)
TC† | Effectiveness | Efficiency | Safety | Timeliness | Equity | E-WA O/E composite | E-HQID composite |
---|---|---|---|---|---|---|---|
A | AVERAGE | AVERAGE | LOW | LOW | AVERAGE | LOW | LOW |
B | AVERAGE | HIGH | AVERAGE | LOW | AVERAGE | LOW | LOW |
C | AVERAGE | HIGH | AVERAGE | HIGH | AVERAGE | AVERAGE | AVERAGE |
D | AVERAGE | AVERAGE | AVERAGE | AVERAGE | HIGH | AVERAGE | LOW |
E | AVERAGE | LOW | AVERAGE | AVERAGE | AVERAGE | AVERAGE | AVERAGE |
F | AVERAGE | AVERAGE | AVERAGE | LOW | AVERAGE | AVERAGE | LOW |
G | AVERAGE | AVERAGE | AVERAGE | LOW | AVERAGE | LOW | LOW |
H | AVERAGE | LOW | AVERAGE | LOW | AVERAGE | AVERAGE | AVERAGE |
I | AVERAGE | LOW | AVERAGE | HIGH | AVERAGE | HIGH | HIGH |
J | AVERAGE | LOW | AVERAGE | HIGH | AVERAGE | HIGH | HIGH |
K | AVERAGE | LOW | AVERAGE | HIGH | AVERAGE | HIGH | HIGH |
L | AVERAGE | LOW | AVERAGE | AVERAGE | AVERAGE | HIGH | HIGH |
M | AVERAGE | LOW | LOW | HIGH | AVERAGE | HIGH | HIGH |
N | AVERAGE | HIGH | AVERAGE | AVERAGE | AVERAGE | AVERAGE | LOW |
O | AVERAGE | LOW | AVERAGE | LOW | AVERAGE | AVERAGE | |
P | AVERAGE | HIGH | AVERAGE | AVERAGE | AVERAGE | AVERAGE | AVERAGE |
†TC: trauma centre
The comparison in relation to an incentive programme showed that the E-WA O/E and E-HQID methods could have different monetary impact for some TCs over a three-year analysis, assuming a fixed annual payment of 100 USD per TC per year (Table 3). Three (20.0%) TCs received incentives regardless of the multi-aim method, but only one (6.6%) received the same amount. One (6.6%) TC received penalties regardless of the multi-aim method. The use of the effectiveness metric for the incentive programme resulted in more and mostly different TCs penalised in relation to any of the multi-aim methods. One (6.6%) TC received the same amount in penalties as the E-HQID method. The payer had different overall expenditures for the different multi-aim methods considered. However, using effectiveness only to guide performance evaluation resulted in savings to the payer.
Table 3.
Incentive (penalty) for trauma centres and payer for a period of three years of analysis (values in $100)
|
Trauma centre |
Payer | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | F | G | H | I | J | K | L | M | N | P | ||
Effectiveness only | 2 | (1) | (2) | (2) | 0 | 2 | 0 | (2) | 2 | (2) | (2) | (1) | 1 | 2 | (1) | 4 |
E-WA O/E Composite |
(1) | (2) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 6 | 1 | 5 | 0 | 0 | (12) |
E-HQID Composite | 0 | (1) | 0 | (1) | 0 | 0 | 3 | 0 | 0 | 0 | 6 | 2 | 3 | (1) | 0 | (11) |
The comparison in relation to the weights allocated to each aim indicated that in the E-WA O/E method, the effectiveness and efficiency aims had similar weights in all years of analysis (the efficiency weight was 101.5% of the effectiveness weight with a standard deviation (SD) of 6.1%), but those weights were consistently higher than the weights given to the timeliness and safety aims. The safety aim had similar weight in all years (17.6% of the effectiveness weight with a SD of 1.3%). The timeliness weight was similar in 2013 and 2014 (72.0% of the effectiveness weight with a SD of 0.7%) but smaller in 2012 (6.6% of the effectiveness weight). The E-HQID method, on the other hand, tended to give the highest weight to efficiency (158.3% of the effectiveness weight with a SD of 5.6%) and timeliness (113.9% of the effectiveness weight with a SD of 5.8%) in 2013 and 2014. The safety aim had smaller but similar weight in all years (30.9% of the effectiveness weight with a SD of 4.4%). The weights for efficiency and timeliness were different in 2012 (247.6% and 15.1% of the effectiveness weight, respectively) than other years due to variations in the data composition.
4. Discussion
According to the IOM, integrating all quality aims into performance evaluation inherently provides a better representation of the multidimensional nature of healthcare quality, regardless of the method used. The main contribution of this paper was to provide a starting point in bridging the gap between the IOM recommendations and practice, particularly focusing on trauma care quality and the data available in the MTQIP. This paper explored composite methods based on current research and practice and identified opportunities and challenges in operationalising such recommendations.
The multi-aim methods studied produced different TC categorisations than those obtained using the corresponding single-aim analyses. These differences illustrated the potential gain of information when incorporating multiple dimensions into the analysis and confirm previous findings indicating that only one outcome may not be sufficient to assess the performance of TCs (Hashmi et al., 2014; Shahian et al., 2012; Stelfox et al., 2010). Our study showed that the methods used are equivalent to a weighted average of the baseline metrics in which some natural or unintended weights allocated to each aim-metric depend on both, the method itself and the dataset composition. In the registry data available for this analysis, the effectiveness and efficiency aims had significantly higher numbers of eligible patients than the safety and timeliness aim metrics. Thus, effectiveness and efficiency tended to dominate the assessments. This finding can inform the selection of either of the methods presented here as well as justify the development of new methods that integrate aim-metrics considering more traditional, artificial weighting approaches that reflect other decision-makers’ preferences. While there is no official ranking of the aims given by the IOM to guide the selection of weights, there may be situations in which alternative weightings are desired. For example, patients seeking to make informed decisions about their care, may seek to assess different healthcare provider options with a metric that reflects their individual definitions of quality expressed as different weights per aim. Similarly, TCs may prefer a different weighting scheme to favour advertising efforts.
The results of this study provided evidence of the effects of each method when evaluating trauma care quality from different perspectives and the anticipated sources of resistance to change given the choice of metric. The composite methods considered were compared in terms of external benchmarking and in terms of a pay-for-performance programme. TCs seeking benchmarking might prefer the E-WA O/E method because of its similitude with current practices. If the assessment was associated with an incentive programme, we would also observe discrepancies in preferences. Payers may prefer one method to optimise budgets while trauma centres may prefer the other to maximise their own incentives or penalties. The three-year analysis indicated that none of the methods was consistently better for all TCs, but the E-HQID method presented a small economic advantage for the payer. Some TCs might prefer the E-WA O/E method while others the E-HQID method given the associated incentives at the end of the third year. The incentive programme analysis also suggested that the payer might prefer the single-aim method using effectiveness because it would result in the payer owing no financial incentives to any TC. Establishing an incentive programme thus requires agreement on how to incorporate the different aims into the analysis and how to use the results of the analysis to determine penalties or incentives that positively influence performance, while considering that the competing interests of different stakeholders cannot be captured within one single approach.
Our study found that the first major challenge in operationalising the IOM recommendations is related to the selection of metrics to represent each aim. Although the equity and patient-centeredness aims were as important as the other four aims according to the IOM, the practical assessment of both aims remains a challenge, as evidenced in the lack of published studies systematically measuring and analysing them. The registry data did not allow for the quantification of pre-hospital QIs identified from the literature to assess those aims. Nevertheless, this study proposed an approach to incorporate the equity aim into the analysis. This approach has the potential to improve how traditional incentives based on the average influence performance. In this approach, underperforming subpopulations (and potential sources of disparities) become bottlenecks that organisations must elevate in order to meet incentives. While the baseline QI and the disparity source considered limit the generalisability of the results, the proposed methodology can be adapted or expanded to include alternative QIs once new data sources become available. As a result, only five aims could be included in the analysis, which was a confirmation of what the literature has indicated: patient-centeredness is still a difficult aim to measure.
Another challenge was related to data pre-processing and analysis. Both multi-aim approaches considered in this study required metrics in ratio form. The transformation of some of the QIs, such as TAS, TIL, and LOS, from a continuous metric to a ratio-type metric can be considered a limitation of the analysis. These transformations required the definition of a threshold, which likely impacted the resulting categorisation of TCs. For example, the expected values of LOS were used as LOS thresholds per encounter, while constants defined in the literature were used as thresholds for TAS and TIL. While this study was constrained by the availability of data in the MTQIP registry, having information on the characteristics of the QIs that can be used in such analysis can guide the (re)design of trauma registries to ensure a valid representation of each aim.
The insights obtained through this research are intended to support stakeholders in identifying the most appropriate method to evaluate TC quality and then making efforts towards ensuring the availability of the necessary data. The approaches described here would likely take additional resources beyond the usual data analyst, performance improvement, and TQIP resources that most trauma centres are already employing, thus this research could motivate trauma centres to re-evaluate the analytical needs for their quality improvement programmes. Alternatively, this study can motivate researchers to further explore analysis techniques that can adapt to the limitations of available data sources, particularly missing data, and to ensure that these can be applied systematically to any participating trauma system with limited resources.
Acknowledgments
The authors would like to thank Bronson Trauma Surgery Services, particularly Scott Davidson, MD, for supporting this study and providing access to the necessary information and resources. Funding support was provided through the Bronson Research Fund.
Funding Statement
This work was supported by the Bronson Research Fund [BMH 2015-0797].
Disclosure of interest
This work was supported by The Bronson Research Fund under Grant BMH 2015-0797.
Laila Cure was the PI in the grant (BMH 2015-0797) to support this research. Lucy G. Aragon was a graduate assistant funded partly from this grant. For Karen Schieman, no conflicts were declared.
Disclosure statement
No potential conflict of interest was reported by the authors.
References
- American College of Surgeons Committee on Trauma . (2006). Resources for optimal care of the injured patient 2006. American College of Surgeons. [Google Scholar]
- Aragon, L., Cure, L., & Schieman, K. (2018). Comparing analytic approaches to evaluating quality performance in trauma care. In Barker K., Berry D., & Rainwater C. (Eds.), Proceedings of the 2018 institute of industrial and systems engineers annual conference and expo; may 19- 22,2018. Orlando, FL: The Institute of Industrial and Systems Engineers. [Google Scholar]
- Bailey, J., Trexler, S., Murdock, A., & Hoyt, D. (2012). Verification and regionalization of trauma systems. The impact of these efforts on trauma care in the United States. Surgical Clinics of North America, 92(4), 1009–1024. 10.1016/j.suc.2012.04.008 [DOI] [PubMed] [Google Scholar]
- Berwick, D. M., Nolan, T. W., & Whittington, J. (2008). The triple aim: Care, health, and cost. Health Affairs, 27(3), 759–769. 10.1377/hlthaff.27.3.759 [DOI] [PubMed] [Google Scholar]
- Bradley, N. L., Au, S., & Widder, S. (2016). Quality improvement and trauma quality indicator. In Gillman L. M., Widder S., Blaivas M. MD, & Karakitsos D. (Eds.), Trauma team dynamics (pp. 67–72). Springer International Publishing. [Google Scholar]
- Braveman, P. (2006). Health disparities and health equity: Concepts and measurement. Annual Review of Public Health, 27(1), 167–194. 10.1146/annurev.publhealth.27.021405.102103 [DOI] [PubMed] [Google Scholar]
- Cantor, J. C., Schoen, C., Belloff, D., How, S. K. H., & McCarthy, D. (2007). Aiming higher: Results from a state scorecard on health system performance. Commonwealth Fund. [Google Scholar]
- Carpenter, J., & Bithell, J. (2000). Bootstrap confidence intervals: When, which, what? A practical guide for medical statisticians. Statistics in Medicine, 19(9), 1141–1164. [DOI] [PubMed] [Google Scholar]
- Centers for Disease Control and Prevention, National Center for Injury Prevention and Control . (2018, November 14). Web-based injury statistics query and reporting system (WISQARS). https://www.cdc.gov/injury/wisqars/cost/index.html
- Cheng, N. F., Han, P. Z., & Gansky, S. A. (2008). Methods and software for estimating health disparities: The case of children’s oral health. American Journal of Epidemiology, 168(8), 906–914. 10.1093/aje/kwn207 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corso, P. (2006). Incidence and lifetime costs of injuries in the United States. Injury Prevention, 12(4), 212–218. 10.1136/ip.2005.010983 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Evans, G. W. (2017). Multiple criteria decision analysis for industrial engineering: Methodology and applications. CRC Press. [Google Scholar]
- Gruen, R. L., Jurkovich, G. J., McIntyre, L. K., Foy, H. M., & Maier, R. V. (2006). Patterns of errors contributing to trauma mortality: Lessons learned from 2594 deaths. Annals of Surgery, 244(3), 371–378. 10.1097/01.sla.0000234655.83517.56 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haider, A. H., Weygandt, P. L., Bentley, J. M., Monn, M. F., Rehman, K. A., Zarzaur, B. L., … Cooper, L. A. (2013). Disparities in trauma care and outcomes in the United States: A systematic review and meta-analysis. Journal of Trauma and Acute Care Surgery, 74(5), 1195–1205. 10.1097/TA.0b013e31828c331d [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hashmi, Z. G., Schneider, E. B., Castillo, R., Haut, E. R., Zafar, S. N., Cornwell, E. E., … Haider, A. H. (2014). Benchmarking trauma centers on mortality alone does not reflect quality of care: Implications for pay-for-performance. Journal of Trauma and Acute Care Surgery, 76(5), 1184–1191. 10.1097/TA.0000000000000215 [DOI] [PubMed] [Google Scholar]
- Hemmila, M. R., Nathens, A. B., Shafi, S., Calland, J. F., Clark, D. E., Cryer, H. G., … Fildes, J. J. (2010). The trauma quality improvement program: Pilot study and initial demonstration of feasibility. The Journal of Trauma: Injury, Infection, and Critical Care, 68(2), 253–262. 10.1097/TA.0b013e3181cfc8e6 [DOI] [PubMed] [Google Scholar]
- Hsia, R. Y., Wang, E., Torres, H., Saynina, O., & Wise, P. H. (2010). Disparities in trauma center access despite increasing utilization: Data from California, 1999 to 2006. The Journal of Trauma, 68(1), 217–224. 10.1097/TA.0b013e3181a0e66d [DOI] [PMC free article] [PubMed] [Google Scholar]
- Institute of Medicine . (2001). Crossing the quality chasm: A new health system for the 21st century. National Academy Press. [PubMed] [Google Scholar]
- Jha, A. K., Orav, E. J., Li, Z., & Epstein, A. M. (2007). The inverse relationship between mortality rates and performance in the hospital quality alliance measures. Health Affairs, 26(4), 1104–1110. 10.1377/hlthaff.26.4.1104 [DOI] [PubMed] [Google Scholar]
- Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. 10.2307/2529310 [DOI] [PubMed] [Google Scholar]
- Moore, L., Evans, D., Yanchar, N. L., Thakore, Y., Stelfox, J., Hameed, H. T., … Turgeon, A. (2017). Canadian benchmarks for acute injury care. Canadian Journal of Surgery, 60 (6), 380–387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moore, L., Lavoie, A., Sirois, M. J., Belcaid, A., Bourgeois, G., Lapointe, J., … Émond, M. (2013). A comparison of methods to obtain a composite performance indicator for evaluating clinical processes in trauma care. Journal of Trauma and Acute Care Surgery, 74(5), 1344–1350. 10.1097/TA.0b013e31828c32f2 [DOI] [PubMed] [Google Scholar]
- Moore, L., Stelfox, H. T., Turgeon, A. F., Nathens, A. B., Lavoie, A., Émond, M., … Neveu, X. (2014). Derivation and validation of a quality indicator of acute care length of stay to evaluate trauma care. Annals of Surgery, 260(6), 1121–1127. 10.1097/SLA.0000000000000648 [DOI] [PubMed] [Google Scholar]
- Nathens, A. B., Cryer, H. G., & Fildes, J. (2012). The American college of surgeons trauma quality improvement program. Surgical Clinics of North America, 92(2), 441–454. 10.1016/j.suc.2012.01.003 [DOI] [PubMed] [Google Scholar]
- O’Brien, S. M., Delong, E. R., Dokholyan, R. S., Edwards, F. H., & Peterson, E. D. (2007). Exploring the behavior of hospital composite performance measures: An example from coronary artery bypass surgery. Circulation, 116(25), 2969–2975. 10.1161/CIRCULATIONAHA.107.703553 [DOI] [PubMed] [Google Scholar]
- Ontario Agency for Health Protection and Promotion (Public Health Ontario) . (2013). Summary measures of socioeconomic inequalities in health. Queen’s Printer for Ontario. [Google Scholar]
- Premier Inc. . (2006). Centers for medicare and medicaid services (CMS)/premier hospital quality incentive demonstration project: Findings from year one. Charlotte, NC: Author. http://www.premierinc.com/quality-safety/tools-services/p4p/hqi/hqi-whitepaper041306.pdf
- Porgo, T. V., Moore, L., & Tardif, P. A. (2016). Evidence of data quality in trauma registries: A systematic review. Journal of Trauma and Acute Care Surgery, 80(4), 648–658. 10.1097/TA.0000000000000970 [DOI] [PubMed] [Google Scholar]
- Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850. 10.1080/01621459.1971.10482356 [DOI] [Google Scholar]
- Reeves, D., Campbell, S. M., Adams, J., Shekelle, P. G., Kontopantelis, E., & Roland, M. O. (2007). Combining multiple indicators of clinical quality: An evaluation of different analytic approaches. Medical Care, 45(6), 489–496. 10.1097/MLR.0b013e31803bb479 [DOI] [PubMed] [Google Scholar]
- Rosen, H., Saleh, F., Lipsitz, S., Rogers, S. O., & Gawande, A. A. (2009). Downwardly mobile: The accidental cost of being uninsured. Archives of Surgery, 144(11), 1006. 10.1001/archsurg.2009.195 [DOI] [PubMed] [Google Scholar]
- Santana, M. J., & Stelfox, H. T. (2012). Quality indicators used by trauma centers for performance measurement. The Journal of Trauma and Acute Care Surgery, 72(5), 1298–1303. 10.1097/TA.0b013e318246584c [DOI] [PubMed] [Google Scholar]
- Santana, M. J., & Stelfox, H. T. (2014). Development and evaluation of evidence-informed quality indicators for adult injury care. Annals of Surgery, 259(1), 186–192. 10.1097/SLA.0b013e31828df98e [DOI] [PubMed] [Google Scholar]
- Shafi, S., Barnes, S., Nicewander, D., Ballard, D., Nathens, A. B., Ingraham, A. M., … Gentilello, L. M. (2010). Health care reform at trauma centers-mortality, complications, and length of stay. Journal of Trauma - Injury, Infection and Critical Care, 69(6), 1367–1371. 10.1097/TA.0b013e3181fb785d [DOI] [PubMed] [Google Scholar]
- Shafi, S., De La Plata, C. M., Diaz-Arrastia, R., Bransky, A., Frankel, H., Elliott, A. C., … Gentilello, L. M. (2007). Ethnic disparities exist in trauma care. Journal of Trauma - Injury, Infection and Critical Care, 63(5), 1138–1142. 10.1097/TA.0b013e3181568cd4 [DOI] [PubMed] [Google Scholar]
- Shafi, S., Nathens, A. B., Cryer, H. G., Hemmila, M. R., Pasquale, M. D., Clark, D. E., … Fildes, J. J. (2009). The trauma quality improvement program of the american college of surgeons committee on trauma. Journal of the American College of Surgeons, 209(4), 521–530. 10.1016/j.jamcollsurg.2009.07.001 [DOI] [PubMed] [Google Scholar]
- Shahian, D. M., Edwards, F. H., Ferraris, V. A., Haan, C. K., Rich, J. B., Normand, S. T., … Peterson, E. D. (2007). Quality measurement in adult cardiac surgery: Part 1—conceptual framework and measure selection. The Annals of Thoracic Surgery, 83(4), S3–S12. 10.1016/j.athoracsur.2007.01.053 [DOI] [PubMed] [Google Scholar]
- Shahian, D. M., Iezzoni, L. I., Meyer, G. S., Kirle, L., & Normand, S. L. T. (2012). Hospital-wide mortality as a quality metric: Conceptual and methodological challenges. American Journal of Medical Quality, 27(2), 112–123. 10.1177/1062860611412358 [DOI] [PubMed] [Google Scholar]
- Sharma, S., De Mestral, C., Hsiao, M., Gomez, D., Haas, B., Rutka, J., & Nathens, A. B. (2013). Benchmarking trauma center performance in traumatic brain injury: The limitations of mortality outcomes. Journal of Trauma and Acute Care Surgery, 74(3), 890–894. 10.1097/TA.0b013e3182827253 [DOI] [PubMed] [Google Scholar]
- Shwartz, M., Ren, J., Peköz, E. A., Wang, X., Cohen, A. B., & Restuccia, J. D. (2008). Estimating a composite measure of hospital quality from the Hospital Compare database: Differences when using a Bayesian hierarchical latent variable model versus denominator-based weights. Medical Care, 46(8), 778–785. 10.1097/MLR.0b013e31817893dc [DOI] [PubMed] [Google Scholar]
- Shwartz, M., Restuccia, J. D., & Rosen, A. K. (2015). Composite measures of health care provider performance: A description of approaches. The Milbank Quarterly, 93(4), 788–825. 10.1111/1468-0009.12165 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soto, J. M., Zhang, Y., Huang, J. H., & Feng, D. X.. An overview of the American trauma system. (2018). Chinese Journal of Traumatology - English Edition, 21(2), 77–79. Elsevier B.V. 10.1016/j.cjtee.2018.01.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stelfox, H. T., Bobranska-Artiuch, B., Nathens, A., & Straus, S. E. (2010). Quality indicators for evaluating trauma care. Archives of Surgery, 145(3), 286–295. 10.1001/archsurg.2009.289 [DOI] [PubMed] [Google Scholar]
- Torain, M. J., Maragh-Bass, A. C., Dankwa-Mullen, I., Hisam, B., Kodadek, L. M., Lilley, E. J., … Haider, A. H. (2016). Surgical disparities: A comprehensive review and new conceptual framework. Journal of the American College of Surgeons, 223(2), 408–418. 10.1016/j.jamcollsurg.2016.04.047 [DOI] [PubMed] [Google Scholar]
- Udyavar, R., Perez, S., & Haider, A. (2018). Equal access is quality: an update on the state of disparities research in trauma. Current Trauma Reports, 4(1), 25–38. 10.1007/s40719-018-0114-6 [DOI] [Google Scholar]
- Ury, H. K., & Wiggins, A. D. (1985). Another shortcut method for calculating the confidence interval of a Poisson variable (or of a standardized mortality ratio). American Journal of Epidemiology, 122(1), 197–198. 10.1093/oxfordjournals.aje.a114083 [DOI] [PubMed] [Google Scholar]
- Warrens, M. J. (2008). On the equivalence of cohen’s kappa and the hubert-arabie adjusted rand index. Journal of Classification, 25(2), 177–183. 10.1007/s00357-008-9023-7 [DOI] [Google Scholar]
- Willis, C. D., Gabbe, B. J., & Cameron, P. A. (2007). Measuring quality in trauma care. Injury, 38(5), 527–537. 10.1016/j.injury.2006.06.018 [DOI] [PubMed] [Google Scholar]
- Willis, C. D., Stoelwinder, J. U., Lecky, F. E., Woodford, M., Jenks, T., Bouamra, O., & Cameron, P. A. (2010). Applying composite performance measures to trauma care. The Journal of Trauma: Injury, Infection, and Critical Care, 69(2), 256–262. 10.1091/TA.0b013e3181e5e2a3 [DOI] [PubMed] [Google Scholar]