Skip to main content
De Gruyter Funded Articles logoLink to De Gruyter Funded Articles
. 2024 Apr 22;11(3):303–311. doi: 10.1515/dx-2023-0184

Development of a disease-based hospital-level diagnostic intensity index

Michael I Ellenbogen 1,, Leonard S Feldman 2, Laura Prichett 3, Junyi Zhou 3, Daniel J Brotman 1
PMCID: PMC11306196  PMID: 38643385

Abstract

Objectives

Low-value care is associated with increased healthcare costs and direct harm to patients. We sought to develop and validate a simple diagnostic intensity index (DII) to quantify hospital-level diagnostic intensity, defined by the prevalence of advanced imaging among patients with selected clinical diagnoses that may not require imaging, and to describe hospital characteristics associated with high diagnostic intensity.

Methods

We utilized State Inpatient Database data for inpatient hospitalizations with one or more pre-defined discharge diagnoses at acute care hospitals. We measured receipt of advanced imaging for an associated diagnosis. Candidate metrics were defined by the proportion of inpatients at a hospital with a given diagnosis who underwent associated imaging. Candidate metrics exhibiting temporal stability and internal consistency were included in the final DII. Hospitals were stratified according to the DII, and the relationship between hospital characteristics and DII score was described. Multilevel regression was used to externally validate the index using pre-specified Medicare county-level cost measures, a Dartmouth Atlas measure, and a previously developed hospital-level utilization index.

Results

This novel DII, comprised of eight metrics, correlated in a dose-dependent fashion with four of these five measures. The strongest relationship was with imaging costs (odds ratio of 3.41 of being in a higher DII tertile when comparing tertiles three and one of imaging costs (95 % CI 2.02–5.75)).

Conclusions

A small set of medical conditions and related imaging can be used to draw meaningful inferences more broadly on hospital diagnostic intensity. This could be used to better understand hospital characteristics associated with low-value care.

Keywords: low-value care, overuse, diagnostic testing, hospital variation

Introduction

Low-value care is a widespread problem in the US healthcare system and has been estimated to cost the system between $76 and $102 billion per year [1]. In addition to contributing to high societal healthcare costs, it has pernicious effects on individual patients including pain, anxiety, radiation exposure, and cascades of care (which can lead to unnecessary procedures or delay necessary procedures) [2], [3], [4]. To date, much of the research quantifying low-value care and understanding its drivers has focused on regional measurement [5], [6], [7], [8], [9].

Developing methods to quantify low-value care at the hospital level is important for a number of reasons. The largest proportion of total national healthcare expenditures (31.1 %) is attributable to hospitals [10]. Providers are increasingly directly employed by hospitals or health systems or working at practices that are partially owned by hospitals or health systems [11, 12]. The financial incentives and culture that are drivers of low-value care are often products of the environment at a given institution [13]. As a result, the hospital is often the level at which quality improvement initiatives and realignment of incentives can optimally impact overuse. However, there is a paucity of well-constructed methods to quantify hospital-level low-value care. We sought to develop a simple index that does not require longitudinal data to measure hospital-level diagnostic intensity using relatively common diseases that often can be diagnosed and managed without expensive diagnostics.

Materials and methods

Conceptual framework for choosing index components

In order to construct the diagnostic intensity index (DII), three hospitalist physicians chose candidate metrics representing common diseases leading to hospitalization for which expensive imaging is frequently not needed (particularly for lower acuity cases) but often still performed, even when unlikely to change management (Supplementary Table 1). We selected conditions: 1) which can often be diagnosed by signs and symptoms and/or laboratory tests alone, yet advanced imaging is frequently ordered for diagnostic purposes (for example, pancreatitis/computed tomography (CT) scan and pyelonephritis/CT scan), 2) for which advanced imaging may be used more often than is clinically useful to determine an etiology of a condition and for which the conditions often resolves with simple interventions not dictated by the imaging (for example, acute kidney injury/renal ultrasound, encephalopathy/brain imaging, and post-partum hemorrhage/pelvic imaging), 3) for which advanced imaging may be used more frequently than necessary to rule out secondary diagnoses that are uncommon in the setting of a known primary diagnosis (for example, a lower extremity ultrasound to rule out a venous thromboembolism in a patient with leg swelling secondary to cellulitis or heart failure exacerbation or CT scan to rule out pulmonary embolism as the cause of chronic obstructive pulmonary disease exacerbation, and 4) for which advanced imaging may be used for prognostic purposes more often than necessary (for example, echocardiogram to look for right heart strain in pulmonary embolism in cases where it is clinically obvious that right heart strain is not present or this could be ruled out by laboratory tests alone). Importantly, none of these imaging tests are required to make the diagnosis they are paired with; we acknowledge, however, that these diagnostic tests may be helpful to guide management in some instances, and that the appropriate rate is not zero.

We used these categories to brainstorm potential metrics, and then used our combined clinical experience coupled with American College of Radiology Appropriateness Criteria [14] to select final candidate metrics by consensus. Our supposition was not that these conditions never or rarely require expensive diagnostics, but instead that we can use the hospital-level frequency of expensive imaging tests associated with a subset of diagnoses to extrapolate and quantify diagnostic intensity more broadly at a hospital. This hospital-level DII value might serve as a proxy for overuse or low-value care allowing for comparison across institutions.

Data sources

We used hospital-level data from the State Inpatient Databases (SID) which are part of the Agency for Healthcare Research and Quality’s Healthcare Cost and Utilization Project. We obtained data from 2016 to 2018 from Kentucky, Maryland, New Jersey, and North Carolina. These states were chosen based on cost and availability of data (not all states are included in the SID, and not all states in the SID have the same data elements). This dataset includes patient characteristic and diagnosis and procedure codes (International Classification of Disease (ICD) 10 discharge diagnosis codes, ICD10 procedure codes, current procedural terminology (CPT) codes, and revenue codes) for each inpatient hospitalization.

We used the American Hospital Association (AHA) annual survey [15] to provide hospital characteristics – bed size, teaching status, critical access status, ownership, and location. In keeping with the AHA definition, we considered major teaching hospitals to be those with the Council of Teaching Hospital designation from the Association of American Medical Colleges and minor teaching hospitals to be those with 1) at least one Accreditation Council of Graduate Medical Education accredited program, 2) an affiliation with a medical school, or 3) an internship or residency approved by the American Osteopathic Association. We also used the AHA annual survey to link each hospital to a hospital service area (HSA) and county. We used the Dartmouth Atlas 2017 Hospital Tracking File [16] to determine whether a hospital was part of a health system.

Medicare county-level cost measures, a Dartmouth Atlas HSA-level measure, and a previously developed and validated hospital-level symptom-based DII [17] were used to externally validate this new disease-based DII. Specifically, we used 2017 county-level cost of imaging, procedures, and tests per user, adjusted for geographic differences in payment rates, which were available through a Medicare public use file [18]. The data were organized into these categories by the Centers for Medicare and Medicaid Services using Berenson-Eggers Type of Service classification scheme [19]. Imaging includes X-ray, CT, magnetic resonance (MR), and ultrasound modalities. Procedures include major surgeries, minor surgeries, non-invasive procedures including endoscopy, and radiation. Tests include standard blood and urine tests, microbiology, electrocardiograms, and stress tests.

We used data from the Dartmouth Atlas for inpatient spending per decedent during the last six months of life per HSA [20]. For all county-level and HSA-level measures, we applied the value of the measure to all hospitals in that geographic area. We also measured diagnostic intensity using a previously developed hospital-level symptom-based DII [17].

Candidate metrics

Each candidate metric paired a disease with an imaging test or tests. Diseases were defined using a single or, more commonly, group of ICD10 discharge diagnoses. We excluded ICD diagnosis codes that might suggest more complicated or severe disease states. For example, with pancreatitis, we excluded ICD10 codes for pancreatitis with necrosis as they suggest more severe disease states for which CT imaging is generally appropriate. We identified imaging tests using ICD10 procedure codes, CPT codes (which we expected to be infrequently used given that these are typically used for outpatients rather than inpatients), and revenue codes (Supplementary Table 1).

For a given hospital, each candidate metric was a fraction. The denominator was defined as the number of inpatient hospitalizations with one of the target diagnoses, and the numerator was the number of those hospitalizations with an associated imaging test. In other words, each metric represented the proportion of patients with a given diagnosis who also underwent advanced imaging. We characterized each candidate metric by the number of observations, mean, standard deviation, interquartile range, and median.

We tested three alternative versions of each metric: 1) using a length of stay (LOS) cutoff to exclude any inpatient hospitalizations with a LOS greater than or equal to the median LOS across all hospitals for that medical condition, 2) using only inpatient hospitalizations with one of the discharge diagnosis codes of interest in the primary position for a given metric, and 3) using both the LOS and primary position-only restrictions.

Derivation-validation process and index construction

We tested each candidate metric for temporal stability using intraclass correlation coefficients (ICCs). We calculated the average ICC for each metric with 95 % confidence intervals (CI). We considered ICCs between 0.75 and 0.90 to suggest good temporal stability and greater than 0.90 to suggest excellent temporal stability [21]. We calculated ICCs for the derivation set (Maryland and North Carolina), the validation set (Kentucky and New Jersey), and the entire dataset with the plan to exclude metrics with poor temporal stability in either the derivation or validation set.

We then tested the remaining non-excluded metrics for internal consistency using Cronbach’s alpha. We planned to exclude metrics with an item-rest correlation below 0.30 in both the derivation and validation sets, utilizing a high bar for exclusion as we wanted to ensure that the DII was broadly representative of medical pathology, even if measuring different aspects of diagnostic testing. We calculated the overall Cronbach’s alpha for the combined metrics with the expectation that an overall group Cronbach’s alpha above 0.60 would be satisfactory.

Those metrics that were not discarded comprised the final disease-based DII. We defined each hospital’s overall DII score as the mean value of the eight included metrics. Although the prevalence of included conditions varied widely (Supplementary Table 2), we elected not to incorporate prevalence into the composite score. This allowed for simplicity, broader representation of conditions, and generalizability, recognizing that disease prevalence varies by institution. The remaining analyses presented are based on this composite hospital-level DII score.

Examination of the distribution of the DII and external validation measures

We elected to use a categorical (quantile) approach to group hospitals and compare them to external validation measures as we did not expect the DII to have the ability to discriminate among hospitals with marginally different DII values. We created mosaic plots of DII quartiles by each of the five external validation measures quartiled (Supplementary Figure 1). Based on the appearance, we chose to analyze the DII and external validation measures by tertile.

Characterization and external validation of index

Rehabilitation, specialty, psychiatric, long-term acute care, and pediatric hospitals were excluded. We divided all acute care hospitals in the four-state sample into tertiles based on their index value, calculating the mean and range of index values of each tertile with a higher tertile representing higher diagnostic intensity and likely more overuse. We described each tertile by hospital characteristics – bed size (continuous variable), teaching status (major, minor, non-teaching), critical access status (yes/no), ownership (government, non-federal; non-government, not-for-profit; for profit), and location (urban/rural). We also compared the mean Elixhauser Comorbidity Index score [22, 23] across tertiles using all inpatients, using a Stata program (code: elixhauser I10_DX1-I10_DX25, index(10) smelix cmorb). We used analysis of variance to compare continuous variables (hospital bed size and Elixhauser Comorbidity Index) and chi-squared test to compare categorical variables (teaching status, critical access status, ownership, and location) across tertiles.

Multilevel mixed-effects ordered logistic (ordinal tertile) regression with a random intercept for state effect was used to externally validate the index. We chose not to include hospital characteristics as covariates in this model as we were simply looking at the ability of our DII to quantify hospital-level diagnostic intensity, which likely is related to hospital characteristics. Controlling for hospital characteristics would have adjusted away part of the relationship between the DII and external validation measures we were trying to measure. We characterized hospitals per county and HSA. Analyses were conducted using Stata 18 (College Station, TX).

Results

Descriptive statistics for candidate metrics across hospitals

The entire cohort included 319 hospitals with data from 8.1 million hospitalizations. Of these, 3.2 million hospitalizations (39.0 %) included one diagnosis of interest, and 1.0 million hospitalizations (12.2 %) included two or more diagnoses of interest. All candidate metrics had at least one observation at each hospital (at least one inpatient hospitalization with one of the diagnosis codes of interest for a given metric) except encephalopathy (two hospitals without any observations) and post-partum hemorrhage (93 hospitals without any observations) (Table 1). The candidate metric with the highest mean (rate of testing) value was encephalopathy and CT or MR brain, with a mean of 0.55 (Figure 1). The pancreatitis and CT abdomen metric had the highest variability across hospitals (standard deviation – 0.33).

Table 1:

Descriptive statistics of candidate metrics.

Observations Mean Standard deviation 25th percentile Median 75th percentile
Acute kidney injury – ultrasound 319 0.28 0.12 0.19 0.29 0.37
Cellulitis – ultrasound 319 0.19 0.12 0.11 0.18 0.26
Heart failure – ultrasound 319 0.17 0.09 0.11 0.18 0.23
Pulmonary embolism – echocardiogram 319 0.38 0.28 0.00 0.49 0.62
Pyelonephritis – CT 319 0.42 0.28 0.10 0.52 0.65
Pancreatitis – CT 319 0.49 0.33 0.13 0.64 0.77
COPD/asthma – CT 319 0.26 0.16 0.14 0.28 0.39
Post-partum hemorrhage – CT 226 0.04 0.12 0.00 0.01 0.04
Post-partum hemorrhage – MRI 226 0.00 0.00 0.00 0.00 0.00
Post-partum hemorrhage – CT or MRI 226 0.04 0.12 0.00 0.01 0.04
Post-partum hemorrhage – CT and MRI 226 0.00 0.00 0.00 0.00 0.00
Encephalopathy – CT 317 0.53 0.25 0.47 0.62 0.70
Encephalopathy – MRI 317 0.13 0.10 0.02 0.14 0.19
Encephalopathy – CT or MRI 317 0.55 0.25 0.48 0.65 0.71
Encephalopathy – CT and MRI 317 0.10 0.09 0.00 0.11 0.16

COPD, chronic obstructive pulmonary disease; CT, computed tomography; MRI, magnetic resonance imaging.

Figure 1:

Figure 1:

Box plots of candidate metric values across hospitals, showing interquartile range and median. Four metrics for post-partum hemorrhage were excluded because most values were zero. AKI, acute kidney injury; COPD, chronic obstructive pulmonary disease; CT, computed tomography; enceph, encephalopathy; HF, heart failure; MRI, magnetic resonance imaging; PE, pulmonary embolism; pyelo, pyelonephritis; US, ultrasound.

Tests for temporal stability and internal consistency

For all candidate metrics except the four post-partum hemorrhage ones, there was excellent temporal stability in the derivation cohort, the validation cohort, and the combined cohort (Table 2). Temporal stability for the post-partum hemorrhage candidate metrics was poor, so they were excluded from further consideration.

Table 2:

Candidate metric temporal stability by intraclass correlation coefficients.

Derivation cohort – Maryland and North Carolina Validation cohort – Kentucky and New Jersey Entire cohort
ICC 95 % CI ICC 95 % CI ICC 95 % CI
Acute kidney injury – ultrasound 0.96 0.94 0.97 0.97 0.96 0.98 0.96 0.96 0.97
Cellulitis – ultrasound 0.95 0.94 0.96 0.96 0.95 0.97 0.96 0.95 0.97
Heart failure – ultrasound 0.97 0.96 0.98 0.98 0.97 0.98 0.98 0.97 0.98
Pulmonary embolism – echocardiogram 0.94 0.92 0.96 0.92 0.89 0.94 0.93 0.91 0.94
Pyelonephritis – CT 0.97 0.96 0.98 0.94 0.92 0.95 0.95 0.94 0.96
Pancreatitis – CT 0.97 0.97 0.98 0.95 0.94 0.96 0.96 0.95 0.97
COPD/Asthma – CT 0.99 0.98 0.99 0.97 0.97 0.98 0.98 0.98 0.98
Post-partum hemorrhage – CT 0.40 0.17 0.57 0.41 0.17 0.59 0.41 0.25 0.53
Post-partum hemorrhage – MR 0.15 −0.17 0.39 −0.07 −0.50 0.26 −0.01 −0.28 0.21
Post-partum hemorrhage – CT or MR 0.41 0.19 0.58 0.43 0.20 0.60 0.42 0.27 0.55
Post-partum hemorrhage – CT and MR −0.01 −0.38 0.28 0.00 −0.41 0.30 −0.01 −0.27 0.21
Encephalopathy – CT 0.96 0.94 0.97 0.94 0.93 0.96 0.95 0.94 0.96
Encephalopathy – MR 0.96 0.94 0.97 0.95 0.94 0.96 0.95 0.94 0.96
Encephalopathy – CT or MR 0.95 0.94 0.97 0.94 0.92 0.96 0.95 0.94 0.96
Encephalopathy – CT and MR 0.96 0.94 0.97 0.97 0.96 0.98 0.96 0.96 0.97

There is good temporal stability in the derivation cohort, validation cohort, and entire cohort for all candidate metrics except the post-partum hemorrhage ones. COPD, chronic obstructive pulmonary disease; CT, computed tomography; ICC, intraclass correlation coefficient; MRI, magnetic resonance imaging.

Cronbach’s alpha for the set of the remaining candidate metrics was calculated to first determine which of the encephalopathy candidate metrics best optimized the Cronbach’s alpha. The encephalopathy – CT or MR brain candidate metric had the highest item-rest correlation, so the other encephalopathy metrics were discarded. The set of eight remaining candidate metrics was then tested for internal consistency. Among the remaining candidate metrics, cellulitis-ultrasound and pulmonary embolism-echocardiogram had item-rest correlations below 0.30 in the derivation cohort but not in the validation cohort, thus we did not exclude them. The overall Cronbach’s alpha for the final group of eight metrics was 0.74 in the derivation cohort, 0.78 in the validation cohort, and 0.76 in the entire group, suggesting satisfactory internal consistency (Table 3).

Table 3:

Cronbach’s alpha measurement of internal consistency.

Derivation cohort – Maryland and North Carolina Validation cohort – Kentucky and New Jersey Entire cohort
Item-test Item-rest Cronbach’s alpha Item-test Item-rest Cronbach’s alpha Item-test Item-rest Cronbach’s alpha
Acute kidney injury – ultrasound 0.53 0.45 0.73 0.60 0.52 0.76 0.57 0.49 0.75
Cellulitis – ultrasound 0.34 0.23 0.75 0.52 0.43 0.77 0.44 0.35 0.76
Heart failure – ultrasound 0.43 0.36 0.74 0.57 0.51 0.77 0.51 0.45 0.76
Pulmonary embolism – echocardiogram 0.46 0.21 0.78 0.54 0.33 0.79 0.51 0.27 0.78
Pyelonephritis – CT 0.84 0.71 0.65 0.82 0.71 0.71 0.83 0.71 0.68
Pancreatitis – CT 0.81 0.64 0.68 0.75 0.57 0.75 0.77 0.59 0.72
COPD/Asthma – CT 0.84 0.78 0.67 0.85 0.80 0.73 0.84 0.79 0.70
Encephalopathy – CT or MR 0.63 0.49 0.71 0.65 0.47 0.76 0.63 0.46 0.74
Total 0.74 0.78 0.76

Based on the item-rest correlations of these eight candidate metrics, we kept all of them. The Cronbach’s alphas for the derivation cohort, validation cohort, and entire cohort are above 0.60, suggesting satisfactory internal consistency. COPD, chronic obstructive pulmonary disease; CT, computed tomography; MRI, magnetic resonance imaging.

Diagnostic intensity tertiles

Hospitals in the lowest diagnostic intensity tertile tended to be smaller (p<0.0001) and critical access hospitals (p<0.0001) (Table 4). They were also more likely to be part of a health system (p=0.002). Elixhauser Comorbidity Index scores were not significantly different across tertiles.

Table 4:

Characteristics of hospitals stratified by diagnostic intensity index tertile.

Tertile 1 Tertile 2 Tertile 3 p-Values
n=107 hospitals, mean=0.18 n=106 hospitals, mean=0.37 n=106 hospitals, mean=0.48
Range of values=0.01–0.28 Range of values=0.29–0.43 Range of values=0.44–0.60
Total hospital beds, median [IQR] 71 [25–173] 148 [90–317] 258 [146–383] p<0.0001
Elixhauser score a , median [IQR] 3.32 [2.78–3.73] 3.40 [3.11–3.65] 3.37 [3.10–3.67] p=0.12
Teaching status, n (%) p<0.0001
Non-teaching hospital 68 (63) 57 (54) 30 (28)
Minor teaching hospital 37 (35) 36 (34) 67 (63)
Major teaching hospital 2 (2) 13 (12) 9 (8)
Critical access, n (%) 43 (40) 4 (4) 0 (0) p<0.0001
Ownership, n (%) p=0.15
Government, non-federal 18 (17) 13 (12) 13 (12)
Non-government, not-for-profit 85 (79) 80 (75) 80 (75)
For profit 4 (4) 13 (12) 13 (12)
Part of a health system 59 (55) 34 (32) 41 (39) p=0.002
Urban setting 44 (41) 72 (68) 88 (83) p<0.0001

Hospitals in the lowest diagnostic intensity tertile tended to be smaller and critical access hospitals and were also more likely to be part of a health system. Median Elixhauser Comorbidity Index scores were not significantly different across tertiles. aAcross all hospitalized patients, not just those with diagnoses included in the index. IQR, interquartile range. Lower tertile represents lower diagnostic intensity.

External validation

The DII correlated in a dose-dependent fashion with four of the five pre-defined external validation measures, all except county-level costs of tests (Figure 2). These included county-level, HSA-level, and hospital-level measures. As presented at the top of Figure 2, the odds ratio of being in a higher tertile of the DII when comparing tertiles three and one of the county-level cost of imaging was 3.41 (95 % CI: 2.02–5.75). Most counties and HSAs included only one hospital (Supplementary Table 3). Greater than 90 % of counties and HSAs contained three or fewer hospitals.

Figure 2:

Figure 2:

Odds ratio of being classified in a higher tertile of the disease-based diagnostic intensity index calculated as a function of being in a given tertile of the external validation measure, with a random intercept for state effects. Imaging includes X-ray, computed tomography, magnetic resonance, and ultrasound. Procedures include major surgeries, minor surgeries, non-invasive procedures including endoscopy, and radiation. Tests include standard blood tests and urine tests, microbiology, electrocardiograms, and stress tests.

Discussion

We developed a DII to quantify hospital-level diagnostic intensity for inpatients with a set of relatively common medical conditions for which expensive imaging is not required for diagnosis and often does not change management. We excluded diagnosis codes suggesting more complex or higher acuity presentations. The individual components of the index showed good temporal stability and internal consistency. A challenge of developing this index was that there is no gold standard with which to validate it, though this also demonstrates the importance of developing such tools.

In external validation, this index correlated well with three of the four pre-specified regional utilization measures. The finding that the index correlates well with county-level procedural costs suggests that the factors driving hospital-level diagnostic intensity may also drive procedural intensity. This new disease-based DII also correlated well with a previously designed hospital-level symptom-based DII [17]. The symptom-based DII used non-specific diagnosis codes paired with diagnostic tests, employing rates of uninformative testing as a proxy for diagnostic intensity. We suspect that these two DIIs – disease-based and symptom-based – measure different but related types of diagnostic intensity and potentially low-value care. In either case, the optimal amount of imaging is not zero. However, high rates of diagnostic testing likely represent at least some low-value care.

Other studies to develop institutional-level overuse indices have used different approaches. A hospital-level overuse index developed using Medicare claims was heavily weighted on procedures (rather than diagnostic tests), and three of the four diagnostic tests were for syncope hospitalizations [24]. Another study modified a previously developed regional overuse index [5, 6] for health systems to evaluate characteristics associated with low-value care [25]. It included screening tests, diagnostic tests, and therapeutic procedures with metrics relevant for outpatient and hospital-based care. A third index used a set of 41 low-value services based on the Milliman MedInsight Health Waste Calculator [26] (lab testing, imaging, cardiopulmonary and neurologic testing, procedures, and drugs), heavily focused on outpatient care, to evaluate health system characteristics associated with high levels of low-value care [27]. Our novel disease-based DII is focused exclusively on hospital care.

These other indices all require longitudinal data to operationalize claims-based metrics (such as lookback periods ranging from weeks to years to ensure absence of a specific diagnosis code required to categorize the test or procedure as low-value). For example, operationalizing a metric for head imaging for syncope required a lookback period of two years to ensure no previous syncope diagnosis during that time [24]. This novel disease-based DII has the advantage of not requiring longitudinal patient-level data. Relatedly, this index measures diagnostic intensity whereas the others use validated overuse or low-value care measures. High levels of diagnostic intensity likely reflect low-value care and overuse. Moreover, the generality of our metrics may be an advantage as they are less likely to need modification as guidelines change and practice patterns shift. Another strength of our analysis is that unlike other studies involving index development, ours included validation with external utilization measures.

We found that smaller bed size, non-teaching status, critical access status, non-urban location, and being a part of a health system were all significantly associated with lower diagnostic intensity. These associations are similar to those in the symptom-based DII though we did not evaluate urban/rural status or being a part of a health system for that index [17]. Similar to the disease-based DII, these other hospital [24] and health system [25] overuse indices also demonstrated that smaller size was associated with less overuse. However, their results for the relationship between teaching status and overuse were different from ours – major teaching hospitals (or health systems with major teaching hospitals) had less overuse [24, 25, 27]. This difference may relate to the inclusion of procedures in their indices and the fact that two of the three studies looked at health systems rather than hospitals. Given the cross-subsidization at major teaching hospitals, proceduralists may be less incentivized to perform low-value procedures. In contrast, teaching culture might promote the use of advanced radiographic testing if learners lack the confidence to rule in or exclude diagnoses on clinical grounds. Our finding that being part of a health system is associated with lower diagnostic intensity is consistent with a recent study looking at differences in quality and utilization between standalone hospitals and those that are part of a health system [28].

Our study is not without limitations. It relies on administrative data, which is based on claims, and thus is subject to biases from incomplete coding, upcoding, and more generally interhospital differences in coding intensity. Additionally, revenue codes rather than ICD procedure codes or CPT codes were the most frequent way imaging was coded. The level of granularity for revenue codes is lower than for ICD procedure and CPT codes. For example, there is only one revenue code for CT scans of the body. Additionally, this index may be less reliable for hospitals with small numbers of admissions due to low numbers of patients with relevant diseases. Finally, it is derived entirely from inpatient administrative data. Utilization patterns for patients that have emergency department (ED) visits or observation stays that do not lead to inpatient admission may be different. This could lead to bias when comparing hospitals with markedly different rates of hospitalization of ED patients and observation status use.

Future research could evaluate the extent to which hospitals with high index values suggesting high levels of diagnostic intensity also deliver high levels of low-value care. This could take the form of a chart review with a protocol to determine the approximate percentage of the time the imaging in these metrics changes management for the associated clinical scenarios. If the percentage is small, which we suspect, then this would help confirm that the DII not only measures diagnostic intensity but also approximates low-value care.

This DII may be of interest to researchers, health system leaders, and policymakers. Researchers could use it to better understand what care processes and employment and payment models are associated with higher levels of diagnostic intensity, perhaps leveraging situations like hospital mergers to draw causal inferences. Using the symptom-based DII and disease-based DII together might also help to determine causal factors for high levels of diagnostic intensity. Health system leaders could use it to understand variation in diagnostic intensity (and presumably low-value care) among hospitals within their system. Finally, payers could utilize the DII to identify outlier hospitals with high levels of diagnostic intensity which might be providing poor value.

In summary, we developed a simple hospital-level diagnostic intensity index which has face validity and correlates with regional utilization measures. Our findings suggest the feasibility of using cross-sectional hospital-level data, without requiring longitudinal patient-level claims, to develop a reasonable estimate of hospitals’ diagnostic intensity based on a small subset of diagnosis and imaging data. This index could be used to quantify diagnostic intensity and identify outlier hospitals and to better understand hospital characteristics and processes of care associated with low-value care.

Supplementary Material

Supplementary Material

Supplementary Material

Supplementary Material

Supplementary Material

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/dx-2023-0184).

Footnotes

Research ethics: Research involving human subjects complied with all relevant national regulations, institutional policies and is in accordance with the tenets of the Helsinki Declaration (as revised in 2013), and has been approved by Authors’ Institutional Review Board (nr 07/2019).

Informed consent: Informed consent was not required because we did not perform additional biochemical analysis. Confidentiality was guaranteed and no interventions were performed beyond ordinary good and standard clinical practices.

Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

Competing interests: The authors state no conflict of interest.

Research funding: Dr. Ellenbogen is supported by AHRQ 1K08HS028673-01A1 and a Johns Hopkins Hospitalist Innovation Grant.

Data availability: The raw data cannot be obtained on request from the corresponding author because of a data use agreement.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

Supplementary Material

Supplementary Material

Supplementary Material


Articles from Diagnosis (Berlin, Germany) are provided here courtesy of De Gruyter

RESOURCES