Skip to main content
Health Services Research logoLink to Health Services Research
. 2021 Nov 3;57(1):182–191. doi: 10.1111/1475-6773.13882

Weak correlations in health services research: Weak relationships or common error?

Alistair James O'Malley 1,, Bruce E Landon 2,3, Lawrence A Zaborski 2, Eric T Roberts 4, Hazar H Khidir 5, Peter B Smulowitz 6,7, John Michael McWilliams 2,8
PMCID: PMC8763298  PMID: 34585380

Abstract

Objective

To examine whether the correlation between a provider's effect on one population of patients and the same provider's effect on another population is underestimated if the effects for each population are estimated separately as opposed to being jointly modeled as random effects, and to characterize how the impact of the estimation procedure varies with sample size.

Data sources

Medicare claims and enrollment data on emergency department (ED) visits, including patient characteristics, the patient's hospitalization status, and identification of the doctor responsible for the decision to hospitalize the patient.

Study design

We used a three‐pronged investigation consisting of analytical derivation, simulation experiments, and analysis of administrative data to demonstrate the fallibility of stratified estimation. Under each investigation method, results are compared between the joint modeling approach to those based on stratified analyses.

Data collection/extraction methods

We used data on ED visits from administrative claims from traditional (fee‐for‐service) Medicare from January 2012 through September 2015.

Principal findings

The simulation analysis demonstrates that the joint modeling approach is generally close to unbiased, whereas the stratified approach can be severely biased in small samples, a consequence of joint modeling benefitting from bivariate shrinkage and the stratified approach being compromised by measurement error. In the administrative data analyses, the estimated correlation of doctor admission tendencies between female and male patients was estimated to be 0.98 under the joint model but only 0.38 using stratified estimation. The analogous correlations for White and non‐White patients are 0.99 and 0.28 and for Medicaid dual‐eligible and non‐dual‐eligible patients are 0.99 and 0.31, respectively. These results are consistent with the analytical derivations.

Conclusions

Joint modeling targets the parameter of primary interest. In the case of population correlations, it yields estimates that are substantially less biased and higher in magnitude than naive estimators that post‐process the estimates obtained from stratified models.

Keywords: attenuation, bivariate random effects, bivariate shrinkage, hierarchical model, measurement error


What is already known on this topic

  • Researchers often evaluate how provider effects (or related quantities) are correlated across different populations by estimating provider effects from population‐stratified models and then computing correlations among these stratified provider effects.

  • Because this approach does not account for uncertainty in the stratum‐specific estimates, it risks substantially underestimating the true correlations.

  • The solution to this problem—estimating a joint model—has been used by statisticians but is not yet widely used in health services research.

What this study adds

  • We provide a multi‐pronged (analytical, graphical, computational) demonstration of the statistical drawbacks of stratified modeling.

  • We demonstrate the benefit of using a joint model to estimate the relationship of provider effects, and we apply this method to compare provider treatment patterns (propensity to admit patients from the emergency department) across different patient populations.

  • Stata, SAS, and R code for implementing the joint model are provided in Appendix S1.

1. INTRODUCTION

An enduring goal of health services research has been to understand the reasons for variation in nearly all aspects of health care delivery. One potential source of variation is that individual doctors might consciously or unconsciously make different clinical decisions for different types of patients with similar conditions. In health services research, one might ask whether doctors practice similarly for different populations of patients. For example, do doctors admit patients of different racial groups to the hospital at comparable or different rates? In another application, one may want to examine whether the quality of care provided by doctors or hospitals is similar for patients with different health conditions. For example, do hospitals that have lower mortality rates for patients with heart failure also have lower mortality rates for patients with pneumonia?

To assess these correlations, health services researchers often estimate separate models for each stratum (e.g., patient population) with provider effects modeled as either random effects or fixed effects—“stratified estimation”—as opposed to joint modeling. Researchers then examine the correlation or concordance (e.g., by quantile) of these stratified provider effects. However, these correlations can be severely underestimated because the stratified approach does not account for measurement error in the stratified provider effects. The attenuation bias is greater when the number of patients per unit (e.g., doctor or hospital) is smaller. There are several examples of such “stratified” or two‐step computations in health services research. 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 Because stratified approaches can underestimate the consistency of care patterns within providers, they may lead to incorrect conclusions about the sources of variation in care. Therefore, greater attention should be placed on this problem.

The question of consistency—whether providers (doctors or hospitals) vary in practice patterns for different patient populations—also bears on the sources of health care disparities. Disparities can arise between providers (patients distributed non‐randomly to providers with different treatment patterns) or from systemic factors affecting specific patient populations across all providers. Examining whether provider effects are correlated across different populations can help to illuminate the sources of health care disparities. High correlations imply high provider consistency across patient populations. Observing high correlations and disparities in care suggests that systemic factors present across all providers (e.g., structural racism), rather than discriminatory treatment by some providers, may underlie disparities in care. Conversely, low correlations indicate that disparities between patient populations vary across providers, suggesting the potential for provider‐level differences in care to contribute to disparities.

This work is motivated by the question of whether the hospitalization tendencies of emergency department (ED) doctors correlate across Medicare patients of different sex, race and ethnicity, or income (proxied by dual eligibility for Medicaid for patients over age 65). Specifically, we examine whether a doctor's propensity to hospitalize patients from one stratum of the population is correlated with his or her propensity to hospitalize patients from other population strata. A correlation of 1 implies that doctors admit patients with perfect consistency across strata. Correlations across population strata that are close to 1 can occur even if mean admission rates differ between patient populations, provided these differences are consistent across doctors.

We develop a model that allows provider effects for different patient population strata to be correlated by specifying multivariate random effects. The primary target of inference is the correlation matrix whose elements describe the extent to which providers are inherently consistent across patients in different strata. However, this framework can also be applied to examine the correlation of repeated observations on the same unit (e.g., longitudinal data). 9 , 10 Therefore, we will illustrate the breadth of potential applications by also describing a longitudinal example where joint modeling could be used.

Through comparison with the results of the joint model, we seek to determine whether the approach of calculating correlations of provider effects estimated from stratified models yields estimates that are biased toward 0 because of measurement error. 11 , 12 , 13 We take three different approaches to explore the problem. First, we develop an analytical derivation that characterizes the general problem and serves as theoretical background for the simulation experiments and analyses of the motivating hospital admissions data that follow. Simulations are used to illuminate the potential for bias when sample sizes are of a magnitude commonly encountered in research studies. Second, we conduct administrative data analyses to illustrate the implications of stratified versus joint estimation using real‐world data. We examine the similarity of doctor or hospital‐level propensities to admit patients of different sex, race, and income (insurance status). Third, to further emphasize the breadth of applications in which the general problem applies, we examine the consistency of hospital readmission rates across time using publicly available data on the Hospital Readmissions Reduction Program (HRRP). An Appendix S1 includes Stata, SAS, and R code for estimating joint models.

2. METHODS

Our analysis is motivated by an empirical example of doctor‐level propensities to admit patients with different demographic characteristics. Thus, we use a generalized linear mixed model for a binary outcome (admission). We show results for other model specifications in Appendix S1.

2.1. Statistical model

Consider the model Yijk=Sijkλi+Sijkθij+Xijksβ+εijk with εijkNormal0σ2 in which we observe the binary‐valued outcome Bijk=IYijk>0, such as admission to hospital, for the kth of rij patients seen by the jth of ni doctors in the ith of n hospitals. In this model, Yijk may be thought of as a latent variable, for which positive values reflect tendency for the binary outcome Bijk (admission) to occur. The model for Yijk is a three‐level mixed model because patients are nested within doctors who are nested within hospitals. Because εijk is normally distributed, the implied model for Bijk is a probit regression mixed model. However, our setup also generalizes to logistic regression (implied when εijk has a logistic distribution) mixed models. The covariates Sijk and Xijks consist of indicator variables of patient population stata (e.g., sex, race, dual Medicaid eligible) and other covariates , respectively. Interactions between other covariates and Sijk are included in Xijks; hence, the superscript s. We focus on modeling the correlation of the doctor and hospital effects in the first two right‐hand‐side terms in the earlier equation for Yijk across population stratum defined by Sijk. For example, Sijk could be the vector containing the binary indicators for White and non‐White race, with the distribution of doctor effects for these two population strata described by the bivariate probability distribution

θij=θij1θij2Normal00τ12ρdocτ1τ2ρdocτ1τ2τ22, (1)

where θij is the vector of patient stratum‐specific effects of doctor j within hospital i (these are doctor‐specific deviations from the average admission tendencies of their hospital). In the case of two population strata as depicted in Equation (1), the focal point of the analysis is estimation of ρdoc=corθij1θij2, the correlation of population strata effects across doctors. When the stratification variable contains q categories, the correlation matrix will have q2 elements and qq1/2 correlations (e.g., when q=4, as for a finer representation of race, there are six correlations). The number of patients seen by doctor ij in stratum s is denoted rijs with rij=rij1+···+rijq.

If data were only available from a single hospital or one wanted to estimate separate models for each hospital, the model simplifies to Yjk=Sjkθj+Xjksβ+εjk with θj specified as for θij in Equation (1). For a sample of hospitals, a hospital's effects for two population strata can be described by the bivariate probability distribution

λi=λi1λi2Normal00ω12ρhospω1ω2ρhospω1ω2ω22, (2)

where λi is the patient stratum‐specific effects of hospital j. In a study of hospital‐level variation, the correlation ρhosp=corλi1λi2 of hospital‐specific variation between patient strata might be the primary focus.

We focus on the three‐level mixed‐model specification defined by Equations (1) and (2) because it allows us to partition the covariation between different levels and yields inferences with population interpretations for doctors and hospitals. However, a researcher might alternatively specify the model with doctor random effects but hospital fixed effects to block variation between hospitals if, for example, one is focused on the variation between doctors within only those hospitals represented in the study sample.

2.2. Estimation approaches

Joint modeling uses maximum‐likelihood or related methods to simultaneously estimate the model parameters. When the doctor (“doc”) random effects are described by Equation (1), the fitted joint model yields the estimated values of

ρdoc=corθij1θij2=covθij1θij2varθij1varθij2 (3)

and ρhosp, assuming the hospital random effects are described by Equation (2). The specification of the random effects in a computer software package must comply with Equations (1) and (2) in order to obtain a correlation matrix whose elements reflect the correlations of provider decisions across populations (see Appendix S1 for Stata, SAS, and R model specifications).

The stratified (two‐step) approach separately estimates the q univariate mixed‐effect probit models for the binary outcomes Bijk (or linear mixed‐effect models if Yijk is observed) on the data sets corresponding to stratum s =1,2,,q. Because each model is conditioned on a specific stratum, Xijk excludes the superscript s and θij and λi are univariate, corresponding to the sth components of Equations (1) and (2). These models yield separate estimates of each doctor's and hospital's independent effects on the outcome. Researchers then compute Pearson correlation coefficients between the stratified estimates for the same doctors and hospitals.

2.3. Theory: Fallacy of stratified estimation

In Appendix S1, we show that the stratified estimator of ρdoc, the correlation of the vectors of doctor random effects between patient‐population strata 1 and 2 denoted ρ^docstrat=corθ^1θ^2 is biased toward underestimating the magnitude of ρdoc. The stratified approach can severely underestimate between‐strata correlations because in finite samples, each provider‐specific effect θijs for a population stratum s will be estimated with error. Greater variance in these errors, such as due to small sample sizes of patients per stratum per doctor, will dilute estimates of between‐stratum correlations.

The derivation presented in Appendix S1 is based on assuming that Yijk is observed so that a linear mixed‐effect model applies and the best linear unbiased prediction or empirical Bayes estimator 14 of θijs has a closed form. Because Y¯ijsλisX¯ijsβ=θijs+ε¯ijs, the correlation being targeted by ρ^docstrat is approximately

ρdocstrat=corθij1+ε¯ij1θij2+ε¯ij2=covθij1+ε¯ij1θij2+ε¯ij2varθij1+ε¯ij1varθij2+ε¯ij2. (4)

By virtue of being evaluated on mutually exclusive sets of patients, ε¯ij1 and ε¯ij2 are statistically independent. If the number of patients in each stratum is the same for all doctors, so that rijs=rs for all ij, then

ρdocstrat=covθij1θij2varθij1+ε¯ij1varθij2+ε¯ij2=ρdocτ1τ2τ12+σ2r1τ22+σ2r2, (5)

where σ2 is the variance parameter governing the amount of heterogeneity between patients within stratum and doctor. Because

τ12+σ2r1τ22+σ2r2>varθij1varθij2=τ12τ22

the denominator in Equation (5) exceeds that of Equation (3), implying ρdocstrat is smaller in magnitude than ρdoc. Measurement error occurs because θ^ij=θ^ij1θ^ij2T depends on the error terms εijk,k=1,,rij as well as θij=θij1θij2T, whereas ρdoc only depends on the latter. This result mirrors the attenuation of regression coefficients when a predictor is measured with error. 15 , 16 , 17 Because the joint model allows for uncertainty in the doctor effects in accordance with the number of patients per stratum per doctor, the uncertainty in the doctor effects is accounted when determining how correlated the doctor effects must be in the population in order for the observed data to result. Accordingly, estimates of ρdoc under the joint model are not compromised by attenuation bias.

As also demonstrated in Appendix S1, the fact that ρdoc is estimated directly under the joint model allows it to learn from the data that the estimates of the random effects for each provider should be shrunk toward a common value (“bivariate shrinkage”), whereas under the stratified approach, the random effects for each stratum are separately shrunk toward their population mean of 0 (“univariate shrinkage”). The lack of “bivariate shrinkage” exacerbates the impact of measurement error on the stratified estimator by yielding noisier estimates due to the fact that information cannot be borrowed from observations on a different subpopulation of patients for the same providers. Therefore, both shrinkage and measurement error contribute to the bias of the stratified approach with the relative contribution of each depending on the value of ρdoc.

2.4. Simulation experiments

To complement the asymptotic results of the preceding section and Appendix S1, we compared estimates based on the correlations obtained from joint and stratified models to the true values of ρdoc and ρhosp by simulating data under a range of effect sizes and sample sizes. Although related results have been determined previously, they were specific to longitudinal modeling. 9 The joint model corresponds to the probit mixed‐effect regression model with bivariate distributions of the random effects given by Equations (1) and (2). Under the stratified approach, we first estimated probit mixed‐effect models with univariate random‐effect distributions and then calculated the Pearson correlation of the estimated doctor and hospital random effects. To quantify the performance of the estimation approaches at recovering the true values ρdoc and ρhosp, we calculated:

  • Bias. The expected difference between the estimator and the true value of the model parameter. If the difference is 0, the estimator is unbiased.

  • Mean‐squared‐error (MSE). The expected value of the squared difference between the estimator and the true value of the model parameter. Smaller MSE implies better performance.

We consider four distinct simulation experiments described below and tabulated in Table 2. The first two were designed to be sensitive to the estimation of ρdoc with the amount of information at the hospital‐level substantial in comparison to the doctor level. The second two were designed to be more sensitive to estimation of ρhosp by substantially reducing the number of doctors per hospital and setting the number of patients per stratum per doctor to small numbers.

TABLE 2.

Settings for simulations

Number of
Simulation experiment ρ Hospitals Doctors per hospital Patients per doctor
Doctor focus
By ρ (Simulation 1) [−0.99, 0.99] 30 30 20
By number of patients (Simulation 2) 0.95 30 30 [4, 100]
Hospital focus
By ρ (Simulation 3) [−0.99, 0.99] 225 4 4
By number of doctors (Simulation 4) 0.95 225 [4, 100] 4
Actual data
Mean n/a 720 23.1 69.9
Standard deviation n/a 720 15.0 80.2
Minimum n/a 720 5 5
5th percentile n/a 720 7 6
10th percentile n/a 720 9 7
25th percentile n/a 720 13 13
Median n/a 720 19 38
75th percentile n/a 720 29 97
90th percentile n/a 720 43 184
95th percentile n/a 720 52 241
Maximum n/a 720 124 682

Abbreviation: n/a = not available

In the first simulation experiment, each simulated data set included 30 hospitals, 30 doctors per hospital, and rijs=rs=10 patients per stratum per doctor. Population strata were defined by the stratum vector (Sijk) emulating each of the three pairs of populations in the motivating hospital admissions application (sex, race, and dual status). All regression and variance parameters were fixed (variances equal to 1); only the correlation parameters in the doctor‐ and hospital‐level covariance matrices varied between simulations. The correlations took 21 values: ρdoc=ρhosp=ρ0.99,0.9,0.8,,0.1,0,0.1,,0.9,0.99. We first generated the doctor random effects using each ρdoc, the hospital random effects using each ρhosp, and the univariate error terms. We then added the covariate multiplied by its regression coefficient to the linear predictor to generate continuous outcomes (Yijk). Binary outcomes (Bijk) were generated by thresholding Yijk about 0, yielding 1 if Yijk>0 and 0 otherwise. Estimates of bias and MSE were evaluated using the estimates obtained over the simulated data sets generated using the given values of ρdoc and ρhosp.

In the second simulation, the number of patients per stratum per doctor (rijs=rs) ranged from 2 to 50 (total number of patients per doctor range from 4 to 100), the number of doctors per hospital was fixed at ni=n=30, and ρdoc=ρhosp=0.95 was assumed throughout. This simulation allowed the comparative performance of the joint model and stratified approaches at ρdoc estimation to be determined as a function of rs.

The third simulation emulated the first in that ρdoc and ρhosp were systematically varied but focused on performance at estimating ρhosp by reducing the amount of information per hospital to make results more sensitive to the estimation method. The numbers of doctors per hospital and patients per doctor were set to 4 (rs=2 for s=1,2).

The fourth simulation emulates the second in that ρdoc=ρhosp=0.95 throughout. The number of doctors per hospital (ni=n) was varied from 4 to 100 while the number of patients per stratum per doctor (rijs=rs) was 2. This simulation allows the differences in performance between joint modeling and stratified estimation at estimating ρhosp to be evident for varying numbers of doctors.

2.5. Motivating applications

To evaluate the extent to which stratified estimation may be biased relative to the results from a joint model in practice, we compared the joint model and the stratified estimators in an analysis of hospital admission of Medicare patients visiting the ED. We examined whether doctors with a higher propensity to admit patients in one population stratum have a higher propensity to admit patients in another stratum. The strata are the two‐level variables: sex, race (White vs. non‐White), and Medicaid dual‐eligible (see Table 1 for layout of the data). The statistical models used the bivariate random‐effect specifications in Equations (1) and (2) for doctor and hospital random effects, respectively. In addition to sex, race, and Medicaid dual‐eligible status, the predictors consisted of the following: age (categorized into 5‐year increments); original reason for Medicare enrollment (i.e., disability); day of the week, month, and year of ED visit; diagnosis indicators; chronic disease indicators from the Chronic Conditions Data Warehouse (CCW); and hierarchical condition category (HCC) scores. 18

TABLE 1.

Layout of data in emergency department‐hospitalization application in which correlations are estimated across patient strata at the doctor and hospital levels

Hospital ID Doctor in hospital Patient of doctor Patient key characteristics Other covariates Outcome hospitalized
Patient sex Patient race Medicaid status
1 1 1 Male Black Non‐dual 1
1 1 2 Male Non‐Black Dual 0
1 2 1 Female Black Dual 0
1 2 2 Female Non‐Black Non‐dual 1
2 1 1 Female Non‐Black Dual 0
2 2 1 Female Black Dual 1
2 2 2 Male Non‐Black Non‐dual 1
2 2 3 Male Black Dual 1
2 3 1 Male Non‐Black Non‐dual 0
2 3 2 Female Black Dual 1

Note: An analysis that evaluates the correlation of doctor and hospital outcomes for patients with different medical conditions would have an analogous structure to that presented above.

We analyze Medicare claims for a 20% random sample of beneficiaries from January 1, 2012, through September 30, 2015. We excluded Medicare Advantage enrollees and patients with end‐stage renal disease. ED visits, admissions, and the doctors caring for patients during these visits were identified using Carrier claims, following prior work 19 (Appendix S1). We imposed minimum requirements of five patients per doctor and five doctors per hospital to restrict the analysis to providers encountering a reasonable number of patients; Table 2 contains the distribution of numbers of patient visits per doctor and of doctors per hospital.

To further illustrate the scope of situations to which the findings of this paper apply, we also examine risk‐standardized hospital readmission rates over 2016–2019 using publicly available CMS data for three (AMI, heart failure, and pneumonia) targeted conditions in the HRRP 10   (CMS reports risk‐standardized readmission rates as 3‐year moving averages for hospitals, such that 2016 and 2019 represent distinct measurement periods for the same hospitals). It is of interest to evaluate the consistency in hospital readmission rates across years. Because we use publicly available data on readmission rates aggregated to the hospital level, we cannot emulate our joint model approach in this analysis. However, to illustrate the existence of the same general problem, we examined the correlation of readmission rates for hospitals meeting different thresholds for the number of index admissions over a 3‐year period.

3. RESULTS

3.1. Doctor‐focused simulations

The bias and MSE under the joint model and stratified estimation of ρdoc in the two doctor‐focused simulations are shown in Figures 1 and 2. Recall that 0 is the nadir for bias while smaller values for the MSE indicate less estimation error.

FIGURE 1.

FIGURE 1

Performance of estimators of ρdoc as a function of ρ (Simulation 1), where the sample size per stratum per doctor is 10 (half the total per doctor sample‐size) [Color figure can be viewed at wileyonlinelibrary.com]

FIGURE 2.

FIGURE 2

Performance of estimators of ρdoc by the total number of patients per doctor (Simulation), where sample size per stratum per doctor is half the total per doctor sample size and ρ=0.95 [Color figure can be viewed at wileyonlinelibrary.com]

In the first simulation, the doctor‐level correlation coefficient ρdoc was varied (along with ρhosp), while the number of patients per stratum per doctor, rijs, is fixed at rijs=rs=10. The joint model estimator of ρdoc is close to unbiased. In contrast, stratified estimation is attenuated by more than 0.2 at the extremes (substantial on the 1 to 1 correlation scale). For these and the three subsequent sets of simulation results, we refer readers to Appendix S1 for additional results involving a third estimator that demonstrate that both shrinkage and measurement error are important contributors to the difference in performance of the joint model and the stratified estimator (Figures S1 and S2).

When ρdoc=0.95 and rijs=rs is varied (Simulation 2), the bias and MSE of joint modeling are largely invariant to rs with only a small bias through to rs=30 (Figure 2). In contrast, stratified estimation performed far worse, especially when rs is small. Although performance improves with larger samples, the stratified approach is still inferior to the joint model even with rs=50.

If we model a continuous outcome (or even a binary‐valued one) using a linear mixed‐effect model, the conclusions from the comparative estimation of ρdoc and ρhosp are similar (results in Appendix S1, Figures S6 and S7).

3.2. Hospital‐focused simulations

When ρhosp is varied with the number of doctors per hospital fixed at ni=n=4 and the number of patients per stratum per doctor fixed at rijs=rs=2 (Simulation 3), the bias under joint modeling is small and nearly invariant to ρhosp while the bias and MSE under stratified estimation are much greater than for joint modeling when ρhosp is close to ±1 and of a similar magnitude when ρhosp is close to 0 (Figure S3). At ρhosp between 0.25 and 0.25, the MSE of joint modeling and stratified estimation is similar with stratified estimation a little smaller when ρhosp is close to 0. The slightly lower MSE under stratified estimation stems from no information in the data being expended to estimate strata correlations.

When instead ni=n varies from 4 to 100 with ρhosp=0.95 fixed (Simulation 4), the results emulate those at the doctor level in Simulation 2 (Figure S4). Joint modeling is largely unbiased, whereas stratified estimation is substantially biased, especially when the number of doctors per hospital, n, is small. Even when n=40 (160 patients per hospital), stratified estimation performs noticeably inferior to joint modeling.

The above results reveal that using the joint model may be just as important for accounting for uncertainty in the estimated hospital effects across populations as it was for estimating doctor effects across populations. As illustrated by Equation (5), amplifying the issue is that the sample size for the smallest population stratum becomes the constraint on the extent to which a provider's performance in that stratum can be precisely compared with their performance in other stratum. A scenario in which there are relatively few patients per hospital could arise when a small patient sample is accrued (such as at small hospitals or condition‐specific samples at large hospitals) and a two‐level model is used for estimation. Another is in a three‐level analysis in which there are few patients per doctor and doctors per hospital. In situations for which ρhosp is expected to be large in magnitude, these findings are particularly alarming as stratified estimation performs its worst.

3.3. Applications

In the simulation study, the joint model performed well at both the doctor and the hospital levels across a range of different stratum‐specific sample sizes. In contrast, the performance of the stratified approach only neared that of the joint model when the number of patients per stratum per doctor was large (for estimation of ρdoc) or when the number of patients per hospital was large (for estimation of ρhosp). With these findings in mind, we compare the two approaches in our analysis of Medicare claims data and illustrate the potential for analogous problems to occur when making comparisons of hospital performance across time using HRRP data.

Comparing first the estimated female–male, White–non‐White, and dual–non‐dual correlations between joint modeling and stratified estimation using all of the data, we find large differences in the estimates at the doctor level (Table 3, all doctors row). The joint model estimates are greater in magnitude, consistent with attenuation bias having lowered the magnitude of the stratified estimates. The relatively smaller differences at the hospital level (Table S1) are understandable given the greater average amount of information per hospital than per doctor. Furthermore, the correlations when the stratum are less balanced in numbers of patients (e.g., dual–non‐dual, White–non‐White) are more attenuated under stratified estimation, again illustrating that attenuation is primarily driven by the stratum with the smallest sample size.

TABLE 3.

Estimated doctor‐level correlations of tendency for hospital admission across population strata

Study sample based on visits per doctor Female–male White–non‐White Dual–non‐dual
Joint Stratified Joint Stratified Joint Stratified
All doctors (N = 16,489) 0.9809 0.3843 0.9897 0.2789 0.9999 0.3082
Top 50% (N = 8262) 0.9800 0.4238 0.9986 0.303 0.9997 0.3368
Top 10% (N = 1663) 0.9858 0.5124 0.9659 0.3418 0.9639 0.3756
Top 5% (N = 832) 0.9699 0.5437 0.9997 0.3926 0.9999 0.4427

Note: Estimates are evaluated using the linear mixed‐effect model for the binary outcome admission to hospital; because the outcome is binary, the model is often referred to as the linear probability mixed‐effect model. Convergence difficulties due to the nonlinearity of the probit mixed‐effect model and stratum‐specific sample sizes of 0 or 1 for some doctors (although each doctor had at least five patients in total, no restriction was placed on the number of patients per stratum) meant that estimates for this model often failed to be determined on a given data set. By estimating the analogous probit model on smaller subsamples of data, we showed that any differences between the probit and the linear specifications in the results for estimation of ρdoc and ρhosp were small (less than 3%–5% in magnitude). The similarity of the simulation results across the probit mixed‐effect model, the linear probability model, and the linear mixed‐effect model of the underlying continuous outcome further supports the generalizability of the presented results.

When the sample is restricted to the top percentile (e.g., 5%) of doctors based on their number of patients, a subgroup of doctors for whom measurement error should have the least impact on stratified estimators, the estimates of the doctor‐level correlations under stratified estimation were larger, as expected (Table 3). For example, the female–male estimate increased from 0.38 to 0.54, and the White–non‐White estimate increased from 0.28 to 0.39, whereas the correlations changed minimally under joint modeling. Thus, stratified estimation introduces greater measurement error for smaller within‐cluster stratum sample sizes but over this range of sample sizes remains well below the magnitude of the joint modeling estimators. The results for ρdoc differ minimally when the hospital effects, λi, are modeled as fixed effects. For details, see Table S2 and accompanying discussion.

The correlations of the hospital readmission rates for the same condition in 2016 and 2019 for the HRRP data are shown in Table 4. The correlation monotonically increases as the analysis is performed on smaller subsets of progressively higher‐volume hospitals, including meaningful increases in the correlation estimates when increasing the admission threshold for hospital inclusion from 500 to 1000 admissions. An even greater correlation would be anticipated if we analyzed these data in a joint model.

TABLE 4.

Within‐hospital correlations of risk‐standardized readmission rates for Hospital Readmission Reduction Program, 2016 and 2019 reporting years

Index admission for
Lower bound for number of index admissions Acute myocardial infarction Heart failure Pneumonia
Number of hospitals meeting lower bound Correlation Number of hospitals meeting lower bound Correlation Number of hospitals meeting lower bound Correlation
25 1978 0.4 2799 0.49 2857 0.37
100 1298 0.43 2220 0.51 2423 0.38
200 800 0.47 1646 0.54 1691 0.4
500 177 0.55 674 0.58 446 0.49
800 44 0.59 260 0.6 128 0.52
1000 21 0.68 147 0.6 50 0.56

Note: Hospital‐level risk‐standardized readmission rates among fee‐for‐service Medicare beneficiaries are publicly reported by the Centers for Medicare and Medicaid Services (CMS) for hospitals subject to the Hospital Readmission Reduction Program (HRRP). These reflect 30‐day all‐cause readmissions following index admissions for acute myocardial infarction (AMI), heart failure, and pneumonia over a 3‐year performance period for each hospital. Readmission rates in the 2016 reporting year are based on performance from July 1, 2011, to June 30, 2014, and readmission rates reported in 2019 are based on performance from July 1, 2014, to June 30, 2017. CMS risk‐standardized readmission rates are based on the case mix of index admissions (age, sex, and comorbidities reported on recent inpatient claims) using a hierarchical model to account for within‐hospital sampling error.

4. DISCUSSION

We demonstrated that estimators of the correlation of provider (doctor and hospital) effects on patient outcomes across population strata are biased towards 0 if correlations are estimated from stratum‐specific models. Smaller within‐unit sample sizes increase this attenuation bias. Conversely, the use of a joint model substantially limits attenuation bias, particularly in analyses where there are sparsely sampled patient‐stratum per‐provider. The source of the attenuation bias was demonstrated to be twofold: measurement error due to ignoring that the fitted provider effect estimates are noisy; and the lack of “bivariate shrinkage” of the provider random effects toward a common value. The second problem exacerbates the first by yielding noisier estimates when information cannot be borrowed across patient strata, as for the stratified approach. The findings demonstrate the potential for conclusions based on commonly employed stratified estimation to lead to the erroneous conclusion that there is substantial inconsistency across providers in outcomes of patients in different population strata when in actuality there may exist substantial consistency (e.g., ρdoc is close to 1). 2 , 3

In administrative data analyses, the impact of using joint modeling versus stratified estimation was substantial. In the case of female and male patients, joint modeling yields an estimated correlation of 0.98, whereas stratified estimation obtains 0.38. The differences between the approaches for the estimated White–non‐White and dual–non‐dual correlations are even more substantial. Stratified approaches are vulnerable to drawing erroneous conclusions, whereas joint modeling facilitates correct interpretation. Although stratified approaches may produce acceptable approximations when samples are large, a sufficient sample size depends on multiple factors. Therefore, we cannot provide a sample‐size threshold above which estimated correlations of provider care patterns from a stratified analysis will be guaranteed to have sufficiently small bias. We caution against assuming that sampling error has been minimized without due consideration of this important source of bias and all factors affecting it. For instance, in our real‐world analyses, the stratified approach with increasing sample sizes continued to produce higher correlations even when samples were ostensibly large.

Our methods have broad applicability, including analyses of hospitals, health systems, and geographic areas, where multiple prior studies have estimated unit‐level correlations from stratified models, rather than joint models. For example, in a study of the use of multiple services across geographic areas, the correlation of multiple categories of low‐value care across these areas was computed using the stratified approach. 5 The joint model proposed herein could enhance such analyses by estimating the population correlations absent measurement error and benefit of bivariate shrinkage.

There are several limitations of this paper. First, in the simulation study, the data‐generating model corresponded to the joint model, suggesting that this may give joint modeling an advantage. However, under the stratified approach, equivalent assumptions (e.g., normally distributed random effects) are made, and so arguably, the simulations do not favor either approach. Nonetheless, comparing the approaches when model assumptions are violated is important. In Appendix S1, we show that joint modeling also outperforms the stratified approach when the error terms have a skewed distribution as opposed to being normally distributed (Figures S8–S10), but leave the assessment of the approaches under violations of other modeling assumptions to future work. Second, the use of multivariate random effects adds computational complexity, especially when nonlinear regression is used, leading us to substitute the linear probability specification for the probit form of the models in our motivating application. However, the range of scenarios for which the stratified approach with random effects can be implemented while joint modeling cannot be implemented is relatively small. Furthermore, ever‐advancing computer power enables models that at one time cannot be estimated become estimable at a later date. Finally, we used dual enrollment in Medicare and Medicaid to proxy for socioeconomic status. This measure has recognized limitations, one of which is that state‐level differences in Medicaid eligibility rules contribute to socioeconomic heterogeneity within the dual population. However, substantial differences between dual and non‐dual populations support our use of this variable as a broad marker of socioeconomic status.

Joint modeling of the effects of providers should be encouraged whenever evaluating similarities or differences across patient strata. The importance of joint modeling becomes more pronounced when cluster sizes are small. Mistaken or naïve use of stratified estimation may have led in the past to misleading findings being published, particularly for studies of variations in health care utilization, quality, cost, and outcomes. Correctly modeling these correlations can also inform risk‐adjustment methods, 10 an increasingly important component of provider and health system payment models, as the provider stratum correlations represent an index of the appropriateness of the assumption that the case‐mix coefficients are homogeneous across providers (the closer the correlations are to 1 the more justified the assumption). We hope this paper increases awareness of concerns with stratified estimation of quantities whose estimated effects are subsequently analyzed and that this practice is avoided in the future.

Supporting information

Appendix S1. Supporting Information.

ACKNOWLEDGMENT

This work was supported from the Agency for Health Care Research and Quality (AHRQ) by grant 1R01HS025408‐01.

O'Malley AJ, Landon BE, Zaborski LA, et al. Weak correlations in health services research: Weak relationships or common error? Health Serv Res. 2022;57(1):182‐191. doi: 10.1111/1475-6773.13882

Funding information Agency for Health Care Research and Quality (AHRQ), Grant/Award Number: 1R01HS025408‐01

REFERENCES

  • 1. Jha AK, Li Z, Orav EJ, Epstein AM. Care in U.S. hospitals–the Hospital Quality Alliance program. N Engl J Med. 2005;353(3):265‐274. [DOI] [PubMed] [Google Scholar]
  • 2. Hu J, Jordan J, Rubinfeld I, Schreiber M, Waterman B, Nerenz D. Correlations among hospital quality measures: what “hospital compare” data tell us. Am J Med Qual. 2017;32(6):605‐610. [DOI] [PubMed] [Google Scholar]
  • 3. McHugh M, Neimeyer J, Powell E, Khare RK, Adams JG. Is emergency department quality related to other hospital quality domains? Acad Emerg Med. 2014;21(5):551‐557. [DOI] [PubMed] [Google Scholar]
  • 4. Radomski TR, Feldman R, Huang Y, et al. Evaluation of low‐value diagnostic testing for 4 common conditions in the veterans health administration. JAMA Netw Open. 2020;3(9):e2016445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Schwartz AL, Landon BE, Elshaug AG, Chernew ME, McWilliams JM. Measuring low‐value care in Medicare. JAMA Intern Med. 2014;174:1067‐1076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Krumholz HM, Lin Z, Keenan PS, et al. Relationship between hospital readmission and mortality rates for patients hospitalized with acute myocardial infarction, heart failure, or pneumonia. J Am Stat Assoc. 2013;309(6):587‐593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Weech‐Maldonado R, Elliott MN, Adams JL, et al. Do racial/ethnic disparities in quality and patient experience within Medicare plans generalize across measures and racial/ethnic groups? Health Serv Res. 2015;50(6):1829‐1849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Panagiotou OA, Voorhies KR, Keohane LM, et al. Association of inclusion of Medicare advantage patients in hospitals' risk‐standardized readmission rates, performance, and penalty status. JAMA Netw Open. 2021;4(2):e2037320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Mikulich‐Gilbertson SK, Wagner BD, Grunwald GK, Riggs PD, Zerbe GO. Using empirical Bayes predictors for generalized linear mixed models to test and visualize associations among longitudinal outcomes. Stat Methods Med Res. 2019;28(5):1399‐1411. [DOI] [PubMed] [Google Scholar]
  • 10. Roberts ET, Zaslavsky AM, Barnett ML, Landon BE, Ding L, McWilliams JM. Assessment of the effect of adjustment for patient characteristics on hospital readmission rates: implications for par for performance. JAMA Intern Med. 2018;178(11):1498‐1507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Matzke D, Ly A, Selker R, et al. Bayesian inference for correlations in the presence of measurement error and estimation uncertainty. Collabra: Psychology. 2017;3(1):25. [Google Scholar]
  • 12. Saccenti E, Hendricks HWB, Smilde AK. Corruption of the Pearson correlation coefficient by measurement error and its estimation, bias, and correction under different error models. Sci Rep. 2020;10:438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Neupane B, Beyene J. Multivariate meta‐analysis of genetic association studies: a simulation study. PLoS One. 2015;10(7):e0133243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Skrondal R‐H. Prediction in multilevel generalized linear models. J R Stat Soc Ser A. 2009;172(3):659‐687. [Google Scholar]
  • 15. Carroll RJ, Eltinge JL, Ruppert D. Robust linear regression in replicated measurement error models. Stat Probab Lett. 1993;16:169‐175. [Google Scholar]
  • 16. Carroll RJ, Roeder K, Wasserman L. Flexible parametric measurement error models. Biometrics. 1999;55:44‐54. [DOI] [PubMed] [Google Scholar]
  • 17. Carroll RJ, Ruppert D, Stefanski LA. Measurement Error in Nonlinear Models. Chapman and Hall; 1995. [Google Scholar]
  • 18. Bernard S. Risk Adjustment Documentation & Coding. Chicago2018.
  • 19. Khidir H, McWilliams JM, O'Malley AJ, Zaborski L, Landon BE, Smulowitz PB. Analysis of consistency in emergency Department Physician Variation in propensity for admission across patient sociodemographic groups. JAMA Netw Open. 2021;4(9):e2125193. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix S1. Supporting Information.


Articles from Health Services Research are provided here courtesy of Health Research & Educational Trust

RESOURCES