ABSTRACT
In health services research, researchers often use clustered data to estimate the independent association between individual outcomes and cluster‐level covariates after adjusting for individual‐level characteristics. Marginal generalized linear models estimated using generalized estimating equation (GEE) methods or hierarchical (or multilevel) regression models can be used when there is a single source of clustering (e.g., patients nested within hospitals). Hierarchical regression models can also be used when there are multiple sources of clustering (e.g., patients nested within surgeons who in turn are nested within hospitals). Methods for estimating marginal regression models are less well‐developed when there are multiple sources of non‐nested clustering (e.g., patients are clustered both within hospitals and within in neighborhoods, but neither neighborhoods or hospitals are nested in the other). Miglioretti and Heagerty developed a GEE‐type variance estimator for use when fitting marginal generalized linear models to non‐nested multilevel data. We propose a variance estimator for a marginal Cox regression model fit to non‐nested multilevel data that combined their approach with Lin and Wei's robust variance estimator for the Cox model. We evaluated the performance of the proposed variance estimator using an extensive set of Monte Carlo simulations. We illustrated the use of the variance estimator in a case study consisting of patients hospitalized with an acute myocardial infarction who were clustered within hospitals and who were also clustered in neighborhoods. In summary, a variance estimator motivated by that proposed by Miglioretti and Heagerty can be used with marginal Cox regression models fit to non‐nested multilevel data.
Keywords: clustered data, Cox proportional hazards model, marginal model, Monte Carlo simulations, multilevel data, variance estimation
1. Introduction
Clustered data are common in many areas of applied research, including health services research, epidemiological research, and education research. Common examples include individuals clustered within neighborhoods or counties, patients clustered within hospitals or within family medicine practices, and students clustered within schools. Clustering can induce a lack of independence due to a within‐cluster homogeneity in outcomes. Ignoring clustering when fitting regression models can result in estimated standard errors that are likely to be biased downwards, resulting in estimated 95% confidence intervals that are artificially narrow and in significance levels that are artificially low [1].
When analyzing clustered data, analysts often choose between two types of regression models: generalized linear models estimated using generalized estimating equation (GEE) methods and hierarchical regression models (also known as multilevel models, random effects models, or mixed effects models) [2, 3, 4, 5, 6]. The former are marginal regression models that treat the between‐cluster variation in outcomes as a nuisance and provide a population‐average interpretation [3]. The latter are conditional regression models that incorporate cluster‐specific random effects to account for within‐cluster homogeneity in outcomes and provide a cluster‐specific interpretation.
Generalized linear models estimated using GEE methods can only account for one source of clustering. However, in many research settings, analysts want to account for two or more sources of non‐nested clustering. For instance, the data may consist of patients who are clustered within family medicine practices and who are also clustered within neighborhoods. However, neither family medicine practices nor neighborhoods are nested within the other (i.e., some patients attend family medicine practices that are not in their neighborhood). Similarly, patients hospitalized with an acute stroke may be clustered within the acute care hospital to which they were admitted and in the neighborhood in which they reside. However, neither hospital nor neighborhood are nested within each other (e.g., some patients are hospitalized while at work or while traveling and not at a hospital near their home). Miglioretti and Heagerty proposed a variance estimator for use with marginal generalized linear models estimated using GEE methods that accounts for multiple sources of non‐nested clustering [7]. Miglioretti and Heagerty note that if there are multiple sources of clustering (e.g., patients clustered within physicians and patients clustered within hospitals) such that there is perfect hierarchical structure to the clustering (e.g., patients clustered within physicians who in turn are clustered within hospitals, such that each patient is treated by only one physician and at only one hospital and each physician practices within only one hospital), then a conventional GEE analysis that only accounts for clustering within the top level of the hierarchical structure (i.e., hospitals in this example) would account for the multilevel structure.
Time‐to‐event or survival outcomes (e.g., time to death) are common in clinical, epidemiological, and health services research. The conventional model‐based estimate of the standard error assumes that subjects are independent of one another—an assumption that is likely to be violated when data are subject to clustering. Lin and Wei proposed a robust variance estimator for use with a marginal Cox proportional hazards model [8]. This variance estimator is frequently used when fitting a Cox model to clustered data. However, like GEE methods, the Lin–Wei variance estimator can only account for one source of clustering. Currently, there are no variance estimators for use with a marginal Cox regression model when the data have a non‐nested multilevel structure.
The objective of the current study is to modify the Miglioretti–Heagerty variance estimator by combining it with the Lin–Wei robust variance estimator to allow for variance estimation when fitting a marginal Cox regression model to non‐nested multilevel data. The paper is structured as follows: In Section 2, we describe two proposed variance estimators. In Section 3, we describe the design of a series of Monte Carlo simulations to evaluate the performance of five variance estimators for use with a marginal Cox regression model fit to non‐nested multilevel data. In Section 4, we report the results of these Monte Carlo simulations. In Section 5, we describe a secondary set of simulations to evaluate the effect of the number of subject‐level covariates on the performance of these variance estimators. In Section 6, we describe a second set of secondary simulations to evaluate the effect on the different variance estimators of allowing the number of subjects within each unique combination of the two types of clustering to vary. In Section 7, we provide a case study consisting of patients hospitalized for acute myocardial infarction (AMI or heart attack) who are clustered both in the hospital to which they were admitted and in the neighborhoods in which they resided. Finally, in Section 8, we summarize our findings and place them in the context of the existing literature.
2. Proposed Variance Estimator
We describe the proposed variance estimator which is based on a variance estimator proposed by Miglioretti and Heagerty for use with generalized linear models estimated using GEE methods [7]. In the context of estimating a generalized linear model using GEE methods, Miglioretti and Heagerty proposed a variance estimator for use in settings with two sources of non‐nested clustering (e.g., patients clustered within family medicine practices who are also clustered within neighborhoods, but with neither family medicine practices nor neighborhoods nested within the other). Their method involved fitting three regression models, each accounting for a different source of clustering, with each regression model containing the same explanatory variables. In each of the three regression models an independent working correlation matrix was assumed (thus the estimated regression coefficients would be the same across all three models and would equal the estimated coefficients from a marginal generalized linear model that ignored clustering). Let V 1 denote the estimated variance–covariance matrix for the estimated regression coefficients when accounting for the first source of clustering (e.g., when accounting for clustering within family medicine practices). Let V 2 denote the estimated variance–covariance matrix for the estimated regression coefficients when accounting for the second source of clustering (e.g., when accounting for clustering within neighborhoods). Let V 3 denote the estimated variance–covariance matrix for the estimated regression coefficients when accounting for clustering within all unique combinations of the two sources of clustering (e.g., when accounting for clustering within all unique combinations of family medicine practices and neighborhoods). Miglioretti and Heagerty's proposed variance estimator is .
In the context of GEE estimation in settings with a single source of clustering, Mancl and DeRouen proposed several corrections to the GEE variance estimator to improve its small sample performance [9]. One of the small sample corrections was to multiply the estimated variance by a scaling factor defined as , where K denotes the number of clusters and p denotes the number of regression parameters. Miglioretti and Heagerty suggested that Mancl and DeRouen's scaling factor could be combined with their proposed estimator: , where K 1 denotes the number of the first type of clusters (e.g., the number of family medicine practices), K 2 denotes the number of the second type of clusters (e.g., the number of neighborhoods), K 3 denotes the number of unique combinations of the two types of clusters (e.g., the number of unique combinations of family medicine practices and the number of neighborhoods), and p denotes the number of regression parameters.
In the context of marginal Cox proportional hazard models, Lin and Wei proposed a robust variance estimator for a marginal Cox proportional hazards model [8]. This robust variance estimator is often used when fitting a Cox model to clustered data. However, as with GEE methods, it can only account for one source of clustering. The Lin and Wei robust variance estimator is a sandwich‐type variance estimator. In the context of clustered data, the estimator (using the terminology of Wang and colleagues [10]) is , where denotes the model‐based variance estimator, , N denotes the number of clusters, and where denotes the estimated mean‐zero martingale score for the ith cluster.
We propose that the Miglioretti and Heagerty variance estimator be modified for use with Cox regression models. Rather than fitting three generalized linear models using an independent working correlation matrix but accounting for different sources of clustering, one fits three Cox regression models, each accounting for a different type of clustering using the Lin–Wei robust variance estimator. The three types of clustering would be the same as in the Miglioretti and Heagerty method when used with generalized linear models.
3. Monte Carlo Simulations—Methods
We conducted a set of Monte Carlo simulations to evaluate the performance of the proposed variance estimator. The design of our simulations was based on a non‐nested multilevel structure in which there were two types of non‐nested clustering, with each subject being nested within two types of clusters, and with neither type of clustering being nested in the other type of clustering. We simulated clustered data with time‐to‐event outcomes that were associated with both a subject‐level covariates and with two cluster‐level covariates (one covariate for each of the two types of clusters) using a Cox regression model that incorporated cluster‐specific random effects. In each simulated dataset, we fit a marginal Cox proportional hazards and used five different variance estimators. We assessed the performance of the variance estimators by determining the degree to which the estimated standard errors for the subject‐level covariate and the two cluster‐level covariates approximated the standard deviation of the sampling distribution of the estimated regression coefficients and by whether the empirical coverage rates of 95% confidence intervals for the subject‐level covariate and for the two cluster‐level covariates were equal to the advertised rate.
3.1. Factors in the Monte Carlo Simulations
We allowed three factors to vary in the Monte Carlo simulations: (i) the number of each of the two types of clusters (N cluster ); (ii) the number of subjects within each unique combination of the two types of clusters (N subjects ); (iii) the within‐cluster correlation in the baseline hazard function (σ)—see below for a description of this factor. The first factor took on values from 20 to 50 in increments of 10, for a total of four levels. The second factor took on values from 10 to 50 in increments of 10, for a total of five levels (thus the total number of subjects across all combinations of clusters ranged from 4000 to 125 000). The third factor took on values from 0.05 to 0.25 in increments of 0.05, for a total of five levels. We used a full factorial design and thus examined 4 × 5 × 5 = 100 different scenarios. In each scenario, we created 1000 simulated datasets.
3.2. Data‐Generating Process
We simulated data with two types of clusters: type A clusters (e.g., neighborhoods) and type B clusters (e.g., family practices). We simulated data for N cluster type A clusters and for N cluster type B clusters. For each of the N cluster type A clusters we simulated a binary cluster‐level covariate from a Bernoulli distribution with parameter 0.5: , where “j” denotes the jth type A cluster. For each of the N cluster type B clusters we simulated a continuous cluster‐level covariate from a standard normal distribution: , where “j” denotes the jth type B cluster.
We then simulated subject‐level data. We assume a cross‐classified multilevel structure such that each of the N cluster × N cluster combinations of type A clusters and type B clusters contained N subjects subjects. For each of the N cluster × N cluster × N subjects subjects we simulated a subject‐level covariate from a standard normal distribution: for the ith subject in the jth type A cluster and kth type B cluster. We thus had three covariates: a continuous subject‐level covariate, a binary cluster‐level covariate for the type A clusters, and a continuous cluster‐level covariate for the type B clusters.
We simulated time‐to‐event outcomes using a modification of an approach described by Bender and colleagues [11]. Our modification involved including cluster‐specific random effects to induce a within‐cluster homogeneity in the baseline hazard function. For each type A cluster, we simulated a random effect from a normal distribution with mean zero and standard deviation σ: , where j denotes the jth type A cluster. Similarly, for each type B cluster, we simulated a random effect from a normal distribution with mean zero and standard deviation σ: , where k denotes the kth type B cluster.
For the ith subject in the jth type A cluster and in the kth type B cluster we defined the linear predictor as: . We then simulated a time‐to‐event outcome using a Cox–Weibull model: , where u is drawn from a standard uniform distribution [11]. The Weibull parameters were , as was done previously [12]. The inclusion of cluster‐specific random effects allowed the baseline hazard function to vary across clusters, thus inducing a within‐cluster homogeneity in outcomes within clusters.
This data‐generating process employs a conditional model (i.e., the interpretation of the regression coefficients is conditional on the cluster‐specific random effects). Our focus is on estimating marginal models. To determine the true regression parameters for the underlying marginal model, we simulated data for 100 type A clusters, 100 type B clusters, and 100 subjects per unique combination of type A and type B clusters, for a total of 1 000 000 subjects. We fit a marginal Cox model to these simulated data. The estimated regression coefficients will serve as the true values of the coefficients for the marginal model. These values will be used when assessing the empirical coverage rates of estimated 95% confidence intervals (see below).
3.3. Statistical Analyses in the Simulated Data
For each of the 100 combinations of the three factors (N cluster , N subjects , and σ) described in Section 3.1, we created 1000 simulated datasets. In each simulated dataset, we fit a marginal Cox regression model in which the hazard of the outcome was regressed on the three covariates (the continuous subject‐level covariate, the binary cluster‐level covariate describing type A clusters, and the continuous cluster‐level covariate describing type B clusters). We extracted the estimated regression coefficients for these three covariates.
We used five estimators of the standard error of the estimated regression coefficients: (i) the adapted Miglioretti–Heagerty estimator (henceforth referred to as the Miglioretti estimator); (ii) the adapted Miglioretti–Heagerty estimator using the Mancl‐DeRouen scaling factors (henceforth referred to as the Miglioretti‐MD estimator); (iii) the conventional Lin–Wei variance estimator accounting for clustering within type A clusters (henceforth referred to as the Lin–Wei‐A estimator); (iv) the conventional Lin–Wei variance estimator accounting for clustering within type B clusters (henceforth referred to as the Lin–Wei‐B estimator); (iv) the conventional Lin–Wei variance estimator accounting for clustering within unique combinations of type A and type B clusters (henceforth referred to as the Lin–Wei‐AxB estimator).
Using the estimated regression coefficients and the estimated standard errors, we constructed 95% confidence intervals for each of the three regression coefficients using standard normal theory methods.
3.4. Performance Measures
We used two performance measures to assess the performance of the different methods for estimating the standard error of the estimated regression coefficients and for constructing 95% confidence intervals: (i) relative percent error in the estimated standard error of the estimated regression coefficients; (ii) empirical coverage rates of the estimated 95% confidence intervals.
Relative percent error in the estimated standard error of the estimated regression coefficients was computed as , where denotes the estimated standard error in the ith simulation replicate and denotes the standard deviation of the estimated regression coefficients across the 1000 simulation replicates [13]. If the relative error is equal to zero, then the estimated standard error is correctly estimating the standard deviation of the sampling distribution of the estimated regression coefficients. If the relative error is less than zero, then the estimated standard errors are underestimating the standard deviation of the sampling distribution of the estimated regression coefficients. If the relative error is greater than zero, then the estimated standard errors are overestimating the standard deviation of the sampling distribution of the estimated regression coefficients.
Empirical coverage rates of estimated 95% confidence intervals were computed as the proportion of estimated confidence intervals that contained the true value of the regression coefficient that was determined in the data‐generating process.
3.5. Software
The simulations were conducted using the R statistical programming language (version 3.6.3). Marginal Cox models were fit using the coxph function from the survival package (version 3.2‐11). The simulation results were summarized using the simsum function from the rsimsum package (version 0.13.0).
4. Monte Carlo Simulations—Results
We first present results for the estimation of standard errors, followed by the results for empirical coverage rates of estimated confidence intervals.
4.1. Relative Error in Estimated Standard Errors
The relative error in the estimated standard errors for the three covariates is reported in Figure 1, which has three panels, one for each of the three covariates. Each panel uses nested loop plots to report the relative error in the estimated standard error across the 100 simulation scenarios (see next paragraph on how to interpret nested loop plots) [14]. Each panel contains five colored lines, one line for each of the five variance estimators. On each panel, we have superimposed a horizontal line denoting a relative error of zero, indicating unbiased estimation of the standard error.
FIGURE 1.

Relative error in the estimated standard error of the estimated log−hazard ratio.
We provide a brief interpretation of nested loop plots, referring the reader elsewhere for greater detail [14]. Our nested loop plots have three loops, one loop for each of the three factors in the Monte Carlo simulations. The outer loop represents N clusters , the number of each type of clusters, the middle loop represents N subjects , the number of subjects within each unique combination of the two types of clusters, while the inner loop represents σ, the standard deviation of the cluster‐specific random effects. At the top of each panel is a four‐level step function representing the four levels of the outer loop (N clusters ) (upper black step function). Below this four‐level step function is a five‐level step function that repeats four times, with one repetition for each of the steps of the outer step function. Below this repeating step function is a five‐level step function that repeats 20 times, once within each of the unique combinations of the previous two step functions. Below these three step functions are five colored lines describing the relative error in the estimated standard error, with one line for each of the variance estimation methods. For a given simulation scenario (one of the 100 scenarios described by the three factors in the design of the simulations), one identifies the location of the three step functions describing the three loops that correspond to the values of the three factors for the given simulation scenario. One then determines the value of the relative error that is described by the function for a given variance estimation method. For example, the leftmost point of each of the five colored lines represents the relative error when N clusters = 20, N subjects = 10, and σ = 0.05.
None of the five variance estimators had uniformly superior performance when estimating the standard error of the subject‐level covariate. The Miglioretti‐MD estimator tended to overestimate the standard error when the number of each type of clusters was less than or equal to 30. The other four methods tended to consistently underestimate the standard error across all simulation scenarios. In general, either the Miglioretti or the Miglioretti‐MD method tended to result in more accurate estimates of the standard error compared to the other three methods. The Lin–Wei‐AxB variance estimator tended to result in the largest underestimation of the standard error.
When estimating the standard error of the regression coefficient for the binary covariate for type A clusters, the Miglioretti estimator, the Miglioretti‐MD estimator, and the Lin–Wei‐A estimator tended to result in the most accurate estimates of standard error. While differences between these three methods were often modest, the Miglioretti‐MD estimator tended to result in estimated standard errors with the lowest bias. The Lin–Wei‐B and Lin–Wei‐AxB estimators tended to result in estimated standard errors that were substantially smaller than the empirical standard error.
When estimating the standard error of the regression coefficient for the continuous covariate for type B clusters, the Miglioretti estimator, the Miglioretti‐MD estimator, and the Lin–Wei‐B estimator tended to result in the most accurate estimates of standard error. While differences between these three estimators were often modest, the Miglioretti‐MD estimator tended to result in estimated standard errors with the lowest bias. The Lin–Wei‐A and Lin–Wei‐AxB estimators tended to result in estimated standard errors that were substantially smaller than the empirical standard error.
In each of the 100 simulation scenarios and for each variance estimation method, we computed the Monte Carlo standard errors of the estimated relative error in the estimated standard error [13]. The Monte Carlo standard errors are reported graphically in the top row of panels in Figure 2. Each panel displays side‐by‐side boxplots describing the distribution of Monte Carlo standard errors across the 100 simulation scenarios for each of the five variance estimation methods. The largest Monte Carlo standard error was 2.66. The size of the Monte Carlo standard errors suggests that, when interpreting the results in Figure 1, when considering the relative error in the estimated standard error for the binary cluster‐level covariate (top right panel), the relative errors of the Lin–Wei‐B and Lin–Wei‐AxB estimators tended to be different from the relative errors of the other three estimators. Similarly, when considering the relative error in the estimated standard error for the continuous cluster‐level covariate (bottom left panel), the relative errors of the Lin–Wei‐A and Lin–Wei‐AxB estimators tended to differ from the relative errors of the other three estimators.
FIGURE 2.

Distribution of MCSE of performance metrics for estimation of standard errors of log−hazard ratio.
4.2. Empirical Coverage Rates of Estimated 95% Confidence Intervals
Empirical coverage rates of estimated 95% confidence intervals are reported in Figure 3, which has a similar structure to Figure 1. Due to our use of 1000 simulation replicates for each scenario, empirical coverage rates lower than 0.9365 or greater than 0.9635 are statistically significantly different than the advertised rate of 0.95 based on a standard normal‐theory test. We superimposed horizontal lines on the figure denoting the advertised rate of 0.95 along with the thresholds of 0.9365 and 0.9635. All five variance estimation methods tended to result in 95% confidence intervals whose empirical coverage rates were lower than the advertised rate. However, the Miglioretti and Miglioretti‐MD estimators tended to have better performance across all scenarios and across the three covariates compared to the other three estimators. When constructing 95% confidence intervals for the regression coefficient for the binary covariate for type A clusters, the Lin–Wei‐A estimator had performance comparable to the Miglioretti and Miglioretti‐MD estimators. When constructing 95% confidence intervals for the regression coefficient for the continuous covariate for type B clusters, the Lin–Wei‐B estimator had performance comparable to the Miglioretti and Miglioretti‐MD estimators.
FIGURE 3.

Empirical coverage rates of 95% confidence intervals for estimated log−hazard ratio.
We computed the Monte Carlo standard errors of the empirical coverage rates of 95% confidence intervals. The Monte Carlo standard errors are reported in the bottom row of panels in Figure 2. Each panel displays side‐by‐side boxplots to describing the distribution of Monte Carlo standard errors across the 100 simulation scenarios for each of the five variance estimation methods. The largest Monte Carlo standard error was 0.016. The size of the Monte Carlo standard errors suggests that, when interpreting the results in Figure 3, when considering the empirical coverage rates of the estimated 95% confidence intervals for the regression coefficient for the binary cluster‐level covariate (top right panel), the empirical coverage rates of the Lin–Wei‐B and Lin–Wei‐AxB estimators tended to be different from the empirical coverage rates of the other three estimators. Similarly, when considering the empirical coverage rates of the estimated 95% confidence intervals for the regression coefficient for the continuous cluster‐level covariate (bottom left panel), the empirical coverage rates of the Lin–Wei‐A and Lin–Wei‐AxB estimators tended to differ from the empirical coverage rates of the other three estimators.
5. Secondary Simulations: Relationship Between the Number of Subject‐Level Covariates and the Performance of the Variance Estimators
A previous study compared different GEE variance estimators when estimating logistic regression models when the number of clusters was low [15]. While the use of the Mancl–DeRouen scaling factor with the conventional Liang–Zeger GEE variance estimator worked well when the number of subject‐level covariates was low, it had suboptimal performance when the number of subject‐level covariates was large. Motivated by this observation, we conducted a limited set of simulations to examine the relationship between the number of subject‐level covariates and the performance of the different variance estimators.
5.1. Methods
These simulations were identical to those described above, with two exceptions. First, we restricted scenarios to those with N clusters = 20 and 40 (number of each of the two types of clusters), N subjects = 30 (number of subjects in each unique combination of the two types of clusters), and σ = 0.15 (the standard deviation of the cluster‐specific random effects). Second, rather than simulate a single subject‐level covariate, we simulated N vars subject‐level covariates from independent standard normal distributions. When generating time‐to‐event outcomes, the log‐hazard for each of the N vars subject‐level covariates and the two cluster‐level covariates was equal to log(1.5). We allowed N vars to range from 1 to 12 in increments of 1. Thus, we examined 24 scenarios. In each scenario, we simulated 1000 datasets. Apart from these modifications, the simulations were identical to those described in Section 3.
5.2. Results
Results are reported in Figure 4 (relative error in estimation of the standard error) and Figure 5 (empirical coverage rates of estimated 95% confidence intervals). Unlike the primary set of simulations in which there were three factors that varied, in these simulations only two factors varied (N clusters and N vars ). We used line plots to describe the relationship between the relative error in the estimated standard error and empirical coverage rates and the number of subject‐level covariates, with a separate color for each of the five variance estimation methods and a separate line type for each of the two values of N clusters . Each figure has three panels: the top‐left panel reports results for the binary covariate describing type A clusters, the top‐right panel reports the results for the continuous covariate describing type B clusters, while the bottom‐left panel reports results for the first subject‐level covariate.
FIGURE 4.

Relationship between number of subject‐level covariates and the relative error in estimated standard error of estimated log−hazard ratio.
FIGURE 5.

Relationship between number of subject‐level covariates and the empirical coverage rates of 95% confidence intervals.
The number of subject‐level covariates had no meaningful effect on the relative error in the estimated standard error for four of the five variance estimators (Figure 4). The exception was the Miglioretti‐MD estimator. When N vars was low, the relative bias was close to zero, with a minor negative bias (under‐estimation of the empirical standard error) for the two cluster‐level variables and a minor positive bias for the first subject‐level covariate. However, as N vars increased, the relative bias increased, so that the estimated standard error was larger than the empirical standard error. When the number of subject‐level covariates was equal to 12, the estimated standard error exceeded the empirical standard error by more than 15%. The relative bias tended to be amplified when N clusters = 20 compared to when N clusters = 40.
Four of the five variance estimators tended to result in estimated 95% confidence intervals whose empirical coverage rates were significantly lower than the advertised rate (Figure 5). For some values of N vars , the Miglioretti‐MD estimator resulted in estimated 95% confidence intervals whose empirical coverage rates were not significantly different from the advertised rate. This latter observation is likely due to the upward bias in the estimated standard error that we observed in the preceding paragraph.
6. Secondary Simulations: Impact of Allowing the Number of Subjects Within Each Unique Combination of the Two Types of Clusters to Vary
In the simulations described above, within a given scenario, the number of subjects within each unique combination of the two types of clusters was fixed. In a limited set of simulations, we examined the impact of allowing the number of subjects within each unique combination of the two types of clusters to vary.
6.1. Methods
The simulations were identical in design to those described in Section 3 with two exceptions. First, we examined a restricted set of scenarios. We restricted the scenarios to those with N clusters = 40 (number of each of the two types of clusters). Second, we allowed the number of subjects within each unique combination of the two types of clusters to follow a Poisson distribution. The mean of the Poisson distribution for the number of subjects in each unique combination of the two types of clusters (N subjects‐mean ) took on five values: 10–50 in increments of 10. We allowed σ (the standard deviation of the cluster‐specific random effects) to take on the four values used in the primary simulations described in Section 3: 0.05, 0.10, 0.15, and 0.20. We thus examined 20 different scenarios (i.e., we considered all 20 scenarios from the primary set of simulations with N clusters = 40).
6.2. Results
Results are reported in Figure 6 (relative error in estimation of the standard error) and Figure 7 (empirical coverage rates of estimated 95% confidence intervals). The results for the relative error in estimation of the standard error were qualitatively similar to those observed in the primary set of simulations (Figure 1) when the number of clusters was equal to 40 and when the fixed number of subjects in each unique combination of the two types of clusters was equal to the mean number of subjects across the unique combinations. Similarly, the results for the empirical coverage rates of estimated 95% confidence intervals tended to be qualitatively similar to those observed in the primary set of simulations (Figure 3) when the number of clusters was equal to 40.
FIGURE 6.

Relative error in estimated standard error of estimated log−hazard ratio.
FIGURE 7.

Empirical coverage rates of 95% confidence intervals for estimated log−hazard ratio.
7. Case Study
We provide a brief case study to compare the different variance estimators in an empirical analysis. Our case study consisted of patients hospitalized with AMI. Patients were clustered within the neighborhood in which they resided as well as being clustered in the hospital to which they were admitted. However, hospitals and neighborhoods did not form a hierarchy, with neither being nested in the other.
7.1. Data
We used data on 2595 residents of the city of Toronto in the province of Ontario in Canada who were hospitalized at a hospital in Ontario with an AMI in 2016. These data were obtained from the Ontario Myocardial Infarction Database (OMID) [16]. For this case study, the outcome was time to death, with subjects censored after 365 days if they were still alive. Overall, 414 (16.0%) individuals died within 365 days of hospital admission. These 2595 patients resided in 101 neighborhoods in the city of Toronto (defined by the forward sortation area of the residential postal code) and were admitted to 45 different hospitals (including hospitals in Ontario that were outside of Toronto).
We considered 11 patient variables assessed at the time of hospital admission: age, sex, congestive heart failure, cardiogenic shock, arrhythmia, pulmonary edema, diabetes mellitus with complications, stroke, acute renal disease, chronic renal disease, and malignancy [17]. Age was continuous while the other 10 variables were binary. These 11 variables comprise the Ontario AMI mortality prediction model, which was derived and internally validated in Ontario and then externally validated in the province of Manitoba and the state of California [17]. We divided age by 10 so that the regression coefficient for age would denote the change in the log‐hazard of death associated with a 10‐year increase in age.
We considered three hospital‐level variables: the academic status of the hospital (academic vs. non‐academic), whether the hospital had the capacity for cardiac revascularization, and the number of AMI patients treated at the hospital during that year. The first two were binary variables, while the latter was a continuous variable. We divided the latter variable by 100 so that the regression coefficient would denote the change in the log‐hazard of death associated with an increase of 100 AMI patients.
We considered one variable describing the neighborhood in which the individual resided: the quintiles of median neighborhood income. This variable had five levels, ranging from Q1, the least affluent quintile, to Q5, the most affluent quintile. In the subsequent regression analyses, the most affluent quintile (Q5) will be used as the reference category for this variable.
7.2. Statistical Methods
We fit a marginal Cox proportional hazards model in which the hazard of death over 1 year of follow‐up was regressed on the 11 individual‐level covariates described above, the three hospital‐level covariates, and the one neighborhood‐level covariate (which is a categorical variable with five levels). We constructed 95% confidence intervals for each variable using the five variance estimators described above.
These analyses were conducted using R (version 3.6.3). The Cox models were fit using the coxph function from the survival package (version 3.2‐11). The analyses took approximately 2 seconds when run using slurm jobs limited to 1 CPU and 4 GB of memory on a grid of compute servers (8 vCPUs—Intel Xeon CPU E5‐2643 v3@3.40 GHz, 128 GB per node), running RedHat 7. R code for these analyses is provided in the Supporting Information.
7.3. Results
Estimated hazard ratios and the associated estimated 95% confidence intervals are reported using forest plots in Figure 8 (first six individual‐level variables), Figure 9 (last five individual‐level variables), Figure 10 (hospital‐level variables) and Figure 11 (neighborhood‐level variable). Note that, for a given variable, the hazard ratios are identical across the five methods as we fit marginal Cox models. Figures 8 and 9 consist of six and five panels, respectively, one for each of the 11 individual‐level variables. Figure 10 contains three panels, one for each of the three hospital‐level variables, while Figure 11 contains four panels, one for each of the non‐reference levels of quintile of neighborhood income. On each forest plot, we have superimposed a vertical line denoting a hazard ratio of one.
FIGURE 8.

Hazard ratios and 95% confidence intervals for individual‐level variables (case study).
FIGURE 9.

Hazard ratios and 95% confidence intervals for individual‐level variables (case study).
FIGURE 10.

Hazard ratios and 95% confidence intervals for hospital‐level variables (case study).
FIGURE 11.

Hazard ratios and 95% confidence intervals for neighborhood‐level income quintile (case study).
In examining Figures 8 and 9 (the individual‐level variables), a few observations warrant highlighting. First, for two of the individual‐level variables (stroke and pulmonary edema) qualitatively different conclusions would be drawn depending on the variance estimator that was used. When the Miglioretti variance estimator and the conventional Lin–Wei variance estimator that accounted for clustering within hospitals were used, the resultant 95% confidence intervals excluded the null value, whereas 95% confidence intervals constructed using the conventional Lin–Wei estimator that accounted for clustering within neighborhoods and the Miglioretti‐MD estimator included the null value. Second, for eight of the 11 individual‐level variables, the Miglioretti‐MD variance estimator resulted in 95% confidence intervals that were wider than those produced using the other four variance estimators. This observation is concordant with the results of our first set of secondary simulations, in which we found that, when there were 11 individual‐level covariates, the Miglioretti‐MD variance estimator tended to overestimate the standard deviation of the sampling distribution (Figure 4, lower‐left panel). This would result in estimated 95% confidence intervals that were artificially wide.
In examining Figure 10 (hospital‐level variables), several observations warrant highlighting. First, for each variable, qualitatively similar conclusions would be drawn from all five variance estimation methods. Specifically, for each of the five variance estimators, the resultant 95% confidence intervals included the null hazard ratio. Second, the 95% confidence intervals constructed using the Miglioretti‐MD method were modestly wider than the 95% confidence intervals constructed using the four other variance estimators. Third, the confidence intervals constructed using the Miglioretti estimator were very similar to those constructed using the standard Lin–Wei estimator that accounted for clustering within hospitals. The observation that the confidence intervals constructed using the Miglioretti‐MD estimate were modestly wider than those constructed using the other variance estimators likely reflects our findings in the first set of secondary simulations reported in Section 5. The regression models included 11 subject‐level covariates. In Section 5, we observed that with this number of subject‐level covariates, there was an upward bias in the estimated standard error when using the Miglioretti‐MD estimator.
In examining Figure 11 (quintiles of neighborhood income), the primary observation is that for two of the four variables, the differences between the 95% confidence intervals constructed using the Miglioretti estimator and the standard Lin–Wei estimator that accounted for clustering within neighborhoods were greater than those observed for the three hospital‐level variables. In particular, for Q2 versus Q5 and Q4 versus Q5, the confidence intervals constructed using the Miglioretti estimator were modestly narrower than those constructed using the standard Lin–Wei estimator that accounted for clustering within neighborhoods. A secondary observation is that the confidence intervals constructed using the Miglioretti‐MD estimator tended to be modestly wider than those constructed using the other estimators.
8. Discussion
We compared the performance of five variance estimators when fitting a marginal Cox regression model to non‐nested multilevel survival data. Our primary objective was to modify the variance estimator that was proposed by Miglioretti and Heagerty for use with non‐nested multilevel data when fitting marginal generalized linear models using GEE methods. We found that our modified variance estimator, which combined the framework of Miglioretti and Heagerty with the conventional Lin–Wei robust variance estimator for a marginal Cox regression model, tended to have good performance. When the number of subject‐level covariates is low, the Miglioretti‐MD estimator may be preferable. However, as the number of subject‐level covariates increased, the relative bias in the Miglioretti‐MD estimator increased.
If the focus of inference in a particular study is only on cluster‐level covariates for a given type of cluster, our results suggest that one can use the conventional Lin–Wei robust variance estimator to account for clustering within that type of clustering alone. Estimated standard errors and confidence intervals for the subject‐level covariates and for the covariates of the other type of cluster would likely be incorrect. However, if the investigators were only reporting hazard ratios for the given type of cluster, our findings suggest that such an approach would be reasonable.
A limitation of the current study relates to our use of Monte Carlo simulations. Due to the computational burden of the simulations, we were limited in the number of scenarios that we could consider. However, we considered a range of scenarios defined by both the within‐cluster correlation in outcomes, the number of clusters, and the number of subjects per cluster. Due to the computational burden of the simulations, we were unable to consider additional factors beyond these three. A second limitation of the current study was that our data‐generating process was based on a conditional model that incorporated cluster‐specific random effects rather than on a marginal model. We estimated the regression coefficients of the underlying marginal model by simulating a very large sample consisting of 100 of each of the two types of clusters and 100 subjects per unique combination of the two types of clusters, for a total of 1 000 000 subjects. However, these estimated marginal regression coefficients may not coincide with the true parameter values. It is possible that the lower than expected coverage rates of estimated 95% confidence intervals are at least partially due to this discrepancy.
We restricted the current study to settings with two sources of clustering. Miglioretti and Heagerty also described how their variance estimator can be computed in settings with three sources of non‐nested clustering (appendix B in the cited work) [7]. If there were three sources of clustering (e.g., pediatric patients clustered within schools, hospitals, and primary care practices; with none of the sources of clustering nested within another source of clustering), they stated that the variance of an estimated regression coefficient of a generalized linear model can be estimated as , where is the variance estimate when only accounting for the ith source of clustering (i = 1, 2, 3), is the variance estimate when accounting for the unique combinations of the ith and jth sources of clustering, and is the variance estimate when accounting for the unique combinations of the three sources of clustering. We suggest that the methods we have described can be modified using an identical procedure when fitting a Cox proportional hazards model to data with three sources of non‐nested clustering. The method is easily generalizable to settings with more than three sources of clustering. However, we would suggest that, in most applied settings, it would be rare to see more than three sources of clustering.
As noted in the Introduction, an alternative to marginal Cox regression models is frailty models, which are Cox regression models that incorporate cluster‐specific random effects [18, 19]. The inclusion of cluster‐specific random effects accounts for between‐cluster variation in the outcome. While some statistical software packages only allow for fitting frailty models with one source of clustering, others allow for multiple sources of clustering. Frailty models are conditional models, whose regression coefficients are interpreted conditional on the cluster‐specific random effects. Furthermore, as discussed by Gail and colleagues, due to the non‐collapsibility of the hazard ratio, marginal and conditional hazard ratios do not coincide, except when both are the null hazard ratio [20]. Due to the regression coefficients from a frailty model have a conditional or cluster‐specific interpretation, marginal regression models may be of greater interest when the focus is on estimating the association between cluster‐level variables and the hazard of the outcome.
To the best of our knowledge, the current study is the first to propose and evaluate a variance estimator for use with a marginal Cox regression model fit to multilevel non‐nested data. Given the frequency with which time‐to‐event outcomes occur in clinical, epidemiological, and health services research, as well as the frequency with which data with a complex clustering structure occur, the methods proposed in the current study will be of interest to applied researchers in a wide range of disciplines.
In summary, a variance estimator motivated by that proposed by Miglioretti and Heagerty for use with generalized linear models fit to non‐nested multilevel data can be used with marginal Cox regression models fit to non‐nested multilevel data.
Conflicts of Interest
The author declares no conflicts of interest.
Supporting information
Data S1.
Acknowledgments
This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health (MOH) and the Ministry of Long‐Term Care (MLTC). Parts of this material are based on data and/or information compiled and provided by the Ontario Ministry of Health. This document used data adapted from the Statistics Canada Postal CodeOM Conversion File, which is based on data licensed from Canada Post Corporation, and/or data adapted from the Ontario Ministry of Health Postal Code Conversion File, which contains data copied under license from Canada Post Corporation and Statistics Canada. Parts of this material are based on data and/or information compiled and provided by CIHI. However, the analyses, conclusions, opinions and statements expressed in the material are those of the author(s), and not necessarily those of CIHI. The analyses, conclusions, opinions and statements expressed herein are solely those of the authors and do not reflect those of the funding or data sources; no endorsement is intended or should be inferred. As a prescribed entity under Ontario's privacy legislation, ICES is authorized to collect and use health care data for the purposes of health system analysis, evaluation and decision support. Secure access to these data is governed by policies and procedures that are approved by the Information and Privacy Commissioner of Ontario. The opinions, results and conclusions reported in this paper are those of the authors and are independent from the funding sources. The dataset from this study is held securely in coded form at ICES. While legal data sharing agreements between ICES and data providers (e.g., healthcare organizations and government) prohibit ICES from making the dataset publicly available, access may be granted to those who meet pre‐specified criteria for confidential access, available at www.ices.on.ca/DAS (email: das@ices.on.ca). The use of data in this project was authorized under section 45 of Ontario's Personal Health Information Protection Act, which does not require review by a research ethics board. This research was supported in part by an operating grant from the Canadian Institutes of Health Research (CIHR) (PJT 166161).
Funding: This work was supported by Canadian Institutes of Health Research (PJT 166161).
Data Availability Statement
Research data are not shared.
References
- 1. Williams R. L., “A Note on Robust Variance Estimation for Cluster‐Correlated Data,” Biometrics 56, no. 2 (2000): 645–646. [DOI] [PubMed] [Google Scholar]
- 2. Liang K. Y. and Zeger S. L., “Longitudinal Data Analysis Using Generalized Linear Models,” Biometrika 73 (1986): 13–22. [Google Scholar]
- 3. Zeger S. L., Liang K. Y., and Albert P. S., “Models for Longitudinal Data: A Generalized Estimating Equation Approach,” Biometrics 44, no. 4 (1988): 1049–1060. [PubMed] [Google Scholar]
- 4. Snijders T. and Bosker R., Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling, Second ed. (Sage Publications, 2012). [Google Scholar]
- 5. Goldstein H., Multilevel Statistical Models (John Wiley & Sons Ltd, 2011). [Google Scholar]
- 6. Austin P. C. and Merlo J., “Intermediate and Advanced Topics in Multilevel Logistic Regression Analysis,” Statistics in Medicine 36, no. 20 (2017): 3257–3277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Miglioretti D. L. and Heagerty P. J., “Marginal Modeling of Nonnested Multilevel Data Using Standard Software,” American Journal of Epidemiology 165, no. 4 (2007): 453–463. [DOI] [PubMed] [Google Scholar]
- 8. Lin D. Y. and Wei L. J., “The Robust Inference for the Proportional Hazards Model,” Journal of the American Statistical Association 84, no. 408 (1989): 1074–1078. [Google Scholar]
- 9. Mancl L. A. and DeRouen T. A., “A Covariance Estimator for GEE With Improved Small‐Sample Properties,” Biometrics 57, no. 1 (2001): 126–134. [DOI] [PubMed] [Google Scholar]
- 10. Wang X., Turner E. L., and Li F., “Improving Sandwich Variance Estimation for Marginal Cox Analysis of Cluster Randomized Trials,” Biometrical Journal 65, no. 3 (2023): e2200113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Bender R., Augustin T., and Blettner M., “Generating Survival Times to Simulate Cox Proportional Hazards Models,” Statistics in Medicine 24, no. 11 (2005): 1713–1723. [DOI] [PubMed] [Google Scholar]
- 12. Austin P. C., Grootendorst P., Normand S. L., and Anderson G. M., “Conditioning on the Propensity Score Can Result in Biased Estimation of Common Measures of Treatment Effect: A Monte Carlo Study,” Statistics in Medicine 26, no. 4 (2007): 754–768. [DOI] [PubMed] [Google Scholar]
- 13. Morris T. P., White I. R., and Crowther M. J., “Using Simulation Studies to Evaluate Statistical Methods,” Statistics in Medicine 38, no. 11 (2019): 2074–2102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Rucker G. and Schwarzer G., “Presenting Simulation Results in a Nested Loop Plot,” BMC Medical Research Methodology 14 (2014): 129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Austin P. C., “A Comparison of Variance Estimators for Logistic Regression Models Estimated Using Generalized Estimating Equations (GEE) in the Context of Observational Health Services Research,” Statistics in Medicine 43 (2024): 5548–5561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Tu J. V., Austin P. C., Naylor C. D., Iron K., and Zhang H., Cardiovascular Health and Services in Ontario: An ICES Atlas, ed. Naylor C. D. and Slaughter P. M. (Institute for Clinical Evaluative Sciences, 1999), 83–110. [Google Scholar]
- 17. Tu J. V., Austin P. C., Walld R., Roos L., Agras J., and McDonald K. M., “Development and Validation of the Ontario Acute Myocardial Infarction Mortality Prediction Rules,” Journal of the American College of Cardiology 37, no. 4 (2001): 992–997. [DOI] [PubMed] [Google Scholar]
- 18. Duchateau L. and Janssen P., The Frailty Model (Springer, 2008). [Google Scholar]
- 19. Hougaard P., Handbook of Survival Analysis, ed. Klein J. P., van Houwelingen H. C., Ibrahim J. G., and Scheike T. H. (CRC Press, 2014), 457–473. [Google Scholar]
- 20. Gail M. H., Wieand S., and Piantadosi S., “Biased Estimates of Treatment Effect in Randomized Experiments With Nonlinear Regressions and Omitted Covariates,” Biometrika 7 (1984): 431–444. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1.
Data Availability Statement
Research data are not shared.
