One-Step Generalized Estimating Equations with Large Cluster Sizes

Stuart Lipsitz; Garrett Fitzmaurice; Debajyoti Sinha; Nathanael Hevelone; Jim Hu; Louis L Nguyen

doi:10.1080/10618600.2017.1321552

. Author manuscript; available in PMC: 2018 Feb 6.

Published in final edited form as: J Comput Graph Stat. 2017 Jul 27;26(3):734–737. doi: 10.1080/10618600.2017.1321552

One-Step Generalized Estimating Equations with Large Cluster Sizes

Stuart Lipsitz ¹, Garrett Fitzmaurice ², Debajyoti Sinha ³, Nathanael Hevelone ⁴, Jim Hu ⁵, Louis L Nguyen ⁶

PMCID: PMC5800532 NIHMSID: NIHMS935229 PMID: 29422762

Abstract

Medical studies increasingly involve a large sample of independent clusters, where the cluster sizes are also large. Our motivating example from the 2010 Nationwide Inpatient Sample (NIS) has 8,001,068 patients and 1049 clusters, with average cluster size of 7627. Consistent parameter estimates can be obtained naively assuming independence, which are inefficient when the intra-cluster correlation (ICC) is high. Efficient generalized estimating equations (GEE) incorporate the ICC and sum all pairs of observations within a cluster when estimating the ICC. For the 2010 NIS, there are 92.6 billion pairs of observations, making summation of pairs computationally prohibitive. We propose a one-step GEE estimator that 1) matches the asymptotic efficiency of the fully-iterated GEE; 2) uses a simpler formula to estimate the ICC that avoids summing over all pairs; and 3) completely avoids matrix multiplications and inversions. These three features make the proposed estimator much less computationally intensive, especially with large cluster sizes. A unique contribution of this paper is that it expresses the GEE estimating equations incorporating the ICC as a simple sum of vectors and scalars.

Keywords: clustered data, efficient estimation, exchangeable correlation, fully-iterated, intra-cluster correlation

1 Introduction

Healthcare studies increasingly involve a large sample of independent clusters, where the cluster sizes are also large. The clusters are often households or patients from the same hospital. When estimating the regression parameters of a generalized linear model for for clustered data with large cluster sizes, for reasons of computational feasibility, the most popular approach is to naively assume the observations within a cluster are independent to obtain consistent estimates (Liang and Zeger, 1986); a consistent estimate of the covariance matrix of these regression parameter estimates can be obtained using a ‘sandwich estimator’. These estimates, obtained under naive independence, can be inefficient when the intracluster correlation (ICC) is relatively large; number of the loss of efficiency could be important for regression models with many covariates or interaction terms.

Gains in efficiency can be achieved via the use of generalized estimating equations (GEE) (Liang and Zeger, 1986), incorporating the ICC under an exchangeable correlation structure. However, for large clusters sizes, GEE typically sums over all pairs of outcomes within a cluster to estimate the ICC. Further, it requires inversion of matrices of the same dimensions as the cluster size, which is computationally demanding for large cluster sizes. In this paper, we propose an efficient and computationally feasible estimator of the regression parameters for GEE with an exchangeable correlation when there are a large number of clusters and large cluster sizes. This is achieved by constructing a one-step GEE estimator that 1) matches the asymptotic efficiency of the fully-iterated GEE while reducing the computational burden; 2) uses a simpler formula to estimate the ICC that avoids summing over all pairs; and 3) completely avoids matrix multiplications and inversions (both of which are computationally intensive). The key contribution is that we express the GEE as a simple sum of vectors and scalars. A SAS macro that implements the proposed one-step GEE estimator can be obtained from the authors. Our motivating example, from the U.S. 2010 Nationwide Inpatient Sample (NIS), encompasses over 8 million acute hospital stays from 1049 hospitals, with average cluster size of 7627 patients.

2 Generalized estimating equations

Suppose there are i = 1, …, N clusters and j = 1, …, n_i subjects within cluster i, with outcome variable Y_ij and a vector x_ij = (1, x_ij₁, …, x_ijK)′ of K covariates (including a constant for the intercept). Let μ_ij = E(Y_ij|x_ij, β) denote the expectation of the outcome Y_ij given the covariates and regression coefficients β Here,

E (Y_{i j} ∣ x_{i j}, β) = μ_{i j} = μ_{i j} (β) = g (x_{i j}^{'} β),

(2.1)

where g(·) is a known link function. The variance of Y_ij has general form

v_{i j} = Var (Y_{i j} ∣ x_{i j}, β) = ϕ v (μ_{i j}),

(2.2)

where v(μ_ij) can be any function of μ_i that is always positive and ϕ is a scale parameter.

We let Y_i = [Y_i₁, …, Y_{in_i}]′ be an n_i × 1 vector containing the outcomes for the n_i subjects in cluster i; X_i = [x_i₁, …, x_{in_i}]′ represents the n_i × K covariate matrix and μ_i = [μ_i₁, …, μ_{in_i}]′ is an n_i × 1 mean vector. We assume the correlation between any two observations in the same cluster is exchangeable, i.e., ρ = Corr(Y_ij, Y_ik|X_i); ρ is often referred to as the ICC. The exchangeable correlation matrix of Y_i is

R_{i} = Corr (Y_{i}) = ρ I_{i} + (1 - ρ) J_{i} J_{i}^{'}

where I_i is an n_i × n_i identity matrix and J_i is an n_i × 1 vector of 1’s. In this case, the covariance matrix of Y_i is $V_{i} = A_{i}^{1 / 2} R_{i} A_{i}^{1 / 2}$ , where A_i is a diagonal matrix, with diagonal elements v(μ_ij)ϕ, with v(μ_ij) specified entirely by the marginal distributions, i.e., by β.

To estimate β, consider GEE (Liang and Zeger, 1986) of the form

u (\hat{β}) = \sum_{n = 1}^{N} \sum_{j = 1}^{n_{i}} {\hat{D}}_{i}^{'} {\hat{V}}_{i}^{- 1} [Y_{i} - μ_{i} (\hat{β})] = 0,

(2.3)

where $D_{i} = \frac{d {[μ_{i} (β)]}^{'}}{d β}$ , and V_i (described above) is a function of ρ which must be estimated; however, the scale parameter ϕ can be ignored when solving for β̂.

We first simplify (2.3) by simplifying $V_{i}^{- 1}$ . Note that $V_{i}^{- 1} = A_{i}^{- 1 / 2} R_{i}^{- 1} A_{i}^{- 1 / 2}$ . To avoid matrix inversions in GEE, Qu et al. (2000) proposed using

R_{i}^{- 1} = \frac{1}{(1 - ρ)} I_{i} - \frac{ρ}{(1 - ρ) [(1 - ρ) + n_{i} ρ]} J_{i} J_{i}^{'},

and thus

V_{i}^{- 1} = \frac{1}{(1 - ρ)} A_{i}^{- 1} - \frac{ρ}{(1 - ρ) [(1 - ρ) + n_{i} ρ]} (A_{i}^{- 1 / 2} J_{i}) {(A_{i}^{- 1 / 2} J_{i})}^{'} .

However, in addition to avoiding inversion of large matrices, we show that by multiplying out the terms analytically, multiplication of large matrices can be completely by-passed using this expression for $V_{i}^{- 1}$ . Thus, as shown below, a unique contribution of this paper is that it shows how (2.3) can be expressed as a simple sum of vectors and scalars. That is, the estimating function in (2.3) becomes

u (β) = \frac{1}{(1 - ρ)} \sum_{i = 1}^{N} D_{i}^{'} A_{i}^{- 1} [Y_{i} - μ_{i}] - \sum_{i = 1}^{N} \frac{ρ}{(1 - ρ) [(1 - ρ) + n_{i} ρ]} D_{i}^{'} (A_{i}^{- 1 / 2} J_{i}) {(A_{i}^{- 1 / 2} J_{i})}^{'} [Y_{i} - μ_{i}] .

Further, without loss of generality since we are setting this equal to 0 and solving for β̂, we can multiply the estimating function by (1 − ρ) to obtain

u (β) = \sum_{i = 1}^{N} D_{i}^{'} A_{i}^{- 1} [Y_{i} - μ_{i}] - \sum_{i = 1}^{N} \frac{ρ}{[(1 - ρ) + n_{i} ρ]} D_{i}^{'} (A_{i}^{- 1 / 2} J_{i}) {(A_{i}^{- 1 / 2} J_{i})}^{'} [Y_{i} - μ_{i}] .

(2.4)

The second sum in the estimating function can be simplified further by noting that

{(A_{i}^{- 1 / 2} J_{i})}^{'} [Y_{i} - μ_{i}] = \sum_{j = 1}^{n_{i}} (Y_{i j} - μ_{i j}) / \sqrt{v_{i j}} and D_{i}^{'} (A_{i}^{- 1 / 2} J_{i}) = \sum_{j = 1}^{n_{i}} d_{i j} / \sqrt{v_{i j}}

where $d_{i j} = \frac{d [μ_{i j} (β)]}{d β}$ is a given column of D_i. Then, (2.4) becomes

u (β) = \sum_{i = 1}^{N} \sum_{j = 1}^{n_{i}} d_{i j} (Y_{i j} - μ_{i j}) / v_{i j} - \sum_{i = 1}^{N} \frac{ρ}{[(1 - ρ) + n_{i} ρ]} [\sum_{j = 1}^{n_{i}} d_{i j} / \sqrt{v_{i j}}] \sum_{j = 1}^{n_{i}} (Y_{i j} - μ_{i j}) / \sqrt{v_{i j}} .

(2.5)

Thus (2.3) has now been expressed as a simple sum of vectors and scalars. The first sum,

u_{I} (β) = \sum_{i = 1}^{N} \sum_{i = 1}^{n_{i}} d_{i j} (Y_{i j} - μ_{i j}) / v_{i j}

(2.6)

is the GEE under naive independence, and yields consistent, but possibly inefficient estimators of β. Thus, the second sum in (2.5) is where efficiency is gained when incorporating the ICC. However, ρ must be estimated and plugged into (2.5) to realize potential efficiency gains. Any consistent estimator of ρ will yield the same asymptotic efficiency of the resulting estimator of β. In the following section, we consider a very simple estimator that is computationally feasible with large clusters.

Using Taylor series expansions similar to Liang and Zeger (1986) and Prentice (1988), assuming that the regression for μ_ij is correctly specified, the solution β̂ for u(β̂) = 0 is consistent for β; in addition, N^1/2(β̂ − β) has an asymptotic distribution which is multivariate normal with mean vector 0. The asymptotic covariance matrix of β̂ can be consistently estimated by the “sandwich estimator”

{[\sum_{i = 1}^{N} {\hat{W}}_{i}]}^{- 1} [\sum_{i = 1}^{N} u_{i} (\hat{β}) u_{i} {(\hat{β})}^{'}] {[\sum_{i = 1}^{N} {\hat{W}}_{i}]}^{- 1},

(2.7)

where $u_{i} (β) = \sum_{j = 1}^{n_{i}} d_{i j} (Y_{i j} - μ_{i j}) / v_{i j} - \frac{ρ}{[(1 - ρ) + n_{i} ρ]} [\sum_{j = 1}^{n_{i}} d_{i j} / \sqrt{v_{i j}}] \sum_{j = 1}^{n_{i}} (Y_{i j} - μ_{i j}) / \sqrt{v_{i j}}$ is the sum of the score vectors from the subjects in cluster i and

W_{i} = W_{i} (β, ρ) = - E [\frac{d {[u_{i} (β)]}^{'}}{d β}] = \sum_{j = 1}^{n_{i}} d_{i j} d_{i j}^{'} / v_{i j} - \frac{ρ}{[(1 - ρ) + n_{i} ρ]} [\sum_{j = 1}^{n_{i}} d_{i j} / \sqrt{v_{i j}}] {[\sum_{j = 1}^{n_{i}} d_{i j} / \sqrt{v_{i j}}]}^{'} .

(2.8)

Also, W_i and u_i(β) are evaluated at β̂ and a consistent estimate of ρ.

3 Estimating the ICC

Denoting the true residual for the jth subject from the ith cluster by $e_{i j} = (Y_{i j} - μ_{i j}) / \sqrt{v_{i j}}$ , then by definition, for j ≠ j′, E(e_ije_ij_′) = ρ. This suggests that a consistent method of moments estimator of ρ (Liang and Zeger, 1986) can be obtained via

\hat{ρ} = {[\sum_{i = 1}^{N} n_{i} (n_{i} - 1) / 2]}^{- 1} \sum_{i = 1}^{N} \sum_{j < j^{'}} {\hat{e}}_{i j} {\hat{e}}_{i j^{'}}

(3.9)

where ${\hat{e}}_{i j} = (Y_{i j} - {\hat{μ}}_{i j}) / \sqrt{{\hat{v}}_{i j}}$ . The estimator given by (3.9) requires the sum of $\sum_{i = 1}^{N} n_{i} (n_{i} - 1) / 2$ pairs. However, the square of a summation can be written as (Parzen, 1960),

{(\sum_{j = 1}^{n_{i}} {\hat{e}}_{i j})}^{2} = \sum_{j = 1}^{n_{i}} {\hat{e}}_{i j}^{2} + 2 \sum_{j < j^{'}} {\hat{e}}_{i j} {\hat{e}}_{i j^{'}} .

Thus,

2 \sum_{j < j^{'}} {\hat{e}}_{i j} {\hat{e}}_{i j^{'}} = {(\sum_{j = 1}^{n_{i}} {\hat{e}}_{i j})}^{2} - \sum_{j = 1}^{n_{i}} {\hat{e}}_{i j}^{2}

which requires the sum of 2n_i terms instead of n_i(n_i − 1)/2 terms. Thus, we suggest using the following formulation to obtain an identical estimate to that given in (3.9),

\hat{ρ} = {[\sum_{i = 1}^{N} n_{i} (n_{i} - 1) / 2]}^{- 1} \sum_{i = 1}^{N} [{(\sum_{j = 1}^{n_{i}} {\hat{e}}_{i j})}^{2} - \sum_{j = 1}^{n_{i}} {\hat{e}}_{i j}^{2}];

(3.10)

this expression has $2 \sum_{i = 1}^{N} n_{i}$ (2 times the total sample size) terms instead of $\sum_{i = 1}^{N} n_{i} (n_{i} - 1) / 2$ terms. For the NIS data, this translates into approximately 16 million terms using our proposed approach instead of 92.6 billion terms using all pairs.

4 One-step estimator

Typically, one iterates between solving u(β̂) = 0 in (2.5) for β̂ (given the current estimate of ρ) and estimating ρ with (3.10) (given the current estimate of β) until convergence. If the estimate of β under naive independence, say β̂_I is initially used to estimate ρ, and the resulting estimate of ρ is plugged back into (2.5) and u(β̂) = 0 is solved for β̂, this yields an asymptotically equivalent estimator as the fully-iterated GEE estimator. The proof is similar to Lehmann (1983) for creating a one-step asymptotically efficient estimator from a consistent estimator (in this case β̂_I). A one-step GEE estimator is much less computationally intensive than a fully-iterated GEE for large cluster sizes.

In particular, a one-step estimator of β is formed by using one iteration of a Fisher scoring algorithm for obtaining a solution to u(β̂) = 0, with β̂_I as the starting value, e.g.,

{\hat{β}}_{1 GEE} = {\hat{β}}_{I} + {[W ({\hat{β}}_{I}, \hat{ρ})]}^{- 1} u ({\hat{β}}_{I}),

(4.11)

where β̂₁_GEE is the one-step GEE estimator and ρ̂ is calculated by using β̂_I to estimate μ̂_ij and v_ij in ê_ij in (3.10), and $W (β, ρ) = \sum_{i = 1}^{N} W_{i} (β, ρ)$ . Since u_I(β̂_I) = 0, then u(β̂_I) in (4.11) reduces to

u ({\hat{β}}_{I}) = u_{I} ({\hat{β}}_{I}) - \sum_{i = 1}^{N} \frac{\hat{ρ}}{[(1 - \hat{ρ}) + n_{i} \hat{ρ}]} [\sum_{j = 1}^{n_{i}} {\hat{d}}_{i j} / \sqrt{{\hat{v}}_{i j}}] \sum_{j = 1}^{n_{i}} (Y_{i j} - {\hat{μ}}_{i j}) / \sqrt{{\hat{v}}_{i j}} = - \sum_{i = 1}^{N} \frac{\hat{ρ}}{[(1 - \hat{ρ}) + n_{i} \hat{ρ}]} [\sum_{j = 1}^{n_{i}} {\hat{d}}_{i j} / \sqrt{{\hat{v}}_{i j}}] \sum_{j = 1}^{n_{i}} (Y_{i j} - {\hat{μ}}_{i j}) / \sqrt{{\hat{v}}_{i j}},

so that (4.11) simplifies to

{\hat{β}}_{1 GEE} = {\hat{β}}_{I} - {[W ({\hat{β}}_{I}, \hat{ρ})]}^{- 1} \sum_{i = 1}^{N} \frac{\hat{ρ}}{[(1 - \hat{ρ}) + n_{i} \hat{ρ}]} [\sum_{j = 1}^{n_{i}} {\hat{d}}_{i j} / \sqrt{{\hat{v}}_{i j}}] \sum_{j = 1}^{n_{i}} (Y_{i j} - {\hat{μ}}_{i j}) / \sqrt{{\hat{v}}_{i j}} .

(4.12)

The asymptotic covariance of β̂₁_GEE is consistently estimated by (2.7) evaluated at β̂₁_GEE:

5 Application to 2010 Nationwide Inpatient Sample

The binary outcome of interest is a patient complication within the first 48 hours post surgery (1 if complication, 0 if none). We fit a logistic regression model, where the main covariate was U.S. payer type, with 5 categories: Medicare, Medicaid, private insurance, self-pay, and uninsured/other. The other covariates were: race (1 if white, 0 otherwise), age (in years), advanced cancer stage (1 if yes, 0 if no), AIDS (1 if yes, 0 if no), renal failure (1 if yes, 0 if no) and number of other comorbidities.

Table 1 gives the GEE estimates of β under naive independence, and exchangeable (ICC) correlation (the proposed one-step and fully iterated). The estimated standard errors are from the sandwich variance estimator. An estimate of the asymptotic relative efficiency (ARE) can be obtained by comparing the variance estimates of β̂. From Table 1, we see that the estimated standard errors of β̂ under exchangeable correlation are much smaller than those under independence. Even with a relatively small estimated ICC, ρ̂ = 0.0078, the gains in efficiency appear appreciable. For example, for the payer effects, the estimated AREs range from 50% to 60%. The proposed one-step and fully iterated GEE give very similar results.

Table 1.

Comparison of logistic regression parameter estimates for the post-operative surgical complications data

Effect	Approach	Estimate	SE	Z-statistic	P-value
Intercept	IND-Robust	−2.8382	0.0426	−66.56	<.0001
	Proposed 1-step	−2.5203	0.0290	−86.94	<.0001
	Fully Iterated GEE	−2.4005	0.0356	−67.35	<.0001
Medicare	IND-Robust	−0.0513	0.0236	−2.18	0.0296
	Proposed 1-step	−0.0493	0.0183	−2.69	0.0071
	Fully Iterated GEE	−0.0428	0.0175	−2.45	0.0143
Medicaid	IND-Robust	−0.0307	0.0254	−1.21	0.2257
	Proposed 1-step	−0.0271	0.0193	−1.41	0.1593
	Fully Iterated GEE	−0.0274	0.0185	−1.48	0.1384
Private	IND-Robust	−0.0936	0.0217	−4.32	<.0001
	Proposed 1-step	−0.0741	0.0158	−4.68	<.0001
	Fully Iterated GEE	−0.0736	0.0152	−4.84	<.0001
Self-pay	IND-Robust	0.1848	0.0277	6.67	<.0001
	Proposed 1-step	0.1571	0.0219	7.17	<.0001
	Fully Iterated GEE	0.1496	0.0211	7.08	<.0001
White	IND-Robust	−0.0106	0.0136	−0.77	0.4389
	Proposed 1-step	−0.0233	0.0074	−3.16	0.0016
	Fully Iterated GEE	−0.0244	0.0072	−3.39	0.0007
# Comorbidities	IND-Robust	0.2897	0.0030	96.80	<.0001
	Proposed 1-step	0.2800	0.0027	105.04	<.0001
	Fully Iterated GEE	0.2767	0.0027	102.64	<.0001
Cancer	IND-Robust	0.5291	0.0093	56.98	<.0001
	Proposed 1-step	0.4972	0.0098	50.63	<.0001
	Fully Iterated GEE	0.4935	0.0098	50.42	<.0001
AIDS	IND-Robust	0.1696	0.0396	4.28	<.0001
	Proposed 1-step	0.1790	0.0307	5.83	<.0001
	Fully Iterated GEE	0.1687	0.0298	5.67	<.0001
Renal Fail	IND-Robust	0.8258	0.0097	85.20	<.0001
	Proposed 1-step	0.8333	0.0100	83.49	<.0001
	Fully Iterated GEE	0.8383	0.0103	81.03	<.0001
Age	IND-Robust	0.0248	0.0006	40.19	<.0001
	Proposed 1-step	0.0239	0.0004	63.39	<.0001
	Fully Iterated GEE	0.0232	0.0003	66.38	<.0001

Open in a new tab

Next we compare estimators in terms of computation times (real not CPU). Standard logistic regression maximum likelihood estimation under naive independence without a robust variance in SAS PROC GENMOD (SAS Institute, 2015) is the fastest (1.1 minutes); logistic regression under naive independence but with a sandwich variance estimate (PROC GENMOD) takes 2.5 minutes. Thus, calculation of the sandwich variance estimate for standard logistic regression takes an additional 1.4 minutes. Our SAS macro for the proposed one-step approach takes 2.1 minutes, as opposed to the fully-iterated estimation (in PROC GENMOD) with exchangeable correlation, which takes 8.1 hours. We note that use of only a single iteration of PROC GENMOD with exchangeable correlation should be comparable to our one-step approach because PROC GENMOD uses the logistic regression estimates under naive independence as starting values. Thus, direct comparison of our one-step approach (2.1 minutes) to one-step of PROC GENMOD (1.6 hours) emphasizes the potential advantage of the proposed method. We attribute the advantage in computation time to the following: 1) we have expressed the GEE in a form that does not require inversion or multiplication of any matrices; 2) we use a simple formula to estimate the ICC that does not require summing over all pairs of outcomes within a cluster; and 3) we only use a one-step GEE estimator instead of fully-iterating. A unique contribution of this paper is that it expresses the GEE estimating equations with an exchangeable correlation given by (2.3) as a simple sum of vectors and scalars.

We note that we fit the models on a 64-bit PC workstation with an Intel Xeon CPU E5-2630 0 @ 2.30GHz processor with 16.0 GB of RAM and an SSD hard drive; this PC workstation is faster than a typical desktop PC. Finally, although the results presented here were obtained using SAS, we also attempted to obtain the fully-iterated GEE estimates using both gee in R (Carey, 2002) and the xtgee command in Stata (StataCorp, 2015); both program were unable to produce estimates of the model parameters due to insufficient memory.

Finally, we note that clustering can also arise in longitudinal studies with repeated measures on the same subjects. For such studies an exchangeable correlation structure is often not appropriate; for example, Toeplitz, autoregressive, m-dependent, or even unstructured correlation patterns may be preferred. However, typical cluster sizes for most longitudinal studies tend to be relatively small, say less than 15–20, so that the computationally feasible methods similar to those discussed here are not required.

Acknowledgments

We are grateful for the support provided by grant CA60679 from the U.S. National Institutes of Health.

Contributor Information

Stuart Lipsitz, Brigham & Women’s Hospital, Boston, MA.

Garrett Fitzmaurice, Harvard Medical School, Boston, MA.

Debajyoti Sinha, Florida State University, Tallahassee, FL.

Nathanael Hevelone, Brigham & Women’s Hospital, Boston, MA.

Jim Hu, Cornell Medical College, New York, NY.

Louis L. Nguyen, Brigham & Women’s Hospital, Boston, MA

References

Carey V. gee: Generalized Estimation Equation Solver. R Package Version 4.13-10; Ported from S-PLUS to R by Thomas Lumley (versions 3.13 and 4.4) and Brian Ripley (version 4.13) 2002. [Google Scholar]
Lehmann E. Theory of Point Estimation. John Wiley & Sons; 1983. [Google Scholar]
Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
Parzen E. Modern probability theory and its applications. John Wiley & Sons; 1960. [Google Scholar]
Prentice RL. Correlated binary regression with covariates specific to each binary observation. Biometrics. 1988;44:1033–1048. [PubMed] [Google Scholar]
Qu A, Lindsay B, Li B. Improving generalised estimating equations using quadratic inference functions. Biometrika. 2000;87:823–836. [Google Scholar]
SAS Institute. SAS/STAT Software, Version 9.4. Cary, NC: 2015. [Google Scholar]
StataCorp. Stata Statistical Software: Release 13. College Station, TX: StataCorp LP; 2015. [Google Scholar]

[R1] Carey V. gee: Generalized Estimation Equation Solver. R Package Version 4.13-10; Ported from S-PLUS to R by Thomas Lumley (versions 3.13 and 4.4) and Brian Ripley (version 4.13) 2002. [Google Scholar]

[R2] Lehmann E. Theory of Point Estimation. John Wiley & Sons; 1983. [Google Scholar]

[R3] Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]

[R4] Parzen E. Modern probability theory and its applications. John Wiley & Sons; 1960. [Google Scholar]

[R5] Prentice RL. Correlated binary regression with covariates specific to each binary observation. Biometrics. 1988;44:1033–1048. [PubMed] [Google Scholar]

[R6] Qu A, Lindsay B, Li B. Improving generalised estimating equations using quadratic inference functions. Biometrika. 2000;87:823–836. [Google Scholar]

[R7] SAS Institute. SAS/STAT Software, Version 9.4. Cary, NC: 2015. [Google Scholar]

[R8] StataCorp. Stata Statistical Software: Release 13. College Station, TX: StataCorp LP; 2015. [Google Scholar]

PERMALINK

One-Step Generalized Estimating Equations with Large Cluster Sizes

Stuart Lipsitz

Garrett Fitzmaurice

Debajyoti Sinha

Nathanael Hevelone

Jim Hu

Louis L Nguyen

Abstract

1 Introduction

2 Generalized estimating equations

3 Estimating the ICC

4 One-step estimator

5 Application to 2010 Nationwide Inpatient Sample

Table 1.

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

One-Step Generalized Estimating Equations with Large Cluster Sizes

Stuart Lipsitz

Garrett Fitzmaurice

Debajyoti Sinha

Nathanael Hevelone

Jim Hu

Louis L Nguyen

Abstract

1 Introduction

2 Generalized estimating equations

3 Estimating the ICC

4 One-step estimator

5 Application to 2010 Nationwide Inpatient Sample

Table 1.

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases