Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Mar 1.
Published in final edited form as: Biom J. 2016 Nov 23;59(2):315–330. doi: 10.1002/bimj.201600107

Optimal design of longitudinal data analysis using generalized estimating equation models

Jingxia Liu 1,, Graham A Colditz 2
PMCID: PMC5575779  NIHMSID: NIHMS875839  PMID: 27878852

Abstract

Longitudinal studies are often applied in biomedical research and clinical trials to evaluate the treatment effect. The association pattern within the subject must be considered in both sample size calculation and the analysis. One of the most important approaches to analyze such a study is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which “working correlation structure” is introduced and the association pattern within the subject depends on a vector of association parameters denoted by ρ. The explicit sample size formulas for two-group comparison in linear and logistic regression models are obtained based on the GEE method by Liu and Liang. For cluster randomized trials (CRTs), researchers proposed the optimal sample sizes at both the cluster and individual level as a function of sampling costs and the intracluster correlation coefficient (ICC). In these approaches, the optimal sample sizes depend strongly on the ICC. However, the ICC is usually unknown for cluster randomized and multicenter trials. To overcome this shortcoming, Van Breukelen et al consider a range of possible ICC values identified from literature reviews and present Maximin designs (MMDs) based on relative efficiency (RE) and efficiency under budget and cost constraints. In this paper, the optimal sample size and number of repeated measurements using GEE models with an exchangeable working correlation matrix is proposed under the considerations of fixed budget, where “optimal” refers to maximum power for a given sampling budget. The equations of sample size and number of repeated measurements for a known parameter value ρ are derived and a straightforward algorithm for unknown ρ is developed. Applications in practice are discussed. We also discuss the existence of the optimal design when an AR(1) working correlation matrix is assumed. Our proposed method can be extended under the scenarios when the true and working correlation matrix are different.

Keywords: Generalized estimating equation, longitudinal studies, optimal design, power, working correlation matrix

1 Introduction

Longitudinal studies are often applied in biomedical research and clinical trials. Methods for analyzing such correlated data have been extensively developed in recent years. One of the most important approaches is generalized estimating equation (GEE) proposed by Liang and Zeger (Liang et al., 1986), which fits a marginal model. In order to describe the association pattern within the subject, the idea of a “working correlation structure” is introduced and the pattern depends on a vector of association parameters denoted by ρ. The advantage of this approach is that consistent estimates are provided when the marginal model is correctly specified, even if the working correlation matrix is incorrectly assumed. Therefore, GEE has been popularly applied (Dahmen et al., 2004; Sanda et al., 2008; Lin et al., 2015; Park et al., 2015; Toriola et al., 2015; Jeffe et al., 2016).

Sample size calculation or power estimation is an important topic in study design for investigators. For a two-group comparison with continuous or categorical outcomes, sample size formula have been extensively proposed (Schlesselman, 1982; Self et al., 1988; Self et al., 1992; Shuster, 1993; Dahmen et al., 2004; Dahmen et al., 2004). For longitudinal studies, Liu and Liang (Liu et al., 1997) propose a statistic based on the GEE method and present the asymptotic distribution. They derive the sample size calculation formula for correlated observations based on the score statistic. For special cases, e.g., two-group comparison in linear and logistic regression models, the explicit formulas are obtained. With the assumption of the exchangeable working correlation R(ρ), i.e., 1 for diagonal entries and ρ otherwise, their sample size formula for continuous outcomes is identical to the one developed by Diggle et al (Diggle et al., 1994) (p.31) and for a binary outcome is similar to that of Lee and Dubin (Lee et al., 1994).

Cluster randomized trials (CRTs) are performed when randomization at the individual-level is practically infeasible or may lead to severe estimation bias of the treatment effect. In recent years, there has been growing interest in conducting cluster randomized trials (Campbell et al., 2000; Gulliford et al., 2014; Gravenstein et al., 2016; Kalfon et al., 2016; Mehring et al., 2016; Nagayama et al., 2016; Yamagata et al., 2016). Individuals within a cluster may share similar characteristics or be exposed to common factors. The degree of such a similarity needs to be considered in the analysis and is commonly quantified by the intracluster correlation coefficient (ICC). The equations of sample size for cluster randomized trials have been published as a function of the ICC or the coefficient of variation (CV) of cluster sizes (Donner et al., 1981; Rosner et al., 2003; Eldridge et al., 2006; Austin, 2007; Rosner et al., 2011; Amatya et al., 2013). Additionally, cost is an important factor since it is associated with recruiting an additional cluster instead of an additional subject in an individually randomized trial. Therefore, sample size estimation is particularly important due to the large cost in such a design. Researchers have proposed the optimal sample sizes at both the cluster and individual level as a function of sampling costs and the ICC (Raudenbush, 1997; Raudenbush et al., 2000; Moerbeek et al., 2001; Moerbeek et al., 2001; Connelly, 2003; Liu, 2003; Headrick et al., 2005). Optimal means maximum power and precision for a given sampling budget, or minimum sampling costs for a given power and precision. In these approaches, the optimal sample sizes depend strongly on the ICC. However, the ICC is usually unknown for cluster randomized and multicenter trials. To overcome this shortcoming, Van Breukelen et al (Van Breukelen et al., 2015) consider a range of possible ICC values identified from literature reviews and present Maximin designs (MMDs) based on relative efficiency (RE) and efficiency under budget and cost constraints. Using the linear regression model and the variance of the maximum likelihood estimator, they give a practical expression for the optimal design (number of clusters, number of persons per cluster). They also show that the MMD has a high minimum RE for the realistic ICC ranges and cost ratios and can thus be recommended for practical use.

For longitudinal studies, selecting the optimal number of repeated measures is also a major concern for researchers in order to decrease intra-patient variability and thus increase statistical power (Vickers, 2003). Therefore, the guidance related to how researchers should select the number of repeated measures deserves more attention. Additionally, the cost spent on this kind of study includes the enrollment cost and the cost of repeated measures per subject. The total cost may dramatically increase when either the total sample size or the number of repeated measures per subject rises. Under a given total budget, finding the optimal design that maximizes the power is a practical issue for longitudinal studies. On the other hand, the association parameter (ρ) within the subject is unknown in most cases when the study is designed.

In this paper, we will discuss how many subjects need to be enrolled and how many repeated measures are sufficient under the budget constraint for known and unknown association parameter (ρ) in the working correlation structure within the subject. Given that GEE method provides consistent estimation regardless of the working correlation matrix, we will propose the optimal sample size based on the GEE under a fixed budget. The exchangeable correlation matrix is used in our proposed method and other correlation structures including working correlation matrix misspecification is discussed.

The outline of this article is as follows. In section 2, we briefly summarize the GEE method developed by Liang and Zeger (Liang and Zeger, 1986) and the sample size formula for two-group comparison in linear and logistic regression models proposed by Liu and Liang (Liu and Liang, 1997). Section 3 presents the locally optimal design (LOD) for known parameter value ρ in the working correlation matrix. Section 4 shows the optimal designs (ODs) for unknown parameter value ρ. In section 5, we discuss the existence of optimal design when the first-order autoregressive (AR(1)) is assumed to be the working correlation matrix and later consider several scenarios in which the true and working correlation structures are different. We discuss how the proposed method applies to the real world in section 6, followed by discussion of the limitations and directions for future research.

2 Generalized estimating equation model and sample size calculations

Let γi = (γi1, γi2, ··· γini)′ be a vector of responses from the ith subject, i = 1, ···, m. The responses are assumed to be independent across subjects but correlated within each subject. Liu and Liang (Liu and Liang, 1997) consider the marginal model

g(μij)=Xijφ+Zijλ,i=1,,m;j=1,,ni, (1)

where μij = E(γij) and g(·) is a linking function. A p × 1 vector φ represents fixed effect and parameters of interests, whereas a q × 1 vector λ is nuisance parameters. The mean of γi is denoted by μi = E(γi) and the variance-covariance matrix for γi is denoted by Vi=θAi1/2Ri(ρ)Ai1/2, where Ai = diag {γ(μi1), ···, γ(μini)} and working correlation structure Ri(ρ) describes the association pattern within the subject. Ri(ρ) is a ni × ni matrix and depends on a vector of association parameters denoted by ρ. Both γ and θ are dependent on the distribution of responses. For instance, if γij is binary, γ(μij)=μij(1-μij)m and θ=1; If γij is continuous, γ(μij) = 1 and θ is the random error variance.

The hypothesis are H0: φ = φ0 versus H1: φ = φ1. The quasi-score statistic based on the GEE is

T=Sφ(φ0,λ^0,ρ)0-1Sφ(φ0,λ^0,ρ),

where

Sφ(φ0,λ^0,ρ)=i=1m(μiφ)Vi-1(yi-μi),0=covH0[Sφ(φ0,λ^0,ρ)],

and λ̂0 is an estimator of λ under H0 and estimated by solving the following equation

Sλ(φ0,λ,ρ)=i=1m(μiλ)Vi-1(yi-μi)=0.

Under H0 and H1, μij=g-1(Xijφ0+Zijλ0) and g-1(Xijφ1+Zijλ1), respectively. They show that T follows a non-central chi-square distribution asymptotically under the φ = φ1 and λ = λ1 with the non-central parameter ε1-1ε, where ε is the expectation of Sφ(φ0,λ̂0, ρ) and Σ1 the covariance of Sφ(φ0,λ̂0, ρ) under H1. Both the expectation and covariance can be approximately estimated. Therefore, the statistical power, equivalently the sample size calculation, can be approximated from the non-central chi-square distribution. Please note that this approach assumes that ρ is known.

For simplicity, the number of repeated measures is assumed to be identical across the subjects, i.e., ni = n for all i. A special case is considered where p = q = 1 with Zij ≡ 1 and XijXi = 1 for treatment or 0 for control group. Let π0 be the proportion of control group and π1 = 1 − π0. Liu and Liang (Liu and Liang, 1997) obtain the explicit sample size formulas for a two-group study with binary and continuous responses.

For binary outcome, the null hypothesis is p0 = p1. The total sample size m is calculated by

m=(z1-α/2+z1-β)2×(π1p0(1-p0)+π0p1(1-p1))π0π1(p1-p0)2(1R(ρ)-11),

where α is a pre-specified significance level, zα is the 100×α percentile of a standard normal distribution, 1-β is the nominal power, and 1 is an n × 1 vector of 1’s. With the assumption of exchangeable correlation matrix,

1R(ρ)-11=1+(n-1)ρn, (2)

the total sample size m is simplified to

m=(z1-α/2+z1-β)2×(π1p0(1-p0)+π0p1(1-p1))(1+(n-1)ρ)nπ0π1(p1-p0)2.

The factor 1 + (n − 1)ρ is known as the inflation factor or design effect (Neuhaus et al., 1993). Define

Δ1=π1p0(1-p0)+π0p1(1-p1)π0π1(p1-p0)2, (3)

the total sample size can be re-written as

m=(z1-α/2+z1-β)21+(n-1)ρnΔ1. (4)

Alternatively,

z1-β=n1+(n-1)ρmΔ1-z1-α/2. (5)

Consequently, the power (1 − β) is calculated by φ−1(z1 − β), where φ is cumulative density function of standard normal distribution.

For a two-group study with continuous outcomes, the linear regression model for repeated measurements is considered, where the error term follows a multivariate normal distribution with mean 0 and covariance matrix σ2R(ρ). The hypothesis is H0: φ = 0 versus H1: φ = φ1. The total sample size formula is

m=(z1-α/2+z1-β)2σ2π0π1φ12(1R(ρ)-11).

Again, assuming that exchangeable correlation matrix, the total sample size formula reduces to

m=(z1-α/2+z1-β)2σ2(1+(n-1)ρ)nπ0π1φ12,

and the factor 1 + (n − 1)ρ is known as the inflation factor or design effect (Scott et al., 1982). As noted, it is identical to the one developed by Diggle et al (Diggle, Liang, et al., 1994) (p.31). Define

Δ1=σ2π0π1φ12, (6)

the sample size and power formula for two-group comparison with continuous outcome have the same format as equations (4) and (5). Therefore, we will use these two formulas throughout the paper, regardless of the binary or continuous outcome.

Van Breukelen et al (Van Breukelen and Candel, 2015) define the total budget B in CRTs. We will use the same idea in longitudinal studies. Assume the study cost per subject in the enrollment stage is c currency units (e.g. USD), and each repeated measurement costs s currency units. Total budget B includes m subjects and n repeated measurements per subject,

B=m(c+sn). (7)

The term “optimal” refers to maximum power for a given sampling budget in this paper. The research question is to find the design which maximizes the power (1 − β) in equation (5) given the constraint in (7).

3 Known parameter value ρ in working correlation matrix

In this section we assume ρ is a known value. As noted that π0, Δ1, and α in the equation (5) are pre-determined values under null and alternative hypothesis. Maximizing the power means maximizing

K=nm1+(n-1)ρ.

Substituting m=Bc+sn gives

K=Bnc(1-ρ)+[cρ+s(1-ρ)]n+sρn2.

Taking the partial derivatives with respect to n gives

KnBc(1-ρ)-Bsρn2.

It can be shown that when

n=c(1-ρ)sρ, (8)

the derivatives equals 0. When n>c(1-ρ)sρ,Kn is negative while when n<c(1-ρ)sρ,Kn is positive. Therefore, n=c(1-ρ)sρ maximizes K. The locally optimal design (LOD) is reached for a known value ρ and n in equation (8) is denoted by nLOD. Let θ=1-ρρ, the parameters in LOD are given by

nLOD=θcs,mLOD=Bθsc+c. (9)

We also note that it is identical to the optimal designs (Raudenbush, 1997; Moerbeek et al., 2000; Van Breukelen and Candel, 2015), in which the variance of the treatment effect estimator in the linear regression model is minimized under the cost constraints.

Table 1 shows an example to determine the locally optimal design for binary outcome in a two-group study with p0 = 0.1 and p1 = 0.3 under two different fixed budgets (15,000$ and 20,000$) assuming c = 100, and s = 50. The equal sample size between treatment and control group is considered (π0 = 0.5). For each ρ value between 0.1 and 0.9, the locally optimal design estimated by (9) and the maximized power are illustrated in the table.

Table 1.

Locally optimal design (LOD) at known parameter value ρ in working correlation matrix for two-sample study with c = 100, and s = 50

ρ nLOD mLOD (1 − β)LOD nup mup (1 − β)up Bup ndown mdown (1 − β)down Bdown
B = 15,000$; p0 = 0.1, p1 = 0.3
0.1 4.2 48.1 0.893 5 42 0.885 14700 4 50 0.893 15000
0.2 2.8 62.1 0.834 3 60 0.833 15000 2 75 0.823 15000
0.3 2.2 72.1 0.793 3 60 0.782 15000 2 75 0.792 15000
0.4 1.7 80.4 0.764 2 75 0.762 15000 1 100 0.733 15000
0.5 1.4 87.9 0.745 2 75 0.733 15000 1 100 0.733 15000
0.6 1.2 95.1 0.735 2 75 0.705 15000 1 100 0.733 15000
0.7 0.9 102.5 0.734 1 100 0.733 15000 NA
0.8 0.7 110.8 0.743 1 100 0.733 15000 NA
0.9 0.5 121.4 0.770 1 100 0.733 15000 NA
B = 20,000$; p0 = 0.1, p1 = 0.3
0.1 4.2 64.1 0.959 5 57 0.958 19950 4 66 0.957 19800
0.2 2.8 82.8 0.923 3 80 0.922 20000 2 100 0.915 20000
0.3 2.2 96.1 0.893 3 80 0.885 20000 2 100 0.893 20000
0.4 1.7 107.2 0.872 2 100 0.870 20000 1 133 0.846 19950
0.5 1.4 117.2 0.857 2 100 0.846 20000 1 133 0.846 19950
0.6 1.2 126.8 0.848 2 100 0.823 20000 1 133 0.846 19950
0.7 0.9 136.7 0.847 1 133 0.846 19950 NA
0.8 0.7 147.8 0.855 1 133 0.846 19950 NA
0.9 0.5 161.9 0.876 1 133 0.846 19950 NA

nup = int(nLOD) + 1, mup=int(Bc+snup); ndown = int(nLOD), mdown=int(Bc+sndown); int=an integer part of a number; (1 − β)up and Bup, (1 − β)down and Bdown are actual power and needed budget based on (nup, mup) and (ndown, mdown), respectively; Bold refers to the proposed optimal sample size and number of repeated measures with larger power between (nup, mup) and (ndown, mdown); NA = Not Applicable.

Obviously nLOD may be non-integer. In reality we need to choose an integer value for number of repeated measures with either nup = int (nLOD) + 1 or ndown = int (nLOD), where int refers to an integer part of a number. For both integers, we calculate mup and mdown from m=Bc+sn, and then calculate the corresponding power using equation (5). The proposed optimal number of repeated measures is the one with the larger power. Similarly, mup and mdown are non-integers most likely. In order to meet within the limit of budget, the integer parts for mup and mdown are taken as the values of corresponding sample sizes. Table 1 also shows the values of possible optimal design (nup, mup) and (ndown, mdown). The corresponding power and budget are given to show the actual power and needed budget. The proposed sample size and number of repeated measures ≥ 2 for the study are in bold.

Equation (9) shows that optimal number of repeated measures nLOD is also dependent on the ratio of cost per subject to cost per repeat (cs). Using ρ = 0.5, c = 100, and s = 50 in table 1 as an example, it leads to θcs=2 and nLOD = 1.4. Consequently, nup = 2 and ndown = 1, in which ndown = 1 has the larger power for the budget 15,000$. However, n > 1 should be expected for a longitudinal study. Additionally, we replace c = 100, and s = 50 by c = 100, and s = 10 and the other parameters are same as those in table 1. The results of locally optimal design are shown in table 2. For any ρ ≤ 0.7, both nup and ndown > 1; For ρ > 0.7, θcs4. In order to design a feasible longitudinal study, we need to ensure that θcs is equal to or larger than 4 resulting in nLOD ≥ 2.

Table 2.

Locally optimal design (LOD) at known parameter value ρ in working correlation matrix for two-sample study with c = 100, and s = 10

ρ nLOD mLOD (1 − β)LOD nup mup (1 − β)up Bup ndown mdown (1 − β)down Bdown
B = 15,000$; p0 = 0.1, p1 = 0.3
0.1 9.5 77.0 0.999 10 75 0.999 15000 9 78 0.999 14820
0.2 6.3 91.9 0.991 7 88 0.991 14960 6 93 0.991 14880
0.3 4.8 101.1 0.973 5 100 0.973 15000 4 107 0.972 14980
0.4 3.9 108.1 0.950 4 107 0.950 14980 3 115 0.947 14950
0.5 3.2 114.0 0.925 4 107 0.922 14980 3 115 0.924 14950
0.6 2.6 119.2 0.901 3 115 0.899 14950 2 125 0.898 15000
0.7 2.1 124.3 0.879 3 115 0.872 14950 2 125 0.879 15000
0.8 1.6 129.5 0.863 2 125 0.861 15000 1 136 0.853 14960
0.9 1.1 135.7 0.854 2 125 0.842 15000 1 136 0.853 14960
B = 20,000$; p0 = 0.1, p1 = 0.3
0.1 9.5 102.6 1 10 100 1 20000 9 105 1 19950
0.2 6.3 122.5 0.999 7 117 0.999 19890 6 125 0.999 20000
0.3 4.8 134.9 0.994 5 133 0.994 19950 4 142 0.994 19880
0.4 3.9 144.2 0.986 4 142 0.986 19880 3 153 0.985 19890
0.5 3.2 151.9 0.975 4 142 0.973 19880 3 153 0.975 19890
0.6 2.6 159.0 0.963 3 153 0.962 19890 2 166 0.961 19920
0.7 2.1 165.7 0.951 3 153 0.946 19890 2 166 0.950 19920
0.8 1.6 172.7 0.941 2 166 0.939 19920 1 181 0.935 19910
0.9 1.1 180.9 0.936 2 166 0.927 19920 1 181 0.935 19910

nup = int(nLOD) + 1, mup=int(Bc+snup); ndown = int(nLOD), mdown=int(Bc+sndown); int=an integer part of a number; (1 − β)up and Bup, (1 − β)down and Bdown are actual power and needed budget based on (nup, mup) and (ndown, mdown), respectively; Bold refers to the proposed optimal sample size and number of repeated measures with larger power between (nup, mup) and (ndown, mdown).

Provided that ρ, c, and s are fixed, the number of total sample size is completely dependent on the given budget B. For instance, under the scenario of c = 100, s = 50, and ρ = 0.2, nLOD = 2.8 for B = 15,000$ and B = 20,000$ but mLOD = 62.1 and mLOD = 82.8 for B = 15,000$ and B = 20,000$, respectively. The same findings are observed for all the cases in tables 1 and 2.

Equation (9) also shows that as ρ increases, that is, θ decreases, the number of repeated measurements nLOD must decrease while the total sample size mLOD must increase. Simultaneously,

nLOD×mLOD=θcs×Bθsc+c=Bs(s+cθ)=Bs1+csθ,

and

1+(nLOD-1)ρ=1+θcs-11+θ.

Therefore,

KLOD(θ)={Bs1+csθ}/{1+θcs-11+θ}.

Set ρ1 > ρ2, that is, θ1 < θ2,

KLOD(ρ1)KLOD(ρ2)=KLOD(θ1)KLOD(θ2)=1+csθ21+csθ1×1+θ2cs-11+θ21+θ1cs-11+θ1. (10)

Let f(θ)=(1+csθ)(1+θcs-11+θ), the mathematical computation shows f(θ) is a decreasing function of θ when θcs>1. Finally, KLOD(θ1)KLOD(θ2)=f(θ2)f(θ1)<1. It means, as ρ increases, the maximized power (1 − β)LOD decreases for nLOD > 1. Both tables 1 and 2 demonstrate these points well. The results above show the locally optimal design for known parameter value ρ given the cost.

4 Unknown parameter value ρ in working correlation matrix

Obviously nLOD in the equation (9) requires the prior information of ρ. As shown in tables 1 and 2, the incorrect input of ρ will have the different optimal estimates (mLOD, nLOD), which maximize the power given the budget constraint. In the scenario of unknown parameter value of ρ, we assume to have the possible range of ρ from previous studies, (ρmin, ρmax), called parameter space (Atkinson et al., 2007; Berger et al., 2009). Also we define the range of sample size based on the practical feasibility, (mmin, mmax), called design space. The major research question is thus to identify OD within parameter and design space. One of solution is MMD approach (Mario et al., 2002; Winkens et al., 2007; Berger and Wong, 2009; Maus et al., 2010). It includes three steps. Step 1 defines the parameter and design space; the relative efficiency (RE) is computed in step 2 and finds its smallest RE value in each design within the design space; step 3 selects the design which maximizes the minimum RE among all designs in the design space.

Our algorithm of finding OD for unknown parameter value ρ in working correlation matrix is similar to the MMD approach. It also includes three steps and the optimal design will be chosen from step 3. The difference is that the power is computed in step 2 and the design which minimizes the maximum power is selected. Obviously these steps demonstrate the most robust power estimation even if the parameter value ρ is wrongly assumed.

  • Step 1, Define the parameter and design space, (ρmin, ρmax) and (mmin, mmax), respectively.

  • Step 2, For each ρ in the parameter space, calculate mLOD using equation (9).

    1. If it is within the range (mmin, mmax), then set mLOD=mLOD.

    2. If it is outside of (mmin, mmax), then for any possible sample size m ∈(mmin, mmax) the number of repeats per subject n is calculated from (Bm-c)/s. Choose the design of (m, n) has the largest power within design space, denoted by ( mLOD,nLOD). This choice has locally optimal design.

  • Step 3, Select the design which minimizes the maximum power in step 2.

Table 3 shows an example to determine the optimal design for binary outcome in a two-group study with p0 = 0.1 and p1 = 0.3 under the fixed budgets of 15,000$, c = 100, and s = 20. We still assume the equal sample size between treatment and control group, and consider the parameter space of (0.05, 0.35) and design space of (5, 100). The locally optimal designs estimated by (5) fall within the design space and LODs are reached for each ρ within the parameter space. Figure 1 shows the power estimate for each different sample size and it clearly demonstrates the power is maximized within design space under the different ρ. Please note that nLOD and mLOD should be integers in practice, the similar approach in section 3 need to be adopted. Table 3 presents the proposed ( mLOD,nLOD) for each ρ. Therefore, we choose 93 as the total sample size and 3 as number of repeated measurements (mOD, nOD) = (93, 3) as the optimal design, which gives the power of 91.1%.

Table 3.

Optimal design (OD) at unknown parameter value ρ in working correlation matrix for two-sample study with B = 15,000$, c = 100, and s = 20

ρ nLOD mLOD (1 − β)LOD nup mup (1 − β)up ndown mdown (1 − β)down
nLOD
mLOD
(1-β)LOD
0.05 9.7 50.9 0.998 10 50 0.998 9 53 0.997 10 50 0.998
0.1 6.7 64.1 0.990 7 62 0.989 6 68 0.989 6 68 0.989
0.15 5.3 72.7 0.977 6 68 0.976 5 75 0.977 5 75 0.977
0.2 4.5 79.2 0.962 5 75 0.961 4 83 0.961 5 75 0.961
0.25 3.9 84.5 0.946 4 83 0.945 3 93 0.941 4 83 0.945
0.3 3.4 89.1 0.929 4 83 0.927 3 93 0.926 4 83 0.927
0.35 3.0 93.2 0.913 4 83 0.908 3 93 0.911 3 93 0.911

(ρmin, ρmax) = (0.05, 0.35) and (mmin, mmax) = (5, 100).

Figure 1.

Figure 1

Power of the treatment effect as a function of sample size (5, 100) for several values of Rho

If mLOD is within the range (mmin, mmax) for all the values of ρ in the parameter space, as shown in table 3, then the maximized powers are reached under LOD. Section 3 shows that the maximized power is a decreasing function of ρ under LOD. It means that the algorithm finds the optimal design at ρ = ρmax.

Table 4 shows the same study design as in table 3 but with the feasible sample size of (5, 50). Obviously the locally optimal designs are not reached for each ρ within the parameter space. Figure 2 shows the power estimate for the different sample sizes under the different ρ within design space. The power goes up with the increase of sample size. Table 4 presents LODs, the number of repeated measurements and actual power at the boundary of sample size: mmin and mmax for each ρ. Finally, (mOD, nOD) = (50, 10) is the optimal design with the power of 80.9%.

Table 4.

Optimal design (OD) at unknown parameter value ρ in working correlation matrix for two-sample study with B = 15,000$, c = 100, and s = 20

ρ nLOD mLOD (1 − β)LOD nmin mmin (1 − β)min nmax mmax (1 − β)max
nLOD
mLOD
(1-β)LOD
0.05 9.7 50.9 0.998 145 5 0.680 10 50 0.998 10 50 0.998
0.1 6.7 64.1 0.990 145 5 0.426 10 50 0.987 10 50 0.987
0.15 5.3 72.7 0.977 145 5 0.309 10 50 0.965 10 50 0.965
0.2 4.5 79.2 0.962 145 5 0.246 10 50 0.932 10 50 0.932
0.25 3.9 84.5 0.946 145 5 0.207 10 50 0.893 10 50 0.893
0.3 3.4 89.1 0.929 145 5 0.180 10 50 0.851 10 50 0.851
0.35 3.0 93.2 0.913 145 5 0.161 10 50 0.809 10 50 0.809

(ρmin, ρmax) = (0.05, 0.35) and (mmin, mmax) = (5, 50).

Figure 2.

Figure 2

Power of the treatment effect as a function of sample size (5, 50) for several values of Rho

If mLOD is outside of the range (mmin, mmax) for all the values of ρ in the parameter space, as shown in table 4, that is, mmax < mLOD. Section 3 shows that K is maximized at (mLOD, nLOD). If m < mLOD, then n > nLOD from m=Bc+sn. Since K is a decreasing function of n for n > nLOD, the power will be maximized at m = mmax. Let nmax denote the corresponding the number of repeated measurement at mmax. Obviously, K=nm1+(n-1)ρ at mmax is a decreasing function of ρ. Therefore, the algorithm also finds the optimal design at ρ = ρmax.

Table 5 shows the same study design as in table 3 but with the feasible sample size of (5, 80). The locally optimal designs are reached for smaller values of ρ while those are not for larger values of ρ within the parameter space. Table 5 presents LODs, the number of repeated measurements and actual power at the boundary of sample size: mmin and mmax, ( mLOD,nLOD) for each ρ. Finally, (mOD, nOD) = (80, 4) is the optimal design with the power of 89.7%.

Table 5.

Optimal design (OD) at unknown parameter value ρ in working correlation matrix for two-sample study with B = 15,000$, c = 100, and s = 20

ρ nLOD mLOD (1 − β)LOD
nmin
mmin
(1-β)min
nmax
mmax
(1-β)max
nLOD
mLOD
(1-β)LOD
0.05 9.7 50.9 0.998 10 50 0.998 9 53 0.997 10 50 0.998
0.1 6.7 64.1 0.990 7 62 0.989 6 68 0.989 6 68 0.989
0.15 5.3 72.7 0.977 6 68 0.976 5 75 0.977 5 75 0.977
0.2 4.5 79.2 0.962 5 75 0.961 4 83 0.961 5 75 0.961
0.25 3.9 84.5 0.946 145 5 0.207 4 80 0.937 4 80 0.937
0.3 3.4 89.1 0.929 145 5 0.180 4 80 0.918 4 80 0.918
0.35 3.0 93.2 0.913 145 5 0.161 4 80 0.897 4 80 0.897

(ρmin, ρmax) = (0.05, 0.35) and (mmin, mmax) = (5, 80).

For mLOD ∈ (mmin, mmax), nmin=nup,mmin=mup,nmax=ndown and mmax=mdown;nmin=nmin,mmin=mmin,nmax=nmax and mmax=mmax.

If mLOD is within the range (mmin, mmax) for smaller values of ρ and outside of the range (mmin, mmax) for larger values of ρ in the parameter space, as shown in table 5, then the maximized powers are reached under LOD for smaller values of ρ and at m = mmax for larger values of ρ. Combining the properties in tables 3 and 4, we can show that the algorithm finds the optimal design at ρ = ρmax for this scenario as well.

In conclusion, the maximized power in the design space is a decreasing function of ρ in the parameter space. Therefore, we can simplify the above algorithm as follows.

  • Step 1, Define the parameter and design space, (ρmin, ρmax) and (mmin, mmax), respectively.

  • Step 2, Consider ρ = ρmax, calculate mLOD using equation (9).

    1. If it is within the range (mmin, mmax), then set mOD = mLOD.

    2. If it is outside of (mmin, mmax), then set mOD = mmax. The number of repeats per subject nOD is calculated by (BmOD-c)/s.

When comparing the algorithm in Maximin designs (MMDs) based on relative efficiency (RE) and efficiency (Van Breukelen and Candel, 2015), we notice that our finalized algorithm is simpler because the unique characteristics of power-the maximized power decreases as ρ increases, shown in equation (10). Source code to reproduce the results is available as Supporting Information on the journal’s web page (http://onlinelibrary.wiley.com/doi/xxx/suppinfo).

5 Working correlation matrix misspecification

We assume the exchangeable working correlation matrix in equations (2), (4), (5), and sections 3–4. If assuming the first-order autoregressive (AR(1)) to be working correlation matrix, which considers two adjacent measurements in time are more correlated than two measurements with increasing distance between time points, one has

1R(ρ)-11=11+ρ[2ρ+n(1-ρ)]. (11)

Thus the total sample size m for binary outcome is m=(z1-α/2+z1-β)2×(π1p0(1-p0)+π0p1(1-p1))(1+ρ)π0π1(p1-p0)2[2ρ+n(1-ρ)] and for continuous outcome is m=(z1-α/2+z1-β)2σ2(1+ρ)π0π1φ12[2ρ+n(1-ρ)]. Therefore, z1-β=[zρ+n(1-ρ)]1+ρmΔ1-z1-α2, where Δ1 is defined in equations (3) and (6), respectively. Maximizing the power means maximizing K=[2ρ+n(1-ρ)]m1+ρ. Substituting m=Bc+sn and taking the partial derivatives with respect to n gives Kn(1-ρ)c-2ρs. If ρ<cc+2s, then K is an increasing function of n. That is, the power increases when n increases; If ρ>cc+2s, then K is an decreasing function of n. Therefore no local optimal design exists for GEE models with AR(1) working correlation matrix.

In sample size calculation or power estimation, we need to specify the correlation structure and thus can use it as the working correlation. The advantage of GEE is that the estimator φ is still consistent even when the working correlation matrix is incorrectly assumed based on Liang and Zeger (Liang and Zeger, 1986). In this section, we also discuss the situations when a working correlation matrix is misspecified, even if we may not know the true correlation structure. Let Rt and Rw denote the true and working correlation structure, respectively. We still assume that the number of repeated measures is assumed to be identical across the subjects, i.e., ni = n for all i.

For binary outcomes, Pan (Pan, 2001) shows that the robust variance estimator of m(φ^-φ) is

c[1π1p0(1-p0)+1π0p1(1-p1)],

where

c=1Rw-1RtRw-11(1Rw-11)2

If Rt = Rw are exchangeable matrices, or Rt is an exchangeable matrix and Rw is an independence model, then

c=1+(n-1)ρn.

It is exactly the same as equation (2). Therefore, the algorithm in section 3 and 4 works for such situations. If Rt = Rw are AR(1), c is same as equation (11) resulting in the fact that no optimal design exists.

For continuous outcomes, Wang and Carey (Wang et al., 2003) investigate the asymptotic relative efficiency of GEE when the working correlation structure and true correlation structure are different. If Rt is exchangeable and Rw is independence model, then the asymptotic relative efficiency is 1. That is, the discrepancies between working and true correlation matrix under this scenario has no impact on regression estimator. In conclusion, under the scenarios in which Rt = Rw are exchangeable matrices or Rt is an exchangeable matrix and Rw is an independence model, our algorithm in section 3 and 4 works for either binary or continuous outcomes.

6 Application

The estimates of (mOD, nOD) are dependent on c, s and B besides the parameter and design space. The parameter space may come from the literature review and design space is from the feasibility in a real world setting. We assume that c and s are fixed cost and can be known in advance. Therefore, it is particularly easy and straightforward to calculate the sample size and number of repeated measurements in section 3 for a known parameter value ρ and apply the algorithm in section 4 for unknown ρ in order to have an optimal design in the longitudinal study.

A key practical issue for researchers is budget setting at the stage of study design. Obviously, the total budget decides the possible total sample size and the possible number of repeated measurements. If the budget is too limited, the actual power is still very small even if using the optimal design. Here is an example showing budget estimates which may be useful in practice. The total sample size m′ is calculated assuming only one measurement per subject and then the budget is estimated by m′× c. Finally, the proposed budget in the longitudinal study needs to be larger than this estimated budget. Again, using the example in table 1 as an illustration, the total sample size is 124 if we assume there is only one measurement per subject under the type I error of 0.05 and power of 0.8. c = 100$ per subject cost results in the estimated budget is 12,400$. Therefore, we use 15,000$ and 20,000$ as the given budgets. The minimum of optimal power is more than 70% for c = 100 and s = 50, shown in table 1, and at least 80% for c = 100 and s = 10 in table 2.

The other issue in practice is the sample size calculation for each group. To find an appropriate total sample size m with the largest power in both equation (9) and the algorithm are needed to guarantee that 0 and 1 are integers for a two-group study.

7 Discussion

The equations and algorithm for sample size calculations (total sample size and the number of repeated measurements) were proposed in the longitudinal study under the considerations of a given budget. For a known parameter value ρ in the working correlation structure, the proposed formulas of (mLOD, nLOD) give the maximum of power under the constraints. While for the unknown parameter value ρ, the design space and parameter space need to be defined and the proposed algorithm finds the optimal design (mOD, nOD) which provides the minimum of maximized power for all the possible values of ρ within the parameter space.

In the paper (Van Breukelen and Candel, 2015), the efficient design was proposed for cluster randomized and multicentre trials with unknown intraclass correlation. However, there is no closed form for binary outcomes and the variance from binary outcome needs to be transformed to that of continuous outcomes. The rationals of our proposed method is the sample size formula based on GEE approach for continuous or binary responses (Liu and Liang, 1997). We derived the close form of optimal design and note that they have the similar format for two different type of outcomes. Van Breukelen et al (Van Breukelen and Candel, 2015) also considered the assumption of equal cluster size is not realistic. However, it is feasible and acceptable for longitudinal study to measure the same number of repeats for each subject.

There are several limitations to this approach. The first limitation is that the covariates were not included in the consideration. If we include the covariates in GEE model, the sample size formula is not shown to be simple as equation (4). Moreover, the performance of the sample size formula is sensitive to the distribution of the covariates (Liu and Liang, 1997). However, it is reasonable to calculate the sample size without the consideration of covariates in the design stage in general. The next limitation is, if ρ is a large number or the upper boundary in the parameter space, e.g. >0.7, the power in the optimal design may be low. In such a case, the budget needs to increase to a reasonable number. Section 6 illustrates how to set the budget in the design stage. The third limitation is that our proposed method is based on exchangeable working correlation matrix only. The key assumption is that both the true and working correlation matrices in sections 3 and 4 are the same and they are compound symmetry (CS). It may not hold in a real world application. When the working structure is incorrectly specified, the GEE moment estimator of the correlation matrix and “sandwich” based estimator of the asymptotic covariance matrix of the GEE estimator of the regression parameter may fail to be consistent (Crowder, 1995; Sutradhar et al., 1999). However, our proposed method still works under the scenario when the true and working correlation matrices are different, shown in section 5. The last limitation, or the direction of future research, is that only continuous and binary outcome are considered. We will consider optimal design for the count outcome using GEE model in the future.

In conclusion, it is easy for the researchers to design the longitudinal study using the method proposed in this paper. We believe that the proposed method is very useful, especially for designing the early stage studies when ρ is unknown and under the constraints of given budget.

Supplementary Material

Supplemental Material

Footnotes

Conflict of Interest

The authors have declared no conflict of interest.

References

  1. Amatya A, Bhaumik D, Gibbons R. Sample size determination for clustered count data. Statistics in Medicine. 2013;32:4162–4179. doi: 10.1002/sim.5819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Atkinson A, Donev A, Tobias R. Optimum experimental designs, with SAS. Oxford, UK: Oxford University Press; 2007. [Google Scholar]
  3. Austin P. A comparison of the statistical power of different methods for the analysis of cluster randomization trials with binary outcomes. Statistics in Medicine. 2007;26:3550–3565. doi: 10.1002/sim.2813. [DOI] [PubMed] [Google Scholar]
  4. Berger M, Wong W. An introduction to optimal designs for social and biomedical research. Chichester, UK: Wiley; 2009. [Google Scholar]
  5. Campbell MK, Mollison J, Steen N, Grimshaw JM, Eccles M. Analysis of cluster randomized trials in primary care: a practical approach. Family practice. 2000;17(2):192–196. doi: 10.1093/fampra/17.2.192. [DOI] [PubMed] [Google Scholar]
  6. Connelly L. Balancing the number and size of sites: an economic approach to the optimal design of cluster samples. Control Clin Trials. 2003;24:544–559. doi: 10.1016/s0197-2456(03)00093-x. [DOI] [PubMed] [Google Scholar]
  7. Crowder M. On the use of a working correlation matrix in using generalised linear models for repeated measures. Biometrika. 1995;82(2):407–410. [Google Scholar]
  8. Dahmen G, Rochon J, Konig IR, Ziegler A. Sample size calculations for controlled clinical trials using generalized estimating equations (GEE) Methods of information in medicine. 2004;43(5):451–456. [PubMed] [Google Scholar]
  9. Dahmen G, Ziegler A. Generalized Estimating Equations in Controlled Clinical Trials: Hypotheses Testing. Biometrical Journal. 2004;46(2):214–232. [Google Scholar]
  10. Dahmen G, Ziegler A. S17.1: Sample size calculations in controlled clinical trials with clustered data - a SAS-Program. Biometrical Journal. 2004;46(S1):36–36. [Google Scholar]
  11. Diggle P, Liang K-Y, Zeger SL. Analysis of Longitudinal Data. New York: Oxford University Press; 1994. [Google Scholar]
  12. Donner A, Birkett N, Buck C. Randomization by cluster-sample size requirements and analysis. Am J Epidemiol. 1981;114:906–914. doi: 10.1093/oxfordjournals.aje.a113261. [DOI] [PubMed] [Google Scholar]
  13. Eldridge S, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int J Epidemiol. 2006;35:1292–1300. doi: 10.1093/ije/dyl129. [DOI] [PubMed] [Google Scholar]
  14. Gravenstein S, Dahal R, Gozalo PL, Davidson HE, Han LF, Taljaard M, Mor V. A cluster randomized controlled trial comparing relative effectiveness of two licensed influenza vaccines in US nursing homes: Design and rationale. Clinical trials (London, England) 2016 doi: 10.1177/1740774515625976. [DOI] [PubMed] [Google Scholar]
  15. Gulliford MC, van Staa TP, McDermott L, McCann G, Charlton J, Dregan A. Cluster randomized trials utilizing primary care electronic health records: methodological issues in design, conduct, and analysis (eCRT Study) Trials. 2014;15:220. doi: 10.1186/1745-6215-15-220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Headrick T, Zumbo B. On optimizing multi-level designs: power under budget constraints. Austr N Z J Stat. 2005;47(2):219–229. [Google Scholar]
  17. Jeffe DB, Perez M, Cole EF, Liu Y, Schootman M. The Effects of Surgery Type and Chemotherapy on Early-Stage Breast Cancer Patients’ Quality of Life Over 2-Year Follow-up. Annals of surgical oncology. 2016;23(3):735–743. doi: 10.1245/s10434-015-4926-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kalfon P, Mimoz O, Loundou A, Geantot MA, Revel N, Villard I, Amour J, Azoulay E, Garrouste-Orgeas M, Martin C, Sharshar T, Baumstarck K, Auquier P. Reduction of self-perceived discomforts in critically ill patients in French intensive care units: study protocol for a cluster-randomized controlled trial. Trials. 2016;17(1):87. doi: 10.1186/s13063-016-1211-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lee EW, Dubin N. Estimation and sample size considerations for clustered binary responses. Statistics in Medicine. 1994;13(12):1241–1252. doi: 10.1002/sim.4780131206. [DOI] [PubMed] [Google Scholar]
  20. Liang KY, Zeger SL. Longitudinal Data Analysis Using Generalized Linear Models. Biometrika. 1986;73(1):13–22. [Google Scholar]
  21. Lin CC, Bruinooge SS, Kirkwood MK, Olsen C, Jemal A, Bajorin D, Giordano SH, Goldstein M, Guadagnolo BA, Kosty M, Hopkins S, Yu JB, Arnone A, Hanley A, Stevens S, Hershman DL. Association Between Geographic Access to Cancer Care, Insurance, and Receipt of Chemotherapy: Geographic Distribution of Oncologists and Travel Distance. Journal of clinical oncology : official journal of the American Society of Clinical Oncology. 2015;33(28):3177–3185. doi: 10.1200/JCO.2015.61.1558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Liu G, Liang KY. Sample Size Calculations for Studies with Correlated Observations. Biometrics. 1997;53(3):937–947. [PubMed] [Google Scholar]
  23. Liu X. Statistical power and optimum sample allocation ratio for treatment and control having unequal costs per unit of randomization. J Educ Behav Stat. 2003;28(3):231–248. [Google Scholar]
  24. Mario JNMO, Frans EST, Martijn PFB. Maximin D-Optimal Designs for Longitudinal Mixed Effects Models. Biometrics. 2002;58(4):735–741. doi: 10.1111/j.0006-341x.2002.00735.x. [DOI] [PubMed] [Google Scholar]
  25. Maus B, van Breukelen GJ, Goebel R, Berger MP. Robustness of optimal design of fMRI experiments with application of a genetic algorithm. NeuroImage. 2010;49(3):2433–2443. doi: 10.1016/j.neuroimage.2009.10.004. [DOI] [PubMed] [Google Scholar]
  26. Mehring M, Haag M, Linde K, Wagenpfeil S, Schneider A. Effects of a Web-Based Intervention for Stress Reduction in Primary Care: A Cluster Randomized Controlled Trial. Journal of medical Internet research. 2016;18(2):e27. doi: 10.2196/jmir.4246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Moerbeek M, Van Breukelen G, Berger M. Design Issues for Experiments in Multilevel Populations. Journal of Educational and Behavioral Statistics. 2000;25(3):271–284. [Google Scholar]
  28. Moerbeek M, Van Breukelen G, Berger M. Optimal experimental design for multilevel logistic models. The Statistician. 2001;50(1):17–30. [Google Scholar]
  29. Moerbeek M, Van Breukelen G, Berger M. Optimal experimental designs for multilevel models with covariates. Commun Stat Theory Methods. 2001;30(12):2683–2697. [Google Scholar]
  30. Nagayama H, Tomori K, Ohno K, Takahashi K, Ogahara K, Sawada T, Uezu S, Nagatani R, Yamauchi K. Effectiveness and Cost-Effectiveness of Occupation-Based Occupational Therapy Using the Aid for Decision Making in Occupation Choice (ADOC) for Older Residents: Pilot Cluster Randomized Controlled Trial. PloS one. 2016;11(3):e0150374. doi: 10.1371/journal.pone.0150374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Neuhaus JM, Segal MR. Design effects for binary regression models fitted to dependent data. Stat Med. 1993;12(13):1259–1268. doi: 10.1002/sim.4780121307. [DOI] [PubMed] [Google Scholar]
  32. Pan W. Sample size and power calculations with correlated binary data. Control Clin Trials. 2001;22(3):211–227. doi: 10.1016/s0197-2456(01)00131-3. [DOI] [PubMed] [Google Scholar]
  33. Park YH, Jung KH, Im SA, Sohn JH, Ro J, Ahn JH, Kim SB, Nam BH, Oh do Y, Han SW, Lee S, Park IH, Lee KS, Kim JH, Kang SY, Lee MH, Park HS, Woo SY, Jung SH, Ahn JS, Im YH. Quality of life (QoL) in metastatic breast cancer patients with maintenance paclitaxel plus gemcitabine (PG) chemotherapy: results from phase III, multicenter, randomized trial of maintenance chemotherapy versus observation (KCSG-BR07-02) Breast cancer research and treatment. 2015;152(1):77–85. doi: 10.1007/s10549-015-3450-z. [DOI] [PubMed] [Google Scholar]
  34. Raudenbush S. Statistical analysis and optimal design for cluster randomized trials. Psychol Methods. 1997;2:173–185. doi: 10.1037/1082-989x.5.2.199. [DOI] [PubMed] [Google Scholar]
  35. Raudenbush S, Liu X. Statistical power and optimal design for multisite trials. Psychol Methods. 2000;5(2):199–213. doi: 10.1037/1082-989x.5.2.199. [DOI] [PubMed] [Google Scholar]
  36. Rosner B, Glynn R. Power and Sample size estimation for the clustered Wilcoxon test. Biometrics. 2011;67:646–653. doi: 10.1111/j.1541-0420.2010.01488.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Rosner B, Glynn R, Lee M. Incorporation of clustering effects for the Wilcoxon rank sum test: a large-sample approach. Biometrics. 2003;59:1089–1098. doi: 10.1111/j.0006-341x.2003.00125.x. [DOI] [PubMed] [Google Scholar]
  38. Sanda MG, Dunn RL, Michalski J, Sandler HM, Northouse L, Hembroff L, Lin X, Greenfield TK, Litwin MS, Saigal CS, Mahadevan A, Klein E, Kibel A, Pisters LL, Kuban D, Kaplan I, Wood D, Ciezki J, Shah N, Wei JT. Quality of life and satisfaction with outcome among prostate-cancer survivors. The New England journal of medicine. 2008;358(12):1250–1261. doi: 10.1056/NEJMoa074311. [DOI] [PubMed] [Google Scholar]
  39. Schlesselman JJ. Case-Control Studies: Design, Conduct, Analysis. New York: Oxford University Press; 1982. [Google Scholar]
  40. Scott AJ, Holt D. The Effect of Two-Stage Sampling on Ordinary Least Squares Methods. Journal of the American Statistical Association. 1982;77(380):848–854. [Google Scholar]
  41. Self SG, Mauritsen RH. Power/Sample Size Calculations for Generalized Linear Models. Biometrics. 1988;44(1):79–86. [Google Scholar]
  42. Self SG, Mauritsen RH, Ohara J. Power Calculations for Likelihood Ratio Tests in Generalized Linear Models. Biometrics. 1992;48(1):31–39. [Google Scholar]
  43. Shuster JJ. Practical Handbook of Sample Size Guidelines for Clinical Trials. CRC Press; 1993. [Google Scholar]
  44. Sutradhar B, Das K. Miscellanea. On the efficiency of regression estimators in generalised linear models for longitudinal data. Biometrika. 1999;86(2):459–465. [Google Scholar]
  45. Toriola AT, Liu J, Ganz PA, Colditz GA, Yang L, Izadi S, Naughton MJ, Schwartz AL, Wolin KY. Effect of weight loss on bone health in overweight/obese postmenopausal breast cancer survivors. Breast cancer research and treatment. 2015;152(3):637–643. doi: 10.1007/s10549-015-3496-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Van Breukelen G, Candel M. Efficient design of cluster randomized and multicentre trials with unknown intraclass correlation. Stat Methods Med Res. 2015;24(5):540–556. doi: 10.1177/0962280211421344. [DOI] [PubMed] [Google Scholar]
  47. Vickers AJ. How many repeated measures in repeated measures designs? Statistical issues for comparative trials. BMC medical research methodology. 2003;3:22. doi: 10.1186/1471-2288-3-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Wang YG, Carey V. Working correlation structure misspecification, estimation and covariate design: Implications for generalised estimating equations performance. Biometrika. 2003;90(1):29–41. [Google Scholar]
  49. Winkens B, Schouten HJ, van Breukelen GJ, Berger MP. Optimal designs for clinical trials with second-order polynomial treatment effects. Stat Methods Med Res. 2007;16(6):523–537. doi: 10.1177/0962280206071847. [DOI] [PubMed] [Google Scholar]
  50. Yamagata K, Makino H, Iseki K, Ito S, Kimura K, Kusano E, Shibata T, Tomita K, Narita I, Nishino T, Fujigaki Y, Mitarai T, Watanabe T, Wada T, Nakamura T, Matsuo S. Effect of Behavior Modification on Outcome in Early- to Moderate-Stage Chronic Kidney Disease: A Cluster-Randomized Trial. PloS one. 2016;11(3):e0151422. doi: 10.1371/journal.pone.0151422. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

RESOURCES