Sample size requirements to detect an intervention by time interaction in longitudinal cluster randomized clinical trials

Moonseong Heo; Andrew C Leon

doi:10.1002/sim.3527

. Author manuscript; available in PMC: 2009 Oct 7.

Published in final edited form as: Stat Med. 2009 Mar 15;28(6):1017–1027. doi: 10.1002/sim.3527

Sample size requirements to detect an intervention by time interaction in longitudinal cluster randomized clinical trials

Moonseong Heo ¹, Andrew C Leon ^2,³

PMCID: PMC2758777 NIHMSID: NIHMS132761 PMID: 19153969

Abstract

In designing a longitudinal cluster randomized clinical trial (cluster-RCT), the interventions are randomly assigned to clusters such as clinics. Subjects within the same clinic will receive the identical intervention. Each will be assessed repeatedly over the course of the study. A mixed-effects linear regression model can be applied in a cluster-RCT with three level data to test the hypothesis that the intervention groups differ in the course of outcome over time. Using a test statistic based on maximum likelihood estimates, we derived closed form formulae for statistical power to detect the intervention by time interaction and the sample size requirements for each level. Importantly, the sample size does not depend on correlations among second level data units and the statistical power function depends on the number of second and third level data units through their product. A simulation study confirmed that theoretical power estimates based on the derived formulae are nearly identical to empirical estimates.

Keywords: longitudinal cluster RCT, three level data, power, sample size, intervention by time interaction, effect size

1. Introduction

A longitudinal cluster randomized trial (cluster-RCT) assumes a three level data structure in that the time-specific outcome assessments are nested within subjects who in turn, are nested within the randomized clusters. For instance, consider a study designed to test the effect of an experimental intervention of physician training on the reduction of severity of patients' symptoms of depression over time. In this design, primary care clinics are randomly assigned to either experimental or control intervention and each physician within an experimental clinic is trained to detect and treat depression. Each physician will treat multiple subjects, who, in turn, repeatedly measured on severity of depression symptoms over time.

The primary hypothesis in such a study would focus on the difference in declines of symptom severity over time between subjects who were treated by physicians with and without the experimental intervention. The three level data in a longitudinal cluster-RCT could test the significance of the intervention by time interaction using a mixed-effects linear regression model [1-3].

Sample size determination and power calculations are essential in designing a cluster-RCT. The number of clusters that is required for a target statistical power must be estimated at the experimental design stage. To this end, we build on sample size formulae for two level data structures [4-6] to derive explicitly closed form power function and sample size formulae for detecting a hypothesized interaction effect. The derivations are based on a distribution of a test statistic that used the maximum likelihood estimate of the interaction effect. A simulation study followed to verify the statistical power achieved with the estimated sample sizes.

2. Statistical Model

A three level mixed-effects linear model for outcome Y can be written as follows:

Y_{ijk} = β_{0} + ξ X_{ijk} + τ T_{ijk} + δ X_{ijk} T_{ijk} + u_{i} + u_{j (i)} + e_{ijk},

(1)

where i =1,2,…,2N₃ is the index for the level three unit (e.g., clinic); j = 1,…, N₂, is the index for the level two unit (e.g., subject) nested within each i; and k = 1, 2, …, N₁, is the index for the level one unit (e.g., repeated outcome observations) within each j. The intervention assignment indicator variable X_ijk = 0 if the i-th level three unit is assigned to a control intervention and X_ijk = 1 if assigned to an experimental intervention; therefore X_ijk = X_i for all j and k. Furthermore, here a balanced design is assumed in that Σ_iX_i = N₃. The time variable is denoted by T_ijk. In this study, it is assumed that T_ijk = T_k for all i and j, and that the time increase from 0 (the baseline) to T_end = N₁ - 1 (the last time point) by 1 with equal time intervals. Therefore, the parameter ξ represent the intervention effect at the baseline, and the parameter τ represents the slope of time effect, that is, decline in symptom severities over time. Finally, the intervention by time effect δ is of primary interest representing the slope difference in outcome Y between the intervention groups, or additional decline in the experimental group. The overall fixed intercept is denoted by β₀.

It is assumed that the error term e_ijk is normally distributed as $N (0, σ_{e}^{2})$ , the level two random intercept $u_{j (i)} ~ N (0, σ_{2}^{2})$ and the level three random intercept $u_{i} ~ N (0, σ_{3}^{2})$ . Among those random components, it is further assumed that u_i ⊥ u_j(i) ⊥ e_ijk, i.e., these three random components are mutually independent. In addition, conditional independence is assumed for all u_j(i) and for all e_ijk, whereas as u_i are unconditionally independent. That is, u_j(i) are independent conditional on u_i, and e_ijk are independent conditional on both u_i and u_j(i). After all, β₀, ξ, τ and δ are fixed effect parameters and the last three terms in model (1) are random effects.

As the parameter δ is of the primary interest, the null hypothesis to be tested is:

H_{0} : δ = 0

(2)

Under model (1), with its accompanying assumptions such as conditional independence among random components, it can be shown that the elements of the mean vector are

E (Y_{ijk}) = β_{0} + ξ X_{i} + τ T_{k} + δ X_{i} T_{k}

(3)

and that the elements of the covariance matrix are:

Cov (V_{ijk}, Y_{i' j' k'}) = 1 (i = i' & j = j' & k = k') σ_{e}^{2} + 1 (i = i' & j = j') σ_{2}^{2} + 1 (i = i') σ_{3}^{2},

(4)

where 1(.) is an indicator function. This yields in particular,

σ^{2} \equiv Var (Y_{ijk}) = Cov (Y_{ijk}, Y_{ijk}) = σ_{e}^{2} + σ_{2}^{2} + σ_{3}^{2} .

Therefore, the correlation among level two data can be written for j ≠ j' as follows.

ρ_{2} = Corr (Y_{ijk}, Y_{ij' k'}) = \frac{σ_{3}^{2}}{σ_{e}^{2} + σ_{2}^{2} + σ_{3}^{2}} = \frac{σ_{3}^{2}}{σ^{2}} .

(5)

And, the correlation among level one data can be written for k ≠ k',

ρ_{2} = Corr (Y_{ijk}, Y_{ij k'}) = \frac{σ_{2}^{2} + σ_{3}^{2}}{σ_{e}^{2} + σ_{2}^{2} + σ_{3}^{2}} = \frac{σ_{2}^{2} + σ_{3}^{2}}{σ^{2}} .

(6)

It can be easily seen that ρ₁ ≥ ρ₂ with equality when $σ_{2}^{2} = 0$ .

3. Maximum Likelihood Estimate and its Variance

The maximum likelihood estimate (MLE) $\hat{δ}$ of the interaction effect is indeed the slope difference between the two groups: that is,

\hat{δ} = {\hat{η}}_{1} - {\hat{η}}_{0},

(7)

where ${\hat{η}}_{g} (g = 0, 1)$ is the MLE of the slope for the outcome Y in the g-th group, in which X_i = g. Specifically, for i in the g-th group,

\begin{matrix} {\hat{η}}_{g} & = \sum_{i = 1}^{N_{3}} \sum_{j = 1}^{N_{2}} \sum_{k = 1}^{N_{1}} (T_{k} - \overset{‒}{T}) (Y_{ijk} - {\overset{‒}{Y}}_{g}) / \sum_{i = 1}^{N_{3}} \sum_{j = 1}^{N_{2}} \sum_{k = 1}^{N_{1}} {(T_{k} - \overset{‒}{T})}^{2} \\ = \sum_{i = 1}^{N_{3}} \sum_{j = 1}^{N_{2}} \sum_{k = 1}^{N_{1}} (T_{k} - \overset{‒}{T}) (Y_{ijk} - {\overset{‒}{Y}}_{g}) / N_{3} N_{2} N_{1} {Var}_{p} (T), \end{matrix}

(8)

where: 1) ${\overset{‒}{Y}}_{g} (g = 0, 1)$ is the overall group mean of the outcome Y for the g-th group; 2) $\overset{‒}{T} = Σ_{k = 1}^{N_{1}} T_{k} / N_{1}$ is the “mean” time point; and 3) ${Var}_{p} (T) = Σ_{k = 1}^{N_{1}} {(T_{k} - \overset{‒}{T})}^{2} / N_{1}$ is the “population variance” of the time variable T. In fact, the slope estimate (8), but not the variance of the slope estimate, is the same as that of an ordinary linear regression with u_i = u_j(i) = 0 in model (1). The reason for this, on a heuristic level, is that weights assigned to data points Y_ijk in estimation of the slopes are identical and the slopes do not depend on random intercepts of any data level. Indeed, the ordinary least square estimate (8) is the mle under a perfectly balanced design [2] that we are considering in this paper.

Based on equations (3) and (8), it can easily be shown that the MLE $\hat{δ}$ is unbiased, i.e., $E (\hat{δ}) = E ({\hat{η}}_{1} - {\hat{η}}_{0}) = (τ + δ) - τ = δ$ . The variance of a slope MLE ${\hat{η}}_{δ}$ can be obtained based on equation (4) as follows (see Appendix for a proof):

Var ({\hat{η}}_{g}) = \frac{σ_{e}^{2}}{N_{3} N_{2} N_{1} {Var}_{p} (T)} = \frac{(1 - ρ_{1}) σ^{2}}{N_{3} N_{2} N_{1} {Var}_{p} (T)} .

(9)

Therefore, the variance of $\hat{δ}$ is

Var (\hat{δ}) = Var ({\hat{η}}_{1} - {\hat{η}}_{0}) = Var ({\hat{η}}_{1}) + Var ({\hat{η}}_{0}) = \frac{2 (1 - ρ_{1}) σ^{2}}{N_{3} N_{2} N_{1} {Var}_{p} (T)} .

(10)

Observe that ${\hat{η}}_{1}$ and ${\hat{η}}_{0}$ are independent each other. It is notable, however, that the variance of $\hat{δ}$ depends only on the residual variance $σ_{e}^{2}$ , and none of $σ_{3}^{2}$ , $σ_{2}^{2}$ , or ρ₂. Therefore, for a given total variance σ², it decreases with decreasing $σ_{e}^{2}$ or increasing ρ₁, the correlation among the first level data.

4. Power and sample size

The following test statistic D, based on (7) and (10), can be used to test the null hypothesis (2):

D = \frac{\hat{δ}}{se (\hat{δ})} = \frac{\hat{δ}}{\sqrt{Var ({\hat{η}}_{1}) + Var ({\hat{η}}_{0})}} = \frac{\sqrt{N_{3} N_{2} N_{1} {Var}_{p} (T)} ({\hat{η}}_{1} - {\hat{η}}_{0})}{σ \sqrt{2 (1 - ρ_{1})}} .

(11)

If the three variance components— $σ_{2}^{2}$ , $σ_{3}^{2}$ and $σ_{e}^{2}$ — are known, then the test statistic D is normally distributed with mean $δ ∕ se (\hat{δ})$ and variance 1. When those three variance components are unknown and replaced by their MLE's, the test statistic D becomes a Wald test statistic and its asymptotic distribution is normal based on a large sample theory [7]. Thus, under the null hypothesis (2), D ~ N(0, 1) and under an alternative hypothesis of $δ \neq 0, D ~ N (δ ∕ se (\hat{δ}), 1)$ .

The power of the test statistic D, denoted by φ, can therefore be written as follows:

φ = 1 - β = Φ [\frac{δ}{σ} \sqrt{\frac{N_{3} N_{2} N_{1} {Var}_{p} (T)}{2 (1 - ρ_{1})}} - Φ^{- 1} (1 - α ∕ 2)],

(12)

where α is a two-sided significance level; β represents the probability of type II error; Φ is the cumulative distribution function (CDF) of a standard normal distribution and Φ^-1 is its inverse. From now on, it is understood that: 1) δ = |δ| > 0; and 2) the probability below a critical value, Φ^-1(α/2), in the other side under the alternative hypothesis is negligible and thus assumed to be 0. When the slope difference is expressed in pooled within-group standard deviation (SD) units, i.e., when expressed in terms of a standardized effect size

Δ_{δ} = δ ∕ σ,

the power function can be expressed as follows:

φ = Φ [Δ_{δ} \sqrt{N_{3} N_{2} N_{1} {Var}_{p} (T) ∕ 2 (1 - ρ_{1})} - Φ^{- 1} (1 - α ∕ 2) .]

(13)

It follows that when the hypothesis testing is based on D with a two-sided significance level of α, the third level unit sample size N₃ per group for a desired statistical power φ = 1 - β can be calculated from equation (12) as:

N_{3} = \frac{2 {(Φ^{- 1} (1 - α ∕ 2) + Φ^{- 1} (1 - β))}^{2} (1 - ρ_{1}) σ^{2}}{N_{2} N_{1} {Var}_{p} (T) δ^{2}},

(14)

or equivalently in terms of the standardized effect size Δ_δ from equation (13)

N_{3} = \frac{2 {(Φ^{- 1} (1 - α ∕ 2) + Φ^{- 1} (1 - β))}^{2} (1 - ρ_{1})}{N_{2} N_{1} {Var}_{p} (T) Δ_{δ}^{2}} .

(15)

More precisely, N₃ is the smallest integer greater than the right hand side of equation (14) or (15). It can be observed that the level 3 sample size is a deceasing function of increasing ρ₁ and Var_p(T) in particular. Stated differently, more follow-up with more consistent (as opposed to erratic) observations within subjects over time will increase the power (15) and at the same time will reduce sample size required of N₃ or N₂ for the same anticipated power.

The sample size N₂ has a reciprocal relationship with N₃ in a sense that the power depends through N₂N₃ because both are free each other and of the other parameters. Therefore, sample size N₂ for the level two data can immediately be determined from equation (15) as follows:

N_{2} = \frac{2 {(Φ^{- 1} (1 - α ∕ 2) + Φ^{- 1} (1 - β))}^{2} (1 - ρ_{1})}{N_{3} N_{1} {Var}_{p} (T) Δ_{δ}^{2}} .

(16)

The sample size N₁ for the level one data should, however, be determined in an iterative manner because Var_p(T) is a function of N₁. Specifically, an iterative solution for N₁ must satisfy the following equation:

N_{1} = \frac{2 {(Φ^{- 1} (1 - α ∕ 2) + Φ^{- 1} (1 - β))}^{2} (1 - ρ_{1})}{N_{3} N_{2} {Var}_{p} (T) Δ_{δ}^{2}} .

(17)

5. Simulation study specification

We conducted simulation studies to verify the sample size N₃ (15) and the power function (13) using SAS PROC MIXED, which is suitable for fitting the three-level mixed-effects linear model (1). For a two-sided significance level α = 0.05 and a desired power φ = 0.8, the following combinations of the simulation parameters were prespecified: Δ_δT_end = Δ_δ(N₁ - 1) = 0.3, 0.4, 0.5; N₂ = 5, 10, 20, 30; N₁ = 3, 6, 12; ρ₁ = 0.4, 0.5, 0.6 while without loss of generality σ = 1, ρ₂ = 0.05, β₀ = ξ = 0, and τ = -1 (in model (1)) remained fixed. This 3×4×3×3 factorial design scheme yielded a total of 108 combinations of those parameters. In particular, the effect size of the interaction, or the between-group slope difference Δ_δ, is specified in a way that it would yield a standardized between-group mean difference Δ_δT_end at the end of trial, i.e., when T = T_end = N₁ - 1.

To generate simulated data, we first estimated N₃ using equation (15) for a given combination (see step 2 below). Specifically, for each combination we followed the following steps for simulations:

Calculate the variance of time, Var_p(T), for given N₁;
Calculate N₃ (15) with the computed Var_p(T) and given α, φ, N₁, N₂, and Δ_δ;
Calculate variance components, $σ_{2}^{2}$ , and $σ_{3}^{2}$ based on equations (5) and (6) for given ρ₁, ρ₂ and σ²; Specifically, $σ_{2}^{2} = (ρ_{1} - ρ_{2}) σ^{2}$ and $σ_{3}^{2} = ρ_{2} σ^{2}$ ;
Calculate $σ_{e}^{2} = σ^{2} - (σ_{3}^{2} + σ_{2}^{2})$ ;
Calculate δ =σΔ_δ for the given σ² and Δ_δ;
Generate the random intervention assignment indicator X_i = 0 or 1 for each i = 1,2,.., 2N₃ in a balanced manner so that Σ_iX_i = N₃;
Generate u_i from $N (0, σ_{2}^{2})$ independently for each i = 1,2,…,2N₃ (Unconditional independence assumption);
For each u_i, generate u_j(i) from $N (0, σ_{2}^{2})$ independently for j = 1,2,…,N₂ (Conditional independence assumption);
For each combination of u_i and u_j(i), generate e_ijk from N(0, σ_e²) independently for k = 1,2, …,N₁ (Conditional independence assumption);
Generate outcome data set for Y_ijk = β₀ + ξX_i + τT_k + δX_iT_k + u_i + u_j(i) + e_ijk (1);
Fit the data set with the three-level linear mixed-effects model (1);
Retain a p-value, denoted by p_s(δ) for the s-th simulated data set, obtained from testing the null hypothesis (2);
Repeat the steps 6-12 for 1000 times (i.e., s = 1, 2, …, 1000) for each combination of the simulation parameters.

Let us denote the empirical power by $\tilde{φ}$ that is obtained from the 1000 simulations as follows:

\tilde{φ} = \sum_{s = 1}^{1000} 1 {p_{s} (δ) < α} ∕ 1000 .

(18)

This empirical power is compared with the theoretical power φ that is computed based on N₃ obtained in step 2 above, but not with the prespecified power of 0.8. It should be noted that the theoretical power φ obtained in that way is never less than the prespecified power of 0.8 since N₃ is the smallest integer greater than the right hand side of equation (15).

6. Simulation study results

Table 1 summarizes the specified (N₂ and N₁) and estimated (N₃) sample sizes, the empirical power $\tilde{φ}$ (18) and the theoretical power φ (13) based on the estimated N₃. Although the empirical power is negligibly underestimated as reflected on the mean differences in the last row in Table 1, it is virtually identical to the theoretical power. For instance, among the 108 combinations (Table 1), the maximum absolute difference $∣ φ - \tilde{φ} ∣$ was 0.027, which is tolerable given that the width of the 95% confidence interval for simulation estimates is $\pm 1.96 \sqrt{0.8 \times 0.2 ∕ 1000} = \pm 0.025$ . Thus, the derived formulae for sample size and the power are very accurate under the conditions that were examined. In each case, the theoretical power is no less than 0.8, since the power calculations were based on “integer” values of N₃.

Table 1.

Sample size N₃ theoretical power φ and empirical power $\tilde{φ}$ for testing intervention group by time interaction effect in a three level mixed-effects linear regression analysis, based on 1000 simulations.

			Δ_δT_end = 0.3			Δ_δT_end = 0.4			Δ_δT_end = 0.5
N₂	N₁	ρ ₁	N₃	φ	$\tilde{φ}$	N₃	φ	$\tilde{φ}$	N₃	φ	$\tilde{φ}$
5	3	0.4	42	0.801	0.798	24	0.807	0.787	16	0.823	0.838
		0.5	35	0.801	0.804	20	0.807	0.810	13	0.813	0.806
		0.6	28	0.801	0.818	16	0.807	0.810	11	0.834	0.839
	6	0.4	30	0.801	0.803	17	0.804	0.807	11	0.808	0.836
		0.5	25	0.801	0.775	15	0.826	0.835	9	0.801	0.798
		0.6	20	0.801	0.810	12	0.826	0.802	8	0.841	0.847
	12	0.4	18	0.806	0.831	10	0.801	0.800	7	0.835	0.832
		0.5	15	0.806	0.821	9	0.831	0.829	6	0.845	0.837
		0.6	12	0.806	0.794	7	0.820	0.817	5	0.860	0.855
10	3	0.4	21	0.801	0.801	12	0.807	0.788	8	0.823	0.829
		0.5	18	0.812	0.811	10	0.807	0.804	7	0.841	0.852
		0.6	14	0.801	0.791	8	0.807	0.802	6	0.865	0.865
	6	0.4	15	0.801	0.809	9	0.826	0.845	6	0.841	0.843
		0.5	13	0.816	0.809	8	0.849	0.855	5	0.841	0.846
		0.6	10	0.801	0.795	6	0.826	0.822	4	0.841	0.829
	12	0.4	9	0.806	0.788	5	0.801	0.814	4	0.881	0.875
		0.5	8	0.831	0.822	5	0.868	0.869	3	0.845	0.834
		0.6	6	0.806	0.784	4	0.868	0.878	3	0.914	0.912
20	3	0.4	11	0.819	0.825	6	0.807	0.793	4	0.823	0.826
		0.5	9	0.812	0.809	5	0.807	0.793	4	0.885	0.878
		0.6	7	0.801	0.784	4	0.807	0.805	3	0.865	0.863
	6	0.4	8	0.826	0.816	5	0.863	0.862	3	0.841	0.853
		0.5	7	0.844	0.822	4	0.849	0.826	3	0.900	0.903
		0.6	5	0.801	0.800	3	0.826	0.838	2	0.841	0.839
	12	0.4	5	0.845	0.850	3	0.868	0.857	2	0.881	0.866
		0.5	4	0.831	0.842	3	0.920	0.919	2	0.930	0.927
		0.6	3	0.806	0.800	2	0.868	0.871	2	0.970	0.964
30	3	0.4	7	0.801	0.823	4	0.807	0.800	3	0.865	0.863
		0.5	6	0.812	0.806	4	0.873	0.867	3	0.918	0.901
		0.6	5	0.828	0.824	3	0.851	0.846	2	0.865	0.851
	6	0.4	5	0.801	0.801	3	0.826	0.829	2	0.841	0.839
		0.5	5	0.867	0.867	3	0.888	0.887	2	0.900	0.900
		0.6	4	0.867	0.868	2	0.826	0.828	2	0.952	0.956
	12	0.4	3	0.806	0.819	2	0.868	0.857	2	0.970	0.964
		0.5	3	0.872	0.871	2	0.920	0.929	1	0.845	0.847
		0.6	2	0.806	0.801	2	0.965	0.964	1	0.914	0.916

Mean				0.815	0.814		0.840	0.837		0.866	0.865

Open in a new tab

N₁ = the number of level one units (repeated measures) per subjects; N₂ = the number of level two units (subjects) per clinic; N₃ = the number of level three units (clinics) per group, i.e., the sample size obtained from equation (15); T_endd = N₁ - 1; ρ₁ = correlation among level one data (5); φ = theoretical power based on the formula (13); $\tilde{φ}$ = empirical power based on equation (18); Δ_δ = standardized effect size of the slope difference that yields an intervention efect Δ_δT_end at the end of a study.

As expected, the sample size N₃ for the identical power decreases with increasing correlation ρ₁ when the other design parameters are held the same. For example, when N₂ = 5, N₁ = 6, and Δ_δT_end = 0.3, (or Δ_δ = 0.3/5 = 0.06) the respective sample sizes requirements for 80% power, for the level three data (N₃), were 30, 25, and 20 for ρ₁ = 0.4, 0.5, and 0.6. Furthermore, the theoretical power is identical for various combinations of N₂ and N₃ that yield an equivalent product, assuming other design parameters are held constant. For instance, as shown in Table 1, each the following pairs of N₂ and N₃ with a product of 210 yielded identical power of 0.801 when N₁ = 3, ρ₁ = 0.4, Δ_δT_end = 0.3 (or Δ_δ= 0.3/2 = 0.15): N₂ = 5 and N₃ = 42; N₂ = 10 and N₃ = 21; N₂ = 30 and N₃ = 7.

7. Application

The results in Table 1 can be applied to designing a longitudinal cluster-RCT. Consider, for instance, a longitudinal cluster-RCT that compares an innovative primary care level intervention with a usual primary care practice on depression outcome of subjects as conducted in the PROSPECT [8,9] and the RESPECT [10] trials. To test whether the course of depressive symptoms over time depends on the care that the subjects receive, it is anticipated that primary clinics can accommodate 20 subjects (N₂) for the research purpose and each patient would be followed up for 6 times (N₁) for assessments. The results presented in Table 1 can be applied to estimating number of primary clinics, i.e., level 3 units (N₃), for 80% power. If ρ₁ = 0.5, then four clinics (N₃) for each of the two intervention groups, or a total of 160 subjects, would be needed to detect an effect size Δ_δT_end = 5Δ_δ = 0.4 (or Δ_δ = 0.4/5 = 0.08) with at least 80% statistical power (Table 1). Sample size requirements for other design parameters can be obtained from Table 1. For other combinations of design specification that were not presented in Table 1, the sample size formula (18) can be applied.

8. Discussion

The derived power function (13) and level 3 unit sample size formula (15) requirements to detect an intervention by time interaction are shown to be accurate compared to empirical estimates based on a simulation study. Therefore, sample size formulae (16, 27) for number of level 2 and level 1 data units are also accurate because they are different expressions of equation (15). Importantly, the sample size did not depend on correlations among second level data units and the statistical power function depends on the number of second and third level data units through their product. Furthermore, when either N₃ or N₂ is equal to one, it reduces the level 3 data structure to that of level 2 data with the number of second level data as N₂ or N₃ correspondingly. In either case, the variance $σ_{3}^{2}$ of the level three random intercept can be considered to be 0 and thus ρ₂ can be assumed to be 0. This reduces the sample size formula (14) to equation (2.4.1) in Diggle et al [6] on its page 29, as it should. In Diggle et al's formula too, it can be found that the power function is increasing in ρ₁.

Collectively, therefore, as far as testing the intervention by time interaction is concerned, the design can be very flexible for the same statistical power depending on feasibility. For example, when N₃N₂ = 200 subjects per group is needed for 80% power, then sample sizes for N₃ and N₂ can be determined depending on availability of recruitment of level two and level three units regardless of an anticipated ρ₂. To this end, if recruitment of 10 subjects (N₂) per clinic was feasible, then the investigators could try to enlist 20 clinics (N₃) per intervention group. On the other hand, if only 5 clinics (N₃) were available per intervention group, then recruitment of 40 subjects (N₂) per clinic would be required. In an extreme case where only one clinic (N₃=1) is available, one could recruit 200 subjects (N₂) from the single clinic.

Although the empirical power was based on unknown variance components of random effects, it was virtually identical to the theoretical power derived with known variance components in the test statistic D (11). Therefore, derivation of power function with unknown variances may not be necessary even for small N₃, although it might be possible through application of CDFs of central and non-central t distributions [11] replacing the standard normal CDF Φ and its inverse Φ^-1 in equation (14) or (15).

It should be noted that the sample size formula is to detect a slope difference per se but not an expected between-group difference at T_end, the end of a study. In other words, the sample formula (15) derived herein is not appropriate to detect an intervention effect at a prespecified time point such as the end of a trial. It is because the variance of this effect is not equal to $T_{end}^{2} Var ({\hat{η}}_{1} - {\hat{η}}_{0})$ , even if the estimated quantities are the same. Thus, this intervention effect, Δ_δT_end, served as the basis for estimating a hypothesized slope difference Δ_δ.

Other sample size formulae are available. For instance, Liu et al [12] derived sample size formulas for the slope difference using generalized estimating equations. Murray et al [13] presented detectable effect sizes based on expected mean square errors using random coefficients analysis for the nested cohort design. Roy et al [14] derived general-form sample size determinations using a mixed-effects linear model, taking into account for potential attrition rates and more general correlation structures. Heo and Leon [15] derived an algorithm for sample size requirements to detect a main effect of group using a linear mixed effects model for three level data. Although comparisons of sample sizes assuming different modeling approaches would provide better insight in designing a cluster-RCT, the sample size equations presented above (15,16,17) are more readily implemented.

The sample size determinations derived here have limitations. First, the formulae were derived assuming fixed numbers of units for all levels although number of subjects per clinic will likely vary, i.e., j = 1, 2, …, n_i, depending the i-th clinic. Furthermore, the number of assessments per subjects will also vary (i.e., k = 1, 2, …, n_ij, depending on both clinics and subjects) because attrition of subjects during a trial in reality is the norm rather than exception [16,17]. Nevertheless, our derivation based on non-varying cluster sizes provides a useful approximation and, further, can serve as a basis for deriving a sample size algorithm for varying cluster sizes. For instance, if the variation in the cluster sizes is completely at random in the missing data analysis framework [18], a replacement of the varying cluster sizes with an average cluster size has been shown to be effective for sample size and statistical power with varying cluster sizes under two level binary outcome data [19]. Second, for pragmatic reasons the covariance structure (4) considered here was based on the conditional independence assumption. Therefore, robustness of the derived formulae under alternative covariance structure, such as autocorrelation or unstructured covariance matrix, is unknown.

In conclusion, the derived formulae for sample sizes (15,16,17) and power functions (12,13) can be useful in designing community based longitudinal cluster-randomized clinical trials that compare slopes of outcomes over time between two intervention groups in a three level data structure.

Acknowledgement

We are grateful to Donald Hedeker Ph.D., two anonymous referees and an Associate Editor for their valuable suggestions. This study was supported in part by NIMH grants, P30MH068638 and R01MH060447.

Appendix

Proof of equation (9), Var $Var ({\hat{η}}_{g}) = \frac{σ_{e}^{2}}{N_{3} N_{2} N_{1} {Var}_{p} (T)} = \frac{(1 - ρ_{1}) σ^{2}}{N_{3} N_{2} N_{1} {Var}_{p} (T)}$ . Let $W_{k} = (T_{k} - \overset{‒}{T})$ , then We have: $Σ_{k = 1}^{N_{1}} W_{k}^{2} = N_{1} {Var}_{p} (T)$ ; $Σ_{k = 1}^{N_{1}} W_{k} = 0$ ; $Σ_{k' \neq k}^{N_{1}} W_{k'} = - W_{k}$ and ${\hat{η}}_{g} = Σ_{i = 1}^{N_{3}} Σ_{j = 1}^{N_{2}} Σ_{k = 1}^{N_{1}} W_{k} (Y_{ijk} - {\overset{‒}{Y}}_{g}) / N_{3} N_{2} N_{1} {Var}_{p} (T) = Σ_{i = 1}^{N_{3}} Σ_{j = 1}^{N_{2}} Σ_{k = 1}^{N_{1}} W_{k} Y_{ijk} / N_{3} N_{2} N_{1} {Var}_{p} (T)$ . Observing that Y is independent over i, we decompose the variance of the numerator of ${\hat{η}}_{g}$ as follows:

\begin{matrix} Var (\sum_{i = 1}^{N_{3}} \sum_{j = 1}^{N_{2}} \sum_{k = 1}^{N_{1}} W_{k} Y_{ijk}) & = \underset{A}{\underset{︸}{\sum_{i = 1}^{N_{3}} \sum_{j = 1}^{N_{2}} \sum_{k = 1}^{N_{1}} W_{k}^{2} Cov (Y_{ijk}, Y_{ijk})}} + \underset{B}{\underset{︸}{\sum_{i = 1}^{N_{3}} \sum_{j = 1}^{N_{2}} \sum_{k = 1}^{N_{1}} \sum_{k' \neq k}^{N_{1}} W_{k} W_{k'} Cov (Y_{ijk}, Y_{ijk'})}} \\ + \underset{C}{\underset{︸}{\sum_{i = 1}^{N_{3}} \sum_{j = 1}^{N_{2}} \sum_{j' \neq j}^{N_{2}} \sum_{k = 1}^{N_{1}} \sum_{k' = 1}^{N_{1}} W_{k} W_{k'} Cov (Y_{ijk}, Y_{ij' k'})}} . \end{matrix}

Now, recall equation (4), that is,

Cov (Y_{ijk}, Y_{i' j' k'}) = 1 (i = i' & j = j' & k = k') σ_{e}^{2} + 1 (i = i' & j = j') σ_{2}^{2} + 1 (i = i') σ_{3}^{2} .

It follows that A = σ²N₃N₂N₁Var_p(T) since $Var (Y_{ijk}) = σ^{2} = σ_{e}^{2} + σ_{2}^{2} + σ_{3}^{2}$ . Further, $Σ_{k = 1}^{N_{1}} Σ_{k' \neq k}^{N_{1}} W_{k} W_{k'} Cov (Y_{ijk}, Y_{ijk'}) = - (σ_{2}^{2} + σ_{3}^{2}) Σ_{k = 1}^{N_{1}} W_{k}^{2}$ since $Σ_{k' \neq k}^{N_{1}} W_{k'} = - W_{k}$ . Therefore, $B = - (σ_{2}^{2} + σ_{3}^{2}) N_{3} N_{2} N_{1} {Var}_{p} (T)$ . It is easy to see that C = 0 since $Σ_{k = 1}^{N_{1}} W_{k} = 0$ . Hence, we have $Var (Σ_{i = 1}^{N_{3}} Σ_{j = 1}^{N_{2}} Σ_{k = 1}^{N_{1}} W_{k} Y_{ijk}) = A + B = σ_{e}^{2} N_{3} N_{2} N_{1} {Var}_{p} (T)$ . It follows that equation (9) above holds.

Reference

1.Goldstein H. Multilevel Statistical Models. 2nd ed. Wiley; New York: 1996. [Google Scholar]
2.Raudenbush SW, Bryk AS. Hierarchical Linear Models: Applications and Data Analysis Methods. 2nd ed. SAGE; Thousand Oaks: 2002. [Google Scholar]
3.Hedeker D, Gibbons RD. Longitudinal Data Analysis. Wiley; Hoboken, NJ: 2006. [Google Scholar]
4.Donner A, Birkett N, Buck C. Randomization by clusters; Sample size requirements and analysis. American Journal of Epidemiology. 1981;114:906–914. doi: 10.1093/oxfordjournals.aje.a113261. [DOI] [PubMed] [Google Scholar]
5.Donner A, Klar N. Statistical Consideration in the design and analysis of community intervention trials. Journal of Clinical Epidemiology. 1996;49:435–439. doi: 10.1016/0895-4356(95)00511-0. [DOI] [PubMed] [Google Scholar]
6.Diggle PJ, Heagerty P, Liang K-Y, Zeger SL. Analysis of Longitudinal Data. 2nd ed. Oxford University Press; New York: 2002. [Google Scholar]
7.Serfling RJ. Approximation Theorems of Mathematical Statistics. Wiley; New York: 1980. [Google Scholar]
8.Alexopoulos GS, Katz IR, Bruce ML, Heo M, Ten Have T, Raue PJ, Bogner HR, Schulberg HC, Mulsant BH, Reynolds CF, III, the PROSPECT Group Remission in depressed geriatric primary care patients: a report from the PROSPECT study. American Journal of Psychiatry. 2005;62:718–724. doi: 10.1176/appi.ajp.162.4.718. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Bruce ML, Ten Have TR, Reynolds CF, III, Katz I, Schulberg HC, Mulsant BH, Brown GK, McAvay GJ, Pearson JL, Alexopoulos GS. Reducing suicidal ideation and depressive symptoms in depressed older primary care patients: a randomized controlled trial. JAMA. 2004;291:1081–1091. doi: 10.1001/jama.291.9.1081. [DOI] [PubMed] [Google Scholar]
10.Dietrich AJ, Oxman TE, Williams JW, Jr., Schulberg HC, Bruce ML, Lee PW, Barry S, Raue PJ, Lefever JJ, Heo M, Rost K, Kroenke K, Gerrity M, Nutting PA. Re-Engineering Systems for the Primary Care Treatment of Depression: A Randomized Controlled Trial. British Medical Journal. 2004;329:602–605. doi: 10.1136/bmj.38219.481250.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Johnson NL, Kotz S. Distributions in Statistics: Continuous Univariate Distributions-2. Houghton Mifflin; New York: 1970. [Google Scholar]
12.Liu A, Shih WJ, Gehan E. Sample size and power determination for clustered repeated measurements. Statistics in Medicine. 2002;21:1787–1801. doi: 10.1002/sim.1154. [DOI] [PubMed] [Google Scholar]
13.Murray DM, Blitstein JL, Hannan PJ, Baker WL, Lytle LA. Sizing a trial to alter the trajectory of health behaviors: Methods, parameter estimates, and their application. Statistics in Medicine. 2007;26:2297–2316. doi: 10.1002/sim.2714. [DOI] [PubMed] [Google Scholar]
14.Roy A, Bhaumik DK, Aryal S, Gibbons RD. Sample size determination for hierarchical longitudinal designs with differential attrition rates. Biometrics. 2007;63:699–707. doi: 10.1111/j.1541-0420.2007.00769.x. [DOI] [PubMed] [Google Scholar]
15.Heo M, Leon AC. Statistical power and sample size requirements for three level hierarchical cluster randomized trials. Biometrics. doi: 10.1111/j.1541-0420.2008.00993.x. in press. [DOI] [PubMed] [Google Scholar]
16.Leon AC, Mallinckrodt CH, Chuan-Stein C, Archibald DG, Archer GE, Chartier K. Attrition in randomized controlled clinical trials: methodological issues in psychopharmacology. Biological Psychiatry. 2006;59:1001–1005. doi: 10.1016/j.biopsych.2005.10.020. [DOI] [PubMed] [Google Scholar]
17.Heo M, Leon AC, Meyers BS, Alexopoulos GS. Problems in statistical analysis of attrition in randomized controlled clinical trials of antidepressants for geriatric depression. Current Psychiatry Reviews. 2007;3:178–185. [Google Scholar]
18.Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592. [Google Scholar]
19.Heo M, Leon AC. Performance of a mixed effects logistic regression model with unequal cluster size. Journal of Biopharmaceutical Statistics. 2005;15:513–526. doi: 10.1081/BIP-200056554. [DOI] [PubMed] [Google Scholar]

[R1] 1.Goldstein H. Multilevel Statistical Models. 2nd ed. Wiley; New York: 1996. [Google Scholar]

[R2] 2.Raudenbush SW, Bryk AS. Hierarchical Linear Models: Applications and Data Analysis Methods. 2nd ed. SAGE; Thousand Oaks: 2002. [Google Scholar]

[R3] 3.Hedeker D, Gibbons RD. Longitudinal Data Analysis. Wiley; Hoboken, NJ: 2006. [Google Scholar]

[R4] 4.Donner A, Birkett N, Buck C. Randomization by clusters; Sample size requirements and analysis. American Journal of Epidemiology. 1981;114:906–914. doi: 10.1093/oxfordjournals.aje.a113261. [DOI] [PubMed] [Google Scholar]

[R5] 5.Donner A, Klar N. Statistical Consideration in the design and analysis of community intervention trials. Journal of Clinical Epidemiology. 1996;49:435–439. doi: 10.1016/0895-4356(95)00511-0. [DOI] [PubMed] [Google Scholar]

[R6] 6.Diggle PJ, Heagerty P, Liang K-Y, Zeger SL. Analysis of Longitudinal Data. 2nd ed. Oxford University Press; New York: 2002. [Google Scholar]

[R7] 7.Serfling RJ. Approximation Theorems of Mathematical Statistics. Wiley; New York: 1980. [Google Scholar]

[R8] 8.Alexopoulos GS, Katz IR, Bruce ML, Heo M, Ten Have T, Raue PJ, Bogner HR, Schulberg HC, Mulsant BH, Reynolds CF, III, the PROSPECT Group Remission in depressed geriatric primary care patients: a report from the PROSPECT study. American Journal of Psychiatry. 2005;62:718–724. doi: 10.1176/appi.ajp.162.4.718. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Bruce ML, Ten Have TR, Reynolds CF, III, Katz I, Schulberg HC, Mulsant BH, Brown GK, McAvay GJ, Pearson JL, Alexopoulos GS. Reducing suicidal ideation and depressive symptoms in depressed older primary care patients: a randomized controlled trial. JAMA. 2004;291:1081–1091. doi: 10.1001/jama.291.9.1081. [DOI] [PubMed] [Google Scholar]

[R10] 10.Dietrich AJ, Oxman TE, Williams JW, Jr., Schulberg HC, Bruce ML, Lee PW, Barry S, Raue PJ, Lefever JJ, Heo M, Rost K, Kroenke K, Gerrity M, Nutting PA. Re-Engineering Systems for the Primary Care Treatment of Depression: A Randomized Controlled Trial. British Medical Journal. 2004;329:602–605. doi: 10.1136/bmj.38219.481250.55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Johnson NL, Kotz S. Distributions in Statistics: Continuous Univariate Distributions-2. Houghton Mifflin; New York: 1970. [Google Scholar]

[R12] 12.Liu A, Shih WJ, Gehan E. Sample size and power determination for clustered repeated measurements. Statistics in Medicine. 2002;21:1787–1801. doi: 10.1002/sim.1154. [DOI] [PubMed] [Google Scholar]

[R13] 13.Murray DM, Blitstein JL, Hannan PJ, Baker WL, Lytle LA. Sizing a trial to alter the trajectory of health behaviors: Methods, parameter estimates, and their application. Statistics in Medicine. 2007;26:2297–2316. doi: 10.1002/sim.2714. [DOI] [PubMed] [Google Scholar]

[R14] 14.Roy A, Bhaumik DK, Aryal S, Gibbons RD. Sample size determination for hierarchical longitudinal designs with differential attrition rates. Biometrics. 2007;63:699–707. doi: 10.1111/j.1541-0420.2007.00769.x. [DOI] [PubMed] [Google Scholar]

[R15] 15.Heo M, Leon AC. Statistical power and sample size requirements for three level hierarchical cluster randomized trials. Biometrics. doi: 10.1111/j.1541-0420.2008.00993.x. in press. [DOI] [PubMed] [Google Scholar]

[R16] 16.Leon AC, Mallinckrodt CH, Chuan-Stein C, Archibald DG, Archer GE, Chartier K. Attrition in randomized controlled clinical trials: methodological issues in psychopharmacology. Biological Psychiatry. 2006;59:1001–1005. doi: 10.1016/j.biopsych.2005.10.020. [DOI] [PubMed] [Google Scholar]

[R17] 17.Heo M, Leon AC, Meyers BS, Alexopoulos GS. Problems in statistical analysis of attrition in randomized controlled clinical trials of antidepressants for geriatric depression. Current Psychiatry Reviews. 2007;3:178–185. [Google Scholar]

[R18] 18.Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592. [Google Scholar]

[R19] 19.Heo M, Leon AC. Performance of a mixed effects logistic regression model with unequal cluster size. Journal of Biopharmaceutical Statistics. 2005;15:513–526. doi: 10.1081/BIP-200056554. [DOI] [PubMed] [Google Scholar]

PERMALINK

Sample size requirements to detect an intervention by time interaction in longitudinal cluster randomized clinical trials

Moonseong Heo

Andrew C Leon

Abstract

1. Introduction

2. Statistical Model

3. Maximum Likelihood Estimate and its Variance

4. Power and sample size

5. Simulation study specification

6. Simulation study results

Table 1.

7. Application

8. Discussion

Acknowledgement

Appendix

Reference

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Sample size requirements to detect an intervention by time interaction in longitudinal cluster randomized clinical trials

Moonseong Heo

Andrew C Leon

Abstract

1. Introduction

2. Statistical Model

3. Maximum Likelihood Estimate and its Variance

4. Power and sample size

5. Simulation study specification

6. Simulation study results

Table 1.

7. Application

8. Discussion

Acknowledgement

Appendix

Reference

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases