Abstract
Subject attrition is a ubiquitous problem in any type of clinical trials and thus needs to be taken into consideration at the design stage particularly to secure adequate statistical power. Here, we focus on longitudinal cluster randomized clinical trials (cluster-RCT) that aim to test the hypothesis that an intervention has an effect on the rate of change in the outcome over time. In this setting, the cluster-RCT assumes a three level hierarchical data structure in which subjects are nested within a higher level unit such as clinics and are evaluated for outcome repeatedly over the study period. Furthermore, the subject-specific slopes can be modeled in terms of fixed or random coefficients in a mixed-effects linear model. Closed form sample size formulas for testing the hypothesis above have been developed under assumption of no attrition. In this paper, we propose closed form approximate samples size determinations with anticipated attrition rates by modifying those existing sample size formulas. With extensive simulations, we examine performances of the modified formulas under three attrition mechanisms: attrition completely at random, attrition at random and attrition not at random. In conclusion, the proposed modification is very effective under fixed slope models but yields biased, if not substantially, statistical power under random slope models.
Keywords: longitudinal cluster RCT, three level data, power, sample size, attrition, effect size
1. Introduction
Subject attrition during a clinical trial is a norm rather than an exception. For example, a review of attrition problems in geriatric psychiatry clinical trials reveals that average attrition rates over 68 studies is about 27.3% ranging from 3.1% to 54.1% (Heo et al., 2009). Such attrition problems may also apply to longitudinal cluster randomized trials (cluster-RCT), which we are considering in this paper. A cluster-RCT typically assumes a three level data structure in that interventions are randomly assigned to clusters such as clinics (level 3) which follow up subjects (level 2) for repeated assessments (level 1) during the study period.
At the design stage, as important as planning analytic strategies about handling attrition problems, sample size determinations under anticipated attrition rates should also be put into place in part because subjects’ attritions compromise statistical power of trials. The most intuitive strategy would be to multiply by some factor a number of subjects determined under assumption of no attrition, i.e., under a hypothetically ideal situation. For example, if an anticipated attrition rate is ξ (0<ξ<1), then the multiplication factor would be 1+ξ/(1 − ξ). Although this strategy could be very effective in a typical parallel group design with only one level data structure, we suspect that it would result in an over-powered study of a cluster-RCT particularly because the number of observations is not taken into account for such sample size determinations.
In this paper, we consider a cluster-RCT in which the primary goal is to compare the longitudinal courses in a continuous outcome between two groups, e.g., control and experimental. For example, this hypothesis has been tested in a cluster randomized trial to evaluate the effect of an intervention for depression in the primary care setting on change in depression symptoms using the Hamilton rating scale for depression (Alexopoulos et al, 2005.; Dietrich et al., 2004). The individual longitudinal courses can be modeled as random or fixed slopes of the outcome over the time at the subject level for the purpose of the comparison. The difference in mean slopes over subjects between groups can then be assessed by including in a linear mixed effects model an interaction term between the treatment and time effects (Laird and Ware, 1982; Longford, 1993). Sample size determination formulas for testing this interaction are available under both a fixed slope model (e.g., Heo and Leon, 2009) and a random slope model (Murray et al., 2007; Roy et al., 2007).
The two parameters in existing formulas that would be affected by subject attrition are the number and the variance of the assessment time points per subject. Therefore, our sample size determination (with respect to the number of subjects) strategy is to replace those two parameters by their corresponding expected number and variance under anticipated attrition rates. While Murray et al. (2007) approaches considered broader and more general models for sample size determinations, they did not examine performances of their approach under anticipated attrition problems. Although Roy et al (2007) also considered general models and examined attrition effects on sample size determinations based on implicit approach using critical regions determined by χ2 distributions of feasible version generalized least square estimates, they appeared to consider only attrition completely at random mechanism. In contrast, we consider a specific model specified below and our sample size determination strategy mentioned above is simple and straightforward. Furthermore, our approach results in closed form power functions and sample size formulae and we hypothesize that the resulting multiplication factor would be smaller than 1+ξ/(1 − ξ).
We examine the performance of our approach with extensive simulations considering the following factors among others: fixed and random slope models; different attrition rates; different distributions of attrition time points; and three different attrition mechanisms.
2. Statistical Model
A three level mixed-effects linear model for outcome Y with subject-specific random slopes can be expressed as follows (Hedeker and Gibbons, 2006):
(1) |
where i =1,2,…,2N3 is the index for the level three unit (e.g., clinic); j = 1,…, N2, is the index for the level two unit (e.g., subject) nested within each i; and k = 1, 2, …, N1, is the index for the level one unit (e.g., repeated outcomes) within each j. The intervention assignment indicator Xijk = 0 and 1 if the i-th level three unit is assigned to a control intervention and an experimental intervention, respectively. Here we consider a design with Xijk = Xi for all j and k and also a balanced design so that Σi Xi = N3. In addition, it is assumed that Tijk = Tk for all i and j, and that the time from T1 = 0 (the baseline) to Tend = N1 − 1 (the last time point) increases by equal unit time intervals.
With respect to the random effects, it is assumed that: 1) and ; 2) these four random components are mutually independent i.e., ui ⊥ uj(i) ⊥ eijk ⊥νj(i) ; and 3) uj(i), νj(i) and eijk are conditionally independent whereas the ui are unconditionally independent—that is, both uj(i) and νj(i) are independent conditional on ui, and the eijk are independent conditional on ui, νj(i) and uj(i). When , model (1) reduces to the fixed slope model.
For the fixed effects, the parameter ζ represents the intervention effect at baseline, and the parameter τ represents the slope associated with the time effect, that is, the magnitude of the change in outcome over time, in the control group. Finally, the intervention-by-time effect δ, the parameter of primary interest, represents the difference in mean slopes of the outcome Y between the intervention groups. The overall intercept (fixed) is denoted by β0.
Given that the parameter δ is of primary interest, the relevant null hypothesis can be expressed as:
(2) |
Under model (1), it can be shown that the elements of the mean vector for the outcome are equal to E(Yijk) = β0+ξXi +τTk +δXiTk and the elements of the covariance matrix are:
where 1(.) is an indicator function. It follows that:
where , the variance of Y under the fixed slope model with . Therefore, the correlations among the level two data, i.e., among outcomes from different second level clusters (subjects) but the same third level cluster (clinic), can be expressed for j ≠ j′ as follows:
The correlations among the level one data, i.e., among outcomes measured at different time points on the same subject nested within clinics, can be expressed for k ≠ k′ as:
Under the fixed slope model, i.e., when , the correlations reduce to the following, respectively:
(3) |
and
(4) |
3. Statistical Power and Sample Size with No Subject Attrition
It has been shown under assumption of no subject attrition that the power function to test the null hypothesis (2) based on the ordinary least squares estimate of δ can be written as follows (e.g., Murray et al., 2007; Heo et al., in press):
(5) |
where α is a two-sided significance level; Φ is the cumulative distribution function (CDF) of a standard normal distribution and Φ−1 is its inverse;
is a standardized effect size;
is the ratio of the random slope variance to the sum of the other variances; and is the “population variance” of the time variable T where is the “mean” time point. We assume that δ = |δ| > 0 and also that the probability below a critical value, Φ −1(α/2), in the other side under the alternative hypothesis is negligible and thus considered as 0. When or rτ = 0, the effect size Δ is identical to the standardized effect size for the slope difference δ and power function (5) is the same as that derived under a fixed slope model (Heo and Leon, 2009).
It follows that the required sample size per group for the number of subjects N2 per level three unit, for a desired statistical power φ with a two-sided significance level α, can be calculated from equation (5) as:
(6) |
More precisely, N2 is the smallest integer greater than the right hand side of equation (6). In sum, the validity of this sample size formula has been supported by extensive simulations (Heo et al., 2012).
4. Statistical Power and Sample Size with Anticipated Attritions
A heuristic strategy for derivation of approximate sample sizes would be to replace both N1 and Varp(T) in equation (6) with the corresponding expected values with an anticipated subject attrition rate. To this end, we assume no attrition at baseline, i.e., when T = 0, but consider only monotone attrition pattern in that subject outcome Y is observed at every time point T before attrition but no additional outcomes are observed after attrition. Therefore, the overall attrition rate, ξ = P(A ≤ Tend), where A is attrition time, is equivalent to one minus the proportion of participants who appeared at the last visit. Here, we further assume that the distribution of attrition time A is uniform over or linearly increasing with T = 0, 1, 2, …, N1 − 1 = Tend. For the “uniform” distribution, probability of attrition at time t can be defined as
which yields . For the “linear” distribution in which attrition rates increase over time,
which also yields . However, we do not make any distributional assumption about A when A > Tend except that P (A > Tend) = 1 − ξ. From now on, the superscripts “(u)” and “(l)” will represent “uniform” and “linear” distributions of attrition time, respectively.
The expected number of observations per subject can then be obtained as follow:
and
Either of these does not have to be an integer number. It can be seen that
(7) |
This inequality implies that total number of observations will be larger under the linear distribution of attrition time.
On the other hand, the probability distribution of T, P(T = t), at which the observations are made is no longer uniform under the assumed monotone attrition pattern regardless of the types of the distribution of the attrition time. Under the uniform and linear attrition time distributions, they can be obtained respectively as follows:
and
The first and the second moment of T under these probability distributions can then be obtained as follows:
and
Consequently, the variances can be obtained as:
and
Therefore, the approximate statistical power with the anticipated overall attrition rate ξ can be expressed under the uniform and linear distribution of attrition times respectively as follows:
(8) |
and
(9) |
It follows that the approximate sample size determinations with the anticipated overall attrition rate ξ are:
(10) |
and
(11) |
Here again, both and are the smallest integers greater than their corresponding right hand sides of equations (10) and (11).
Let us denote the ratio, or the multiplication factor, in the number of subjects due to the anticipated attritions by
(12) |
and
(13) |
which we expect would be less than 1 + ξ/(1 − ξ).
5. Attrition Mechanisms
To explore the validity of the approximate sample size formulae N2(u) (10) and N2(l) (11) with anticipated attrition rates we consider the following three conventional attrition mechanisms (Little and Rubin, 2002): 1) Attrition completely at random (ACAR), that is, occurrence of a subject’s attrition does not depend on any observed or unobserved outcomes; 2) Attrition at random (AAR), that is, occurrence of a subject’s attrition depends on observed outcomes; and 3) Attrition not at random (ANAR), that is, occurrence of a subject’s attrition depends on unobserved outcomes. To generate missing data based on each mechanism, we first group the subjects who are retained at each time point t into quartiles: Q1(t), Q2(t), Q3(t), and Q4(t). This quartile grouping is based on values of outcome variable Y among the retainers discounting the dropouts.
Based on the quartile grouping, the conditional distributions of attritions can be formulated in the following way. First, the conditional probability of attrition at time t among retainers up to time t can be written as: with the superscripts indicating “uniform” or “linear” suppressed,
When the quartiles are based on Yt−1, the conditional distributions of attrition times for the g-th quartile can be written as
which yields and thus . Similarly, when the quartiles are based on Yt, the conditional distributions of attrition times for the g-th quartile can be written as
which yields and thus .
For the ACAR mechanism, we consider ωg (t) = zt for all g and t > 0, i.e., subject attrition at time t does not depend on any previous observations. For the AAR mechanism, we consider that the percentages of dropouts at time t whose observed outcomes at time t − 1 belong to Q1(t−1), Q2(t−1), Q3(t−1), and Q4(t−1) are 10%, 20%, 30% and 40%, respectively. Specifically, ω1(t) = .4zt, ω2(t) = .8zt, ω3(t) = 1.2zt and ω4(t) = 1.6zt. Under the AAR mechanism, the subject attrition depends on the observed outcomes at time t − 1, that is, the time immediately prior to attrition. For the ANAR mechanism, we similarly consider the percentages of dropouts at time t whose observed outcomes at time t belong to Q1(t), Q2(t), Q3(t), and Q4(t) are 10%, 20%, 30% and 40%, respectively. Specifically, ϖ1 (t) = .4zt, ϖ2(t) = .8zt, ϖ3(t) = 1.2zt and ϖ4(t) = 1.6zt. Under the ANAR mechanism, the subject attrition depends on the unobserved outcomes at time t, that is, the time of attrition.
6. Simulation study
We conduct simulation studies to examine the performance of the sample size and a two-sided significance level α = 0.05 and a desired power φ = 0.8 under the following combinations: ΔTend = Δ(N1 − 1) = 0.4, 0.5; N3 = 10, 20; N1 = 5, 9; ρ1 = 0.4, 0.6; rτ = 0.0, 0.1, 0.2, while without loss of generality, σ = 1, ρ2 = 0.1, β0 = ζ = 0, and τ = −1 in model (1) remained fixed. Of note, when rτ = 0.1 or 0.2 under the random slope models with missing data, N1 = 9 was excluded due to enormous computing times for simulations (Ahn et al., 2000; Overall et al., 1999). Values for and were determined through ρ2 (3) and ρ1 (4). The effect size of the interaction Δ is specified as a standardized between-group mean difference ΔTend = Δ(N1 − 1) at the end of trial under a fixed slope model. Effect sizes in the range of 0.4–0.6 have generally been referred to as medium (Cohen, 1988). We further considered two attrition rates, ξ = 20% and 30%, and two types of distributions, uniform and linear, of the attrition time points as detailed above.
For each combination, we first computed: E(u)(N1), , E(l)(N1), and , and subsequently and . We then generate 1000 simulated data sets for each combination with each estimated or in accordance to model (1): Yijk = (β0 + ui + uj(i)) + ζXijk + (τ +νj(i))Tijk + δXijk Tijk + eijk,. Finally, according to the three ACAR, AAR and ANAR mechanisms specified in the section above, we delete outcomes from each “complete” data set generated with N1 observations per subject, resulting in three data sets with deleted observations.
Although the sample size determinations were derived based on OLS estimates with known variance components, in order to reflect real data analysis with unknown variance components, we fit each deleted data set using SAS PROC MIXED with the maximum likelihood estimation option and retained the resulting p-values for testing the null hypothesis (2). We denoted the p-value by ps(δ) for the s-th simulated data set (s = 1, 2,.., 1000) and computed the empirical power φ̃, φ̃(u) or φ̃(l), as follows:
(14) |
This empirical power is compared with the approximate power φ on which the sample sizes N2(u) and N2(l)are based. We note that φ is never less than the pre-specified power of 0.8 since both N2(u)and N2(l) are the smallest integer greater than the right hand side of equation (10) and (11), respectively.
7. Simulation study results
Attrition rates
Over all combinations of the simulation specifications, the empirical attrition rates based on simulated data are virtually identical to the pre-specified attrition rates ξ = 20% and 30% regardless of the three different attrition mechanisms and distributions of attrition time points, uniform and linear.
Under fixed slope model
Table 1 summarizes numerical and simulation results when the slopes are considered fixed, i.e., when rτ = 0.0. The average ratios R(u) (12) and R(l) (13) are both much less than 1 + ξ/(1 − ξ) = 1.25 and 1.43 for ξ = 20% and 30%, respectively. Nevertheless, we observe from the evaluation of max|φ − φ̃| that the empirical power estimates are very close to the approximate power regardless of attrition mechanisms and across all simulation parameter combinations. As is foreseen from equation (7), R(u) is no greater than R(l) in every simulation combination. Furthermore, R(l) is 1.0 in many cases for both ξ = 20% and 30% without loss of statistical power. In general, both R(u) and R(l) are smaller for greater Δ.
Table 1.
φ̃(u)
|
φ̃(l)
|
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ξ | ΔT end | N3 | N1 | ρ1 | N2 | N 2(u) | R(u) | φ (u) | ACAR | AAR | ANAR | N2(l) | R(l) | φ(l) | ACAR | AAR | ANAR |
20% | 0.4 | 10 | 5 | 0.4 | 10 | 12 | 1.20 | 0.822 | 0.832 | 0.844 | 0.832 | 11 | 1.10 | 0.819 | 0.795 | 0.795 | 0.798 |
0.6 | 7 | 8 | 1.14 | 0.822 | 0.804 | 0.805 | 0.798 | 7 | 1.00 | 0.802 | 0.788 | 0.785 | 0.790 | ||||
9 | 0.4 | 7 | 8 | 1.14 | 0.833 | 0.820 | 0.817 | 0.817 | 7 | 1.00 | 0.804 | 0.767 | 0.777 | 0.769 | |||
0.6 | 5 | 5 | 1.00 | 0.809 | 0.822 | 0.816 | 0.824 | 5 | 1.00 | 0.831 | 0.823 | 0.815 | 0.818 | ||||
20 | 5 | 0.4 | 5 | 6 | 1.20 | 0.822 | 0.835 | 0.827 | 0.830 | 6 | 1.20 | 0.851 | 0.836 | 0.835 | 0.842 | ||
0.6 | 4 | 4 | 1.00 | 0.822 | 0.811 | 0.816 | 0.810 | 4 | 1.00 | 0.851 | 0.821 | 0.823 | 0.818 | ||||
9 | 0.4 | 4 | 4 | 1.00 | 0.833 | 0.840 | 0.829 | 0.839 | 4 | 1.00 | 0.854 | 0.830 | 0.824 | 0.827 | |||
0.6 | 3 | 3 | 1.00 | 0.874 | 0.873 | 0.879 | 0.864 | 3 | 1.00 | 0.892 | 0.890 | 0.889 | 0.876 | ||||
0.5 | 10 | 5 | 0.4 | 7 | 8 | 1.14 | 0.837 | 0.828 | 0.838 | 0.837 | 7 | 1.00 | 0.817 | 0.784 | 0.781 | 0.774 | |
0.6 | 5 | 5 | 1.00 | 0.813 | 0.783 | 0.792 | 0.794 | 5 | 1.00 | 0.843 | 0.841 | 0.847 | 0.839 | ||||
9 | 0.4 | 5 | 5 | 1.00 | 0.825 | 0.832 | 0.833 | 0.839 | 5 | 1.00 | 0.845 | 0.829 | 0.834 | 0.829 | |||
0.6 | 3 | 4 | 1.33 | 0.887 | 0.886 | 0.902 | 0.893 | 3 | 1.00 | 0.806 | 0.805 | 0.814 | 0.808 | ||||
20 | 5 | 0.4 | 4 | 4 | 1.00 | 0.837 | 0.866 | 0.847 | 0.851 | 4 | 1.00 | 0.865 | 0.857 | 0.860 | 0.866 | ||
0.6 | 3 | 3 | 1.00 | 0.877 | 0.893 | 0.878 | 0.888 | 3 | 1.00 | 0.901 | 0.904 | 0.889 | 0.905 | ||||
9 | 0.4 | 3 | 3 | 1.00 | 0.887 | 0.886 | 0.889 | 0.887 | 3 | 1.00 | 0.903 | 0.884 | 0.896 | 0.892 | |||
0.6 | 2 | 2 | 1.00 | 0.887 | 0.876 | 0.874 | 0.872 | 2 | 1.00 | 0.903 | 0.882 | 0.890 | 0.897 | ||||
| |||||||||||||||||
Mean | 1.07 | 0.843 | 0.843 | 0.843 | 0.842 | 1.02 | 0.849 | 0.834 | 0.835 | 0.834 | |||||||
Min |φ − φ̃| | 0.001 | 0.001 | 0.000 | 0.001 | 0.003 | 0.001 | |||||||||||
Max |φ − φ̃| | 0.030 | 0.022 | 0.024 | 0.037 | 0.036 | 0.043 | |||||||||||
| |||||||||||||||||
30% | 0.4 | 10 | 5 | 0.4 | 10 | 13 | 1.30 | 0.810 | 0.833 | 0.837 | 0.825 | 12 | 1.20 | 0.829 | 0.812 | 0.803 | 0.821 |
0.6 | 7 | 9 | 1.29 | 0.825 | 0.838 | 0.849 | 0.845 | 8 | 1.14 | 0.829 | 0.792 | 0.795 | 0.795 | ||||
9 | 0.4 | 7 | 9 | 1.29 | 0.843 | 0.834 | 0.830 | 0.836 | 8 | 1.14 | 0.833 | 0.823 | 0.815 | 0.820 | |||
0.6 | 5 | 6 | 1.20 | 0.843 | 0.858 | 0.847 | 0.848 | 5 | 1.00 | 0.809 | 0.799 | 0.800 | 0.798 | ||||
20 | 5 | 0.4 | 5 | 7 | 1.40 | 0.838 | 0.852 | 0.858 | 0.851 | 6 | 1.20 | 0.829 | 0.808 | 0.816 | 0.811 | ||
0.6 | 4 | 5 | 1.25 | 0.862 | 0.888 | 0.881 | 0.880 | 4 | 1.00 | 0.829 | 0.801 | 0.801 | 0.796 | ||||
9 | 0.4 | 4 | 5 | 1.25 | 0.879 | 0.869 | 0.876 | 0.880 | 4 | 1.00 | 0.833 | 0.797 | 0.793 | 0.801 | |||
0.6 | 3 | 3 | 1.00 | 0.843 | 0.837 | 0.832 | 0.832 | 3 | 1.00 | 0.874 | 0.860 | 0.848 | 0.856 | ||||
0.5 | 10 | 5 | 0.4 | 7 | 9 | 1.29 | 0.840 | 0.865 | 0.860 | 0.862 | 8 | 1.14 | 0.844 | 0.821 | 0.822 | 0.827 | |
0.6 | 5 | 6 | 1.20 | 0.840 | 0.848 | 0.846 | 0.843 | 5 | 1.00 | 0.820 | 0.812 | 0.803 | 0.805 | ||||
9 | 0.4 | 5 | 6 | 1.20 | 0.857 | 0.848 | 0.853 | 0.856 | 5 | 1.00 | 0.825 | 0.816 | 0.809 | 0.814 | |||
0.6 | 3 | 4 | 1.33 | 0.857 | 0.860 | 0.857 | 0.849 | 4 | 1.33 | 0.887 | 0.859 | 0.861 | 0.858 | ||||
20 | 5 | 0.4 | 4 | 5 | 1.25 | 0.875 | 0.906 | 0.889 | 0.896 | 4 | 1.00 | 0.844 | 0.844 | 0.830 | 0.839 | ||
0.6 | 3 | 3 | 1.00 | 0.840 | 0.866 | 0.846 | 0.849 | 3 | 1.00 | 0.883 | 0.867 | 0.869 | 0.854 | ||||
9 | 0.4 | 3 | 3 | 1.00 | 0.857 | 0.864 | 0.854 | 0.863 | 3 | 1.00 | 0.887 | 0.862 | 0.867 | 0.860 | |||
0.6 | 2 | 2 | 1.00 | 0.857 | 0.849 | 0.841 | 0.840 | 2 | 1.00 | 0.887 | 0.866 | 0.864 | 0.861 | ||||
| |||||||||||||||||
Mean | 1.20 | 0.848 | 0.857 | 0.854 | 0.853 | 1.07 | 0.846 | 0.827 | 0.825 | 0.826 | |||||||
Min |φ − φ̃| | 0.003 | 0.000 | 0.001 | 0.000 | 0.009 | 0.005 | |||||||||||
Max |φ − φ̃| | 0.031 | 0.027 | 0.022 | 0.037 | 0.040 | 0.034 |
Note: ACAR = Attrition Completely At Random; AAR = Attrition At Random; ANAR = Attrition Not At Random
Under random slope model
Table 2 summarizes numerical and simulation results when the slopes are considered random, i.e., when rτ = 0.1 or 0.2. Again, the average ratios R(u) and R(l) are both much far less than 1 + ξ/(1 − ξ) = 1.25 and 1.43 for ξ = 20% and 30%, respectively. In fact, these are even smaller than those under fixed slope model above partly because N2 are much greater due to additional random variations in slopes. However, the simulation-based empirical power estimates underestimate somewhat severely the approximate statistical power. Although the absolute difference |φ − φ̃| is not necessarily associated with attrition mechanism or with distribution of attrition times, it ranges from 0.001 to 0.64 for ξ = 20%; and from 0.019 to 0.078 for ξ = 30%. The underestimation is more severe for the greater attrition rate ξ = 30%. In this case, despite that the biases are about 5%, all of the absolute differences |φ − φ̃| are beyond the 95% confidence limit, except only one case. Again, R(u) is no greater than R(l) in every simulation combination and R(l) is 1.0 in many cases for both ξ = 20% and 30% but seemingly at the cost of statistical power. Again, in general, both R(u) and R(l) are smaller for greater Δ.
Table 2.
φ̃(u)
|
φ̃(l)
|
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ξ | rτ | ΔT end | N3 | ρ1 | N2 | N2(u) | R(u) | φ (u) | ACAR | AAR | ANAR | N2(l) | R(l) | φ(l) | ACAR | AAR | ANAR |
20% | 0.1 | 0.4 | 10 | 0.4 | 26 | 28 | 1.08 | 0.814 | 0.792 | 0.796 | 0.799 | 27 | 1.04 | 0.812 | 0.780 | 0.764 | 0.774 |
0.6 | 22 | 24 | 1.09 | 0.812 | 0.789 | 0.785 | 0.782 | 23 | 1.05 | 0.806 | 0.785 | 0.779 | 0.784 | ||||
20 | 0.4 | 13 | 14 | 1.08 | 0.814 | 0.802 | 0.784 | 0.799 | 14 | 1.08 | 0.826 | 0.800 | 0.808 | 0.803 | |||
0.6 | 11 | 12 | 1.09 | 0.812 | 0.794 | 0.788 | 0.790 | 12 | 1.09 | 0.822 | 0.786 | 0.787 | 0.783 | ||||
0.5 | 10 | 0.4 | 17 | 18 | 1.06 | 0.815 | 0.798 | 0.799 | 0.802 | 17 | 1.00 | 0.806 | 0.780 | 0.780 | 0.785 | ||
0.6 | 15 | 15 | 1.00 | 0.803 | 0.787 | 0.778 | 0.790 | 15 | 1.00 | 0.813 | 0.785 | 0.785 | 0.786 | ||||
20 | 0.4 | 9 | 9 | 1.00 | 0.815 | 0.776 | 0.785 | 0.777 | 9 | 1.00 | 0.828 | 0.764 | 0.766 | 0.772 | |||
0.6 | 8 | 8 | 1.00 | 0.828 | 0.799 | 0.815 | 0.806 | 8 | 1.00 | 0.837 | 0.836 | 0.824 | 0.813 | ||||
0.2 | 0.4 | 10 | 0.4 | 41 | 43 | 1.05 | 0.802 | 0.779 | 0.777 | 0.764 | 42 | 1.02 | 0.801 | 0.781 | 0.759 | 0.773 | |
0.6 | 38 | 39 | 1.03 | 0.800 | 0.771 | 0.771 | 0.763 | 39 | 1.03 | 0.806 | 0.789 | 0.797 | 0.776 | ||||
20 | 0.4 | 21 | 22 | 1.05 | 0.811 | 0.790 | 0.785 | 0.787 | 21 | 1.00 | 0.801 | 0.772 | 0.775 | 0.765 | |||
0.6 | 19 | 20 | 1.05 | 0.810 | 0.766 | 0.769 | 0.774 | 20 | 1.05 | 0.816 | 0.788 | 0.794 | 0.793 | ||||
0.5 | 10 | 0.4 | 27 | 28 | 1.04 | 0.809 | 0.785 | 0.781 | 0.785 | 27 | 1.00 | 0.803 | 0.757 | 0.764 | 0.767 | ||
0.6 | 25 | 25 | 1.00 | 0.801 | 0.754 | 0.751 | 0.752 | 25 | 1.00 | 0.807 | 0.809 | 0.815 | 0.817 | ||||
20 | 0.4 | 14 | 14 | 1.00 | 0.809 | 0.759 | 0.757 | 0.762 | 14 | 1.00 | 0.817 | 0.783 | 0.788 | 0.776 | |||
0.6 | 13 | 13 | 1.00 | 0.816 | 0.786 | 0.789 | 0.787 | 13 | 1.00 | 0.822 | 0.789 | 0.798 | 0.794 | ||||
| |||||||||||||||||
Mean | 1.04 | 0.811 | 0.783 | 0.782 | 0.782 | 1.02 | 0.814 | 0.787 | 0.786 | 0.785 | |||||||
Min |φ− φ̃| | 0.012 | 0.013 | 0.013 | 0.001 | 0.008 | 0.010 | |||||||||||
Max |φ− φ̃| | 0.050 | 0.052 | 0.049 | 0.064 | 0.062 | 0.056 | |||||||||||
| |||||||||||||||||
30% | 0.1 | 0.4 | 10 | 0.4 | 26 | 29 | 1.12 | 0.809 | 0.777 | 0.775 | 0.778 | 27 | 1.04 | 0.802 | 0.752 | 0.738 | 0.739 |
0.6 | 22 | 25 | 1.14 | 0.814 | 0.745 | 0.745 | 0.750 | 24 | 1.09 | 0.814 | 0.772 | 0.777 | 0.771 | ||||
20 | 0.4 | 13 | 15 | 1.15 | 0.822 | 0.783 | 0.786 | 0.788 | 14 | 1.08 | 0.817 | 0.766 | 0.768 | 0.754 | |||
0.6 | 11 | 13 | 1.18 | 0.828 | 0.771 | 0.788 | 0.769 | 12 | 1.09 | 0.814 | 0.738 | 0.756 | 0.742 | ||||
0.5 | 10 | 0.4 | 17 | 19 | 1.12 | 0.818 | 0.778 | 0.781 | 0.799 | 18 | 1.06 | 0.818 | 0.742 | 0.756 | 0.774 | ||
0.6 | 15 | 16 | 1.07 | 0.814 | 0.769 | 0.763 | 0.756 | 15 | 1.00 | 0.805 | 0.741 | 0.744 | 0.730 | ||||
20 | 0.4 | 9 | 10 | 1.11 | 0.837 | 0.796 | 0.806 | 0.794 | 9 | 1.00 | 0.818 | 0.773 | 0.765 | 0.763 | |||
0.6 | 8 | 8 | 1.00 | 0.814 | 0.782 | 0.776 | 0.770 | 8 | 1.00 | 0.830 | 0.770 | 0.770 | 0.776 | ||||
0.2 | 0.4 | 10 | 0.4 | 41 | 45 | 1.10 | 0.808 | 0.763 | 0.746 | 0.766 | 43 | 1.05 | 0.804 | 0.759 | 0.762 | 0.764 | |
0.6 | 38 | 40 | 1.05 | 0.802 | 0.740 | 0.742 | 0.739 | 39 | 1.03 | 0.802 | 0.751 | 0.757 | 0.756 | ||||
20 | 0.4 | 21 | 23 | 1.10 | 0.817 | 0.774 | 0.766 | 0.759 | 22 | 1.05 | 0.813 | 0.779 | 0.775 | 0.779 | |||
0.6 | 19 | 20 | 1.05 | 0.802 | 0.749 | 0.769 | 0.748 | 20 | 1.05 | 0.812 | 0.760 | 0.760 | 0.765 | ||||
0.5 | 10 | 0.4 | 27 | 29 | 1.07 | 0.811 | 0.745 | 0.749 | 0.751 | 28 | 1.04 | 0.811 | 0.757 | 0.757 | 0.742 | ||
0.6 | 25 | 26 | 1.04 | 0.808 | 0.746 | 0.730 | 0.741 | 25 | 1.00 | 0.802 | 0.744 | 0.742 | 0.736 | ||||
20 | 0.4 | 14 | 15 | 1.07 | 0.824 | 0.779 | 0.771 | 0.772 | 14 | 1.00 | 0.811 | 0.762 | 0.767 | 0.759 | |||
0.6 | 13 | 13 | 1.00 | 0.808 | 0.758 | 0.766 | 0.753 | 13 | 1.00 | 0.818 | 0.763 | 0.765 | 0.774 | ||||
| |||||||||||||||||
Mean | 1.09 | 0.815 | 0.766 | 0.766 | 0.765 | 1.04 | 0.812 | 0.758 | 0.760 | 0.758 | |||||||
Min |φ − φ̃| | 0.032 | 0.031 | 0.019 | 0.034 | 0.037 | 0.034 | |||||||||||
Max |φ− φ̃| | 0.069 | 0.078 | 0.067 | 0.076 | 0.064 | 0.075 |
Note: ACAR = Attrition Completely At Random; AAR = Attrition At Random; ANAR = Attrition Not At Random
8. Discussion
The proposed replacement strategies reflected on N2(u) (10) and N2(l) (11) are shown very effective resulting nearly unbiased statistical power when the subject-specific slopes are assumed to be fixed despite the fact that the empirical power estimates were obtained based on simulations with maximum likelihood estimates with unknown variances. Furthermore under this fixed slope model, the finding that R(l) (13) in many cases and R(u) (12) in some cases are 1.0 for both ξ = 20% and 30% implies that no additional recruitment of study subjects may be necessary in those cases. It is because in those cases the statistical power φ (5) under assumption of no attrition might be substantially greater than 0.8 with “integer” values of N2 (6).
On the other hand, when the subject-specific slopes are assumed to be random, the replacement strategy yields underestimated statistical power, that is, the magnitudes of empirical power with sample sizes N2(u) (10) and N2(l) (11) are smaller than those of the approximate power φ(u) (8) and φ(l) (9), respectively. The underestimation was more severe for the greater attrition rate. Both R(u) and R(l) are close to 1.0 less than 1.1 in almost all cases regardless of the attrition rates because N2 (6) under no attrition is large (compared to that under fixed model) due to the additional variance of the subject-specific slopes. If the empirical statistical power were more close to unknown “true” statistical power under attrition than the approximate power (8) and (9), then these approximations might overestimate statistical power under random slope models. Although the empirical statistical power is more likely close to the unknown true statistical power, potential sources of the discrepancy between the empirical and the approximate power are unknown. For example, it could be due to inaccuracy of the approximate power, or due to potential loss of power stemming from the unknown variance assumptions in empirical power estimates, or due to both. Regardless, however, some additional adjustments to N2(u) and N2(l) deem necessary to yield unbiased statistical power under the random slope model. The adjustment does not appear to be substantial given that underestimation is about 5% point on average. For example, the sample sizes N2(u) and N2 (l) can be used as a lower reference bound for conducting empirical simulations iteratively by slightly increasing the number of subjects per cluster until empirical power (14) reaches desired statistical power.
In either fixed or random slope model, nevertheless, multiplication of N2(6) by 1+ξ/(1 − ξ) would result in an over-powered study design. However, it is surprising that the performance of the replacement strategy did not depend on attrition mechanisms in view of the empirical power. In particular, the empirical power under ANAR was virtually identical to those under ACAR and AAR regardless of models, attrition rates, and distribution of attrition times. Although we assume monotone attrition patterns and considered scenarios of the three attrition mechanisms are somewhat limited, this finding implies that the replacement strategy is robust against unknown attrition mechanisms.
Several limitations should be considered when the derived approximate sample size determinations are applied. First, model (1) requires a minimum number of parameters in a class of models for longitudinal cluster trail data and thus may not necessarily reflect a real situation. For instance, when the subject-level random intercepts uj(i) and random slopes νj(i) are correlated, the approximate sample size determination might be even more biased. Therefore, if pilot data were available, testing significance of the correlation in addition to testing significance of variance of random slopes would be important to determine whether to apply the derived sample size formulas. Second, the attrition process may not necessarily be monotone and the distribution of attrition rates may not be uniform or linear. Third, real attrition mechanisms are usually unknown in real practice and may not necessarily be a function of the quintile grouping of the outcome variable. Theoretical derivations, if possible, of sample size determinations addressing all of these concerns that could result in minimized bias should deserve future studies.
In conclusion, even though their application might be limited in real practice, the closed form approximate sample size formulas N2(u) and N2(l) should be useful for designing a cluster randomized trial where testing slope differences is a primary goal. If the subject-specific slopes are homogeneous, the approximate determinations should be accurate and unbiased. Otherwise, some adjustments, if not substantial, are needed to secure adequate statistical power.
Acknowledgments
We are grateful to an anonymous referee for valuable comments that improved the contents of our manuscript. The present study was supported in part by the Einstein-Montefiore Center for AIDS research grant P30AI51519.
References
- Ahn C, Tonidandel S, Overall JE. Issues in use of SAS PROC.MIXED to test the significance of treatment effects in controlled clinical trials. Journal of Biopharmaceutical Statistics. 2000;10:265–286. doi: 10.1081/BIP-100101026. [DOI] [PubMed] [Google Scholar]
- Alexopoulos GS, Katz IR, Bruce ML, et al. Remission in depressed geriatric primary care patients: A report from the PROSPECT study. American Journal of Psychiatry. 2005;162:718–724. doi: 10.1176/appi.ajp.162.4.718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen J. Statistical Power Analysis for the Behavioral Science. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988. [Google Scholar]
- Dietrich AJ, Oxman TE, Williams JW, et al. Re-engineering systems for the treatment of depression in primary care: cluster randomised controlled trial. British Medical Journal. 2004;329:602–605. doi: 10.1136/bmj.38219.481250.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hedeker D, Gibbons RD. Longitudinal Data Analysis. Hoboken, NJ: Wiley; 2006. [Google Scholar]
- Heo M, Leon AC. Sample size requirements to detect an intervention by time interaction in longitudinal cluster randomized clinical trials. Statistics in Medicine. 2009;28:1017–1027. doi: 10.1002/sim.3527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heo M, Papademetriou E, Meyers BS. Design characteristics that influence attrition in geriatric antidepressant trials: meta-analysis. International Journal of Geriatric Psychiatry. 2009;24:990–1001. doi: 10.1002/gps.2211. [DOI] [PubMed] [Google Scholar]
- Heo M, Xue XN, Kim MY. Sample size requirements to detect an intervention by time interaction in longitudinal cluster randomized clinical trials with random slopes. Computational Statistics and Data Analysis. doi: 10.1016/j.csda.2012.11.016. (in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
- Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2. Hoboken, NJ: Wiley; 2002. [Google Scholar]
- Longford NT. Random Coefficient Models. New York: Oxford University Press; 1993. [Google Scholar]
- Murray DM, Blitstein JL, Hannan PJ, Baker WL, Lytle LA. Sizing a trial to alter the trajectory of health behaviours: Methods, parameter estimates, and their application. Statistics in Medicine. 2007;26:2297–2316. doi: 10.1002/sim.2714. [DOI] [PubMed] [Google Scholar]
- Overall JE, Ahn C, Shivakumar C, Kalburgi Y. Problematic formulations of SAS PROC.MIXED models for repeated measurements. Journal of Biopharmaceutical Statistics. 1999;9:189–216. doi: 10.1081/BIP-100101008. [DOI] [PubMed] [Google Scholar]
- Roy A, Bhaumik DK, Aryal S, Gibbons RD. Sample size determination for hierarchical longitudinal designs with differential attrition rates. Biometrics. 2007;63:699–707. doi: 10.1111/j.1541-0420.2007.00769.x. [DOI] [PubMed] [Google Scholar]