Detecting Treatment Differences in Group Sequential Longitudinal Studies with Covariate Adjustment

Neal O Jeffries; James F Troendle; Nancy L Geller

doi:10.1111/biom.12837

. Author manuscript; available in PMC: 2020 Sep 24.

Published in final edited form as: Biometrics. 2017 Dec 18;74(3):1072–1081. doi: 10.1111/biom.12837

Detecting Treatment Differences in Group Sequential Longitudinal Studies with Covariate Adjustment

Neal O Jeffries ^1,^*, James F Troendle ¹, Nancy L Geller ¹

PMCID: PMC7515605 NIHMSID: NIHMS1621526 PMID: 29265179

Summary.

In longitudinal studies comparing two treatments over a series of common follow-up measurements, there may be interest in determining if there is a treatment difference at any follow-up period when there may be a non-monotone treatment effect over time. To evaluate this question, Jeffries and Geller (2015) examined a number of clinical trial designs that allowed adaptive choice of the follow-up time exhibiting the greatest evidence of treatment difference in a group sequential testing setting with Gaussian data. The methods are applicable when a few measurements were taken at prespecified follow-up periods. Here, we test the intersection null hypothesis of no difference at any follow-up time versus the alternative that there is a difference for at least one follow-up time. Results of Jeffries and Geller (2015) are extended by considering a broader range of modeled data and the inclusion of covariates using generalized estimating equations. Testing procedures are developed to determine a set of follow-up times that exhibit a treatment difference that accounts for multiplicity in follow-up times and interim analyses.

Keywords: Generalized estimating equations, Generalized linear models, Group sequential design, Longitudinal analysis

1. Introduction

Clinical trials that regularly record measurements over time may be used to determine if there exist differences between treatment arms during the follow-up periods. It may be of interest to know which, if any, period shows the most evidence of a difference and/or which of a number of potential periods show a difference. As an example of the first question, consider a potential therapy for which there is little prior knowledge as to how long after the intervention is given the benefit may be most apparent. The second question may address how long any benefit may last. As a clinical example, we examine quality of life measurements for a standard and experimental intervention in heart failure. We are particularly interested in settings where an intervention’s effect may be non-monotonic and not easily summarized by a single measure such as the slope or area under the curve (AUC). Additionally, there may be interest in stopping the trial early if an interim analysis reveals important differences exist and the methodology incorporates this option.

Group sequential analysis for longitudinal studies is not new. Armitage, Stratton, and Worthington (1985) compared the cumulative sum of normally distributed longitudinal measurements taken at a common set of equally spaced follow-up times. Entry was assumed to occur simultaneously and the authors developed models incorporating autocorrelated within-person error terms and provided approximate values for adjusted significance levels. This work was broadened by Geary (1988) who developed a four-parameter model that included the Armitage, Stratton, and Worthington (1985) model as a submodel but retained the same restrictive assumptions of normality, simultaneous entry, and common set of follow-up times.

Lee and DeMets (1991) extended this work with a linear mixed model approach that allowed for staggered entry and a different number of follow-up times among individuals. The change over time was assumed to follow a simple parametric pattern, for example, linear or quadratic growth. Wu and Lan (1992) and Kittelson, Sharples, and Emerson (2005) used generalizations of area under the response curve formed by an individual’s responses as a summary measure instead of the sum or fitted slope parameter. Gange and DeMets (1996) developed group sequential testing for the generalized estimating equations setting and this work was extended to general covariate settings by Jennison and Turnbull (1997).

Jeffries and Geller (2015) presented a number of adaptive/flexible designs for a group sequential setting for normally distributed data in a two arm randomized trial without using summary measures or prescribing a specific parametric form for the responses over follow-up times. That work is expanded here by considering a generalized estimating equation approach that allows for more general response models and the inclusion of covariates. We compare a GEE based method using the distribution of a max statistic to more conventional approaches for detecting a difference between longitudinal profiles with group sequential testing. In addition, we present an approach to determine which follow-up times show differences at the interim and/or final planned analysis that protects familywise error in the strong sense.

2. Model Description

Consider a trial that randomizes up to a predetermined number of participants, N, to either a control or experimental arm and accrual occurs over a broad time period. Further, the study collects follow-up measures of an outcome of interest on K occasions, for example, every 6 months for 3 years so that K = 6. In addition, suppose M analyses are conducted (1 final analysis and M − 1 interim analyses) in which treatment differences between the two arms will be assessed at the K different follow-up periods. Let δ_k parameterize the true difference between the experimental and control arms at the kth follow-up period. Depending upon the data structure δ_k could represent the difference in mean responses, the log of an odds ratio, a function of regression model coefficients, or other measures of difference. Let Z_k(N_mk) denote an asymptotically normally distributed test statistic used to test the null hypothesis H_0k : δ_k ≤ 0. The N_mk ≤ N is the total number of individuals providing data for the test of treatment difference at the kth follow-up time for the mth analysis. Initially, we consider one-sided tests where a higher response is desirable. Z_k(N_mk) could arise from a number of testing approaches, for example, t-test, a comparison of proportions, or a contrast from a regression model. We assume if δ_k = 0 then Z_k(N_mk) has a standard normal distribution; otherwise, if δ_k > 0 (< 0) then Z_k(N_mk) is still normally distributed, however E {Z_k(N_mk)} > 0 (< 0).

The intersection null hypothesis of interest is $H_{0} = \cap_{k = 1}^{K} H_{0 k} : δ_{k} \leq 0$ . Jeffries and Geller (2015) presented a number of approaches for testing the intersection null in this situation with one interim and a final analysis. Here, we focus on an extension of a method presented there.

Let s(t) denote a spending function for allocating Type I error where 0 ≤ t ≤ 1 denotes the study’s information time. It is anticipated that as many as N participants may provide data for K follow-up measurements. For the first interim analysis define α⁽¹⁾ = s(t₁) where t₁ = the total number of follow-up measurements observed at the interim analysis divided by the total number of follow-up measurements expected if the trial is not stopped early. Let N_1k, k = 1, …, K denote the number of observed measurements for the kth follow-up time at the first interim analysis. The interim analysis test statistics Z₁(N₁₁), …, Z_K(N_1K) are available and let Σ⁽¹⁾ denote the true K × K correlation matrix for Z₁(N₁₁), …, Z_K(N_1K). Then for $Z_{*}^{(1)} = {max}_{k} Z_{k} (N_{1 k})$ we can find a $b_{*}^{(1)}$ such that

P_{H_{0}} (Z_{*}^{(1)} > b_{*}^{(1)}) \leq P_{H_{00}} (Z_{*}^{(1)} > b_{*}^{(1)}) = 1 - \int_{- \infty}^{b_{*}^{(1)}} \dots \int_{- \infty}^{b_{*}^{(1)}} f_{K} {z_{1} (N_{11}), \dots, z_{K} (N_{1 K}); Σ^{(1)}} d z_{1} (N_{11}) \dots d z_{K} (N_{1 K}) = α^{(1)}

(1)

where f_K{·, Σ⁽¹⁾} denotes a multivariate normal distribution with a mean vector 0 and known correlation matrix Σ⁽¹⁾. Here, H₀₀ corresponds to the intersection hypothesis $\cap_{k = 1}^{K} H_{0 k}^{'} : δ_{k} = 0$ . In practice Σ⁽¹⁾ is estimated, say by $\hat{Σ^{(1)}}$ , at the time of the interim analysis and the multivariate normal integration (Genz et al., 2012) may be embedded within a root-finding function to find $b_{*}^{(1)}$ given α⁽¹⁾ and the estimated correlation. Then, we reject the intersection null hypothesis H₀ = ⋂H_0k : δ_k ≤ 0 at the first interim analysis if $Z_{*}^{(1)} > b_{*}^{(1)}$ .

If we do not reject the intersection null hypothesis at the first interim analysis then a second analysis is conducted where we evaluate test statistics Z_k(N_2k), k = 1, …, K where N_2k indicates the number of observations available for the kth follow-up time at the second analysis. As before, calculate t₂ as the total number of observations available at the second analysis divided by the total number of expected observations if the trial continues to its planned conclusion and α⁽²⁾ = s(t₂), the cumulative amount of type I error spent through the second analysis. Now reject the same intersection null hypothesis if $Z_{*}^{(2)} > b_{*}^{(2)}$ where $Z_{*}^{(2)} = {max}_{k} Z_{k} (N_{2 k})$ and $b_{*}^{(2)}$ satisfies

1 - α^{(2)} = P_{H_{00}} (Z_{*}^{(1)} < b_{*}^{(1)}, Z_{*}^{(2)} < b_{*}^{(2)}) = \int_{- \infty}^{b_{*}^{(2)}} \dots \int_{- \infty}^{b_{*}^{(2)}} \int_{- \infty}^{b_{*}^{(1)}} \dots \int_{- \infty}^{b_{*}^{(1)}} f_{2 K} {z_{1} (N_{11}), \dots, z_{K} (N_{1 K}), z_{1} (N_{21}), \dots, z_{K} (N_{2 K}); Σ^{(2)}} d z_{1} (N_{11}) \dots d z_{K} (N_{1 K}) d z_{1} (N_{21}) \dots d z_{K} (N_{2 K}), or equivalently,

(2)

α^{(2)} - α^{(1)} = P_{H_{00}} (Z_{x}^{(1)} < b_{*}^{(1)}, Z_{*}^{(2)} > b_{*}^{(2)}) = \int_{b_{*}^{(2)}}^{\infty} \dots \int_{b_{*}^{(2)}}^{\infty} \int_{- \infty}^{b_{*}^{(1)}} \dots \int_{- \infty}^{b_{*}^{(1)}} f_{2 K} {z_{1} (N_{11}), \dots, z_{K} (N_{1 K}), z_{1} (N_{21}), \dots, z_{K} (N_{2 K}); Σ^{(2)}} d z_{1} (N_{11}) \dots d z_{K} (N_{1 K}) d z_{1} (N_{21}) \dots d z_{K} (N_{2 K})

(3)

and Σ⁽²⁾ is the true 2K × 2K correlation matrix of the test statistics from the first and second analyses and f_2K{·, Σ⁽²⁾} denotes a multivariate normal distribution with mean vector 0 and correlation matrix Σ⁽²⁾. Given α⁽²⁾, an estimate of Σ⁽²⁾, and $b_{*}^{(1)}$ computed at the first analysis, one can again use a root-finding function and multivariate integration to determine $b_{*}^{(2)}$ . The methodology could be extended to multiple interim analyses; in general, given α⁽¹⁾, …, α^(m), $b_{*}^{(1)}, \dots, b_{*}^{(m - 1)}$ , and an estimate of the correlation matrix Σ^(m) one can compute a threshold $b_{*}^{(m)}$ for max_k Z_k(N_mk) using integration like that in equation (2) which satisfies

P_{H_{00}} (Z_{*}^{(1)} < b_{*}^{(1)}, \dots, Z_{*}^{(m - 1)} < b_{*}^{(m - 1)}, Z_{*}^{(m)} > b_{*}^{(m)}) \leq α^{(m)} - α^{(m - 1)} .

In Figure 1, we illustrate the ideas and notation in a setting with 6, 9, and 12 month follow-up periods and a maximum sample size of N = 400. Accrual occurs uniformly and an interim analysis is planned after data from the first 17 months of study time are available. At study month 17, 6 month follow-up data are available for the subjects enrolled during the first 11 months so N₁₁ = 183 and Z₁(N₁₁) = Z₁(183). The test statistic based on the 9 month follow-up data is denoted as Z₂(133) as about 133 individuals are expected to provide 9 month data at the interim analysis. Similarly the test statistic at the interim analysis for 12 month data is Z₃(83). $Z_{*}^{(1)} = max {Z_{1} (183), Z_{2} (133), Z_{3} (83)}$ is calculated and if sufficiently large one concludes there is a significant evidence against the null hypothesis of no positive treatment difference at any follow-up period. In section 4, we consider which follow-up periods show sufficient evidence of a treatment difference, taking into account multiple comparisons.

If $Z_{*}^{(1)}$ does not exceed the $b_{*}^{(1)}$ threshold then accrual continues and a second test statistic $Z_{*}^{(2)} = max {Z_{1} (400), Z_{2} (400), Z_{3} (400)}$ is computed and compared to $b_{*}^{(2)}$ . Web Appendix B describes a straightforward extension of these ideas for testing two-sided hypotheses, that is, $H_{0 k}^{'} : δ_{k} = 0, k = 1, \dots, K$ .

2.1. GEE Context

Here, we present the testing procedures within the context of generalized estimating equations (GEE) with one interim and a final analysis although the results may be generalized to multiple interim analyses. The notation is similar to the development in Liang and Zeger (1986). Let Y_ik, k = 1, …, K, i = 1, …, N denote the kth follow-up measurement of interest from the ith individual randomized. We assume that marginally Y_ik has a generalized linear model structure with density/mass function

f (y_{i k}; β, ϕ) = exp {\frac{y_{i k} θ_{i k} - a (θ_{i k}) + b (y_{i k})}{ϕ}}

where θ_ik = h(η_ik) and η_ik = x_ik^T β and x_ik^T denotes the transpose of the p × 1 vector x_ik. Standard assumptions yield E(Y_ik) = a′(θ_ik) and Var(Y_ik) = ϕa″(θ_ik) where the ′ and ″ denote first and second derivatives with respect to θ_ik and ϕ is a scale parameter. Here h(·) connects the linear predictors, η_ik = x_ik^T β, to θ_ik. It is to be understood that the expectations and variances in this GEE context are conditional upon the covariates x_ik.

The GEE approach is well suited to modeling correlated data from an individual. Let k_i ≤ K denote the number of observed follow-up measures for the ith individual when an analysis is conducted. Define $Y_{i} = {(Y_{i 1}, \dots, Y_{i k_{i}})}^{T}$ and $X_{i} = {(x_{i 1}, x_{i 2}, \dots, x_{i k_{i}})}^{T}$ as a k_i × p covariate matrix and $μ_{i} = {a^{'} (θ_{i 1}), \dots, a^{'} (θ_{i k_{i}})}^{T}$ . The within person variability is modeled by

V_{i} = V a r (Y_{i}) = ϕ A_{i}^{1 / 2} R_{i} (γ) A_{i}^{1 / 2}

where A_i is a k_i × k_i diagonal matrix with a′(θ_ik) on the diagonal and R_i(γ) is a modeled correlation matrix parameterized by γ. V_i is the “working” covariance matrix. With a working assumption of independence, R_i is the identity matrix. Alternatively, with a small set of follow up times, for example K = 3, it is reasonable to use an unstructured correlation matrix R with elements $r_{k_{1} k_{2}}$ that can be estimated. In this case, γ corresponds to the K(K − 1)/2 off-diagonal correlations $r_{k_{1} k_{2}}$ .

Given working covariance parameters estimated by $\hat{ϕ}$ and $\hat{γ}$ the GEE estimates of β are defined as solutions to the following estimating equations:

\sum_{i = 1}^{N} D_{i}^{T} {\hat{V}}_{i}^{- 1} {Y_{i} - μ_{i} (β)} = 0 where D_{i} = \frac{\partial μ_{i} (β)}{\partial β} .

Although different working covariance structures yield different $\hat{β}$ estimates it is shown in Liang and Zeger (1986) that under general conditions and assuming the mean structure is correctly specified then the different $\hat{β}$ are consistent and

\sqrt{N} (\hat{β} - β) \approx {(\frac{1}{N} \sum_{i = 1}^{N} D_{i}^{T} V_{i}^{- 1} D_{i})}^{- 1} \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} D_{i}^{T} V_{i}^{- 1} {Y_{i} - μ_{i} (β)} .

(4)

The RHS of (4) has an asymptotic multivariate Gaussian distribution with mean vector 0 and variance–covariance matrix

{(lim_{N \to \infty} \frac{1}{N} \sum_{i = 1}^{N} D_{i}^{T} V_{i}^{- 1} D_{i})}^{- 1} \times lim_{N \to \infty} \frac{1}{N} \sum_{i = 1}^{N} D_{i}^{T} V_{i}^{- 1} C o v (Y_{i}) V_{i}^{- 1} D_{i} \times {(lim_{N \to \infty} \frac{1}{N} \sum_{i = 1}^{N} D_{i}^{T} V_{i}^{- 1} D_{i})}^{- 1} .

(5)

This robust sandwich variance–covariance matrix depends on the form of V_i assumed and can be estimated given $\hat{ϕ}$ , $\hat{γ}$ , $\hat{β}$ . The square root of the diagonal of this matrix (divided by N) using the estimated parameters yields the estimated robust standard errors in GEE output.

In the context of a two arm randomized clinical trial with an experimental and control arm and data recorded over K follow-up periods, we specify the following model

η_{i k} = x_{i k}^{T} β = β_{C 1} I {k = 1} + \dots + β_{C K} I{k = K} + {W_{i 0}}^{T} β_{0} + β_{D 1} I {k = 1, A r m = Experimental} + \dots + β_{D K} I {k = K, A r m = Experimental}

(6)

Here I{} are indicator variables designating the follow-up period and if the ith person randomized receives the experimental treatment as opposed to the control treatment, β_Ck, k = 1, …, K denotes the effect for the control arm at the kth measurement, and β_Dk, k = 1, …, K denotes the treatment-control difference at the kth follow-up time. W_i0 denotes a vector of baseline covariates measured prior to randomization and β₀ are the associated coefficients. Our intersection null hypothesis is H₀ = ⋂_k H_0k : β_Dk ≤ 0; here β_Dk corresponds to δ_k in the previous section. This formulation does assume the follow-up measurements are taken at approximately the same common set of K time points although no parametric assumptions are made regarding the shape of the response profile over time, for example, linear or quadratic response.

We allow for an interim analysis in which N₁₁ ≥ N₁₂ ≥ ⋯ ≥ N_1K individuals have provided data for each of the follow-up periods and they are partitioned over the treatment and control arms so that each β_Ck and β_Dk can be estimated.

In Jeffries and Geller (2015) a covariance function was derived to estimate the K × K covariance/correlation matrix associated with the β_Dk estimates for normally distributed data without covariates. Here, we can use the output from GEE to extract the relevant K × K portion of the estimated covariance matrix and convert that into the required estimated correlation matrix, $\hat{Σ^{(1)}}$ . The information time for this interim analysis can be taken as the fraction of the eventual total expected number of responses that are observed at the time of the interim analysis.

We define $Z_{*}^{(1)} = {max}_{k} Z_{k} (N_{1 k})$ where $Z_{k} (N_{1 k}) = \hat{β_{D k}^{(1)}} / \hat{s e_{D k}^{(1)}}$ and the $\hat{β_{D k}^{(1)}}$ and $\hat{s e_{D k}^{(1)}}$ terms are the estimated coefficients and estimated standard errors that are available from the GEE output. To estimate Σ⁽¹⁾, first, let $\hat{Ω^{(1)}}$ denote the p × p estimated variance–covariance matrix for the coefficients. Then the estimated correlations between $Z_{k_{1}} (N_{1 k_{1}})$ and $Z_{k_{2}} (N_{1 k_{2}})$ can be obtained from the K × K submatrix of

{diag (\hat{Ω^{(1)}})}^{- 1 / 2} \hat{Ω^{(1)}} {diag (\hat{Ω^{(1)}})}^{- 1 / 2}

corresponding to the $\hat{β_{D k}}$ terms where ${diag (\hat{Ω^{(1)}})}^{- 1 / 2}$ denotes a diagonal matrix with ψ(x) = x^−1/2 applied to the diagonal elements of $\hat{Ω^{(1)}}$ . Using this $\hat{Σ^{(1)}}$ , $b_{1}^{*}$ is computed from equation (1) and the intersection null hypothesis H₀ = ⋂_k H_0k : β_Dk ≤ 0 is rejected if $Z_{*}^{(1)} > b_{1}^{*}$ .

For the case of M = 2, that is only one interim and a final analysis, if we fail to reject at the interim analysis, then we continue to full accrual of data and conduct a similar analysis at the end of the study using

{Z_{1} (N_{21}), \dots, Z_{K} (N_{2 K})} = (\hat{β_{D 1}^{(2)}} / \hat{s e_{D 1}^{(2)}}, \dots, \hat{β_{D K}^{(2)}} / \hat{s e_{D K}^{(2)}})

where the (2) superscripts and “2” subscripts in the N_2k terms indicate these quantities are based on all the data available at the end of the study. In the absence of missing data N_2k = N for all k.

To find an appropriate cutoff at the end of the study, we need the correlation matrix, Σ⁽²⁾, for all 2K variables {Z₁(N₁₁), …, Z_K(N_1K), Z₁(N₂₁), …, Z_K(N_2K)}. Note that although Σ⁽¹⁾ has dimension K × K, Σ⁽²⁾ has dimension 2K × 2K. The upper left K × K submatrix of Σ⁽²⁾ is estimated by $\hat{Σ^{(1)}}$ obtained at the interim analysis. The lower right K × K submatrix of Σ⁽²⁾ is estimated by GEE output obtained from the final analysis in the same way Σ⁽¹⁾ was estimated, that is, the relevant K × K submatrix of

{diag (\hat{Ω^{(2)}})}^{- 1 / 2} \hat{Ω^{(2)}} {diag (\hat{Ω^{(2)}})}^{- 1 / 2}

where $\hat{Ω^{(2)}}$ is the estimated variance–covariance matrix from the second analysis.

Now consider the entries of Σ⁽²⁾ in rows 1 through K and columns K + 1 through 2K. A more general expression for determining $C o v {Z_{k_{1}} (N_{1 k_{1}}), Z_{k_{2}} (N_{2 k_{2}})}$ can be obtained by first writing the approximation in equation (4) for the interim and final analysis:

\sqrt{N^{(m)}} (\hat{β^{(m)}} - β) \approx {\frac{1}{N^{(m)}} \sum_{i = 1}^{N^{(m)}} {(D_{i}^{(m)})}^{T} {(V_{i}^{(m)})}^{- 1} D_{i}^{(m)}}^{- 1} \times \frac{1}{\sqrt{N^{(m)}}} \sum_{i = 1}^{N^{(m)}} {(D_{i}^{(m)})}^{T} {(V_{i}^{(m)})}^{- 1} {Y_{i}^{(m)} - μ_{i} {(β)}^{(m)}}

where (m) = (1) and (m) = (2) corresponds to quantities determined at the interim and final analysis, respectively. N^(m) corresponds to the maximum number of observations at any of the follow-up periods for the mth analysis—it should typically correspond to N_m1, the number of observations at the first follow-up period in the mth analysis. The asymptotic methods and assumptions that show (5) is the appropriate variance–covariance matrix for (4) can be used to show that, asymptotically,

C o v {\sqrt{N^{(1)}} (\hat{β^{(1)}} - β), \sqrt{N^{(2)}} (\hat{β^{(2)}} - β)} = {lim_{N^{(1)} \to \infty} \frac{1}{N^{(1)}} \sum_{i = 1}^{N^{(1)}} {(D_{i}^{(1)})}^{T} {(V_{i}^{(1)})}^{- 1} D_{i}^{(1)}}^{- 1} \times lim_{N^{(1)} \to \infty} \frac{1}{\sqrt{N^{(1)} N^{(2)}}} \sum_{i = 1}^{N^{(1)}} {(D_{i}^{(1)})}^{T} {(V_{i}^{(1)})}^{- 1} C o v (Y_{i}^{(1)}, Y_{i}^{(2)}) {(V_{i}^{(2)})}^{- 1} D_{i}^{(2)} \times {lim_{N^{(2)} \to \infty} \frac{1}{N^{(2)}} \sum_{i = 1}^{N^{(2)}} {(D_{i}^{(2)})}^{T} {(V_{i}^{(2)})}^{- 1} D_{i}^{(2)}}^{- 1}

(7)

where we consider N⁽¹⁾/N⁽²⁾ a fixed fraction < 1 as both terms go to ∞. An estimate of this matrix can be computed with a moderate amount of programming and the output from the interim and final analyses. The resulting estimated covariance matrix can be converted into an estimated correlation matrix by appropriate division by diagonal elements in GEE output. Details of these computations are presented in Web Appendix C.

Hence, one can construct $\hat{Σ^{(2)}}$ , a 2K × 2K estimated correlation matrix for Z₁(N₁₁), …, Z_K(N_1K), Z₁(N₂₁), …, Z_K(N_2K). Using $\hat{Σ^{(2)}}$ the appropriate values for $b_{*}^{(2)}$ can be calculated using (2) and the intersection null hypothesis is rejected at the final analysis if $Z_{*}^{(2)} = {max}_{k} Z_{k} (N_{2 k}) > b_{*}^{(2)}$ . Although the development in this section was written with M = 2 analyses, the generalization to more than one interim and final analysis is straightforward.

As noted in Liang and Zeger (1986), this procedure works when data are missing completely at random as is the case for an interim analysis where missing data arise solely because not enough follow-up time has elapsed for some individuals. These authors also note other instances in which a weaker missing at random assumption may be sufficient, for example, if the assumed form of the working correlation matrix R is correct (as would be the case with an unstructured correlation matrix) with Gaussian or binary outcomes.

3. Simulations

Simulations for type I error and power are based on the following data generation model:

Y_{i k} = β_{C 1} I {k = 1} + β_{C 2} I {k = 2} + β_{C 3} I {k = 3} + β_{A g e} A g e_{i} + β_{D 1} I {k = 1, Treat = Exper} + β_{D 2} I {k = 2, Treat = Exper} + β_{D 3} I {k = 3, Treat = Exper} + ϵ_{i k}, k \in {1, 2, 3} and Y_{i 0} = β_{A g e} A g e_{i} + ϵ_{i 0}

(8)

where Age_i and Y_i0 correspond to baseline age and baseline value of Y for the ith person. Age is transformed to have a standard normal distribution. Within person correlation was driven by correlation in the ϵ_ik terms that are Gaussian with mean 0 and with pairwise covariance Cov(ϵ_ij, ϵ_ik) exp(−|T_j − T_k|/15) where T₀ = 0, T₁ = 6, T₂ = 9, and T₃ = 12 months. Accrual occurred randomly in both arms at a uniform rate over an 18 month period. The outcome measure was assessed at baseline, and at 6 months, 9 months, and 1 year after baseline. In all simulations, β_Ck = 0 while the β_Dk varied according to the simulation scenario. One interim analyses was conducted when 50% of the randomized individuals provided a measurement for the 3rd follow-up period, that is, N_1K = 0.5N. A final analysis was also conducted if results for the interim analysis were not significant.

A number of testing procedures were evaluated, each designed for a 5% error rate:

A Bonferroni adjustment approach in which the change from baseline to the kth follow-up period is the outcome measure and tested via a two group t-test. The p-value threshold for each of the 3 tests at the interim analysis is s(0.62)/3 where s(t) = 0.05t³ is the spending function and approximately 62% of the expected responses are observed given the stated accrual patterns and follow-up times. This spending function was chosen for simplicity but is otherwise arbitrary. The p-value threshold at the final analysis is given by {0.05 − s(0.62)/3.
a max T test approach as described in Jeffries and Geller (2015). This approach uses the change from baseline to the kth follow-up period as the outcome measure and employs a two group t-test. The thresholds for significance for an interim and final analysis are based on the same ideas for determining the $b_{*}^{(1)}$ and $b_{*}^{(2)}$ here, but a different correlation matrix is required. No use is made of covariate information.
GEE approach based on $Z_{*}^{(m)} = {max}_{k} \hat{β_{D k}^{(m)}} / \hat{s e_{D k}^{(m)}}, m = 1, 2$ using (6) with Y_i0 as the only covariate.
GEE approach based on $Z_{*}^{(m)} = {max}_{k} \hat{β_{D k}^{(m)}} / \hat{s e_{D k}^{(m)}}, m = 1, 2$ with baseline age and Y_i0 as covariates.
A GEE approach based on the model in (d) in which $Z_{k} (N_{m k}) = \hat{β_{D k}^{(m)}} / \hat{s e_{D k}^{(m)}}, k = 1, \dots, K$ are sequentially monitored, each at a Bonferroni corrected alpha level 0.05/K. This approach includes baseline age and Y_i0 as covariates but does not use the distribution of $Z_{*}^{(m)} = {max}_{k} \hat{β_{D k}^{(m)}} / \hat{s e_{D k}^{(m)}}$ .

Table 1 shows there is some Type I error inflation for smaller sample sizes in the non-Bonferroni approaches. The inflation can be reduced, in some cases substantially, by using a t distribution instead of a multivariate normal distribution when finding thresholds as in equations (1) and (2). As the sample size increases the error inflation dissipates. Otherwise, each approach shows appropriate Type I error control across the range of scenarios although the Bonferroni based methods are conservative, as expected. The strong agreement in numerical values across rows reflects that the same random number seeds were used to generate the data although some slight residual variation across rows still arises from the use of Monte Carlo simulation in the multivariate integration process (Genz et al., 2012) and occasional convergence difficulties. (Convergence problems occur for less than 0.01% of the simulations; in these tables a failed convergence for one method suppresses that simulation’s results for all 5 methods.) Aside from variation from convergence problems, results for the t-test methods (approaches (a) and (b)) should only vary by sample size and whether a normal or t distribution was used. Methods using the GEE will vary in addition by the working correlation structure assumption and whether an Age coefficient is included or not.

Table 1.

Type I error simulations

Max enroll per arm	Working corr.	β_Age	(a) Bonferroni	(b) t-test	(c) GEE w/o age covariate	(d) GEE w/age covariate	(e) GEE w/age and Bonferroni correction
100	Indep	0	0.03223	0.05264	0.05385	0.05438	0.04277
100(t dist)	Indep	0	0.02920	0.04916	0.05041	0.05052	0.04086
100	Unstruc	0	0.03222	0.05263	0.05370	0.05396	0.04145
100(t dist)	Unstruc	0	0.02919	0.04915	0.05026	0.05047	0.03963
100	Indep	0.30	0.03223	0.05265	0.05380	0.05437	0.04277
100(t dist)	Indep	0.30	0.02920	0.04916	0.05041	0.05052	0.04086
100	Unstruc	0.30	0.03222	0.05263	0.05370	0.05396	0.04145
100(t dist)	Unstruc	0.30	0.02919	0.04915	0.05026	0.05047	0.03963
200	Indep	0	0.03122	0.05088	0.05211	0.05219	0.04045
200(t dist)	Indep	0	0.02989	0.04896	0.05029	0.05031	0.03944
200	Unstruc	0	0.03122	0.05088	0.05216	0.05190	0.03915
200(t dist)	Unstruc	0	0.02989	0.04896	0.04988	0.05015	0.03817
200	Indep	0.30	0.03122	0.05088	0.05271	0.05218	0.04045
200(t dist)	Indep	0.30	0.02989	0.04896	0.05029	0.05031	0.03944
200	Unstruc	0.30	0.03122	0.05088	0.05216	0.05190	0.03915
200(t dist)	Unstruc	0.30	0.02989	0.04896	0.04988	0.05015	0.03817
400	Indep	0	0.03131	0.05120	0.05221	0.05243	0.04013
400(t dist)	Indep	0	0.03057	0.05025	0.05116	0.05132	0.03964
400	Unstruc	0	0.03131	0.05120	0.05226	0.05255	0.03915
400(t dist)	Unstruc	0	0.03057	0.05025	0.05137	0.05159	0.03862
400	Indep	0.30	0.03131	0.05120	0.05198	0.05241	0.04013
400(t dist)	Indep	0.30	0.03057	0.05030	0.05087	0.05132	0.03964
400	Unstruc	0.30	0.03131	0.05120	0.05195	0.05252	0.03915
400(t dist)	Unstruc	0.30	0.03057	0.05033	0.05114	0.05160	0.03862

Open in a new tab

Note: Each scenario/row was based on 100,000 simulations. The standard error for estimated Type I error is approximately 0.0007. Scenarios differ by the number enrolled, whether a multivariate normal or t distribution was used to determine the thresholds, whether the working correlation used independence assumption or an unstructured framework, and the value of the β_Age coefficient. β_Dk = 0 for k = 1, 2, 3. When a multivariate t distribution was used, the common degrees of freedom was based on the number of observations available for the interim analysis.

Table 2 shows power for different scenarios, and we see important differences here. The approaches based on t-tests rather than GEE models suffer from a loss of power. The t-tests are based on differences between follow-up periods from baseline, that is, Y_i0 is subtracted from follow-up values whereas the modeling approaches use Y_i0 as a RHS covariate. These power differences are what would be expected when comparing change score approaches to analysis of covariance. Predictably, the t-test approaches (a) and (b) show no changes in power with various β_Age values. The GEE approach without Age in the model shows deteriorating performance as the magnitude of the Age effect increases—thus reflecting more misspecification. When the Age effect is 0, models (c) and (d) are essentially the same. The power of the GEE models (methods (d) and (e)) with Age as a covariate do not change as the effect of Age increases—examination of the data generation model shows increasing β_Age will not change estimates or standard error estimates of the β_Dk values. All approaches show increasing power with increasing values of β_D3. Method (d) generally shows superior power and the benefit of including relevant covariate information.

Table 2.

Power for 5 approaches, 10,000 simulations

Working corr. structure	β_age	β_D3	(a) Bonferroni	(b) t-test	(c) GEE w/o age covariate	(d) GEE w/age covariate	(e) GEE w/age and Bonferroni correction
Unstr	0	0.20	0.3794	0.4702	0.5863	0.5867	0.5326
Unstr	0.50	0.20	0.3794	0.4702	0.5690	0.5870	0.5326
Unstr	1.00	0.20	0.3794	0.4702	0.5230	0.5868	0.5326
Unstr	1.50	0.20	0.3794	0.4702	0.5031	0.5868	0.5327
Unstr	0	0.25	0.5628	0.6547	0.7773	0.7769	0.7364
Unstr	0.50	0.25	0.5628	0.6547	0.7518	0.7769	0.7364
Unstr	1.00	0.25	0.5628	0.6547	0.7125	0.7769	0.7364
Unstr	1.50	0.25	0.5629	0.6547	0.6922	0.7768	0.7365
Unstr	0	0.30	0.7344	0.8047	0.9061	0.9062	0.8806
Unstr	0.50	0.30	0.7345	0.8048	0.8868	0.9062	0.8806
Unstr	1.00	0.30	0.7344	0.8048	0.8571	0.9062	0.8806
Unstr	1.50	0.30	0.7344	0.8048	0.8406	0.9063	0.8806
Indep	0	0.20	0.3794	0.4702	0.5846	0.5848	0.5358
Indep	0.50	0.20	0.3794	0.4702	0.5549	0.5847	0.5358
Indep	1.00	0.20	0.3794	0.4702	0.5203	0.5847	0.5358
Indep	1.50	0.20	0.3794	0.4702	0.5012	0.5847	0.5358
Indep	0	0.25	0.5628	0.6547	0.7746	0.7740	0.7355
Indep	0.50	0.25	0.5628	0.6547	0.7489	0.7743	0.7355
Indep	1.00	0.25	0.5628	0.6547	0.7081	0.7743	0.7355
Indep	1.50	0.25	0.5628	0.6547	0.6871	0.7742	0.7355
Indep	0	0.30	0.7344	0.8047	0.9040	0.9040	0.8808
Indep	0.50	0.30	0.7344	0.8048	0.8853	0.9040	0.8808
Indep	1.00	0.30	0.7344	0.8048	0.8539	0.9040	0.8808
Indep	1.50	0.30	0.7344	0.8048	0.8363	0.9040	0.8808

Open in a new tab

Note: Each scenario/row was based on 10,000 simulations. The standard error for estimated power is bounded by 0.005. A normal distribution (rather than a t distribution) was used to calculate threshold values.

The differences arising from an independence versus unstructured correlation structure are minor, except when the interim analysis is examined. As mentioned in Liang and Zeger (1986) differences arising from working correlation assumptions may be smaller for balanced data—unbalanced data will lead to larger differences. Unbalanced data arise at the interim analysis as some individuals have 1, 2, and 3 follow-up observations. When data are balanced (e.g., the final analysis when everyone has 3 follow-up observations) there is no appreciable difference between the independence and unstructured approaches. See Web Appendix A for results showing these interim analyses results and further simulations with smaller sample sizes, M = 3 analyses, K = 5 follow-up periods, and compound symmetry dictating the true correlation between ϵ_ij and ϵ_ik, j, k ≥ 0 in equation (8). The results suggest some care must be taken to evaluate the robustness of estimates to differences in working correlation assumptions and that the use of the t distribution may be overly conservative when sample sizes are small at the interim analysis.

The results show that utilization of a model with baseline outcomes can substantially increase power over an unmodeled approach and suggest the benefit is likely greater still if other important covariates are included.

4. Determining which Follow-Up Periods Show Differences

Thus far the methodology has focused on determining if any follow-up period shows a difference. However, there may be interest in determining which set of follow-up periods show a difference and doing so in a way that accommodates multiplicity concerns—concerns related to the number of follow-up periods as well as the number of interim and final analyses.

In the one-sided testing context of K follow-up times and M sequential analyses with cumulative alpha thresholds α^(m), m = 1, …, M, we have boundary thresholds $b_{*}^{(m)}$ satisfying

P_{H_{00}} (Z_{*}^{(1)} > b_{*}^{(1)}) \leq α^{(1)}

P_{H_{00}} (Z_{*}^{(1)} < b_{*}^{(1)}, \dots, Z_{*}^{(m - 1)} < b_{*}^{(m - 1)}, Z_{*}^{(m)} > b_{*}^{(m)}) \leq α^{(m)} - α^{(m - 1)} .

For one-sided testing, consider test procedure A defined as follows: Let m′ satisfy $Z_{*}^{(1)} < b_{*}^{(1)}, \dots, Z_{*}^{(m^{'} - 1)} < b_{*}^{(m^{'} - 1)}$ , $Z_{*}^{(m^{'})} > b_{*}^{(m^{'})}$ . Reject H_0k : δ_k ≤ 0 for all k with $Z_{k} (N_{m^{'} k}) > b_{*}^{(m^{'})}$ , that is, reject H_0k for all corresponding test statistics exceeding the first crossed boundary.

Lemma 1. Testing Procedure A controls familywise error in the strong sense.

Proof of this lemma is shown in Appendix A. A similar Lemma can be constructed for two-sided testing and is shown as Lemma 2 in Web Appendix B with a proof.

5. Application: SOLVD Trial

The Studies of Left Ventricular Dysfunction (SOLVD) treatment trial was a double-blind, randomized, placebo-controlled trial to assess the effect of enalapril (an ACE inhibitor vasodilator) on mortality in a heart failure population (The SOLVD Investigators, 1991). A quality of life survey was administered at baseline, 6 weeks, 1 year, and 2 years after baseline (Rogers et al., 1994). The survey assessed each participant’s overall general health self-perception and was recorded on a 5 point scale (recoded so that higher scores indicate better quality). Table 3 shows the mean and standard deviation of the self-assessment score at the various time points. Attention is restricted to those who completed the survey at all three follow-up time points. The New York Heart Association heart failure score is a four point measure of the degree of heart failure and was obtained as a baseline measurement in the study. Here, we use it as a baseline covariate in modeling the general health score denoted Y_ik:

Y_{i k} = β_{C 1} I {t_{k} = 6 w k} + β_{C 2} I {t_{k} = 1 y r} + β_{C 3} I {t_{k} = 2 y r} + β_{D 1} I {t_{k} = 6 w k, Treat = enalapril} + β_{D 2} I {t_{k} = 1 y r, Treat = enalapril} + β_{D 3} I {t_{k} = 2 y r, Treat = enalapril} + β_{N} N Y H A_{i 0} + β_{0} Y_{i 0} + ϵ_{i k}

where the ϵ_ik follow a Gaussian distribution.

Table 3.

Summary statistics for General Health Self-Assessment in SOLVD. Higher scores indicate better self-reported general health. Only participants with complete data over three follow-up periods are counted here.

	Placebo, N_Plac = 514		Enalapril N_Enal = 537
	Mean	Std dev.	Mean	Std dev.
Baseline	2.53	0.91	2.54	0.96
6 Weeks	2.63	0.94	2.73	0.94
1 Year	2.75	0.92	2.76	0.96
2 Year	2.70	0.94	2.76	0.94

Open in a new tab

Enrollment in SOLVD occurred over a 34 month period. Here, we present an interim analysis occurring when approximately 25% of the 2 year outcome data are available. This corresponds to about 60% of 1 year, and 91% of 6 week data. This analysis is illustrative, that is, it was not performed as part of the SOLVD study, and the availability of the data at the interim analysis follows from assuming uniform accrual and entry in the order of the study ID number in the publicly available data (see Web Appendix D for the SOLVD data source). The t-statistics for the interim analysis were (t_6wk, t_1yr, t_2yr) = (1.59, 0.80, 1.70). For the Bonferroni approach (a) the corresponding threshold was 2.74 and the threshold for the approach based on the maximum of the t-statistic was 2.73. Consequently neither the (a) nor (b) approach based on t-statistics reached significance (at a 5% level). GEE models were computed with and without the NYHA covariate. The corresponding z–statistics for the three follow-up periods without the NYHA information were (z_6wk, z_1yr, z_2yr) = (1.91, 0.97, 1.04) while those with the covariates were (2.05, 1.00, 1.02). For both models, the threshold for the maximum of the z–statistics was 2.73. The test statistics for method (e) are those used in method (d) and the threshold value is 2.74. Consequently, none of the three GEE methods reached their thresholds for significance at the interim analysis.

A second analysis used all data available at the end of the study. The t-statistics were (t_6wk, t_1yr, t_2yr) = (1.74, −0.07, 0.66). The Bonferroni and max t-statistic thresholds were 2.21 and 2.09, respectively, so neither approach reached significance. The z–statistics for the GEE model without NYHA were (2.06, 0.06, 0.89) and those for the model with the NYHA covariate were (2.15, 0.11, 0.90). The threshold for models (c) and (d) was 2.11 and the threshold for method (e) was 2.16. Consequently only approach (d), the GEE approach using the NYHA information and incorporating correlation between follow-up time periods leads to a rejection of the intersection null hypothesis at the 5% level of significance and the conclusion that a significant improvement exists for the six week measure of self-perceived general health.

6. Discussion

We presented a flexible approach for analyzing longitudinal data in a group sequential setting that is especially suited for non-monotone treatment differences over time. The use of indicator variables in equation (6) allows the model to capture patterns of treatment differences that are not easily expressed by simple parametric functions or summary measures like AUC.

The approach allows for covariates that increases power relative to change score models and approaches that do not employ covariates. The method uses existing software and is therefore relatively simple to implement. The approach should be generalizable to other settings with covariates such as mixed-effects models. In addition, we have shown that a procedure that rejects the null hypothesis of no treatment difference for all follow-up periods with test statistics exceeding the boundary threshold will maintain familywise error in the strong sense.

It is noteworthy that the test statistics available at an interim analysis are based on different amounts of data and earlier follow-up periods should typically have more observations. If there exists the same magnitude of positive treatment difference for each of the K follow-up periods, that is, δ_k = δ for all k = 1, …, K, the larger observed sample size for the earlier follow-up periods will tend to produce larger test statistics. Consequently there may be a tendency in this approach for interim analyses to indicate differences at earlier follow-up periods than would be observed in an analysis at the planned study conclusion with all follow-up data available. This may be undesirable in some instances, for example, if there is interest in knowing how long a treatment difference lasts. This effect could be removed by basing interim test statistics only on a common set of individuals (e.g., those who have been in the study long enough to reach the Kth follow-up period) but this has the disadvantage of not using all available data.

Although the method was presented as if the interim analyses require sufficient accrual so that there are some data for all K follow-up periods, that does not need to be the case. For example, the first interim analysis does not require that some individuals reach the last follow-up period. In this case not all the analyses will involve K test statistics, however the notation and computations are easily altered for this situation.

Among the limitations of the approach is that we assume the timing of planned follow-up measurements is the same for each individual—such similarity makes it easy to model non-parametric patterns with indicator functions. Also, it may be possible to sharpen the boundaries for the procedure that controls familywise error in the strong sense using ideas from Marcus, Peritz, and Gabriel (1976); further work will explore this possibility. However, this closed testing approach will entail nontrivial computational burdens if K is not small. As is often the case, parameter estimates from designs that focus attention on the most extreme test statistics may produce parameter estimates that are subject to selection bias. Future work will explore how to address these restrictions and concerns.

7. Supplementary Materials

Web Appendix A (referenced in Section 3), Web Appendix B (referenced in Sections 2 and 4), Web Appendix C (referenced in Section 2), Web Appendix D (referenced in Section 5), and R code for conducting simulations in Section 3 are available with this article at the Biometrics website on Wiley Online Library.

Supplementary Material

Supllement

NIHMS1621526-supplement-Supllement.pdf^{(114.9KB, pdf)}

r-code-for-simulations

NIHMS1621526-supplement-r-code-for-simulations.r^{(24.6KB, r)}

Acknowledgements

This work utilized the resources of the NIH HPC Biowulf cluster. The authors are employees of the National Heart, Lung, and Blood Institute. The views expressed in this article are the authors’ and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; National Institutes of Health; or the United States Department of Health and Human Services.

Appendix A

Lemma 1. Test Procedure A controls FWE strongly with level ≤ α^(M).

Proof. Let τ = i₁, …, i_v denote subset that one-sided the of {1, …, K} such the null hypotheses are true, that is, $H_{0 i_{u}} : δ_{i_{u}} \leq 0$ for u = 1, …, v. Let H_τ denote the corresponding intersection null hypothesis $H_{τ} = \cap_{u = 1}^{v} H_{0 i_{u}} : δ_{i_{u}} \leq 0$ . Let $\underline{δ} = (δ_{1}, \dots, δ_{K})$ . Under H_τ the v elements in $\underline{δ}$ corresponding to τ are non-positive, the remaining elements of $\underline{δ}$ are positive. Define ${\underline{δ}}^{-}$ to have kth component ${\underline{δ}}_{k}^{-} = min ({\underline{δ}}_{k}, 0)$ so that ${\underline{δ}}^{-}$ has non-positive elements. The only components that differ between $\underline{δ}$ and ${\underline{δ}}^{-}$ correspond to the K − v follow-up times not represented in H_τ. □

Recall from Section 2 that H₀₀ corresponds to δ_k = 0, for all k = 1, …, K. We denote the mean of $Z_{k} (N_{m k}) = E_{k}^{(m)}$ for k = 1, …, K. $E_{k}^{(m)}$ is a function of δ_k, N_mk, and possible nuisance parameters such that the sign of $E_{k}^{(m)}$ matches the sign of δ_k (and if one is zero, then the other is zero). Further define $U_{k}^{(m)} = Z_{k} (N_{m k}) - E_{k}^{(m)}$ so that $U_{1}^{(1)}, \dots, U_{K}^{(1)}, \dots, U_{1}^{(M)}, \dots, U_{K}^{(M)}$ is multivariate normal of dimension M × K with zero means and the same correlation structure as the corresponding Z_k(N_mk) values.

By definition Test Procedure A rejects a hypothesis k′ if

Z_{k^{'}} (N_{1 k^{'}}) > b_{*}^{(1)} or

Z_{*}^{(1)} < b_{*}^{(1)}, \dots, Z_{*}^{(m - 1)} < b_{*}^{(m - 1)}, Z_{*}^{(m)} > b_{*}^{(m)} and

Z_{k^{'}} (N_{m k^{'}}) > b_{*}^{(m)} for some m > 1.

For each m = 1, …, M define $Z_{* τ}^{(m)} = max {Z_{i_{1}} (N_{m i_{1}}), \dots, Z_{i_{v}} (N_{m i_{v}})}$ , that is, the maximal z–statistic among the corresponding true hypotheses at analysis stage m. Then a Type I error occurs if and only if $Z_{* τ}^{(m^{'})} > b_{*}^{(m^{'})}$ where m′ denotes the first analysis in which $Z_{*}^{(m)} > b_{*}^{(m)}$ We want to show $P_{\underline{δ}} (Any Type I Error for Procedure A) \leq α^{(M)}$ .

P_{\underline{δ}} (Any Type I Error for Procedure A) = P_{\underline{δ}} [Z_{* τ}^{(1)} > b_{*}^{(1)} \cup {\cup_{m = 2}^{M} (Z_{*}^{(1)} < b_{*}^{(1)}, \dots, Z_{*}^{(m - 1)} < b_{*}^{(m - 1)}, Z_{* τ}^{(m)} > b_{*}^{(m)})}] \leq P_{\underline{δ}} {\cup_{m = 1}^{M} (Z_{* τ}^{(m)} > b_{*}^{(m)})} = P_{{\underline{δ}}^{-}} {\cup_{m = 1}^{M} (Z_{* τ}^{(m)} > b_{*}^{(m)})}

(because \underline{δ} and {\underline{δ}}^{-} agree on components {δ_{i_{1}}, \dots, δ_{i_{v}}}) \leq P_{{\underline{δ}}^{-}} {\cup_{m = 1}^{M} (Z_{*}^{(m)} > b_{*}^{(m)})} = P_{{\underline{δ}}^{-}} (\cup_{m = 1}^{M} [max {Z_{1} (N_{m 1}), \dots, Z_{K} (N_{m K})} > b_{*}^{(m)}]) = P_{{\underline{δ}}^{-}} (\cup_{m = 1}^{M} [max {U_{1}^{(m)} + E_{1}^{(m)}, \dots, U_{K}^{(m)} + E_{K}^{(m)}} > b_{*}^{(m)}]) \leq P_{{\underline{δ}}^{-}} (\cup_{m = 1}^{M} [max {U_{1}^{(m)}, \dots, U_{K}^{(m)}} > b_{*}^{(m)}]) (because E_{k}^{(m)} \leq 0) = P_{H_{00}} (\cup_{m = 1}^{M} {Z_{*}^{(m)} > b_{*}^{(m)}}) = P_{H_{00}} (Z_{*}^{(1)} > b_{*}^{(1)}) + \sum_{m = 2}^{M} P_{H_{00}} (Z_{*}^{(1)} < b_{*}^{(1)}, \dots, Z_{*}^{(m - 1)} < b_{*}^{(m - 1)}, Z_{*}^{(m)} > b_{*}^{(m)}) \leq α^{(M)}

This demonstrates the lemma.

References

Armitage P, Stratton IM, and Worthington HV (1985). Repeated significance tests for clinical trials with a fixed number of patients and variable follow-up. Biometrics 41, 353.–. [PubMed] [Google Scholar]
Gange SJ and DeMets DL (1996). Sequential Monitoring of Clinical Trials with Correlated Responses. Biometrika 83, 157.–. [Google Scholar]
Geary DN (1988). Sequential testing in clinical trials with repeated measurements. Biometrika 75, 311.–. [Google Scholar]
Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, et al. (2012). mvtnorm: Multivariate Normal and t Distributions. R package version 0.9–9992 URL http://CRAN.R-project.org/package=mvtnorm. [Google Scholar]
Jeffries N and Geller NL (2015). Longitudinal clinical trials with adaptive choice of follow-up time. Biometrics 71, 469.–. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jennison C and Turnbull BW (1997). Group-Sequential Analysis Incorporating Covariate Information. Journal of the American Statistical Association 92, 1330.–. [Google Scholar]
Kittelson JM, Sharples K, and Emerson SS (2005). Group sequential clinical trials for longitudinal data with analyses using summary statistics. Statistics in Medicine 24, 2457.–. [DOI] [PubMed] [Google Scholar]
Lee JW and DeMets DL (1991). Sequential comparison of changes with repeated measures data. Journal of the American Statistical Association 86, 757.–. [Google Scholar]
Liang KY and Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13.–. [Google Scholar]
Marcus R, Peritz E, and Gabriel KR (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63, 655.–. [Google Scholar]
Rogers WJ, Johnstone DE, Yusuf S, Weiner DH, Gallagher P, Bittner VA, et al. (1994). Quality of life among 5025 patients with left ventricular dysfunction randomized between placebo and enalapril: The studies of left ventricular dysfunction. Journal of the American College of Cardiology 23, 393.–. [DOI] [PubMed] [Google Scholar]
The SOLVD Investigators (1991). Effect of enalapril on survival in patients with reduced left ventricular ejection fractions and congestive heart failure. The New England Journal of Medicine 325, 293.–. [DOI] [PubMed] [Google Scholar]
Wu MC, Lan KKG (1992). Sequential monitoring for comparison of changes in a response variable in clinical studies. Biometrics 48, 765.–. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supllement

NIHMS1621526-supplement-Supllement.pdf^{(114.9KB, pdf)}

r-code-for-simulations

NIHMS1621526-supplement-r-code-for-simulations.r^{(24.6KB, r)}

[R1] Armitage P, Stratton IM, and Worthington HV (1985). Repeated significance tests for clinical trials with a fixed number of patients and variable follow-up. Biometrics 41, 353.–. [PubMed] [Google Scholar]

[R2] Gange SJ and DeMets DL (1996). Sequential Monitoring of Clinical Trials with Correlated Responses. Biometrika 83, 157.–. [Google Scholar]

[R3] Geary DN (1988). Sequential testing in clinical trials with repeated measurements. Biometrika 75, 311.–. [Google Scholar]

[R4] Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, et al. (2012). mvtnorm: Multivariate Normal and t Distributions. R package version 0.9–9992 URL http://CRAN.R-project.org/package=mvtnorm. [Google Scholar]

[R5] Jeffries N and Geller NL (2015). Longitudinal clinical trials with adaptive choice of follow-up time. Biometrics 71, 469.–. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Jennison C and Turnbull BW (1997). Group-Sequential Analysis Incorporating Covariate Information. Journal of the American Statistical Association 92, 1330.–. [Google Scholar]

[R7] Kittelson JM, Sharples K, and Emerson SS (2005). Group sequential clinical trials for longitudinal data with analyses using summary statistics. Statistics in Medicine 24, 2457.–. [DOI] [PubMed] [Google Scholar]

[R8] Lee JW and DeMets DL (1991). Sequential comparison of changes with repeated measures data. Journal of the American Statistical Association 86, 757.–. [Google Scholar]

[R9] Liang KY and Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13.–. [Google Scholar]

[R10] Marcus R, Peritz E, and Gabriel KR (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63, 655.–. [Google Scholar]

[R11] Rogers WJ, Johnstone DE, Yusuf S, Weiner DH, Gallagher P, Bittner VA, et al. (1994). Quality of life among 5025 patients with left ventricular dysfunction randomized between placebo and enalapril: The studies of left ventricular dysfunction. Journal of the American College of Cardiology 23, 393.–. [DOI] [PubMed] [Google Scholar]

[R12] The SOLVD Investigators (1991). Effect of enalapril on survival in patients with reduced left ventricular ejection fractions and congestive heart failure. The New England Journal of Medicine 325, 293.–. [DOI] [PubMed] [Google Scholar]

[R13] Wu MC, Lan KKG (1992). Sequential monitoring for comparison of changes in a response variable in clinical studies. Biometrics 48, 765.–. [PubMed] [Google Scholar]

PERMALINK

Detecting Treatment Differences in Group Sequential Longitudinal Studies with Covariate Adjustment

Neal O Jeffries

James F Troendle

Nancy L Geller

Summary.

1. Introduction

2. Model Description

Figure 1.

2.1. GEE Context

3. Simulations

Table 1.

Table 2.

4. Determining which Follow-Up Periods Show Differences

5. Application: SOLVD Trial

Table 3.

6. Discussion

7. Supplementary Materials

Supplementary Material

Acknowledgements

Appendix A

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Detecting Treatment Differences in Group Sequential Longitudinal Studies with Covariate Adjustment

Neal O Jeffries

James F Troendle

Nancy L Geller

Summary.

1. Introduction

2. Model Description

Figure 1.

2.1. GEE Context

3. Simulations

Table 1.

Table 2.

4. Determining which Follow-Up Periods Show Differences

5. Application: SOLVD Trial

Table 3.

6. Discussion

7. Supplementary Materials

Supplementary Material

Acknowledgements

Appendix A

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases