Abstract
The primary analysis in two‐arm clinical trials usually involves inference on a scalar treatment effect parameter; for example, depending on the outcome, the difference of treatment‐specific means, risk difference, risk ratio, or odds ratio. Most clinical trials are monitored for the possibility of early stopping. Because ordinarily the outcome on any given subject can be ascertained only after some time lag, at the time of an interim analysis, among the subjects already enrolled, the outcome is known for only a subset and is effectively censored for those who have not been enrolled sufficiently long for it to be observed. Typically, the interim analysis is based only on the data from subjects for whom the outcome has been ascertained. A goal of an interim analysis is to stop the trial as soon as the evidence is strong enough to do so, suggesting that the analysis ideally should make the most efficient use of all available data, thus including information on censoring as well as other baseline and time‐dependent covariates in a principled way. A general group sequential framework is proposed for clinical trials with a time‐lagged outcome. Treatment effect estimators that take account of censoring and incorporate covariate information at an interim analysis are derived using semiparametric theory and are demonstrated to lead to stronger evidence for early stopping than standard approaches. The associated test statistics are shown to have the independent increments structure, so that standard software can be used to obtain stopping boundaries.
Keywords: augmented inverse probability weighting, early stopping, influence function, proportion of information
1. INTRODUCTION
In many randomized clinical trials, the primary analysis involves a comparison of two treatments, typically an active or experimental agent vs a control, which is formalized as inference on a scalar treatment effect parameter. When the primary outcome is a continuous measure, this parameter is usually the difference of treatment‐specific means. For a binary outcome, the treatment effect parameter may be the risk difference, risk ratio, or odds ratio; and the odds ratio under the assumption of a proportional odds model is often the treatment effect parameter of interest in trials involving an ordinal categorical outcome. The primary analysis is ordinarily based on a test statistic constructed using an estimator for the parameter of interest, for example, the difference of sample means or maximum likelihood (ML) estimator for the odds ratio in a proportional odds model. The overall sample size is established so that the power to detect a clinically meaningful departure from the hypothesis of no treatment effect at a given level of significance using the test statistic at the final analysis achieves some desired value, for example, 90%.
Most later‐stage clinical trials are monitored for the possibility of early stopping for efficacy or futility by a data and safety monitoring board (DSMB), with interim analyses planned at either fixed, predetermined analysis times or when specified proportions of the total “statistical information” to be gained from the completed trial have accrued. 1 Ordinarily, at the time of an interim analysis, the test statistic to be used for the final analysis is computed based on the available data and compared to a suitable stopping boundary constructed to preserve the overall operating characteristics of the trial. 2 , 3
Because of staggered entry into the trial, the data available at the time of an interim analysis are from subjects who have already enrolled. Moreover, the primary outcome is ordinarily not known immediately but is ascertained after some lag time , say. In some trials, the lag time is the same for all participants, as in the case where is a continuous outcome that will be measured at a prespecified follow‐up time , for example, 1 year, so that for all subjects, and the treatment parameter is the difference in treatment means of at 1 year. Here, at the time of an interim analysis, will be available only for subjects enrolled for at least 1 year, so that the analysis can be based only on the data for these subjects.
In other settings, the time lag may be different for different participants and, moreover, may be correlated with the outcome . This issue arises in many clinical trials of COVID‐19 therapeutics conducted by the Accelerating COVID‐19 Therapeutic Interventions and Vaccines (ACTIV) public‐private partnership. In an ongoing clinical trial coordinated through the ACTIV‐3b: Therapeutics for Severely Ill Inpatients with COVID‐19 (TESICO) master protocol, 4 patients hospitalized with acute respiratory distress syndrome are randomized to receive an active agent or placebo and followed for up to days. The primary outcome is an ordinal categorical variable with six levels. The first five categories reflect a subject's status at 90 days following enrollment: (1) at home and off oxygen for at least 77 days (the most favorable category); (2) at home and off oxygen for at least 49 but no more than 76 days; (3) at home and off oxygen for at least 1 but no more than 48 days; (4) not hospitalized and either at home on oxygen or receiving care elsewhere; and (5) still hospitalized or in hospice care. Category 6 (the worst) corresponds to death within the 90 day follow‐up period. While Categories 1‐5 cannot be ascertained until a subject has been followed for the full 90 days, that a subject's outcome is Category 6 is known at the time of death. Thus, the time lag before ascertainment is days for subjects with and is equal to the (random) time of death for those with . In TESICO, the treatment effect parameter is the odds ratio for active agent relative to placebo under an assumed proportional odds model. Similarly, in a clinical trial coordinated through the ACTIV‐2: a study for outpatients with COVID‐19 master protocol (Study A5401), 5 subjects within seven days of self‐reported COVID‐19 onset are randomized to receive an active agent or placebo and followed for up to days for the binary outcome , where if the subject dies or is or hospitalized within 28 days and otherwise. For subjects who die or are hospitalized at time prior to 28 days, is ascertained after a time lag , whereas that can be ascertained only after the full 28 days, and . Here, the treatment parameter is the relative risk (risk ratio) of hospitalization/death for active agent vs placebo.
At the time of an interim analysis in TESICO and A5401, the available data include the outcomes for all enrolled subjects who have been followed for at least 90 and 28 days, so for whom or 28, respectively, along with the outcomes for enrolled subjects who do not have 90 or 28 days of follow up but have already been observed to die () in TESICO or to be hospitalized or die () in A5401 ( or 28). Thus, information on Category 6 in TESICO will accumulate more rapidly than that on the other categories; similarly, information on hospitalization/death in A5401 will accrue more quickly than information on subjects who remain alive and unhospitalized at day 28. Intuitively, basing an interim analysis on all observed outcomes will naively overrepresent and and lead to potentially biased inference on the treatment effect parameters.
To characterize this issue more precisely, if is the time from a subject's entry into the study to the time of an interim analysis, then is known at the time of an interim analysis if . Otherwise, the time lag for ascertainment of is censored at , and is not observed. Basing the analysis on all subjects with , the “complete cases” for whom is observed, without taking appropriate account of the fact that is not available for those with leads to the bias noted above. This bias arises because subjects with shorter lag times are more likely to be represented among the complete cases than those subjects with longer lag times, and because and may be correlated, as in TESICO and A5401, the distribution of among the complete cases thus will not reflect the true distribution of . These considerations suggest that a valid interim analysis can be obtained by using only the data from enrolled subjects followed for the full, maximum follow‐up period , that is, for whom . In studies like those above involving a continuous outcome or ordinal categorical outcome, as in TESICO, the standard interim analysis is based on the estimator to be used at the final analysis using only the data on subjects with , as there is no apparent general approach to “adjusting” for the censoring. In the case of a binary outcome as in A5401, the standard interim analysis does use the information on censoring; for example, if the treatment effect is the risk ratio, the estimator is the ratio of the treatment‐specific Kaplan‐Meier estimators for the probability of death or hospitalization at days.
A goal of an interim analysis is to stop the trial as early as possible if there is sufficiently strong evidence to do so. It is thus natural to consider whether or not it is possible to make more efficient use of the available data at the time of an interim analysis to enhance precision and thus the strength of the evidence for stopping. One step toward increasing efficiency of interim analyses would be a general approach to accounting for censoring for any outcome and treatment effect parameter that incorporates data from all of the complete cases, not just those with . In addition, it may be possible to incorporate partial information; for example, in TESICO, a subject who is at day 45 since study entry and still in the hospital at the time of an analysis, so for whom , can end up only in Categories 3‐6, so have , or 6, at 90 days. There may be baseline covariates as well as intermediate measures of the outcome or other post‐treatment variables that also could be exploited to increase precision at an interim analysis.
In this article, we propose a general group sequential framework for clinical trials with a possibly censored, time‐lagged outcome, which leads to practical strategies for interim monitoring. Treatment effect estimators are proposed via application of semiparametric theory, 6 , 7 which dictates how censoring can be taken into account and baseline and time‐dependent covariate information can be exploited in a principled way to increase precision and thus yield stronger evidence for early stopping. Although ordinarily incorporation of post‐randomization covariates in clinical trial analyses, for example, in models for covariate adjustment, raises concerns over causal interpretations, the use of such information in the proposed approach follows from semiparametric theory and serves only to increase efficiency. Estimation of the risk ratio via treatment‐specific Kaplan‐Meier estimators as described above emerges as a simple special case, which can be improved upon through incorporation of covariates. We show that the test statistics based on these estimators have an independent increments structure, 8 which allows standard software for constructing stopping boundaries 2 , 3 , 9 to be used. Two interim monitoring strategies are discussed: an information‐based monitoring approach under which the trial will continue, with possibly a larger sample size than originally planned, until the full, target statistical information accrues; and a fixed‐sample size approach appropriate in settings where the planned sample size cannot be increased due to resource and other constraints. We focus on the common case of two treatments; extension to more than two treatments is possible 10 and could be adapted to group sequential methods for multi‐arm trials. 11 , 12
In Section 2, we introduce the basic statistical framework and assumptions, and we sketch the estimation approach and state the independent increments property in Section 3. In Section 4, we describe practical implementation of the resulting approach to interim monitoring. We demonstrate the performance of the methods in a series of simulation studies in Section 5, and we present a case study exemplifying the use of the methods for a simulated trial based on TESICO. Technical details and sketches of proofs of results are given in the Supplemental Material.
2. STATISTICAL FRAMEWORK
2.1. General model
As in Section 1, denote the outcome by . Let denote the treatment indicator, where (1) corresponds to control (active/experimental treatment), and is the probability of being assigned to active treatment; and let be a vector of baseline covariates. Treatment effects are often characterized in terms of a model for (features of) the distribution of or , which involves parameters , where is the scalar treatment effect parameter of interest and is a vector of nuisance parameters, and the model is parameterized such that corresponds to the null hypothesis of no treatment effect.
In the case of the first example in Section 1 of continuous , ; equivalently,
(1) |
For an ordinal categorical outcome with categories, as in TESICO with , the outcome can be represented as either a scalar random variable taking values or a random vector , where takes values . Using the first definition, the treatment effect can be defined through an assumed proportional odds model
(2) |
where , so that is the log odds ratio of interest. If the conditional (on ) treatment effect is of interest, replace (2) by , where now . If is binary as in A5401 and the relative risk (risk ratio) is the focus, taking
(3) |
corresponds to log relative risk .
In general, estimators for the parameter of interest in (1)‐(3) and other models based on the data available at the final analysis, at which time are known for all participants, are obtained by solving, jointly in and , appropriate estimating equations. Namely, with independent and identically distributed (iid) data , , available, and equal to the dimension of , and solve in and equations of the form
(4) |
where is a ‐dimensional vector of functions such that , and and are the true values of and under the assumption that the model is correctly specified. For example, under models (1) and (3)
(5) |
respectively. Writing , , the treatment‐specific sample means, the estimator obtained from (5a) is and that from (5b) is , which is the estimator for the relative risk used in A5401. Under model (2), with ,
(6) |
where is a matrix of functions of ; the ML estimator 13 takes , where is the gradient matrix of the vector in (6) with respect to , and is its conditional covariance matrix given .
In general, given a particular model and estimating equations defined by the corresponding function , let be the last row of the matrix
(7) |
where the matrix inside the expectation is the matrix of partial derivatives of the components of with respect to . Then, with denoting this expression evaluated at ,
(8) |
is referred to as the influence function of the corresponding estimator for and has mean zero. From the theory of M‐estimation 14 and semiparametric theory, 7 it can be shown the estimator obtained by solving in the estimating equation
(9) |
where is any root‐ consistent estimator for , has influence function (8). Tsiatis et al 10 show this explicitly in the case of (6). Such estimators are consistent for the true value and asymptotically normal, where the variance of the limiting normal distribution of is equal to , so that approximate (large sample) SEs and test statistics are readily derived.
From semiparametric theory, 7 there is a one‐to‐one correspondence between influence functions and estimators. Thus, if the form of influence functions in a specific model involving a parameter can be derived, estimating equations leading to estimators for can be developed. As we demonstrate in Section 3, influence functions corresponding to estimators for based on the data available at an interim analysis can be derived from the influence function (8), and the resulting estimators exploit baseline and time‐dependent covariate information to gain precision.
2.2. Data and assumptions
To characterize the data that would be available at an interim analysis, we first describe more fully the data that would be available at the final analysis if the trial were carried out to completion. Subjects enter the trial in a staggered fashion; thus, if the trial starts at calendar time 0, denote by the calendar time at which a subject enters the trial. As in Section 1, let denote the time lag in ascertaining the outcome ; thus, is the time since entry at which is determined, measured on the scale of subject time. We assume that can be determined with certainty by the maximum follow‐up period for any subject, so that . In addition to baseline covariates , time‐dependent covariate information may be collected on each participant up to the time is ascertained. Denote by the vector of such information at time following entry into the study, and let be the history of the time‐dependent covariate information through time . Thus, represents the covariate history for a subject for whom is ascertained after time lag .
With these definitions, for a trial with planned total sample size , the data available at the final analysis are iid
(10) |
we refer to (10) as the full data. As in Section 2.1, estimation of at the final analysis is based only on the data on , , and possibly (in the case of conditional inference), and , , and are not used, and we call (8) a full data influence function.
Now consider the data that would be available at an interim analysis at calendar time following the start of the trial at calendar time 0. It proves convenient for the developments in Section 3 to represent these data in terms of the full data (10) that would be available at the final analysis were the trial to be carried out to completion. At , data will be observed only for subjects for whom . For such subjects, define to be the censoring time, that is, the time from a participant's entry into the study to the time of the interim analysis. If the time lag a subject would have in ascertaining the outcome is such that , then would be available at ; otherwise, would not yet be observed. Accordingly, define and , so that is available at the time of the interim analysis only if . With these definitions, the data available at an interim analysis at calendar time can be represented as iid
(11) |
where then is the number of subjects of the planned enrolled in the trial by calendar time .
As noted in Section 1, an interim analysis that uses all of the available data, including those from subjects for whom , can naively overrepresent some values of the outcome over others. In terms of (11), the data on which this naive analysis would be based involve only subjects who are enrolled and whose outcome is available, that is, for whom . In contrast, a valid analysis that uses only the data from subjects enrolled for at least the full, maximum follow‐up period involves subjects for whom . In the next section, we appeal to semiparametric theory as noted at the end of Section 2.1 to deduce methods yielding valid inference on based on the available data (11) that can improve substantially on this analysis and thus lead to more efficient interim analyses.
3. INFERENCE BASED ON INTERIM DATA
3.1. Treatment effect estimation
We first present general estimating equations using the available data (11) at an interim analysis at time that yield treatment effect estimators offering gains in precision relative to the estimator based only on subjects for whom . Letting “” denote statistical independence, assume that , which is guaranteed by randomization. Also assume that
(12) |
(12) implies that subjects enter according to a completely random process, which is reasonable in many trials. Because , (12) also implies that . We discuss weakening these assumptions in Section 7. We also require that , so that there is positive probability of seeing subjects for whom the final outcome has been ascertained at an interim analysis at and so that the first interim analysis must occur at least time units after the start of the trial.
We first summarize the theoretical underpinnings of the practical, more efficient interim monitoring approach we propose in Section 4. Under the above assumptions, if is the influence function of a given estimator for a treatment effect parameter in a model for as in Section 2.1, so based on the full data (10), then semiparametric theory yields that influence functions for estimators for based on the available data in (11) at an interim analysis at time are of the form
(13) |
In (13), is an arbitrary function of ; is an arbitrary function of , , , and ; and
Here, is the survival distribution for the censoring variable at the time of the interim analysis, and are the censoring counting process and at‐risk process, and is the cumulative hazard function for censoring.
Let be the Kaplan‐Meier estimator for using the data for such that , and define , , and , the proportion of enrolled subjects at assigned to active treatment. Then it can be shown that estimating equations corresponding to the influence functions in (13) based on the available data (11) yielding estimators for are of the form
(14) |
where is a consistent estimator for based on the available data at . For a specific model, corresponding full data influence function , and choice of the functions and , to be discussed momentarily, an estimator for based on the data available at interim analysis time is the solution to (14).
Taking and in (14) yields the estimating equation
(15) |
whose solution is a so‐called inverse probability weighted complete case (IPWCC) estimator, which effectively bases estimation of on only subjects for whom is available at , that is, the complete cases at , but with inverse weighting by the censoring distribution “adjusting” appropriately for the lag time in ascertaining the outcome. In particular, intuitively, the inverse weighting of the complete cases by the probability of being represented in the available data accounts for the fact that subjects with shorter lag times are more likely to be represented, so that the weighted sample of completers mimics the distribution of if there were no censoring. Judicious nonzero choices of and facilitate exploiting baseline and time‐dependent covariate information to gain efficiency over the IPWCC estimator solving (15) through the two rightmost “augmentation” terms in the bracketed expression in (14), leading to what is referred to as an augmented inverse probability weighted complete case (AIPWCC) estimator for ; the optimal such choices are discussed below.
A counterintuitive result from semiparametric theory is that, for any arbitrary and , it is possible to improve the precision of the above estimators by replacing the Kaplan‐Meier estimator by treatment‐specific Kaplan‐Meier estimators , say, obtained using the data for such that and , , even though because of (12) the distribution of is not treatment dependent. This substitution leads to influence functions for estimators for based on the available data of the form
(16) |
where now
Estimating equations corresponding to (16) are then
(17) |
where now ; and . The estimating equations (17) with and ,
(18) |
yield an IPWCC estimator, and, again, nonzero choices of and lead to an AIPWCC estimator.
When is a binary outcome, as in study A5401, it can be shown that the IPWCC estimator solving (18) is algebraically identical to the logarithm of the ratio of treatment‐specific Kaplan‐Meier estimators for the probability of death or hospitalization at days. Thus, as noted in Section 1, the standard estimator for the risk ratio at an interim analysis is a special case of the general formulation here. Moreover, because this estimator is equivalent to an IPWCC estimator, it should be possible to obtain more efficient inference on the risk ratio at an interim analysis via an AIPWCC estimator.
Semiparametric theory provides the optimal choices of and yielding the most precise AIPWCC estimator solving either of (14) or (17), given by
(19) |
The conditional expectations in (19) are not likely to be known in practice. We propose an approach to approximating and in Section 4. We recommend estimating at an interim analysis at time by solving an estimating equation of the form (17) with the approximations for and substituted.
From semiparametric theory, estimators solving estimating equations of the form (14) or (17) are consistent for (for and large) and asymptotically normal, where, as at the end of Section 2.1, the variance of the large sample distribution of can be obtained from the variance of the corresponding influence function. Thus, the resulting approximate SEs can be used to form a Wald‐type test statistic, appropriate for addressing the null hypothesis of no treatment effect, .
We conclude this section by noting an important implication of these results. In the case where the full data (10) are available, as would be the case at the conclusion of the trial if not stopped early, the preceding developments lead to covariate‐adjusted estimators for based on the full data that have the potential to yield increased efficiency over the usual full data analyses outlined in Section 2.1. In particular, considering (17), if is the calendar time at which the trial concludes with the full data accrued and outcomes for all subjects ascertained, , , , and , , and (17) becomes
(20) |
where , with corresponding influence function
(21) |
As above, the optimal choice of leading to the most precise estimator solving (20) is that given in (19). The estimating Equation (20) is of the form of those in Zhang et al 15 Thus, the proposed approach leads naturally to estimators for a final analysis that exploit baseline covariate information to improve efficiency through the “augmentation term” .
3.2. Interim analysis
In practice, interim analyses will be carried out at times , with the possibility of stopping the trial early, for example, for efficacy if evidence of a large treatment effect emerges at an interim analysis. That is, focusing on efficacy, the trial may be stopped at the first interim analysis time at which the relevant test statistic exceeds some appropriate stopping boundary; that is, if
for a two‐sided alternative or or , , for a one‐sided alternative, where , , are the stopping boundaries. As is well‐studied in the group sequential testing literature, the stopping boundaries are chosen to take into account multiple comparisons and ensure that the resulting procedure preserves the desired overall type 1 error. 1 , 2 , 3 , 9 Standard methods 2 , 3 , 9 for deriving stopping boundaries are based on the premise that the sequentially‐computed test statistics have the so‐called independent increments structure. 8 , 16 In the Supplemental Material, we sketch an argument demonstrating that, with the optimal choices of and given in (19), the proposed test statistics, properly normalized, have the independent increments structure. Owing to this property, the practical strategies for interim monitoring presented in Section 4 can be implemented using standard software for computation of stopping boundaries; in the simulations in Section 5, we use the R package ldbounds. 17
4. PRACTICAL IMPLEMENTATION AND INTERIM MONITORING STRATEGIES
4.1. Treatment effect estimation
Generalizing the approach in Tsiatis et al 10 in the special case of a proportional odds model (2), we propose estimation of at an interim analysis at time using an AIPWCC estimator solving (17), which can be obtained via a two‐step algorithm.
Assume that the treatment effect of interest is defined within a model for which, given full data at the end of the study, the estimator for would be obtained jointly with that for by solving an estimating equation of the form in (4) for a particular estimating function . Because the optimal choices and in (19) are not known, we approximate them by linear combinations of basis functions. Letting be functions of specified by the analyst, with , approximate by
(22) |
Similarly, specify basis functions of , , and approximate by
(23) |
With suitably chosen basis functions, experience in other contexts 6 , 10 , 15 suggests that this approach can lead to AIPWCC estimators that achieve substantial efficiency gains over IPWCC estimators.
The AIPWCC estimator for obtained by substituting (22) and (23) in (17) has influence function (16) with these same substitutions. Because from semiparametric theory the variance of the estimator depends on the variance of the influence function, as at the end of Section 2.1, we find the coefficients , , and , , , that minimize this variance, which, from the form of (16), is a least squares problem, as detailed below.
With these considerations, the two‐step algorithm is as follows. At an interim analysis at time : Step 1. Estimate and by solving jointly in and
to obtain and ; is an IPWCC estimator solving (18). Then obtain an estimator for . If the expectation in (7) is analytically tractable, is the last row of (7) with and substituted for and ; if not, take the estimator to be the last row of
For each subject for whom , based on (8), construct
Step 2. Estimate the coefficients , , and , , , in the approximations (22) and (23) by “least squares,” as suggested above. Namely, for each subject for whom , define the “dependent variable”
where, in the integrand in the second term of the above expression, for , ,
Likewise, for each of these subjects and suitably chosen basis functions as discussed above, define the “covariates”
where, for
Then obtain estimators , , and , , , by linear regression of on the above covariates. Based on this regression, obtain “predicted values” for each subject for whom as
The estimator for is then obtained as the one‐step update
(24) |
and an approximate SE for is given by
(25) |
By an argument similar to that in the Supplementary Material of Tsiatis et al, 10 the estimator (24) is asymptotically equivalent to an AIPWCC estimator solving (17).
In some settings, scant time‐dependent covariate information may be available. Here, a special case of the general AIPWCC formulation that still attempts to gain efficiency from only baseline covariates is to solve an estimating equation of the form in (17) but with as in (19) and . Implementation is as above, but with the “dependent variable” in Step 2 regressed only on the “covariates” , , to obtain estimators , , and by redefining in the one‐step update (24) and its associated standard error (25). For definiteness, we refer to the resulting estimator as “AIPW1” and that incorporating time‐dependent covariates above as “AIPW2.”
4.2. Interim analysis
There is a vast literature on early stopping of clinical trials using group sequential and other methods; such methods are readily applied if the independent increments property holds. We now discuss information‐based and fixed‐sample size monitoring strategies for using these approaches with the proposed treatment effect estimators, for which, as argued in the Supplemental Material and demonstrated empirically in Section 5, the independent increments property holds exactly or approximately.
In the general information‐based monitoring approach, 1 monitoring and group sequential tests are based on the proportion of the total information to be gained from the completed trial available at interim analysis times , where in the present context information is approximated at time using the large‐sample approximate SE of the relevant estimator , . If a group sequential test is desired with type 1 error for testing and power against a clinically meaningful alternative value , say, then the maximum information required to achieve this objective at the final analysis with a two‐sided test is
where is the quantile of the standard normal distribution, and is an inflation factor to account for the loss of power that results due to repeated testing relative to doing a single final analysis. For example, the inflation factor associated with using O'Brien‐Fleming stopping boundaries 3 is modest, equal to about 1.03; see Tsiatis. 1 Information at an interim analysis at time is approximated as
Thus, the proportion of information at interim analysis time is approximated as
(26) |
Given the proportion of information (26) together with, for example, the Lan‐DeMets spending function, 9 standard software can be used to obtain stopping boundaries such that the resulting group sequential testing procedure has the desired operating characteristics.
Typically, in determining the overall sample size for a clinical trial to achieve the desired power to detect a meaningful difference at a given level of significance, the analyst must make assumptions on the values of nuisance parameters. If these assumptions are not correct, the trial could be underpowered. An oft‐cited advantage of information‐based monitoring is that interim analyses would continue until the information achieves the maximum information , guaranteeing the desired operating characteristics regardless of the values of nuisance parameters. However, if the assumptions leading to the target sample size are not correct, the information available at the time this planned sample size is reached and all participants have the outcome ascertained may be less than . If evidence emerges during the trial that is unlikely to be achieved, the sample size might be reestimated and increased so that the full information threshold is met.
In many trials, however, resource constraints or other factors may make exceeding the originally planned sample size impossible, thus rendering principled information‐based monitoring infeasible. If a fixed, maximum sample size , say, is planned and inalterable, then the proportion of information available at an interim analysis at any time on which stopping boundaries can be determined must instead be based on . In terms of the data (11) available at an interim analysis at , as before, is the number of subjects who have enrolled by time ; of these subjects, is the number who have been enrolled for the maximum follow‐up period and thus have the outcome ascertained with certainty, and it is likely that . In general, a typical interim analysis at for fixed would be based only on the data from these subjects, and, accordingly, the proportion of information available at would be , with at the final analysis. However, here, the proposed IPWCC and AIPWCC estimators allow censoring due to the time lag in ascertaining the outcome to be taken into account and incorporation of covariates to increase efficiency, so make use of additional information in (11) beyond that available on just the subjects for whom the outcome has been ascertained by time . Thus, if monitoring is based on test statistics constructed from these estimators, the proportion of information available at should be between and .
With these considerations, for fixed‐sample size monitoring, we propose characterizing the proportion of information available at an interim analysis at time in terms of what we refer to as the effective sample size , say, at . Intuitively, we define to be the number of participants, had they been enrolled for the maximum follow‐up period and had their outcome ascertained with certainty, that would be required to lead to an estimator for based only on data from such subjects with the same precision as that achieved by an IPWCC or AIPWCC estimator for based on all of the available data at . The proportion of information at would then be .
To define effective sample size formally, with subjects for whom the outcome has been fully ascertained, indexed by , consider the estimator obtained by solving in the full data estimating Equation (9) based on these subjects,
for some consistent estimator . Then from the semiparametric theory, has SE approximately equal to the square root of The effective sample size at an interim analysis at time for the IPWCC estimator calculated using the available data (11) at is the value such that . Accordingly, define the effective sample size when monitoring is based on the IPWCC estimator as
(27) |
Because is not known, in practice we must estimate it based on the available data, which can be accomplished via the estimator
(28) |
Thus, in practice, we obtain the approximate effective sample size as
(29) |
The effective sample size for an AIPWCC estimator (either AIPW1 or AIPW2) calculated using the available data at via the two‐step algorithm is defined similarly, but with the full data influence function in (27) replaced by the influence function as in (21). An estimator for based on the available data is given by
(30) |
where now the “predicted values” are obtained by a weighted least squares regression with “dependent variable” , “covariates” , , and “weights” . Thus, for the AIPW1 or AIPW2 estimator, is defined as in (29) with the numerator given by (30).
With the appropriate definition of , we approximate the corresponding proportion of information available at with interim analyses based on an IPWCC or AIPWCCC estimator as
(31) |
From (28) and (30), at the final analysis. As for information‐based monitoring, given the proportion of information (31), one can use standard software with, for example, the Lan‐DeMets spending function 9 to obtain stopping boundaries.
In the simulation studies in the next section, we study the methods under fixed‐sample monitoring, as in our experience this approach is most common in practice. Moreover, while performance of information‐based monitoring with statistics that possess the independent increments property has been well‐studied, 8 , 18 because our approach to characterizing proportion of information in fixed‐sample monitoring based on the proposed effective sample size measure is new, evaluation of its performance is required.
5. SIMULATION STUDIES
We present results from several simulation studies, each involving 10 000 Monte Carlo replications. For each simulation scenario, we considered a uniform enrollment process during the calendar time interval with a maximum, fixed sample size and maximum follow up time , and fixed‐sample size monitoring with interim analyses planned at calendar times and a final analysis at time , for a total of possible analyses. For simplicity, in each scenario, we took to be the sample size required to achieve roughly 80% or 90% power for a single analysis at and did not include an inflation factor. 1 At each of the analysis times, we estimated the relevant treatment effect parameter four ways:
-
(i)
using the estimator obtained by carrying out the full data analysis based only on subjects enrolled for at least the maximum follow‐up period , so using subjects for whom ;
-
(ii)
using the IPWCC estimator based on the available data (11), obtained at Step 1 of the two‐step algorithm;
-
(iii)
using the AIPWCC estimator based on the available data (11), obtained at Step 2 of the two‐step algorithm using only baseline covariates to gain efficiency, as at the end of Section 4.1;
-
(iv)
using the AIPWCC estimator based on the available data (11), obtained at Step 2 of the algorithm using both baseline and time‐dependent covariates and .
At the final analysis at time at which the outcome has been ascertained on all subjects, and yield (versions of) the intended full data analysis; and and are identical and yield the covariate‐adjusted analysis exploiting baseline covariate information discussed at the end of Section 3.1. In all scenarios, for the null hypothesis and one‐sided alternative hypotheses and level of significance , we used the R package ldbounds 17 with a Lan‐DeMets spending function to compute both O'Brien‐Fleming 3 and Pocock 2 stopping boundaries at each analysis time , . For , this calculation was based on the proportion of information ; for each of the IPWCC and AIPWCC estimators , , and , the stopping boundaries were obtained using the approximate proportion of information (31) based on the relevant approximate effective sample size given in (29).
For each scenario, we present the following results from two simulation studies, one under , so with data generated with , and one under an alternative of interest :
-
(i)
for each estimator, Monte Carlo estimates of , , and , ; if the independent increments property holds, , ;
-
(ii)
for each estimator at each , Monte Carlo mean and SD of , Monte Carlo mean of , and Monte Carlo mean square error (MSE) for divided by that for ;
-
(iii)
for each estimator and stopping boundary, the Monte Carlo proportion of data sets for which was rejected, Monte Carlo estimate of expected sample size, and Monte Carlo estimate of expected stopping time.
The first two simulation scenarios, demonstrating the methods for an ordinal categorical outcome and a binary outcome, respectively, are based on the TESICO study with days, using the generative models adopted by Tsiatis et al, 10 with , days, and interim analyses planned at calendar times days, with the final analysis at days. For each simulated subject, was generated as Bernoulli with , where corresponds to placebo (active agent). To produce data for which the proportional odds model (2) holds, we generated and set , where as in (2) is the log odds ratio, so that the distribution of given satisfies . For an ordinal outcome, we took for as in Table 1 of Tsiatis et al 10 and thus generated according to in which interval fell as determined by the cutpoints . Then if , so or 3, we took the time in hospital to be and the number of days at home and off oxygen as , and . If or , corresponding to or 5, again ; if , corresponding to death, time of death , where if and if . A baseline covariate was generated as , so that is independent of , correlated with , and does not affect the proportional odds model. Two time dependent covariates were generated as , where , so that if the subject was still in the hospital at time ; and , the number of days the subject was expected to be out of the hospital at day , and . For a scenario with binary outcome, we generated the data according to the foregoing scheme, except that we defined , corresponding to death, if , and otherwise.
For the first scenario with ordinal categorical outcome, we generated data as above under the null hypothesis, so with , and , corresponding to the alternative for which TESICO was powered (80% at the final analysis with ), 10 and . Here, is the ML estimator for in (2) obtained using the R function polr in the MASS package. 19 Following Tsiatis et al, 10 to simplify implementation, we constructed , , and using the estimating function (6) with , where is chosen according to the “working independence” assumption, so that with full data at the final analysis, and are not identical. As shown by Tsiatis et al 10 and borne out in the simulations below, the efficiency loss for relative to is negligible.
Under (a) the null hypothesis and (b) the alternative, the Monte Carlo sample covariance matrices of the 10 000 estimates are
(32) |
In (32), it is evident that , , demonstrating that the independent increments property holds approximately for this estimator. Analogous results for the other three estimators are given in the Supplemental Material, showing that the independent increments property holds approximately for all.
Under the null hypothesis and the alternative, Table 1 presents Monte Carlo mean and SD, the Monte Carlo average of standard errors , and MSE ratio defined above as the Monte Carlo MSE for divided by that for the indicated estimator. Thus, the MSE ratio reflects efficiency of the indicated estimator relative to the usual ML estimator using data only on subjects enrolled for the maximum follow‐up period . From the table, all estimators are consistent, with SEs that track the Monte Carlo SDs, under both hypotheses. The efficiency gains over achieved at interim analyses by using any of , , and are substantial. The IPWCC estimator achieves gains solely through accounting for censoring; the AIPWCC estimators improve on these gains by additionally incorporating covariates. Notably, yields a 2‐fold gain at the initial interim analysis. For all three estimators, the efficiency gains are most pronounced at the early interim analyses where censoring is the most substantial and diminish as censoring decreases as the trial progresses. At the final analysis, and show very similar performance, with exhibiting minimal relative loss of efficiency, as noted above. As expected, and are identical and, due to the incorporation of adjustment for baseline covariates, result a 16%‐17% gain in efficiency over the usual final analysis.
TABLE 1.
MC mean | MC SD | Ave MC SE | MSE ratio | MC mean | MC SD | Ave MC SE | MSE ratio | |||
---|---|---|---|---|---|---|---|---|---|---|
(a) Null hypothesis | ||||||||||
|
|
|||||||||
|
0.002 | 0.294 | 0.294 | 1.000 | 0.004 | 0.232 | 0.232 | 1.603 | ||
|
0.002 | 0.221 | 0.221 | 1.000 | 0.002 | 0.189 | 0.189 | 1.330 | ||
|
0.003 | 0.184 | 0.185 | 1.000 | 0.002 | 0.166 | 0.164 | 1.239 | ||
|
0.001 | 0.162 | 0.162 | 1.000 | 0.001 | 0.152 | 0.151 | 1.139 | ||
|
0.000 | 0.146 | 0.146 | 1.000 | 0.000 | 0.147 | 0.146 | 0.991 | ||
|
|
|||||||||
|
0.004 | 0.221 | 0.221 | 1.775 | 0.005 | 0.203 | 0.198 | 2.095 | ||
|
0.002 | 0.178 | 0.178 | 1.534 | 0.002 | 0.168 | 0.165 | 1.717 | ||
|
0.002 | 0.156 | 0.154 | 1.399 | 0.002 | 0.149 | 0.145 | 1.542 | ||
|
0.001 | 0.141 | 0 .140 | 1.327 | 0.001 | 0.138 | 0.136 | 1.380 | ||
|
0.000 | 0.135 | 0.135 | 1.169 | 0.000 | 0.135 | 0.135 | 1.169 |
(b) Alternative hypothesis | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
|
|||||||||
|
0.408 | 0.294 | 0.294 | 1.000 | 0.406 | 0.235 | 0.235 | 1.566 | ||
|
0.406 | 0.220 | 0.221 | 1.000 | 0.406 | 0.191 | 0.191 | 1.336 | ||
|
0.404 | 0.185 | 0.185 | 1.000 | 0.405 | 0.167 | 0.165 | 1.221 | ||
|
0.406 | 0.163 | 0.162 | 1.000 | 0.406 | 0.153 | 0.152 | 1.131 | ||
|
0.406 | 0.147 | 0.146 | 1.000 | 0.406 | 0.148 | 0.147 | 0.985 |
|
|
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
0.405 | 0.224 | 0.224 | 1.733 | 0.406 | 0.204 | 0.200 | 2.078 | ||
|
0.406 | 0.180 | 0.180 | 1.508 | 0.408 | 0.169 | 0.167 | 1.702 | ||
|
0.405 | 0.158 | 0.155 | 1.378 | 0.408 | 0.150 | 0.147 | 1.523 | ||
|
0.406 | 0.142 | 0.141 | 1.314 | 0.407 | 0.139 | 0.137 | 1.373 | ||
|
0.406 | 0.137 | 0.136 | 1.159 | 0.406 | 0.137 | 0.136 | 1.159 |
Note: MC mean is the mean of 10 000 Monte Carlo estimates; MC SD is the Monte Carlo SD, Ave MC SE is the mean of Monte Carlo SEs, and MSE ratio is the ratio of Monte Carlo mean square error for divided by that for the indicated estimator.
Table 2 presents interim monitoring results using each estimator with both O'Brien‐Fleming and Pocock stopping boundaries under the null hypothesis and under the alternative . Under the null, the nominal level is achieved for all estimators. Under the alternative, power for is slightly shy of the desired 80%, as expected with no inflation factor; by comparison, the AIPWCC estimators yield improved power due to inclusion of covariate information. Under the alternative and both types of boundaries, basing interim analyses on , , and results in impressive reductions in expected sample size and expected stopping time relative to , with the gains especially impressive for .
TABLE 2.
P(reject) | MC E(SS) | MC E(Stop) | P(reject) | MC E(SS) | MC E(Stop) | ||
---|---|---|---|---|---|---|---|
(a) Null hypothesis | |||||||
O'Brien‐Fleming | Pocock | ||||||
|
0.024 | 601.9 (2.7) | 329.3 (7.5) | 0.023 | 599.7 (21.4) | 327.4 (19.5) | |
|
0.024 | 601.6 (7.8) | 328.4 (12.1) | 0.024 | 598.8 (25.9) | 326.7 (22.4) | |
|
0.024 | 601.7 (6.7) | 328.5 (11.3) | 0.024 | 598.9 (25.5) | 326.8 (22.0) | |
|
0.024 | 601.2 (11.4) | 327.9 (14.9) | 0.027 | 598.0 (28.9) | 326.1 (24.7) |
(b) Alternative hypothesis | |||||||
---|---|---|---|---|---|---|---|
O'Brien‐Fleming | Pocock | ||||||
|
0.784 | 592.7 (31.5) | 284.2 (44.7) | 0.710 | 548.1 (85.3) | 260.5 (68.7) | |
|
0.771 | 564.6 (62.5) | 257.1 (55.9) | 0.701 | 516.3 (100.3) | 239.0 (74.7) | |
|
0.836 | 562.7 (62.7) | 251.5 (53.4) | 0.774 | 508.3 (100.6) | 230.1 (72.2) | |
|
0.841 | 531.9 (81.7) | 231.7 (58.0) | 0.783 | 483.4 (103.5) | 215.1 (72.3) |
Note: P(reject) is the proportion of Monte Carlo data sets for which the null hypothesis was rejected; MC E(SS) is the Monte Carlo average of number of subjects enrolled at the time the stopping boundary was crossed (SD); and MC E(Stop) is the Monte Carlo average stopping time (days) (SD). The standard error for entries for P(reject) in (a) is .
For the second scenario with binary outcome, we generated data as above under the null hypothesis and with the log odds ratio equal to 1.5, which implies a log relative risk (risk ratio) for death () of as in (3) and alternative hypothesis . We took , which corresponds roughly to 90% power to detect this alternative. The Monte Carlo sample covariance matrices of the 10 000 estimates for each of the four estimators under both the null and alternative settings are shown in the Supplemental Material and exhibit patterns analogous to those in (32), demonstrating that all estimators have approximately the independent increments property. Also shown in the Supplemental Material for each estimator at each analysis time are the Monte Carlo mean and SD, the Monte Carlo average of standard errors , and MSE ratio defined above as the Monte Carlo MSE for the estimator divided by that for under the null and alternative hypotheses. All estimators are consistent, and SEs are very close to the Monte Carlo SDs. Under both null and alternative hypotheses, the estimator , which takes censoring at interim analysis times into account and as noted previously is identical to the ratio of treatment‐specific Kaplan‐Meier estimators often used in practice, achieves substantial efficiency gains over , with a 2‐fold increase at the first interim analysis and 24% at the last at days. These estimators are equivalent, as expected, at the final analysis. The AIPWCC estimators and achieve even greater gains. Here, does not offer improved performance over ; this behavior is not surprising, as the time‐dependent covariates reflecting length of hospital stay do not provide information on death. As expected, these estimators are identical at and offer 10%‐12% gains in efficiency over the standard analysis through adjustment for the baseline covariate.
Table 3 shows interim monitoring results using each estimator with O'Brien‐Fleming and Pocock stopping boundaries under the null hypothesis and under the alternative . Again, overall testing procedures achieve the nominal level. Power gains over under the alternative are achieved using the AIPWCC estimators. As for the first scenario, basing interim analyses on , , and yields substantial reductions in expected sample size and stopping time over under the alternative, especially for the AIPWCC estimators.
TABLE 3.
P(reject) | MC E(SS) | MC E(Stop) | P(reject) | MC E(SS) | MC E(Stop) | ||
---|---|---|---|---|---|---|---|
(a) Null hypothesis | |||||||
O'Brien‐Fleming | Pocock | ||||||
|
0.024 | 900.0 (2.3) | 329.4 (6.5) | 0.022 | 896.3 (33.6) | 327.4 (19.8) | |
|
0.024 | 898.8 (15.8) | 328.0 (14.5) | 0.023 | 894.3 (42.3) | 326.5 (23.7) | |
|
0.023 | 899.0 (14.0) | 328.1 (13.8) | 0.025 | 894.1 (42.8) | 326.3 (24.3) | |
|
0.024 | 899.0 (14.9) | 328.0 (14.2) | 0.026 | 894.1 (42.7) | 326.2 (24.4) |
(b) Alternative hypothesis | |||||||
---|---|---|---|---|---|---|---|
O'Brien‐Fleming | Pocock | ||||||
|
0.770 | 887.5 (44.3) | 285.9 (44.0) | 0.690 | 827.2 (122.4) | 264.4 (67.2) | |
|
0.767 | 808.3 (121.4) | 241.3 (61.3) | 0.700 | 744.9 (157.6) | 228.3 (76.7) | |
|
0.806 | 801.8 (122.5) | 236.1 (59.6) | 0.746 | 733.2 (157.3) | 221.4 (74.7) | |
|
0.809 | 799.9 (123.4) | 235.5 (59.7) | 0.748 | 731.6 (157.3) | 220.6 (74.7) |
Note: Entries are as in Table 2. The standard error for entries for P(reject) in (a) is .
The final simulation scenario involves a continuous outcome, with , weeks, and weeks, so that enrollment takes place over 3 years, with interim analyses planned at calendar times 104, 130, 156, 182 weeks and the final analysis at weeks. We generated treatment assignment as Bernoulli with , where corresponds to placebo (active agent); and a categorical baseline covariate was generated with for . With weeks, , a matrix with , and , we generated longitudinal measurements for each subject according to the linear mixed effects model , where independent of . The outcome for subject is then , the longitudinal measure at weeks. As would be likely in practice, we included in only the single baseline covariate , the value of the longitudinal measure at time 0 and did not also include , and we took the single time‐dependent covariate at time to be the most recently observed value of the longitudinal measurements . Under the null hypothesis, ; under the alternative, corresponding to , for which yields roughly 90% power at the final analysis. Results shown in the Supplemental Material demonstrate that the independent increments property holds approximately for all estimators. Here, the estimators and are identical because for all subjects, so that both are based only on subjects followed for at least weeks. SEs are obtained from the routine formula for a difference in sample means assuming common treatment‐specific variance, while follows from the IPWCC influence function; these SEs are asymptotically equivalent but differ slightly for finite samples. Incorporation of the baseline covariate yields 10%‐20% gains in efficiency; further incorporation of the last outcome carried forward as a time‐dependent covariate leads to efficiency gains for of 34% to 47%.
Interim monitoring results are shown in Table 4 and are analogous to those in Tables 2 and 3. Under the null hypothesis, the Monte Carlo rejection probability for with Pocock boundaries exceeds slightly the nominal 0.025 level. Again, under the alternative, the AIPWCC estimators result in earlier expected sample sizes and stopping times.
TABLE 4.
P(reject) | MC E(SS) | MC E(Stop) | P(reject) | MC E(SS) | MC E(Stop) | ||
---|---|---|---|---|---|---|---|
(a) Null hypothesis | |||||||
O'Brien‐Fleming | Pocock | ||||||
|
0.026 | 299.9 (3.1) | 207.4 (5.5) | 0.025 | 298.6 (11.3) | 206.2 (12.7) | |
|
0.026 | 299.8 (3.4) | 207.4 (5.8) | 0.025 | 298.6 (11.5) | 206.1 (12.9) | |
|
0.025 | 299.9 (2.6) | 207.5 (5.0) | 0.026 | 298.6 (11.2) | 206.2 (12.7) | |
|
0.026 | 299.8 (4.0) | 207.0 (7.4) | 0.029 | 298.1 (13.2) | 205.7 (14.5) |
(b) Alternative hypothesis | |||||||
---|---|---|---|---|---|---|---|
O'Brien‐Fleming | Pocock | ||||||
|
0.875 | 286.3 (26.0) | 167.8 (30.6) | 0.823 | 259.9 (45.2) | 151.9 (41.9) | |
|
0.876 | 286.0 (26.4) | 167.4 (30.7) | 0.826 | 259.1 (45.4) | 151.1 (41.9) | |
|
0.930 | 286.5 (25.5) | 165.5 (28.9) | 0.896 | 255.9 (45.2) | 146.0 (39.6) | |
|
0.930 | 265.9 (37.3) | 146.7 (31.2) | 0.892 | 239.8 (45.3) | 133.5 (37.8) |
Note: Entries are as in Table 2. The standard error for entries for P(reject) in (a) is .
We remark that all scenarios reflect the general result that basing interim analyses on the proposed AIPWCC estimators leads to not only more efficient inferences but also, because of the increased precision, to a greater proportion of the total statistical information being available at each interim analysis time than would be available using the usual methods. This feature implies that O'Brien‐Fleming boundaries will be less conservative for the proposed estimators, leading to potential gains in expected sample size and stopping times.
6. APPLICATION
To demonstrate how use of the methods would proceed in practice as a trial progresses, we consider the setting of TESICO with ordinal categorical outcome, where the treatment effect of interest is the log odds ratio in an assumed proportional odds model as in (2). Because this trial is ongoing, we cannot base this demonstration on data from the trial; accordingly, we present use of the methods for a simulated data set generated according to the first simulation scenario in Section 5 with , which is based on this study. As in Section 5, the planned maximum sample size is , with full enrollment reached by day. Interim analyses are planned at 150, 195, 240, and 285 days, with the final analysis to be conducted at days, at which time all participants will have completed the trial with their outcomes ascertained. For definiteness, we use O'Brien‐Fleming stopping boundaries and focus on the null and alternative hypotheses vs , with overall level of significance .
Table 5 shows how the trial would proceed if the analyses were conducted at each interim analysis time using each of the estimators , , , and . For each estimator, the proportion of information at each of the interim analysis times was calculated as described in Section 4.2 and was used to obtain the stopping boundary. At the first interim analysis at 150 days, the proportion of information for the ML estimator , which uses only those subjects among the enrolled who have been followed for at least the maximum follow‐up time , is 0.257, whereas that for is 0.462, almost twice as large. This striking difference is reflected in the corresponding stopping boundaries: at the first interim analysis at 150 days, the test statistic based on is 2.496, far from the boundary of 4.265, whereas that based on is 2.966, almost reaching the boundary of 3.099. Basing the analyses on results in sufficient evidence to stop the trial at the second interim analysis at 195 days with 487 subjects enrolled, while sufficient evidence to stop using does not emerge until the fourth interim analysis at 285 days, with all subjects enrolled. Basing the analyses on and results in stopping the trial at 240 days, with again all 602 planned subjects enrolled.
TABLE 5.
|
|
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
Est (SE) |
|
|
|
Est (SE) |
|
|
|
||||||||
150 | 368 | 0.730 (0.292) | 2.496 | 0.257 | 4.265 | 0.547 (0.230) | 2.380 | 0.408 | 3.318 | ||||||||
195 | 487 | 0.619 (0.224) | 2.765 | 0.432 | 3.218 | 0.476 (0.193) | 2.473 | 0.581 | 2.733 | ||||||||
240 | 602 | 0.457 (0.187) | 2.445 | 0.611 | 2.657 | 0.423 (0.166) | 2.551 | 0.785 | 2.313 | ||||||||
285 | 602 | 0.459 (0.162) | 2.828 | 0.809 | 2.277 | ‐ | ‐ | ‐ | ‐ | ||||||||
330 | 602 | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ |
|
|
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
Est (SE) |
|
|
|
Est (SE) |
|
|
|
||||||||
150 | 368 | 0.565 (0.218) | 2.586 | 0.382 | 3.444 | 0.590 (0.199) | 2.966 | 0.462 | 3.099 | ||||||||
195 | 487 | 0.497 (0.182) | 2.739 | 0.564 | 2.777 | 0.532 (0.167) | 3.185 | 0.670 | 2.521 | ||||||||
240 | 602 | 0.409 (0.156) | 2.615 | 0.757 | 2.362 | ‐ | ‐ | ‐ | ‐ | ||||||||
285 | 602 | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ | ||||||||
330 | 602 | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ |
Note: For each of the estimators (the ML estimator based on data from all subjects followed for at least the maximum follow‐up period at ), the IPWCC estimator , and the AIPWCC estimators and , Est (SE) are the estimate (standard error ) at , is associated the Wald test statistic, is the proportion of information at , and is the O'Brien‐Fleming stopping boundary. Entries are boldfaced at the interim analysis at which the trial would be stopped using the indicated estimator.
7. DISCUSSION
We have proposed a general framework for design and conduct of group sequential trials in the common situation where the outcome is known with certainty only after some time lag. The methods account for censoring at the time of an interim analysis and incorporate baseline and time‐dependent evolving covariate information to improve efficiency over standard analyses, facilitating earlier stopping with potentially smaller numbers of enrolled subjects. We have demonstrated analytically and empirically that the proposed test statistics possess the independent increments structure, so that standard methods and software for specifying stopping boundaries can be used. The methods can be applied under both information‐based monitoring and fixed‐sample monitoring strategies. For the latter, we have proposed the idea of effective sample size to characterize the proportion of information available at an interim analysis. Simulation studies demonstrate that the methods preserve the operating characteristics of a monitored trial and that substantial reductions in expected sample size and stopping time can be achieved.
As noted above, the proposed methodology is relevant in the large class of problems where the outcome would be known with certainty for all subjects at the final analysis. For some trials with possibly censored time‐to‐event outcome, interest may focus on the hazard ratio under the assumption of proportional hazards. Here, there is no prespecified, maximum follow‐up time at which the outcome is known with certainty, so that the proposed framework is not applicable.
We have defined generically as the lag time in ascertaining the outcome . In practice, there may be an administrative delay between the time an event related to the outcome has occurred and when that information is available in the trial database; for example, in TESICO, if a subject dies, there may be a delay before the time of death appears in the database. Our definition of lag time should be taken to include such administrative delays.
The methods as presented are based on the assumption (12) that entry time is independent of all other variables, including baseline covariates , which implies that at any interim analysis time , . This assumption is made tacitly in any clinical trial that focuses on inference on an unconditional treatment effect parameter. If the distribution of changes over the course of a trial, then (12) is violated, and, intuitively, subjects enrolled at the time of an interim analysis may not be representative of the population of interest at the final analysis. This is a general phenomenon and not unique to our methodology. If (12) is violated in this way, then the treatment effect parameter may not be static over time. Under these circumstances, conditional (on ) inference may be more appropriate; for example, as in the case of the conditional proportional odds model for ordinal categorical outcome in Section 2.1. Under the modified assumption (independence conditional on ), the proposed methods can be extended to support such conditional inference through incorporation of relevant influence functions and modeling of the censoring distribution as a function of . In the Supplemental Material, we report on a simulation study based on Scenario 1 of Section 5 in which entry times are taken to be associated with that suggests that the methods may enjoy some robustness to violations of (12).
Supporting information
ACKNOWLEDGEMENTS
The authors thank Drs. Birgit Grund and Michael Hughes for useful discussions and the referees for very helpful comments. The authors are grateful to Dr. Shannon Holloway for preparing the R package tLagInterim, noted in the Data Availability Statement.
Tsiatis AA, Davidian M. Group sequential methods for interim monitoring of randomized clinical trials with time‐lagged outcome. Statistics in Medicine. 2022;41(28):5517–5536. doi: 10.1002/sim.9580
DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article, as no data sets are generated or analyzed. The methods developed are proposed to enable future analyses of data from clinical trials, including ongoing trials from which the data are proprietary and not available. An R package, tLagInterim, implementing the methodology is available at the Comprehensive R Archive Network (CRAN) at https://cran.r‐project.org/package=tLagInterim.
REFERENCES
- 1. Tsiatis A. Information‐based monitoring of clinical trials. Stat Med. 2006;25:3236‐3244. [DOI] [PubMed] [Google Scholar]
- 2. Pocock S. Group sequential methods in the design and analysis of clinical trials. Biometrika. 1977;64:191‐199. [Google Scholar]
- 3. O'Brien P, Fleming T. A multiple testing procedure for clinical trials. Biometrics. 1979;35:549‐556. [PubMed] [Google Scholar]
- 4. National Library of Medicine (US) . ACTIV‐3b: therapeutics for severely ill inpatients with COVID‐19 (TESICO). ClinicalTrials.gov identifier NCT04843761. Accessed June 27, 2022.
- 5. National Library of Medicine (US) . ACTIV‐2: a study for outpatients with COVID‐19. ClinicalTrials.gov identifier NCT04518410. Accessed June 27, 2022.
- 6. Lu X, Tsiatis A. Semiparametric estimation of treatment effect with time‐lagged response in the presence of informative censoring. Lifetime Data Anal. 2011;17:566‐593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Tsiatis A. Semiparametric Theory and Missing Data. New York, NY: Springer; 2006. [Google Scholar]
- 8. Scharfstein D, Tsiatis A, Robins J. Semiparametric efficiency and its implication on the design and analysis of group sequential studies. J Am Stat Ass. 1997;92:1342‐1350. [Google Scholar]
- 9. Lan K, DeMets D. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659‐663. [Google Scholar]
- 10. Tsiatis A, Davidian M, Holloway S. Estimation of the odds ratio in a proportional odds model with censored time‐lagged outcome in a randomized clinical trial. Biometrics. 2021;78:1‐13. [DOI] [PubMed] [Google Scholar]
- 11. Hellmich M. Monitoring clinical trials with multiple arms. Biometrics. 2001;57:892‐898. [DOI] [PubMed] [Google Scholar]
- 12. Wason J. Design of multi‐arm, multi‐stage trials in oncology. In: Halabi S, Michiels S, eds. Textbook of Clinical Trials in Oncology: A Statistical Perspective. New York: Chapman & Hall/CRC Press; 2019:155‐182. [Google Scholar]
- 13. Agresti A. An Introduction to Categorical Data Analysis. New York, NY: Wiley; 2019. [Google Scholar]
- 14. Stefanski L, Boos D. The calculus of M‐estimation. Am Stat. 2002;56:29‐38. [Google Scholar]
- 15. Zhang M, Tsiatis A, Davidian M. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics. 2008;64:707‐715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Kim KM, Tsiatis AA. Independent increments in group sequential tests: a review. SORT. 2020;44:223‐264. [Google Scholar]
- 17. Casper C, Cook T, Perez OA. Package ldbounds: Lan‐DeMets method for group sequential boundaries. Comprehensive R Archive Network. https://cran.r‐project.org/package=ldbounds. Accessed June 27, 2022.
- 18. Mehta CR, Tsiatis AA. Flexible sample size considerations using information‐based interim monitoring. Drug Inf J. 2001;35:1095‐1112. [Google Scholar]
- 19. Venables W, Ripley B. Modern Applied Statistics with S. 4th ed. New York, NY: Springer; 2002. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data sharing is not applicable to this article, as no data sets are generated or analyzed. The methods developed are proposed to enable future analyses of data from clinical trials, including ongoing trials from which the data are proprietary and not available. An R package, tLagInterim, implementing the methodology is available at the Comprehensive R Archive Network (CRAN) at https://cran.r‐project.org/package=tLagInterim.