Abstract
When designing repeated measures studies, both the amount and the pattern of missing outcome data can affect power. The chance that an observation is missing may vary across measurements, and missingness may be correlated across measurements. For example, in a physiotherapy study of patients with Parkinson’s disease, increasing intermittent dropout over time yielded missing measurements of physical function. In this example, we assume data are missing completely at random, since the chance that a data point was missing appears to be unrelated to either outcomes or covariates. For data missing completely at random, we propose noncentral F power approximations for the Wald test for balanced linear mixed models with Gaussian responses. The power approximations are based on moments of missing data summary statistics. The moments were derived assuming a conditional linear missingness process. The approach provides approximate power for both complete-case analyses, which include independent sampling units where all measurements are present, and observed-case analyses, which include all independent sampling units with at least one measurement. Monte Carlo simulations demonstrate the accuracy of the method in small samples. We illustrate the utility of the method by computing power for proposed replications of the Parkinson’s study.
Keywords: Power, missing data, mixed model, multilevel, longitudinal
1. Introduction
In repeated measures studies, researchers observe multiple responses on a set of independent sampling units. The repeated measurements may be longitudinal (across time), spatial (across location), or multivariate (on different scales). Outcome data are missing when a measurement is not observed at a particular planned occasion. The chance that an observation is missing may vary across measurements, and missingness may be correlated across measurements. Both the amount and the pattern of missing outcome data can affect power in repeated measures studies. Such complex missing data patterns occur frequently in biomedical studies. For example, in a randomized controlled clinical trial of physiotherapy in patients with Parkinson’s disease (ClinicalTrials.gov Identifier: NCT01257945) (Schenkman et al. 2012), increasing dropout over a 16 month follow-up period produced missing measurements of physical function. In this study, the chance that a data point was missing appeared to be unrelated to either outcomes or covariates.
Published power approximations use assumptions that do not match the complex missing data patterns like the one in the Parkinson’s disease study. Muller et al. (1992, 2007) assumed no missing data. Li and McKeague (2013) use local alternative distributions which rely on asymptotic assumptions for accuracy. Further compounding the issues with local alternatives, the use of GEE estimates has been shown to suffer from inflated Type I error rates in small samples, weakening its otherwise general appeal (Stiger et al. 1998). Tu et al. (2007) accounts for missing data in their power approximation, but they too assume local alternative distributions. Ringham et al. (2016) accounts for missing data and uses a noncentral F-distribution proposed by Muller et al. (1992, 2007), but they assume that every response was independently and equally likely to be missing.
To counteract these challenges, we provide power approximations that accommodate complex missing data processes characterized by a conditional linear model (Qaqish 2003). We modify a noncentral F power approximation for a class of mixed models with no missing data described by Muller et al. (1992, 2007) and Edwards et al. (2008). The modifications incorporate the expected value of one of several missing data summary statistics (Ringham et al. 2016; Barton and Cramer 1989; Catellier and Muller 2000). The choice of which missing data summary statistic to use depends on whether scientists plan complete- or observed-case data analytic approaches. The complete-case approach uses only independent sampling units with all measurements present. The observed-case approach uses any independent sampling unit with at least one measurement. There are two missing data summary statistics in particular that we favor for these objective analyses.
The work applies to a broad and useful subset of missing processes, models and hypothesis tests. The data are assumed to be missing completely at random (MCAR) (Little and Rubin 2002). Under the MCAR assumption, the distribution of the missing responses does not depend on any of the observed outcomes, unobserved outcomes or explanatory variables. The models are assumed to be a class of general linear mixed models (Laird and Ware 1982) referred to as balanced linear mixed models (Muller et al. 2007; Ringham et al. 2016). A general linear mixed model is a balanced linear mixed model if it can be expressed as an equivalent general linear multivariate model when there is no missing data. We restrict attention to the Wald test and the Kenward-Roger reference distribution due to its equivalence with a Hotelling-Lawley trace test (McKeon 1974). We use a covariance estimator that assumes an unstructured correlation pattern between responses which has been shown to be quite favorable in clinical trial settings with random dropout (Gosho et al. 2017). Balanced linear mixed models fulfill the following criteria. For a particular independent sampling unit, the predictors have the same value for all outcomes. In addition, each independent sampling unit has the same number of planned predictors and the same error covariance matrix. In balanced linear mixed models with no missing data, each independent sampling unit has the same number and type of planned outcome measurements. In this manuscript, we relax the last assumption, and allow for missing data.
The paper has six sections. Section 2 contains general notation, definitions, and model assumptions, while Section 3 contains the power approximations. In Section 4, Monte Carlo simulations are used to demonstrate the accuracy of the method in small samples. Section 5 contains example power analyses for a proposed clinical trial. The implications and limitations of the results are discussed in Section 6.
2. Notation, definitions, and assumptions
2.1. General notation
For an (m × o) matrix A and an (o × n) matrix B, AB is the matrix product. Throughout vec(A) is the matrix function that stacks the columns of an (m × n) matrix A into an (mn × 1)vector. For an (m × n) matrix A and a (p × q) matrix B, the Kronecker product A ⊗ B generates the (mp × qn) matrix C = {aijB} The trace of a square matrix A is . A square and full rank matrix A has inverse A−1 Also, for any matrix A, the Moore-Penrose inverse is A+ and Aʹ is the transpose. An (a × a) identity matrix is Ia, an (a × 1) vector with every value set to 1 is 1a and an (a × 1) vector with every value set to 0 is 0a The minimum and maximum values of a set S are denoted by and , respectively.
We indicate the expected value and variance of y as and , respectively Similarly, for scalar y, we indicate the expected value and variance as and . Scalar random variables yi and yj have covariance . For positive integer n, we write y ~ Nn(μ,Ξ) to indicate the vector normal probability distribution (Arnold 1981) with an (n × 1) vector of means and a positive definite symmetric (n × n) matrix of covariances. Similarly, with N and p positive integers, N>p, we write y ~ Nn,p (M,Ω,Ξ) to indicate the matrix normal (Arnold 1981), with an (N × p) matrix of mean a positive definite symmetric (P × P) matrix of column covariances and a positive definite symmetric (N × N) matrix of row covariances. Consequently, y ~ Nn,p (M,IN,Ξ) implies that y ~ Nnp[vec(M),Ξ ⊗IN].
2.2. Balanced general linear mixed model and general linear hypothesis
Under the assumptions in Section 1, we define a balanced linear mixed model (Muller et al. 2007; Ringham et al. 2016). We use the subscript m to indicate mixed model components.
Independent sampling unit i ∈ {1,2,…, N} at measurement j ∈ {1,2,…., p} has response yi,j, with the (Np × 1) vector of responses for independent sampling unit i, and the (Np×1) vector of responses for all independent sampling units. The fixed and known (N × q) design matrix for a single outcome is X, and Xm = X⊗Ip is the corresponding (Np × qp) design matrix for all outcomes. The (qp×1) vector β contains the unknown and constant model coefficients. In turn, Σ is a positive definite symmetric (p × p) covariance matrix and is the same for all independent sampling units. We define Ψ = (IN⊗ Σ): Throughout, e ~NNp(0,Ψ )is the (Np × 1) vector of errors, which implies statistical independence of ei and for i,iʹ ∈ {1,2,…, N} i≠iʹ The balanced general linear mixed model is
| (1) |
with y ~ NNp(Xmβ, Ψ) and yi independent of for i,iʹ ∈ {1,2,…, N} i≠iʹ
The (a × q) matrix C contains the between-independent-sampling-unit contrasts and the (p × b) matrix U contains the within-independent-sampling-unit contrasts. The (ab × pq) contrast L =C ⊗ Uʹ defines the secondary parameter θ = Lβ and the corresponding null hypothesis
| (2) |
The Wald parameter is defined as
| (3) |
with Xm the design matrix speculated to exist were the study to be observed.
2.3. Vector of non-missing data indicators
The balanced linear mixed model described in Equation 1 has no missing response data. Often, in biomedical studies, responses at planned measurements are not observed, resulting in missing response data, and a sample size that may be smaller than the planned sample size, N. We define notation for the probability that yi, j is missing or non-missing.
The indicator variable di, j has realization . If yi, j is missing then . Otherwise, .For di, j ~ Bernoulli (πj) it follows that
| (4) |
With j ≠ jʹ and j,jʹ ∈ (1,2, P}, the joint probability mass function is defined by
| (5) |
An implicit requirement is that . It is important to recognize that implicit in Equations 4 and 5 is a homogeneity assumption: the probability that a response is non-missing is the same for all i.
The (p × 1) vector of non-missing data indicators for independent-sampling-unit i is di = [di,1, di,2 … di,p]ʹthe (1 × N) matrix of non-missing data indicators for measurement j is Dj = [d1, j, d2, j … dN, j] and the (Np × 1) vector of non-missing data indicators for all independent sampling units and measurements is .
2.4. Conditional linear missing data process
In the previous section, we defined notation to describe whether a response was present or missing. We turn now to examining the underlying probability processes which dictate the presence or absence of the responses. For the sake of clarity, we use the word “process” to describe missingness. This choice is in contrast to the word “model” used in Section 2 to describe the association between outcomes and predictors.
For data analysis, Qaqish (2003) described binary responses using a class of conditional linear models with constrained covariance parameters. We can adapt his model to characterize a missing data process, referred to hereafter as the conditional linear process (CLP). The conditional linear process has two parameters. For j ∈ {1,2,…, p}, we define (p × 1) vector of the marginal probabilities that a response at a given repeated measurement j is non-missing as π = [π1 π2 … πp]ʹ Further, we write πj−1 to indicate the (j – 1) vector [π1 π2 … πj−1]ʹ. We use to denote the correlation between di,j and for j ≠ jʹ With ψj = [πj/(1−πj)]1/2 Qaqish (2003) showed that bounding
| (6) |
yields a positive semidefinite covariance matrix while strict inequalities produce positive definite covariance matrices.
For the conditional linear process, we write the (p × 1) vector of expected values and the (p × p) covariance . Both π and Φ are defined by a conditional model, which gives the result for measurement j, conditioned on the previous (j − 1) measurements.
To describe the missing states for the first j measurements on independent sampling unit i, we define the (j × 1) vector di,j = [di,1, di,2 … di,j]ʹ: As described above, we use to denote the vector of realized values for di, j: Similarly, we define the (j × j) matrix to contain the variance and covariance parameters for the first j measurements on independent sampling unit i. With , k ∈ {1,2,…j−1} and the [(j−1)×1] vector of covariances τj−1= [τ1,j τ2,j ….τj−1,j]ʹ we obtain
| (7) |
Variance elements in Φ fall in [0,0.25]: Covariance elements are a function of the chance that both elements are non-missing. Using the result in Equation 7, we write
| (8) |
2.5. Missing data summary statistics
Barton and Cramer (1989) and Catellier and Muller (2000) suggested using missing data summary statistics to summarize an observed missing data pattern. However, when designing a study and calculating power, the pattern of missingness has not yet been observed. Instead, one must consider the expected value of the missing data summary statistic with respect to the missing data process. The expected value is a weighted average over all possible realizations of the missing data process.
For independent sampling unit i, we define the number of missing responses {0,1,….,p} as
| (9) |
A complete case is indicated by
| (10) |
For j ∈ {1,2,….,p}, the number of independent sampling units who have measurement j present is given by
| (11) |
for Njj ∈ {0,1,…N}
The missing data summary statistics Nk, k ∈{c,m,p} are defined in Table 1. For the formulae presented in Table 1, we define
| (12) |
Table 1.
Missing data summary statistics.
| Summary Statistic | Function | Description | ||
|---|---|---|---|---|
| Nc | number of complete cases | Nϕ | Nϕ(1 − ϕ) | |
| Nm | mean number of non-missing measurements | |||
| Np | N | planned number of measurements | N | 0 |
3. Power for balanced linear mixed models with missing data
We provide power approximations for balanced linear mixed models with no missing data, and for the complete- and observed-case analyses. We restrict attention to the Wald test with the Kenward-Roger (Kenward and Roger 1997, 2009) reference distribution.
The new power approximation is based on a sequence of previous published works. Muller et al. (2007) and Edwards et al. (2008) demonstrated that a multivariate model can be converted to a balanced linear mixed model, and that the Hotelling-Lawley trace statistic can be converted to the Wald test statistic. Under the alternative hypothesis, HA :θ ≠ θ0, for balanced linear mixed models with no missing data, a function of the Wald statistic has an approximate noncentral F distribution (Edwards et al. 2008). Ringham et al. (2016) presented a power approximation for balanced linear mixed models using a multivariate noncentrality parameter. In this work, we demonstrate that the multivariate noncentrality parameter (Ringham et al. 2016) can be transformed into the noncentrality parameter for balanced linear mixed models, given below in Equation 13. The transformation is detailed in Section A.1 of the Appendix.
In the remainder of Section 3 we outline the approximation method. The error degrees of freedom υe = N – rank(X): With missing data, with as given in Table 1, we write the adjusted error degrees of freedom as . The noncentrality parameter is given by
| (13) |
As defined previously, a and b are the ranks of the between- and within-independent-sampling-unit contrast matrices and that ζW(θ, Ψ) is the Wald parameter (Equation 3).
Let indicate the quantile function for a noncentral F random variable. Following the approach in Muller et al. (1992, 2007) and Ringham et al. (2016), we use the McKeon (1974) approximation for the Hotelling-Lawley trace to define the degrees of freedom. With υk replacing υe in the forms for no missing data, the numerator and denominator degrees of freedom are υ1 = ab and
| (14) |
respectively. For , we approximate power by
| (15) |
We use the notation v2(vk) and ω(vk) to emphasize the dependence of the parameters on vk.
Equation 15 gives approximate power for two missing data analysis approaches. To calculate the approximate power for the complete-case analysis, one uses Equation 15 evaluated with . In this setting represents the expected number of complete cases. To calculate the approximate power for the observed-case analysis, one uses Equation 15 evaluated with . We substitute for N in the observed-case as it characterizes the expected number of observations at each repeated measurement.
For no missing data, the results of Equation 15 reduces to the power approximation suggested by Edwards et al. (2008). The results are equivalent because vk = vp = Np –rank(X) = N –rank(X) and thus, Power(vk) = Power(ve). In an even more restrictive case, with no missing data and s = min(a, b) = 1, the noncentral distribution of the test statistic is known and the power results are exact, rather than approximate.
4. Numerical evaluation of the accuracy of the power computations
4.1. Simulation methods
We evaluated the accuracy of the power approximations via simulation. To compute empirical power, we defined α, the Type I error rate, X, C, U, β, θ0 and Ψ: We computed Xm = X⊗Ip L = C ⊗Uʹ and θ =Lβ We generated realizations of e and computed realizations of y as in Equation 1.
We randomly generated realizations of di, the vector of non-missing data indicators, using different missing data processes. We then created y* by setting each corresponding value in y to present or missing, as indicated by di: The process was repeated for 10,000 replications.
For each replication, we attempted to fit a general linear mixed model, with y* as the response vector. Mixed models were fit in SAS 9.4 (SAS Institute Inc., 2017) using the PROC GLIMMIX procedure with double dogleg optimization. The modeling approach used an unstructured covariance, the REPEATED statement and the KR2 (Kenward Roger) option for the denominator degrees of freedom.
We evaluated the accuracy of the approximations separately for the complete- and observed-case approaches. For the complete-case approach, we analyzed only the independent sampling units with all measurements present. For the observed-case approach, we analyzed all independent sampling units with at least one measurement present.
For some replicates, we were unable to fit the mixed model. The parameters β and hence θ are estimable only if there is at least one data point for each within-by-between cross-classification. In some cases, there were sufficient data so that β was technically estimable, but the model failed to converge, usually because the Hessian matrix was not positive definite. We recorded the number of replicates for which β was estimable, and the number of replicates for which the model converged. Convergence rate was calculated as the number of replicates for which the model converged divided by the number of replicates for which β was estimable.
For each model that converged, we calculated the Wald statistic and computed a p-value using the Kenward-Roger (Kenward and Roger 2009) degrees of freedom. Empirical power was computed as the number of p-values less than or equal to α, divided by the number of replicates for which the model converged. The simulation study used 10, 000 replications so that the error in the estimation of the empirical power would occur in the second decimal place.
The analytic power approximations for the complete-case and observed-case analyses were computed as in Equation 15. The absolute deviation was calculated as the absolute value of the difference between the approximate power and the empirical power. The maximum absolute deviation was computed as the maximum of the absolute deviations across a specified range of experimental conditions.
We examined the accuracy of the power approximations for five of the nine scenarios previously considered by Johnson et al. (2009) and then by Kreidler et al. (2013). The subset was chosen to ensure that the Wald statistic and the mixed model were appropriate. Below, we provide results for two of the five scenarios: Scenario 4 and Scenario 5. Scenario 4 involves a test of a time-by-treatment interaction in a design with four exposure groups and three repeated measurements over time. Scenario 5 also involves a test of a time-by-treatment interaction, but for a design with two exposure groups, and five repeated measurements over time. Detailed descriptions of the study designs for Scenarios 4 and 5 appear in Section A.5 in the Appendix. The online supplemental material contains study designs for the remaining scenarios.
Scenarios 4 and 5 were chosen to allow evaluation of the accuracy of the power approximations in two important cases. In Section 3, we defined s = min{a, b} to allow us to distinguish between single and multiple degree of freedom hypothesis tests. For the design in Scenario 4, s > 1, and the power results would be approximate even with complete data. For the design in Scenario 5, s = 1. This means that the analytic power results would be exact with no missing data, but are approximate when some measurements are missing.
For each experimental scenario, we varied the analysis approach (complete- or observed-case), the planned sample size, the missing data process, the parameter inputs for the missing data process (π and Φ) and the scaling parameters for the error variance and mean difference (δσ and δβ, respectively).
4.2. Simulation results
Over all experimental conditions, the minimum number of replications for which β was estimable was 9, 431 out of the planned 10, 000 replicates. The convergence rates across most experimental conditions were greater than 0.99. The exception were conditions for Scenario 5 with a planned sample size of 20, which gave the minimum convergence rate of 0.78.
A subset of the results for Scenario 4 appears in Table 2. A subset of the results for Scenario 5 appears in Table 3. The online supplemental material contains results for the remaining scenarios and experimental conditions. We report for the complete-case approach, for the observed-case approach, the approximate analytic power, the empirical power and the absolute deviation.
Table 2.
Selections from Scenario 4.
| Complete/Observed Case Analysis | δσ | δβ | N | Analytic Power | Empirical Power | Absolute Deviation | |
|---|---|---|---|---|---|---|---|
| Complete | 1 | 1 | 20 | 11.5600 | 0.0213 | 0.0246 | 0.0033 |
| Complete | 1 | 2 | 20 | 11.5600 | 0.0771 | 0.1098 | 0.0326 |
| Complete | 2 | 1 | 20 | 11.5600 | 0.0152 | 0.0147 | 0.0004 |
| Complete | 1 | 1 | 40 | 23.1200 | 0.0777 | 0.0811 | 0.0033 |
| Complete | 1 | 2 | 40 | 23.1200 | 0.5329 | 0.5455 | 0.0126 |
| Complete | 2 | 1 | 40 | 23.1200 | 0.0356 | 0.0433 | 0.0078 |
| Observed | 1 | 1 | 20 | 16.0000 | 0.0407 | 0.0561 | 0.0155 |
| Observed | 1 | 2 | 20 | 16.0000 | 0.2440 | 0.2359 | 0.0082 |
| Observed | 2 | 1 | 20 | 16.0000 | 0.0227 | 0.0336 | 0.0108 |
| Observed | 1 | 1 | 40 | 32.0000 | 0.1542 | 0.1423 | 0.0119 |
| Observed | 1 | 2 | 40 | 32.0000 | 0.8408 | 0.7852 | 0.0556 |
| Observed | 2 | 1 | 40 | 32.0000 | 0.0604 | 0.0615 | 0.0011 |
Table 3.
Selections from Scenario 5.
| Complete/Observed Case Analysis | δσ | δβ | N | Analytic Power | Empirical Power | Absolute Deviation | |
|---|---|---|---|---|---|---|---|
| Complete | 1 | 1 | 20 | 8.3521 | 0.1109 | 0.1564 | 0.0454 |
| Complete | 1 | 1 | 40 | 16.704 | 0.3737 | 0.3855 | 0.0117 |
| Complete | 1 | 1 | 80 | 33.4084 | 0.8072 | 0.8128 | 0.0056 |
| Observed | 1 | 1 | 20 | 16.0000 | 0.3675 | 0.3672 | 0.0002 |
| Observed | 1 | 1 | 40 | 32.0000 | 0.7945 | 0.7618 | 0.0327 |
| Observed | 1 | 1 | 80 | 64.0000 | 0.9919 | 0.9883 | 0.0036 |
Across both experimental scenarios, and all experimental conditions, the maximum absolute deviation between the approximate and analytic power was less than 0.056 (data not shown). Most scientists design studies so that they have power between 0.8 and 0.95. For all scenarios where the analytic power was greater than or equal to 0.8, the maximum absolute deviation was less than 0.052. The accuracy of the approximations was similar across different sample sizes, values of variance, mean differences, types of missing data processes, inputs for the missing data processes and choice of analysis approach. Results for the other five scenarios shown in the online supplemental material were similar.
5. Randomized controlled clinical trial in parkinson’s disease
Schenkman et al. (2012) conducted a randomized controlled clinical trial of three different forms of exercise therapy: flexibility/balance/function, standard aerobic exercises and a home-based exercise regimen. The outcome was the Continuous Scale Physical Function test at baseline, 4, 10, and 16 months after randomization.
Investigators propose a similar follow-up study with a different sample from the same population. For the follow-up study, investigators will fit a general linear mixed model with indicator variables for the three treatments as predictors and the four repeated measurements of the physical function test as outcomes. The goal will be to evaluate the time-by-treatment interaction, using the mixed model Wald test with Kenward-Roger degrees of freedom and an α level of 0.05.
We provide a power analysis for the follow-up study. Model and hypothesis parameters are given in Section A.6 in the Appendix. We assumed a conditional linear missing data process. The missing data process parameter estimates are taken from the preliminary data analysis, with the covariance matrix of the missingness given by
| (16) |
We consider three patterns for π: Pattern A corresponds to complete data, with π = [1 1 1 1]ʹ. Pattern B corresponds to the missing data pattern observed in the preliminary data analysis, with π = [1 0.868 0.818 0.793 ]ʹ. Pattern C corresponds to half the missing data observed in the preliminary analysis, with π = [1 0.736 0.636 0.587]ʹ.
Power curves for the three patterns are shown in Figure 1. Power is shown on the y axis. The x axis is in units of the largest element in the interaction coefficient matrix observed in the previous study, 4.153. No units are shown because the outcome is unitless.
Figure 1.
Power for a randomized controlled trial.
6. Discussion
Missing data occurs everywhere in biomedical research, from longitudinal epidemiologic observational studies to randomized controlled clinical trials. Missing data can strongly affect the power of biomedical research studies and thus the choice of sample size. If the sample size is larger than needed, too many participants are exposed to the potential risk of research. If the sample size is smaller than needed, resources, investigator and participant time are wasted on a study that has insufficient statistical power to answer the question of interest. Accurate power and sample size calculations must take into account the chance of missing data.
The methods proposed in the current work allow power calculations for some important classes of models and missing data processes. We consider balanced linear mixed models and the Wald test with Kenward-Rogers degrees of freedom. We also used a flexible model of the missing data process to allow us to examine how different missing data patterns affect the results of the power analysis.
The work extends previous power approximations for data without missing values (Muller et al. 2007; Edwards et al. 2008) and for data with values missing completely at random with equal probability (Ringham et al. 2016). The new methods accommodate complex missing data processes (Qaqish 2003) to better mirror processes likely to occur in real repeated measures studies. The missing data processes we consider allow a different chance of missingness for each measurement in a repeated measures study. In addition, the missing data processes allow for the chance of missingness for one measurement to be correlated with the chance of missingness of another measurement. We provide different power approximations for the complete- and observed-case data analytic approaches, both of which are commonly used in biomedical research. The approximation has adequate accuracy for the scenarios scientists care about the most, with true power between 0.8 and 0.95. The error in the power approximation is in the second decimal place.
A version of the approximations described here will be included in Version 3 of GLIMMPSE (Kreidler et al. 2013), available at SampleSizeShop.org. The GLIMMPSE power and sample size software for multilevel and longitudinal studies is user-friendly, point-and-click and available to run without cost from a web browser. Users who prefer R code to implement the methods may request a copy from the authors.
To approximate power for a study with no missing data, a scientist needs to specify the Type I error rate, predictor matrix, error variance, between- and within-independent sampling unit contrasts and sample size. To conduct an accurate power calculation with outcome data missing completely at random, a scientist must, in addition, choose an appropriate missing data process. The choice should depend on the scientist’s assumptions about the missing data process. The conditional linear missing data process can accommodate several assumptions, including autoregressive and unstructured correlation structures. Estimates for the chance that the response is missing at each measurement, and the correlation between missingness at each measurement can be obtained from literature reports, from previous unpublished data, or from an understanding of the rationale for missing data in a certain study design, disease state, or treatment regimen.
While the power approximations cover a useful set of models and missing data processes, there are limitations to the work. First, the assumption that the data are missing completely at random (MCAR) may not hold. The MCAR assumption means that the chance that a response is missing does not depend on observed data, including covariates. Secondly, the manuscript only proposes power analyses for balanced linear mixed models with Gaussian outcomes. This means that the results may not hold for studies with mistimed data, time-dependent covariates or non-Gaussian outcomes.
In future work, we hope to develop a maximum likelihood approach to estimate marginal missingness, and correlation between missingness from completed studies. It would be useful to develop a corresponding likelihood ratio test to aid scientists in characterizing a missing data process. We also plan to relax the assumption that the outcome data are missing completely at random. We hope to consider a larger class of models and outcomes, including non-linear mixed models, and binary and Poisson distributed outcomes. It is important to consider how power calculations are affected by the use of observed (and hence random) percentages of missingness at each measurement drawn from previous experiments. An additional open question is how to compute confidence intervals for power for studies with missing data. For ethical reasons, some investigators may prefer to consider quantiles of power, rather than power calculated for an expected amount of missingness.
Supplementary Material
Funding
KAS and DD were supported by NIH R01DK076645 (Dabelea), NIH UG3OD023248 (Dabelea) and AHA16MCPRP29710005 (Sauder). MS was supported by NIH R01 HD043770 (Schenkman), CCTSI TL1RR025778 (Schenkman), NIH P30 DK048520 (Hill), K23 NS052487 (Hall), and the Parkinson’s Disease Foundation. DHG and KEM were supported by NIH R01GM121081 (Glueck, Dabelea, Muller), NIH R25GM111901 (Glueck, Muller) and NIH G13LM011879 (Glueck, Muller). BMR and KPJ were supported by NIH R01GM121081 (Glueck, Dabelea, Muller).
Appendix A
A.1. Approximate distribution of the Wald statistic
Some multivariate model notation is needed. Here yi, j is the response for independent sampling unit i ∈ {1, 2, :::, N} at measurement j ∈ {1, 2, :::, p}, and Y = {yi, j} is the (N × p) matrix of responses for all independent sampling units. The corresponding (N × q) design matrix is X, while B is the (q × p) primary parameter matrix, and E is the (N × p) error matrix. For Σ a positive definite symmetric (p × p) covariance matrix, Y ~ NN, p(XB, IN, Σ): The general linear multivariate model is written
| (17) |
The (a × q) between-independent-sampling-unit contrast matrix C, the (p × b) within-independent-sampling-unit contrast matrix U, and the (a × b) Θ0 are all matrices of known constants. The multivariate null hypothesis is
| (18) |
Also
| (19) |
The constant matrix
| (20) |
helps define
| (21) |
The Hotelling-Lawley trace parameter then follows
| (22) |
With the Hotelling-Lawley parameter, we derive the Wald parameter as follows:
| (23) |
Under an alternative hypothesis, Ringham et al. (2016) proposed using as the noncentrality parameter in their F power approximation for a multivariate model with missing data. Following Equation 23, ω*(vk) is equivalent to ω(vk) defined in equation 13.
To obtain the Hotelling-Lawley trace statistic, use the ordinary least squares estimator to obtain with which we get and . Then and .The Hotelling-Lawley statistic is then defined The analogous Wald statistic is defined where , and . Using Equation 23, it can be shown that .
Assuming the null hypothesis is true, McKeon (1974) showed that
| (24) |
is approximately distributed as F[ab, v2(ve)] using method of moments to approximate and a constant scaling component with v1 given. Thus,
| (25) |
is also approximately distributed as F[ab, v2(ve)]. When missing data is present, we plug-in vk for ve.
A.2. Moments for the univariate indicator of the non-missing responses
The expected value of the missing state for a single repeated measurement is given by
| (26) |
The second moment is
| (27) |
yielding . For , since has a Bernoulli distribution. This gives .
A.3. Moments for Nc
The moments can be derived by a process of induction. To begin, we show that . The initial step gives
| (28) |
For the base case, allow p= 2. Then
| (29) |
For the induction step, suppose that
| (30) |
Then
| (31) |
The complete-case indicator variable has a Bernoulli distribution such that
| (32) |
Since we assume that for all i, iʹ ∈{1, 2, :::, N}, i ≠ iʹ and j, jʹ ∈ {1, 2, :::, p), , the sum of complete-case indicator variables has a binomial distribution with
| (33) |
and
| (34) |
A.4. Moments for Nm
We derive the moments for Nmas follows:
| (35) |
and
| (36) |
A.5. Specification of scenarios for numerical evaluation of accuracy
A.5.1. Scenario 4
Here we specify parameters for the complete data. Let α = 0:01, N ∈ {20, 40}, δβ ∈ {0.0,0.5,1.0,1.5,2},δσ ∈ {1.0, 2.0} and θ0 = 06. Define the following matrices:
| (37) |
| (38) |
| (39) |
| (40) |
| (41) |
We now specify the missing processes that act on the complete data. For the conditional linear missing data process (Qaqish 2003 ), we specified π= [0:8 0:8 0:8] and Φ as compound symmetric covariance pattern with .
A.5.2. Scenario 5
Here, we specify parameters for the complete data. Let α = 0:05, N ∈ {20, 40, 80}, δβ ∈ {1}, δσ ∈ {1} and θ0 = 06. Define the following matrices:
| (42) |
| (43) |
| (44) |
| (45) |
| (46) |
The missing data processes that act on the complete data are parallel to those described in Scenario 4 with one exception. For Scenario 5, π = [0:8 0:8 0:8 0:8 0:8 ] and Φ was an AR(1) covariance pattern with γ12 = 0:25:
A.6. Specification of scenarios for Parkinson’s randomized controlled trial illustrative example
We give the designs for complete data. Letα = 0.01,N =121,δβ rande from 0 to 10 by 0.01, δσ = 1and θ0 = 06. Define the following matrices:
| (47) |
| (48) |
| (49) |
| (50) |
| (51) |
| (52) |
Footnotes
Disclaimer
This manuscript was submitted to the Department of Biostatistics and Informatics in the Colorado School of Public Health, University of Colorado Denver, in partial fulfillment of the requirements for the degree of Master of Science in Biostatistics for KPJ. The content of this paper is solely the responsibility of the authors, and does not necessarily represent the official views of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, the National Center for Advancing Translational Sciences, the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institute of General Medical Sciences, the National Institute of Neurological Disorders and Stroke, the National Library of Medicine, the Office of the Director, the National Institutes of Health, the American Heart Association nor the Parkinson’s Disease Foundation. The authors have no conflicts of interest to disclose.
Supplemental data for this article is available online at https://doi.org/10.1080/03610926.2021.1909732.
References
- Arnold SF 1981. The theory of linear models and multivariate analysis. 1st ed. New York: John Wiley & Sons Inc. [Google Scholar]
- Barton CN, and Cramer EC. 1989. Hypothesis testing in multivariate linear models with randomly missing data. Communications in Statistics - Simulation and Computation 18 (3): 875–95. doi: 10.1080/03610918908812796. [DOI] [Google Scholar]
- Catellier DJ, and Muller KE. 2000. Tests for Gaussian repeated measures with missing data in small samples. Statistics in Medicine 19 (8):1101–14. doi:. [DOI] [PubMed] [Google Scholar]
- Edwards LJ, Muller KE, Wolfinger RD, Qaqish BF, and Schabenberger O. 2008. An R2 statistic for fixed effects in the linear mixed model. Statistics in Medicine 27 (29):6137–57. doi: 10.1002/sim.3429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gosho M, Hirakawa A, Noma H, Maruo K, and Sato Y. 2017. Comparison of bias-corrected covariance estimators for MMRM analysis in longitudinal data with dropouts. Statistical Methods in Medical Research 26 (5):2389–406. doi: 10.1177/0962280215597938. [DOI] [PubMed] [Google Scholar]
- Johnson JL, Muller KE, Slaughter JC, Gurka MJ, Gribbin MJ, and Simpson SL. 2009. POWERLIB: SAS/IML software for computing power in multivariate linear models. Journal of Statistical Software 30 (5):1–27. doi: 10.18637/jss.v030.i05. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kenward MG, and Roger JH. 1997. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 53 (3):983–97. doi: 10.2307/2533558. [DOI] [PubMed] [Google Scholar]
- Kenward MG, and Roger JH. 2009. An improved approximation to the precision of fixed effects from restricted maximum likelihood. Computational Statistics & Data Analysis 53 (7): 2583–95. doi: 10.1016/j.csda.2008.12.013. [DOI] [Google Scholar]
- Kreidler SM, Muller KE, Grunwald GK, Ringham BM, Coker-Dukowitz ZT, Sakhadeo UR, Barón AE, and Glueck DH. 2013. GLIMMPSE: Online power computation for linear models with and without a baseline covariate. Journal of Statistical Software 54 (10): 1–28. doi: 10.18637/jss.v054.i10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laird NM, and Ware JH. 1982. Random-effects models for longitudinal data. Biometrics 38 (4):963–74. doi: 10.2307/2529876. [DOI] [PubMed] [Google Scholar]
- Li Z, and McKeague IW. 2013. Power and sample size calculations for generalized estimating equations via local asymptotics. Statistica Sinica 23 (1):231–50. doi: 10.5705/ss.2011.081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Little RJA, and Rubin DB. 2002. Statistical analysis with missing data. 2nd ed. Hoboken, NJ: Wiley-Interscience. [Google Scholar]
- McKeon JJ 1974. F approximations to the distribution of Hotelling’s T2 0. Biometrika 61 (2): 381–83. [Google Scholar]
- Muller KE, Edwards LJ, Simpson SL, and Taylor DJ. 2007. Statistical tests with accurate size and power for balanced linear mixed models. Statistics in Medicine 26 (19):3639–60. doi: 10.1002/sim.2827. [DOI] [PubMed] [Google Scholar]
- Muller KE, Lavange LM, Ramey SL, and Ramey CT. 1992. Power calculations for general linear multivariate models including repeated measures applications. Journal of the American Statistical Association 87 (420):1209–26. doi: 10.1080/01621459.1992.10476281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qaqish BF 2003. A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90 (2):455–63. doi: 10.1093/biomet/90.2.455. [DOI] [Google Scholar]
- Ringham BM, Kreidler SM, Muller KE, and Glueck DH. 2016. Multivariate test power approximations for balanced linear mixed models in studies with missing data. Statistics in Medicine 35 (17):2921–37. doi: 10.1002/sim.6811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- SAS Institute Inc. 2017. SAS 9.4.
- Schenkman M, Hall DA, Baron AE, Schwartz RS, Mettler P, and Kohrt WM. 2012. Exercise for people in early- or mid-stage Parkinson disease: A 16-month randomized controlled trial. Physical Therapy 92 (11):1395–410. doi: 10.2522/ptj.20110472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stiger TR, Kosinski AS, Barnhart HX, and Kleinbaum DG. 1998. Anova for repeated ordinal data with small sample size? A comparison of anova, manova, wls and gee methods by simulation. Communications in Statistics - Simulation and Computation 27 (2):357–75. doi: 10.1080/03610919808813485. [DOI] [Google Scholar]
- Tu XM, Zhang J, Kowalski J, Shults J, Feng C, Sun W, and Tang W. 2007. Power analyses for longitudinal study designs with missing data. Statistics in Medicine 26 (15):2958–81. doi: 10.1002/sim.2773. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

