Summary
Cronbach Coefficient Alpha (CCA) is a classic measure of item internal consistency of an instrument and is used in a wide range of behavioral, biomedical, psychosocial, and health-care related research. Methods are available for making inference about one CCA or multiple CCAs from correlated outcomes. However, none of the existing approaches effectively address missing data. As longitudinal study designs become increasingly popular and complex in modern-day clinical studies, missing data has become a serious issue, and the lack of methods to systematically address this problem has hampered the progress of research in the aforementioned fields. In this paper, we develop a novel approach to tackle the complexities involved in addressing missing data (at the instrument level due to subject dropout) within a longitudinal data setting. The approach is illustrated with both clinical and simulated data.
Keywords: Cronbach Coefficient Alpha, Inverse probability weighting, Missing data, Monotone missing data pattern, U-statistics
1 Introduction
Measurement error models play a particularly important role in psychosocial research, as many quantities of interest are latent constructs and not directly observable. Outcomes for measuring such constructs are often derived based on instruments (questionnaires) that delve into multifaceted dimensions of an individual’s mental and physical health. For most instruments, items are scored on a Likert scale such as 0, 1, 2, 3, 4 and then totaled either across all items or a subset of items from a subscale to provide an assessment of the latent construct or a sub-domain of interest. An elegant measure to ensure that the different items in an instrument or a sub-scale capture a large amount of variability of either a uni-dimensioinal construct or multi-dimensional constructs sharing a common source is the Cronbach Coefficient Alpha (CCA). CCA ranges between 0 and 1, with larger values indicating higher degrees of consistency across the different items in terms of capturing a common source of variation [1, 2].
Methods are available for making inference about a single CCA and multiple dependent CCAs, with the latter based on data from the same subjects [3]–[10]. However, as these methods are derived based on joint multivariate normal distributions of item scores, they are sensitive to departures from such parametric distributional assumptions. Since item scores from most instruments in psychosocial research are based on Likert-type scales, they are intrinsically discrete and modeling them using parametric distributions for continuous outcomes such as the multivariate normal is theoretically flawed, especially when modeling subscales of an instrument with a very limited numerical range. Further, as these methods for comparing multiple CCAs from either several instruments when applied to a common group of subjects or subscales of an instrument are developed based on statistics for omnibus tests of overall homogeneity involving all the CCAs, they do not provide explicit asymptotic joint distributions of estimates of individual CCAs. As a result, they cannot be applied to test the equality of some combination of the CCAs, which is particularly common in longitudinal studies. For example, we may want to know whether the CCA for an instrument has dropped by 30% from baseline to some follow-up visit in a longitudinal study. None of these methods can be applied to address such a hypothesis.
As longitudinal study designs become increasingly popular and complex in modern-day studies, missing data are inevitable due to loss to follow up, making inference more challenging for analysis of longitudinal CCAs. Although a variety of methods are available for addressing missing data, none applies directly to the current setting involving inference for comparing multiple CCAs within a longitudinal data setting.
In this paper, we develop a novel approach to tackle the complexities involved in addressing missing data within a longitudinal and clustered data setting. The approach is illustrated with both clinical and simulated data.
2 Modeling CCA for Longitudinal Data
2.1 Complete Data
Consider an instrument consisting of K items, which is administered to n subjects in a longitudinal study with T assessments. Let yitk denote the item score to the kth item from the ith subject at time t and let
Also, let
where 1 denotes a K × 1 column vector of 1’s. The Cronbach Coefficient Alpha α can be expressed as:
Let
| (1) |
Then, it is readily checked that
| (2) |
is a one-sample, vector-valued U-statistic and E (θ̂) = θ [11, chap. 3]. By applying the theory of multivariate U-statistics [12, chap. 5], and the Delta method, α̂ = f (θ̂) b is a consistent and asymptotically normal estimate of α. We summarize the asymptotic properties of α̂ in a theorem below.
Theorem 1
Let Σθ = 4V ar (E (hij | yi)). Then, under mild regularity conditions, we have:
-
α̂ is consistent and asymptotically normal:
(3) where →p and →d denote convergence in probability and distribution, respectively.
A consistent estimate of the asymptotic variance is obtained by substituting consistent estimates of the respective components of Σα; if Σ̂θ is a consistent estimate of Σθ, a consistent estimate of Σα is given by: .
The proof of Theorem 1 is not provided since these conclusions follow as a special case from the more general results concerning the missing data case to be discussed in Section 2.2.
To find a consistent estimate of Σθ, first note that:
| (4) |
Thus, a consistent estimate of Σθ is given by:
| (5) |
where Ê (hij | yi) denotes E (hij | yi) given in (4) with estimated θ̂ and substituting in place of θ and μ.
We can use Theorem 1 to test linear contrasts of the form:
where K is some p × T full rank matrix of known constants. For inference, we can use the Wald statistic, , which has an asymptotic central distribution with p degrees of freedom under the null H0.
2.2 Missing Data
In the presence of missing data, one approach is to apply the complete-data approach above to the subsample consisting of those subjects with complete data. However, in most longitudinal studies, missing data do not occur completely at random, as they often result from subjects’s deteriorated/improved health and other related conditions in response to the treatment. In such cases, missing data are predicted by observed responses and as a result, the missing completely at random (MCAR) assumption does not apply [13]. Thus, such a complete-data approach not only reduces power, but also likely yields biased estimates. A better alternative is to include all available data and to address the inherent missing data problem.
To this end, we assume that missing data only occurs at the instrument level and define a vector of binary variables for indicating missing (or rather observed) instrument-level response as follows:
| (6) |
Let
Define a new estimate of α with the form as before α̂ = f (θ̂), but with redefined by:
| (7) |
Note that although hijt in (7) is not defined if one of the yit and yjt is missing, θ̂t is still well-defined since hijt can be assigned any value in such cases without affecting θ̂t. The estimate θ̂t in (7) may be viewed as a generalization to a U-statistics setting of the classic inverse probability weighted (IPW) estimate for distribution-free inference about standard statistical models such as the weighted generalized estimating equations (WGEE) [14].
First, assume that Δi is known. Then, it can be shown that θ̂ is both consistent and asymptotically normal. Thus, by the Delta method, α̂ = f (θ̂) is a consistent and asymptotically normal estimate of α. We summarize these results in a theorem below, with a proof sketched in Appendix A1.
Theorem 2
Let
| (8) |
Then,
-
α̂ is consistent and asymptotically normal,
(9) The asymptotic variance Σα can be estimated by the Delta method with a consistent estimate Σ̂θ given by:
| (10) |
This case with known Δi may arise if data are missing by designs, as in some multi-stage trials in which we may want to over-sample those subjects with larger outcomes at a prior stage to participate in the current stage. For example, in a two-stage study with n subjects at stage one, we can model the selection probability Δi2 for stage two using the following logistic regression:
| (11) |
where ri2 denotes the selection indicator (1 for selected) and yi1 the outcome from the ith subject in the first stage. In the above model, β0 and β1 (≥ 0) determines the selection probability for subjects with outcomes yi1 to participate in the second stage.
In most applications, Δi is unknown and must be estimated. Under MCAR, ri is independent of yi. Consequently, Δi is functionally independent of yi and is readily estimated by the respective sample moments: . When Δi becomes dependent on yi, it is necessary to model Δi as a function of yi. However, it is difficult to model such a relationship without imposing some additional assumptions on the relationship between the occurrence of missing data and outcome [13]. As in the literature, we focus on the missing at random (MAR) assumption, in which case the occurrence of missing data only depends on the observed response.
Let denote the observed responses for the ith subject. Then, under MAR,
| (12) |
As is the sub-vector of yi containing the non-missing responses, Δit above is essentially a function of the missing data patterns across the subjects. Since there are potentially a total of 2T different patterns, it is generally not feasible to model and estimate Δit in most real studies unless there is a certain structure in the patterns. Fortunately, in most longitudinal trials, missing data often occur as the result of subject dropout due to deteriorated/improved health and other related conditions, exhibiting the so-called monotone missing data pattern (MMDP). The structured MMDP reduces the number of missing data patterns to T, making it possible to model such a dependence with limited data in most studies.
We assume no missing data at baseline t = 1 so that ri1 = 1. Under MMDP, if yit is observed at t, yis is then observed at all earlier times s < t. Let denote the sub-vector of yi containing responses up to and including time t − 1 (2 ≤ t ≤ T ). Then, under MAR, it follows from (12) that
| (13) |
To estimate Δit, we first model the one-step transition probability of the occurrence of missing data using logistic regression:
| (14) |
In many applications, the limited data may prevent us from reliably estimating β, especially with a large K. In such cases, we may simplify the above model by substituting the mean item score in place of yis, yielding:
| (15) |
Now by invoking MMDP and the assumption of no missing data at t = 1, we obtain:
| (16) |
Note that by setting βt1 = 0 in (14) or (15), we can use (16) to estimate Δit under the MCAR assumption, which is equivalent to estimating Δit using the sample moments, , as discussed earlier.
When Δi is estimated by the logistic regression model in (14), Δi (β̂) becomes subject to the sampling variability of β̂. Since Theorem 2 treats Δi as a known constant, we must account for this extra variability for correct inference about α.
Suppose that β in the logistic model in (14) is estimated by maximum likelihood. Then, β̂ is the solution to the following score equations:
| (17) |
When using the estimated β̂, the estimate defined in (7) becomes
| (18) |
Thus, we must also include the variability of β̂ in the asymptotic variance of θ̂(β̂). By utilizing the properties of score equations and the projection-based U-statistics asymptotic expansion, we can derive the asymptotic variance to take into account this extra variability. We summarize the results below with a justification given in Appendix A2.
Theorem 3
Let
| (19) |
Then, under the assumptions of Theorem 2,
-
α̂ is consistent and asymptotically normal, with the asymptotic variance,
(20) where Σθ is given in (8).
A consistent estimate of ΣαΦ is , where Σ̂θ is given in (10) and Φ̂ is obtained by substituting moment estimates in place of H, C, and .
It is seen from (20) that when the weight function is estimated, the asymptotic variance of α̂ has an additional component Φ, which reflects the sampling variability in β̂.
Although derived from a longitudinal data setting, the results in the theorems can also be applied to cross-sectional study data. For example, most of the available methods have been developed to compare whether several CCAs either from different instruments or from multiple domains, or subscales of an instrument are identical [3]–[10]. By identifying T as the number of instruments or domains of an instrument, yitk then represents the score from the ith subject in response to the kth item within the tth domain. We can use Theorem 1 for inference about such an omnibus hypothesis if there is no missing data. If missing data do occur at the domain level, we can still model the missing data using the above approaches. Although MCAR presents no additional problem, applications under MAR to the current context are generally more difficult, since it may not be possible to identify a simple structure to model the missing data indicators in (6) such as the MMDP for longitudinal data.
3 Application
We illustrate the approach with both clinical and simulated data. We first present a clinical application and then follow up with investigations of the performance of the approach under small to moderate samples by simulation. In all of the examples, we set the statistical significance at α = 0.05. All analyses are carried out using a package we have developed for implementing the proposed approach using the R software platform (R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org)
3.1 Clinical Study
A study on the benefits of total knee arthroplasty (TKA) was recently conducted in the Hospital for Special Surgery in 2008 (“Early clinical and radiographic results of the standard posterior stabilized and high flex polyethylene inserts in the genesis II total knee replacement system: A pilot study” was approved by the Institutional Review Board of the Hospital for Special Surgery). A modified version of the Knee Society rating System (KSS) [15] was used to report results for patients who underwent TKA at three visits including pre-operation, 6 weeks, and 4 months post-operation. For illustration purposes, we focused on a total of 117 patients who had observed data at the pre-op in the domain of the patient reported portion of the KSS that consists of 5 items measuring knee functions in walking (a 6-level ordinal scale), going up stairs (a 4-level ordinal scale), going down stairs (a 4-level ordinal scale), getting out of chair (a 4-level ordinal scale), and necessity of support for walking (a 3-level ordinal scale), respectively.
In this sample, 20 of the 117 patients were lost to follow-up, yielding a monotone missing data pattern. Let yit = (yit1, yit2,…, yit5)⊤ denote the 5 item scores at time t with t = 1 denoting pre-op, and t = 2, 3 corresponding to the two post-op assessments at 6 weeks, and 4 months, respectively. The missingness at time t was modeled using the logistic regression model in (15), with the predictor, ỹit = ȳi(t−1), the mean score of yi(t−1), and baseline covariates including age, sex, and body mass index (BMI):
| (21) |
Shown in Table 1 are the estimates of parameters from the above model, their standard errors, and corresponding p-values. Missingness at 4 months post-op was dependent on the observed KSS at 6 weeks post-op (p-value=0.043). There was no evidence indicating the dependence of missingness at 6 weeks on the observed KSS at pre-op. The occurrence of missing data was not highly associated with any of the baseline covariates. Given these results, we proceeded with all subsequent analyses under MAR with inference based on Theorem 3.
Table 1.
Estimates of logistic regression model for missingness under MMDP for the Total Knee Arthroplasty Study.
| Assessment time | Predictors | Estimates | Standard errors | P-values |
|---|---|---|---|---|
| 6 weeks post-op (t = 2) | Intercept | −0.725 | 6.337 | 0.909 |
| ỹi2 | 1.951 | 1.251 | 0.119 | |
| female | 1.411 | 1.073 | 0.189 | |
| age | −0.019 | 0.06 | 0.749 | |
| BMI | 0.004 | 0.094 | 0.967 | |
| 4 months post-op (t = 3) | Intercept | −5.974 | 3.363 | 0.076 |
| ỹi3 | 1.699 | 0.842 | 0.043 | |
| female | −0.136 | 0.729 | 0.853 | |
| age | 0.028 | 0.03 | 0.346 | |
| BMI | 0.034 | 0.046 | 0.464 |
Shown in Table 2 are the estimated αt’s and their standard errors over time, and the Wald test statistic and associated p-value under H0 of equal CCAs over all assessments computed under MAR. The estimates seem to show a drop in CCA at the first 6 weeks postoperative visit, but the Wald test indicates that this drop is not statistically significant (p-value=0.54). The early postoperative course of patients who underwent a TKA is generally different across patients, with considerable variations in symptoms, range of motion, and functions. These clinical differences are likely responsible for the drop of CCA at the first postoperative visit. As most patients are fully recovered 4 months after the surgery, the variability in such outcomes will reduce substantially, making the CCA bounce back to the level at the pre-operative visit. Thus, despite the large variation in patients’clinical outcomes over the course of surgery, the KSS questionnaire seems to remain internally consistent so that the total score can be used to provide a reliable measure to assist objective evaluation of postoperative total knee arthroplasty.
Table 2.
Estimates of CCA over time, test statistics and p-values for testing the null of equal CCA over time under MAR for the Total Knee Arthroplasty Study.
| Estimates of CCA (standard error) | |||
|---|---|---|---|
| Pre-OP (t = 1) | 6 weeks Post-OP (t = 2) | 4 months Post-OP (t = 3) | |
| 0.721(0.043) | 0.646(0.087) | 0.748(0.034) | |
| Hypothesis testing H0: α1 = α2 = α3 | |||
|
|
p-value | ||
| 1.233 | 0.54 | ||
3.2 Simulation Study
3.2.1 Simulation Method
A limited simulation study was conducted to examine the empirical type I error rate for testing the null of equal CCAs over time based on a 5-item, Likert-scale questionnaire under a longitudinal study design with three assessments for four sample sizes–30, 50, 75, and 100–under complete, and missing data with MCAR and MAR. For each sample size, we generated observers’ratings over time yit = (yit1, yit2,…,yit5)⊤ under the null, H0: αt = α, with αt denoting the CCA at time t (1 ≤ t ≤ 3). We tested the null using the Wald statistic and estimated the type I error rate based on the empirical distribution of the test statistic obtained from 1,000 Monte Carlo (MC) replications.
We created the data yit over three assessments in two steps. First, we generated continuous outcomes over time, zit = (zit1, zit2,…,zit5)⊤, by simulating a 15 × 1 vector, , from a 15-variate normal distribution with mean 0 and variance matrix Σ = Σ2 ⊗ Σ1, where ⊗ denotes the Kronecker product and Σk are defined by:
In the above, Σ1 denotes the within-visit, between-item correlation matrix, ARm (1, ρbi) represents a m × m first-order autoregressive correlation structure with auto-correlation ρbi, Σ2 is the within-subject correlation matrix, and Cm (ρbt) denotes a m × m compound symmetry correlation matrix with correlation ρbt. For the simulation study, we set ρbi = 0.6 and ρbt = 0.5, yielding the true CCAs over time, αt = α = 0.742 (1 ≤ t ≤ 3). Next, we transformed each of the continuous observer score zitk into a 5-level categorical response by:
where Φ−1 denotes the inverse of the CDF of the standard normal.
For a given sample size n, let be the Wald statistic from the jth MC replication and G0.95 the 95th percentile of the distribution with 2 degrees of freedom. The empirical size is calculated as .
For the missing data case, we assumed no missing data at baseline t = 1 and simulated the missing response according to a MCAR and MAR model. For MAR, we considered the MMDP model, and simulated the missing data indicators at each time t, rit, from a Bernoulli distribution with the probability of success equal to pit (t = 2, 3). We specified the one-step transition probabilities pit at time t according to the logistic model in (15) under the Markov assumption, which in this case had the following form:
where ȳi(t−1) denotes the mean of yit. We fixed βt1 = 0.3 (t = 1, 2) and solved the following equations for β20 and β30 to generate missing response rates of approximately 10% and 15% for times 2 and 3, respectively:
| (22) |
To ensure that the missing data indicators rit followed the MMDP, we further imposed the following restrictions, ri3 = ri3 × ri2 (1 ≤ i ≤ n). For MCAR, the same approach above was used except for setting βt1 = 0 in (22) (t = 1, 2).
In addition to examining type I error rates, power analysis for testing the null of equal CCAs over time was conducted in a second simulation study within the same longitudinal data setting under complete data, MCAR, and MAR. The true alpha was set at α1 = 0.7, α2 = 0.803, and α3 = 0.898 for times 1, 2, and 3, respectively. The power was estimated by .
3.2.2 Simulation Result
We computed the estimate α̂ of α = (α1, α2, α3)⊤ and estimated its asymptotic variance Σ̂αΦ using Theorems 3. Shown in Table 3 are the averaged α̂ over 1,000 Monte Carlo (MC) replications, along with the empirical variance V ar (α̂). The type I error rates were obtained from . Under complete data, MCAR, and MAR, the averaged α̂ seemed to have some downward bias, especially for small samples n = 30, 50 under missing data. The asymptotic standard errors based on Σ̂αΦ were almost identical to their empirical counterparts, V ar (α̂), across samples under both complete and missing data. Although the empirical type I error rates δ̂ were slightly higher than the nominal value 0.05 at n = 30, the upward bias seemed to have rapidly disappeared as the sample size increased.
Table 3.
Averaged α̂ and Σ̂αΦ over 1,000 MC replications along with estimated empirical variance of α̂ and empirical type I error rate δ̂ under complete data, MCAR, and MAR with the true αt = α = 0.742 (t = 1, 2, 3).
| Sample size | 30 | 50 | 75 | 100 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Assessment time | ||||||||||||
| 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | |
| Complete data | ||||||||||||
| α̂ | 0.725 | 0.729 | 0.725 | 0.73 | 0.732 | 0.731 | 0.735 | 0.736 | 0.734 | 0.696 | 0.800 | 0.891 |
| V ar (α̂) | 0.008 | 0.007 | 0.008 | 0.004 | 0.004 | 0.004 | 0.003 | 0.002 | 0.003 | 0.003 | 0.001 | 0.001 |
| Σ̂αΦ | 0.007 | 0.006 | 0.007 | 0.004 | 0.004 | 0.004 | 0.002 | 0.002 | 0.002 | 0.002 | 0.001 | 0.001 |
| δ̂ | 0.071 | 0.055 | 0.052 | 0.049 | ||||||||
| MCAR | ||||||||||||
| α̂ | 0.725 | 0.726 | 0.722 | 0.73 | 0.731 | 0.729 | 0.735 | 0.736 | 0.733 | 0.738 | 0.736 | 0.736 |
| V ar (α̂) | 0.008 | 0.009 | 0.009 | 0.004 | 0.004 | 0.005 | 0.003 | 0.003 | 0.003 | 0.002 | 0.002 | 0.002 |
| Σ̂αΦ | 0.007 | 0.009 | 0.01 | 0.004 | 0.005 | 0.005 | 0.002 | 0.003 | 0.003 | 0.002 | 0.002 | 0.002 |
| δ̂ | 0.059 | 0.052 | 0.047 | 0.051 | ||||||||
| MAR | ||||||||||||
| α̂ | 0.725 | 0.725 | 0.721 | 0.73 | 0.731 | 0.729 | 0.735 | 0.736 | 0.732 | 0.738 | 0.736 | 0.735 |
| V ar (α̂) | 0.008 | 0.009 | 0.01 | 0.004 | 0.004 | 0.005 | 0.003 | 0.003 | 0.003 | 0.002 | 0.002 | 0.002 |
| Σ̂αΦ | 0.007 | 0.009 | 0.01 | 0.004 | 0.005 | 0.005 | 0.002 | 0.003 | 0.003 | 0.002 | 0.002 | 0.002 |
| δ̂ | 0.061 | 0.056 | 0.056 | 0.052 | ||||||||
Shown in Table 4 are the averaged α̂ over 1,000 Monte Carlo (MC) replications, the empirical variance V ar (α̂), the model based asymptotic variance Σ̂αΦ, and estimated power Λ̂. Similar patterns in power were found across complete data, MCAR, and MAR. Although the estimated power was low at n = 30, it increased rapidly as the sample size increased.
Table 4.
Averaged α̂ and Σ̂αΦ over 1,000 MC replications along with estimated empirical variance of α̂ and power Λ̂ under complete data, MCAR, and MAR with the true α1 = 0.7, α2 = 0.803, and α3 = 0:898, at t = 1, 2, 3, respectively.
| Sample size | 30 | 50 | 75 | 100 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Assessment time | ||||||||||||
| 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | |
| Complete data | ||||||||||||
| α̂ | 0.681 | 0.791 | 0.897 | 0.686 | 0.797 | 0.885 | 0.691 | 0.799 | 0.891 | 0.696 | 0.800 | 0.891 |
| V ar (α̂) | 0.011 | 0.005 | 0.006 | 0.006 | 0.002 | 0.002 | 0.004 | 0.001 | 0.001 | 0.003 | 0.001 | 0.001 |
| Σ̂αΦ | 0.009 | 0.004 | 0.003 | 0.005 | 0.002 | 0.002 | 0.003 | 0.001 | 0.001 | 0.002 | 0.001 | 0.001 |
| Λ̂ | 0.464 | 0.662 | 0.839 | 0.919 | ||||||||
| MCAR | ||||||||||||
| α̂ | 0.681 | 0.789 | 0.866 | 0.686 | 0.796 | 0.884 | 0.691 | 0.799 | 0.889 | 0.696 | 0.799 | 0.891 |
| V ar (α̂) | 0.011 | 0.006 | 0.008 | 0.006 | 0.002 | 0.003 | 0.004 | 0.002 | 0.002 | 0.003 | 0.001 | 0.001 |
| Σ̂αΦ | 0.009 | 0.005 | 0.006 | 0.005 | 0.003 | 0.002 | 0.003 | 0.002 | 0.002 | 0.002 | 0.001 | 0.001 |
| Λ̂ | 0.42 | 0.628 | 0.786 | 0.908 | ||||||||
| MAR | ||||||||||||
| α̂ | 0.681 | 0.789 | 0.868 | 0.686 | 0.796 | 0.884 | 0.691 | 0.799 | 0.889 | 0.696 | 0.799 | 0.891 |
| V ar (α̂) | 0.011 | 0.006 | 0.007 | 0.006 | 0.003 | 0.003 | 0.004 | 0.002 | 0.002 | 0.003 | 0.001 | 0.001 |
| Σ̂αΦ | 0.009 | 0.005 | 0.006 | 0.005 | 0.003 | 0.002 | 0.003 | 0.002 | 0.001 | 0.002 | 0.001 | 0.001 |
| Λ̂ | 0.413 | 0.624 | 0.795 | 0.902 | ||||||||
We repeated the simulation for a larger number of items, k = 10. Compared to the results for k = 5 items, type I error rates are closer to the nominal 0.05 even for the small sample size n = 30. Power analysis was also improved significantly by achieving 60% power for sample size n = 30, which jumped up to 85% for complete, and MCAR, and 83% for MAR at n = 50.
4 Discussion
We developed a U-statistics based approach to concurrently address missing data and stringent distribution assumptions when modeling the Cronbach Coefficient Alpha (CCA) within a longitudinal data setting. Unlike available methods, this approach requires no parametric distribution assumption on the item score and thus provides robust inference regardless of the data distribution. As longitudinal studies have become increasing popular in today’s research, the proposed approach fills a critical gap on the extant literature to address this timely data-analytic issue.
To demonstrate the robustness feature, we also conducted a simulation study to compare the U-statistic based approach with a popular method developed by Feldt et al [6] in a cross-sectional setting by considering the hypothesis H0: α = α0 vs. H0: α ≠ α0 under various sample sizes (n =30, 50, 100, and 200). Feldt et al [6] showed that the transformed alpha follows a F distribution under the assumed ANOVA model where items by subjects interaction effects and the residual errors are independent and normally distributed with homogeneous variance. To examine its performance under likert-type item scores, we generated skewed likert-type item scores with true alpha α0 = 0.693 and k = 5 items. Shown in Table 5 are the estimates of CCA and type I error rates for the proposed U-statistic based approach and their F-distribution based counterpart. Although the estimates of alpha, α̂U (U-statistics based) and α̂F (F-distribution based), are identical and close to the true alpha, the type I error rates are quite different between the two approaches. The type I error rate for the U-statistic based approach had some upward bias when the sample size was small, but rapidly decreased to the nominal level 0.05 as the sample size increased. In contrast, the type I error rate for the F-distribution based method remained upwardly biased despite the increase of the sample size. In fact, the type I error will remain upwardly biased even if the sample size approaches infinity, since the F-distribution is not the asymptotic distribution of the estimate. The observed difference in the behavior of type I error rates reflects the fact that the U-statistic based approach is a distribution-free procedure, yielding not only consistent estimates of CCA but also valid inference regardless of the distribution of the item score, while the inference of the F-distribution based method is based on the normal distribution and is not robust against departures from this underlying assumption.
Table 5.
Estimates of type I error rate and CCA for the U-statistic based and F-distribution based approaches.
| Sample size | Estimates of CCA | Type I error |
|---|---|---|
| N=30 | α̂U = 0.664 | δ̂U = 0.105 |
| α̂F = 0.664 | δ̂F = 0.108 | |
| N=50 | α̂U = 0.675 | δ̂U = 0.09 |
| α̂F = 0.675 | δ̂F = 0.11 | |
| N=100 | α̂U = 0.687 | δ̂U = 0.054 |
| α̂F = 0.687 | δ̂F = 0.099 | |
| N=200 | α̂U = 0.689 | δ̂U = 0.052 |
| α̂F = 0.689 | δ̂F = 0.116 |
Modeling CCA presents a major challenge under the popular mean response based regression paradigm as typified by the weighted generalized estimating equations (WGEE) due to the complexity in the analytic expression of this coefficient involving second-order moments. Although GEE II may be used to model the required second-order moments [12, 16, 17], it leads to more cumbersome algebra. Further, as it requires modeling more parameters, GEE II is generally not as efficient as the proposed U-statistics based approach.
The proposed IPW-based estimate may not be fully efficient since information from the subjects with missing data is only used to estimate the weight function. In many situations, reliable models may also exist for directly modeling the relationship between (missing) response and observed covariates or other auxiliary information. In such cases, we may combine the proposed IPW with this extra model to develop a double robust estimate, which not only ensures consistency if only one of the models–the model for missing data or the one for the auxiliary information–is correct, but also improves efficiency [14, 18].
In addition to CCA, the proposed approach is similarly applied to modeling other measures of agreement for continuous outcome such as the closely related intraclass correlation coefficient (ICC) [19]–[22]. In addition, missing data may also occur at the item level which can become complicated depending on the nature of the question and on the design of the study. For example, person-to-person interviews often result in more missing data and/or unreliable responses compared to questionnaires as interviewees may feel uncomfortable and refuse to answer certain types of questions in a face to face meeting. As missing responses arising from such sensitive questions likely follow the non-ignorable missing data rather than the missing at random (MAR) mechanism as in longitudinal studies, they are best addressed up front by study designs, since the former missingness mechanism is more difficult to model as compared to the MAR mechanism. Work is currently underway to generalize the proposed approach to address these limitations as well as to develop double robust estimates within a longitudinal setting.
Acknowledgments
This research is supported in part by NIH grant U54 RR023480. Dr. Ma was partially supported by the following grants: Center for Education and Research in Therapeutics (CERTs) (AHRQ RFA-HS-05-14) and Clinical Translational Science Center (CTSC) (UL1-RR024996). We sincerely thank Ms. Cheryl Bliss-Clark at the University of Rochester, an Editor and two anonymous reviewers for their constructive comments to improve the presentation of the manuscript.
Appendix
A1. Proof of Theorem 2
We first establish a lemma.
Lemma
Let
where vijt are defined in (8). Then, E (Un) = 0 and
| (23) |
Proof of Lemma
It is readily checked by the iterated conditional expectation that [12, chap. 1]
| (24) |
It follows that E (Un) = 0. Since vkjt = vjkt (j ≠ k), we have
Let . It then follows that
Thus, the projection of Ut,n is given by [12, chap. 5]
Since Ũt,n is a sum of independently and identically distributed random variables, it follows from the central limit theorem (CLT) that
| (25) |
By the theory of U-statistics (e.g. [12, chap. 5], Ũt,n and Ut,n have the same asymptotic distribution and thus
The lemma follows by applying a similar argument to the vector Un.
Proof of Theorem 2
Let . By an argument similar to (24), we have, E (gijt) = θt. It then follows from the theory of U-statistics that
Thus, by Slutsky’s theorem, α̂t = f (θ̂t) is consistent. Further, by applying the Lemma and Slutsky’s theorem, we obtain the asymptotic distribution of α̂t:
Similarly, by considering the vector θ̂, we obtain
Theorem 2/(a) follows by applying the Delta method to ρ̂ = f (θ̂).
To show (10), first note that
Further, we have
Thus, Theorem 2/(b) follows by taking the covariance between ṽit and ṽit′ with μt substituted by its consistent estimate in (10).
A2. Proof of Theorem 3
As noted in Section 2.2, β̂ is the solution to the score equations in (17). From the properties of score equations, we have:
| (26) |
where H is given in (19) and op (1) denotes the stochastic o (1) [12, chap. 1]. It follows from (25), (26), and the projection-based U-statistics asymptotic expansion that [12, chap. 5]
| (27) |
where C is given in (19). Theorem 3/(a) follows from (27), while (b) is obtained by an application of Slutsky’s theorem.
References
- 1.Cronbach LJ, Gleser GC, Nanda H, Rajaratnam N. The Dependability of Behavioral Measurements. Wiley; New York: 1972. [Google Scholar]
- 2.Nunnally JC, Bernstein IH. Psychometric Theory. McGraw Hill; New York: 1994. [Google Scholar]
- 3.Duhachek A, Iacobucci D. Alpha’s Standard Error (ASE): An accurate and precise confidence interval estimate. Journal of Applied Psychology. 2004;89:792–808. doi: 10.1037/0021-9010.89.5.792. [DOI] [PubMed] [Google Scholar]
- 4.Feldt LS. The approximate sampling distribution of the Kuder-Richardson reliability coefficient twenty. Psychometrika. 1965;34:363–373. doi: 10.1007/BF02289499. [DOI] [PubMed] [Google Scholar]
- 5.Feldt LS. A test of the hypothesis that Cronbach’s alpha reliability coefficient is the same for two tests administered to the same sample. Psychometrika. 1980;45:99–105. [Google Scholar]
- 6.Feldt LS, Woodruff DJ, Salih FA. Statistical inference for coefficient alpha. Applied Psychological Measurement. 1987;11:93–103. [Google Scholar]
- 7.Feldt LS, Kim S. Testing the difference between two alpha coefficients with small samples of subjects and raters. Educational and Psychological Measurement. 2006;66:589–600. [Google Scholar]
- 8.Woodruff DJ, Feldt LS. Tests for equality of several alpha coefficients when their sample estimates are dependent. Psychometrika. 1986;51:393–413. [Google Scholar]
- 9.Alsawalmeh YM, Feldt LS. Testing the equality of independent alpha coefficients adjusted for test length. Educational and Psychological Measurement. 1999;59:373–383. [Google Scholar]
- 10.van Zyl JM, Neudecker H, Nel DG. On the distribution of the maximum likelihood estimator of Cronbach’s Alpha. Psychometrika. 2000;65:271–280. [Google Scholar]
- 11.Randels RH, Wolfe DA. Introduction to the Theory of Nonparametric Statistics. Wiley; New York: 1979. [Google Scholar]
- 12.Kowalski J, Tu XM. Modern Applied U Statistics. Wiley; New York: 2007. [Google Scholar]
- 13.Little RJA, Rubin DB. Statistical Analysis with Missing Data. Wiley; New York: 1987. [Google Scholar]
- 14.Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. JASA. 1995;90:106–121. [Google Scholar]
- 15.Insall JN, Dorr LD, Scott RD, Scott WN. Rationale of the knee society clinical rating system. Clinical Orthopaedics and Related Research. 1989;248:13–14. [PubMed] [Google Scholar]
- 16.Prentice RL. Correlated binary regression with covariates specific to each binary observation. Biometrics. 1988;44:321–327. [PubMed] [Google Scholar]
- 17.Prentice RL, Zhao LP. Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics. 1991;47:825–839. [PubMed] [Google Scholar]
- 18.Tsiatis AA. Semiparametric Theory and Missing Data. Springer; New York: 2006. [Google Scholar]
- 19.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin. 1979;86:420–428. doi: 10.1037//0033-2909.86.2.420. [DOI] [PubMed] [Google Scholar]
- 20.McGraw KO, Wong SP. Forming inferences about some Intraclass Correlation Coefficients. Psychological Methods. 1996;1:30–46. [Google Scholar]
- 21.Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. 3. Wiley; New York: 2003. [Google Scholar]
- 22.Landis JR, Koch GG. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics. 1977;33:363–374. [PubMed] [Google Scholar]
