Abstract
In a longitudinal study subjects are followed over time. I focus on a case where the number of replications over time is large relative to the number of subjects in the study. I investigate the use of moving block bootstrap methods for analyzing such data. Asymptotic properties of the bootstrap methods in this setting are derived. The effectiveness of these resampling methods is also demonstrated through a simulation study.
Keywords: Longitudinal study, Moving block bootstrap
1. Introduction
Many longitudinal designs are the case that the number of subjects n is large and the number of replications mi is bounded. Liang and Zeger (1986) proved that the generalized estimating equation (GEE) estimator is consistent and asymptotically normal with misspecification of covariance parameters. That asymptotic property is considered when the number of subjects n goes to infinity and mi is bounded. Xie and Yang (2003) proved the almost sure existence and strong consistency of GEE estimators.
Alternatively in this article, I will consider the case of the longitudinal design in which the number of subjects n is bounded and the number of replications m, the same number for all subjects, is large. The model on which we focus is given by
(1) |
where i = 1, n, …, xi is m × p design matrix, β is p × 1 vector of unknown parameters, yi = (yi1, …, yim)′, and ei = (ei1, …, eim)′.
I focus on the following estimating equation. The estimator β̂ is called a regression estimator if it solves
(2) |
The repeated observations are correlated for each subject. Since bootstrap methods that resample small subjects or resample observations independently may not work well, we will investigate the moving block bootstrap method developed for time-series-correlated data.
The remaining part of this paper is organized as follows. In Sec. 2, we present some literature review. In Sec. 3, we explore the use of moving block bootstrap for analyzing longitudinal data. In Sec. 4, we demonstrate the moving block bootstrap justification in longitudinal data theoretically and empirically. In Sec. 5, we conclude the paper with a brief discussion.
2. Literature Review
Efron (1979) introduced the bootstrap procedure for estimating sampling distributions of statistics based on independent and identically distributed (i.i.d. observations. It is well known, in the i.i.d. setup, that the bootstrap often offers more accurate approximations than classical large sample approximations, e.g., Singh (1981) and Babu (1986). However, when the observations are not necessarily independent, the classical bootstrap no longer succeeds, as shown by Singh (1981).
In a time series case, Lahiri (1996) applied the moving block bootstrap method to multiple linear regression models
(3) |
Where xj's are known p × 1 vectors, β is a p × 1 vector of parameters, and ∊1, ∊2, …, ∊m are stationary, strongly mixing random variables. If β̂m is an M-estimator of β corresponding to some score function ϕ, under some conditions, a two-term Edgeworth expansion for studentized multivariate M-estimator was observed. Also, Lahiri (1996) showed that the block bootstrap has a second-order correctness for some suitable bootstrap analogs of studentized β̂m.
Hall et al. (1995) showed that the optimal asymptotic rate of the block size for the moving blocks method depends significantly on context, being equal to m1/3, m1/4, and m1/5 in the cases of variance or bias estimation, estimation of a one-sided distribution function, and estimation of a two-sided distribution function, respectively. The latter two quantities are needed for the construction of equal-tailed and symmetric confidence intervals, respectively. Hall et al. (1995) present a practical rule for selecting the block size empirically. It is based on the fact that the asymptotic formula is b ∼ Cm1/k, where k = 3, 4, or 5 is known, and C is a constant that depends on the underlying process. The rule suggested provides a way for estimating the optimal block for a time series of smaller length than the original.
Paparoditis and Politis (2002) presented a new block bootstrap variation, the tapered block bootstrap, which is applicable in the general time series case of approximately linear statistics. The asymptotic validity and the favorable bias properties of the tapered block bootstrap are shown in two important cases: smooth function of means and M-estimators.
3. Moving Block Bootstrap for Longitudinal Data
The major drawback with model-based resampling is that in practice not only the parameters of a model, but also its structure, must be identified from the data. If the chosen structure is incorrect, the resampled series will be generated for the wrong model, and they will not have the same statistical properties as the original data. The model-based approach is inconsistent if the model used for resampling is misspecified.
The moving bootstrap involves resampling possibly overlapping blocks. The mixed block bootstrep (MBB) does not force one to select a model and the only parameter required is the block length. If the block is long enough the original dependence will be reasonably preserved in the resampled series. This approximation is better if the dependence is weak and the blocks are as long as possible, thus preserving the dependence more faithfully. But the distinct values of the statistics must be as numerous as possible to provide a good estimate of the distribution of the statistics and this points toward short blocks.
Unless the length of the series is considerable to accommodate longer and more number of blocks, the preservation of the dependence structure may be difficult, especially for complex, long-range dependent structures. In such cases, the block resampling scheme tends to generate resampled series that are less dependent than the original ones. Furthermore, the resampled series often exhibits artifacts that are caused by joining randomly selected blocks. Then, the asymptotic variance–covariance matrices of the estimators based on the original series and those based on the bootstrap series are different and a modification of the original scheme is needed. This suggests a strategy intermediate between model-based and block resampling. The idea comes from a procedure of pre-whitening the original dependent series by fitting a model and is intended to remove much of the dependence between the original observations. A resampling series is generated by block resampling of residuals from the simple fitted model, and the innovation series is then post-blackened by applying the simple estimated model to the resampled innovations. The post-blackened version works more consistently in practice (Davison and Hinkley, 1997).
Bühlmann (1997) suggested the sieve bootstrap that is model based. The autoregressive (AR)(p) model is just used to filter the residual series. If the model used in the sieve bootstrap is not appropriate, the resulting residuals cannot be treated as i.i.d. A hybrid approach between the model based method and moving block bootstrap, named post-blacken bootstrap, was suggested by Davison and Hinkley (1997). The procedure is similar to the sieve bootstrap, but the residuals from AR(p) model are not resampled in an iid manner but by using the MBB bootstrap. If some residual dependent structure is present in the AR residuals, this is kept from the blockwise bootstrap. The linear model is used to pre-whiten the series by fitting the model that is intended to remove much of the dependence present the observations. A series of innovations is then generated by block resampling of residuals obtained from the fitted model, and the innovation series is then post-blackened by applying the estimated model to the resampled innovations.
A block bootstrap algorithm in a longitudinal model is given as follows. We continue to assume (1) as our longitudinal model under consideration.
-
Let êij, i = 1, …, n0, j = 1, …, m be the residuals from the model fit.
(4) where β̂ is the ordinary least square estimate.
-
Now assuming that m = bk with b and k integers: Let denotes k uniform draws with replacement from the integers {0, 1, …, m−b}. These represent the starting point for each block of length b. A block bootstrap resample of residuals, , is defined by:
(5) -
The bootstrapped response, , are then generated from the estimated model with residuals and the original covariates:
(6) From the resampled responses, , and original covariates, we fit the model and obtain new parameter estimates.
Repeating steps (2) through (4) a large number, R, of times one obtains R bootstrap replicates from which features of the distribution of the parameter estimates can be estimated. In particular, the bootstrap variance estimates are simply variance of the B computed values for each parameter.
I consider the six different kinds of block bootstrap methods in a balanced longitudinal design in which the number of subjects is small and the number of replications is large:
Case 1, MBB1 (Within block bootstrap): For each i subject, we construct overlapping blocks with m − b + 1 blocks and block size b, i.e B1, …, Bm−b+1. Let us define m/b = k which is assumed to be an integer for simplicity, in general k = [m/b]. We can add the k blocks with replacement among B1, …, Bm−b+1. We get the with kb = m, and create from êi1, …, êim, where êij = yij − β̂0 − β̂1xij. We can add up to n0 individuals and plug this into the model and the results is a pseudo sample series . From the model , we fit model and produce new parameters and .
Case 2, MBB2 (Mixed block bootstrap): We have m − b + 1 blocks and block size b, i.e B1, …, Bm−b+1 and add up to n0 subjects. We sample n0k blocks with replacements among B1, …, Bn0(m−b+1). We construct with kb = m, and plug this into the model and obtain a pseudo series . Similarly, from the model , we fit the model and produce new parameters and .
-
Case 3, One-line moving block bootstrap: One can make up to one long series and perform the moving block bootstrap using a time series without splitting the different individual consecutive data.
Case 4, Standard bootstrap: This is a special case of b = 1 in MBB2.
Case 5, Resampling subject bootstrap: This is a special case of b = m in MBB2.
Case 6, Stratified standard bootstrap: This is a special case of b = 1 in MBB1.
4. Justification of Moving Block Bootstrap in Longitudinal Data
I consider the justification of moving block bootstrap in longitudinal data. We focus on the within block bootstrap method (MBB1) in the six different kinds of scenario in previous section. I follow the Lahiri's (1996) assumptions. Let's consider the relationship between the GEE and M-estimators. The robust approach can be extended to the regression setup to analyze a predictor−outcome relationship. Suppose we have model (1) with n = n0. The estimator β̂ is called a robust regression estimator or an M-estimator if it solves
(7) |
for some choice of function ψ(·).
4.1 Expansion for M-estimator
It is known that
(8) |
where Ip denotes the identity matrix of order p.
Let êij = yij − xijβ̂n0m denote residuals. Define
Also, let σi (k) = σ(k) and σ̂im(k) = σ̂m(k)
Assumption 1.
- (A.1)
- ψ is twice differentiable, and ψ″ satisfies a Lipschitz condition of order δ1 > 0,
- ψ, ψ′, ψ″ are bounded.
- (A.2)
- for each i Eψ(ei1) = 0, τ ≡ Eψ′ (ei1) ≠ 0,
- .
- (A.3) There exists ρ > 0 such that
- , for all k ≥ 1,
- for all r ≥ 1, and all k ≥ ρ−1, there exists a -measurable random variable ẽir,k such that E|eir − ẽir,k| ≤ ρ−1 exp(−ρk),
- for all r, k, q ≥ ρ−1 and , and
- for all r ≥ ρ−1, k ≤ r and all tr−k, …, tr+k ∈ ℝ with |tr| > ρ, .
(A.4) max{‖xij‖ : 1 ≤ j ≤ m} = O(1) and lim infm→∞m−1λm ≡ λ > 0, where λm denotes the smallest eigenvalue of .
Let and , 1 ≤ j ≤ m. When eij are weakly dependent for each i, the asymptotic covariance of matrix is given by
(9) |
where Li0m = Ip and , 1 ≤ k ≤ m − 1.
To define the studentized version of β̂n0m, note that the asymptotic matrix is given by
(10) |
Therefore, a natural estimator of Σn0m is
(11) |
where 1 ≤ l ≡ lm ≤ m − 1 is an integer. If l → ∞ slowly with m, then ‖Σ̂n0m − Σn0m‖ = op(1). Σ̂n0m is non singular with high probability for m large, and can be inverted to define the studentized statistic,
(12) |
Next, I extend Lahiri's (1996) results for longitudinal case: assume that (A.1),(A.2),(A.3)(i),(ii), and (A.4) hold. Then, there exists a sequence of statistics {β̂m} such that
(13) |
If we have a unique solution β̂n0m, then . When (7) has a unique solution, one can obtain the strong consistency of β̂n0m as in Lahiri (1992). The next result gives a first-order Edgeworth expansion for the studentized M-estimator.
Theorem 1. Assume that Assumptions (A.1)–(A.4) hold and that {β̂n0m} is a sequence of measurable solutions of (7). Then, there exist's a polynomial pm(·) on ℝp such that
(14) |
for every class ℬ of Borel subsets of ℝp satisfying
(15) |
Here ‖pn0mϕ‖∞ = O(m−1/2) with sup norm ‖‖∞, Φ denotes the standard normal distribution onℝp(p ≥ 1), and the coefficient of pn0m(·) are continuous functions of cross-product moments of ψ(eij), ψ′ (eij), and ψ″ (eij). Here ∂B denote's the boundary of a set B ⊆ ℝp and (∂B)∊ = {x : ‖x − y‖ < ∊ for some y ∈ ∂B}.
4.2. Expansion for Bootstrap M-estimator
Define the bootstrap M-estimator as a solution of the equation in t ∈ ℝp
(16) |
where and is given in (6). The is the conditional covariance matrix of which is given by
(17) |
The natural estimator of is
(18) |
where , . The bootstrap version of Tn0m is given by
(19) |
By assumptions, there exists a sequence of statistics such that
(20) |
Theorem 2. Assume that the conditions in Theorem 1. hold. Suppose that is defined for some measurable sequence satisfying (20) and also suppose that mδb−1 = O(1) and b = O(m(1−κ)/4) for some δ > 0, and κ > max{p + 3, 5}δ0. Then
(21) |
for any class ℬ of Borel subset of ℝp.
Theorem 2 shows that the MBB indeed provides more accurate approximation for studentized multivariate M-estimator of the regression parameter vector β than normal approximation. Consequently, Theorem 2 is useful for constructing second-order correct multivariate inference procedures for β under multiple regression model. The studentized moving block bootstrap statistics obtain the second-order accuracy for the bounded n = n0 case.
4.3. A Simulation Study
The block bootstrap captures the dependence in the series of residuals without the need to know the correlation structure. It can be simple and account for correlations in a regression model with correlated error. To obtain the bootstrap version of β̂n0m, first form the observed blocks of residual length b as ξih = (êij, …, êi(h+b−1)), 1 ≤ h ≤ q, where q = m − b + 1 and êij = yij − xijβ̂n0m, 1 ≤ j ≤ m, 1 ≤ i ≤ n. Next draw randomly, with replacements from ξi1, …, ξiq, where m/b = k is assumed to be an integer for simplicity. Note that each has b components. Denote the lth component of , 1 ≤ l ≤ b by . Also, set , 1 ≤ l ≤ b, and we have the bootstrap pseudo-observations
(22) |
Adapting Shorack's approach, we obtain the bootstrapped estimator as a solution of the equation t ∈ Rp,
(23) |
where , and En0m denotes the conditional expectations under the MBB resampling scheme, given ê11, …, ênm. Centering the above equation by μ̂n0m makes the estimating equation conditionally unbiased at β = β̂n0m and ensures the bootstrap analog. The bootstrap estimator is as follows:
(24) |
Consider the following specific model in simulation work:
(25) |
where eij = ϕei(j−1) + uij,
In particular, let n0 = 5, m = 20, and xij = (1,…, 20)′.
4.4. Bootstrapping the Distribution of Statistics
Let R be the number of bootstrap simulations (r = 1, …, R), and β̂* be the bootstrap estimate of β for the r samples. The important result is that the distribution of β̂*, estimated by the empirical distribution function of the β̂*,r(r = 1, …, R), approximates the distribution of β̂. Now the studentized statistic has the following form:
(26) |
The difference between the distribution functions of T̂ and T̂* tends to 0, when the number of observations is large; thus we can use the quartiles of T̂* instead of T̂ to construct intervals or tests. Let (T̂*,r, r = 1, …, R) be the rth sample of T̂*, where T̂* is calculated in the same way as T̂, replacing yij with . Let q̂α be the percentile of the T̂*,r. It can be shown that P(T̂ ≤ q̂α) tends to α, when m tends to infinity. This gives a bootstrap confidence interval for β
(27) |
For large m and R, the coverage probability of ÎR is close to 1 − α. The bootstrap estimation of the variance is calculated using the empirical variance of the R sample (β̂*,r, r = 1, …, R):
(28) |
where is the sample mean .
Coverage accuracy, where coverage is the probability that a confidence interval includes β, is the important property for a confidence interval procedure. Bootstrap confidence interval methods differ in their asymptotic properties. Our simulation results are given in Table 1. MBB1 and MBB2 are similar to each other. Those two block bootstrap methods obtained correct coverage probability at the nominal level of 95%. The standard bootstrap or stratified ordinary bootstrap did not perform well in highly correlated longitudinal data with low coverage probabilities.
Table 1.
Methods | CI | Probability | Length |
---|---|---|---|
MBB1 | (0.849, 1.168) | 0.949 | 0.318 |
MBB2 | (0.844, 1.163) | 0.952 | 0.320 |
SOB | (0.805, 1.091) | 0.768 | 0.286 |
SB | (0.804, 1.086) | 0.747 | 0.282 |
β̂1 | (0.840, 1.169) | 0.950 | 0.329 |
5. Concluding Remarks and Discussion
The moving block bootstrap methods are used for analyzing longitudinal data in which a small number of subjects have a large number of replications over time by investigating the efficacy and utility of the methodology, theoretically and empirically, through a small simulation study. Those have second-order optimality in the case of dependent stationary data, under regular conditions.
Quasi-likelihood approaches such as the GEE of Liang and Zeger (1986) and quadratic inference function (QIF) of Qu et al. (2000) are useful for modeling longitudinal data in the form of large sample short time series. However, their advantages of simplicity and robustness against the misspecification of the correlation structure are offset by their loss of estimation efficiency and lack of procedure for model assessment and selection. In other words, these estimators have consistency properties, but are not fully efficient. To offset this inefficiency, subjects rather than observations can be resampled to obtain an estimation efficiency that takes correlation structure into account.
In a longitudinal study with a small sample and a long time series in which modeling the repeated time pattern is necessary, we assume a stationary process. The stationary process assumption represents a fundamentally different stochastic mechanism from other methods used to govern the structure and behavior of transitions over time. Using a moving block bootstrap estimation procedure is preferable in case of cluster high-correlated data with an equal large number of clusters or in spatial strong-correlated data collected from a large number of locations.
When both the number of subjects and the number of replications are large, the correlation structure is determined by a trade off between the number of nuisance parameters and the closeness of the mathematical model to the true underlying structure. The question is, which provides better estimation efficiency: a simpler correlation with small nuisance parameters or a model closer to the true structure with many nuisance parameters. Further simulations using the bootstrap procedure will be investigated in the future study.
Appendix
Proof of Theorem 1
Proof. We follow Lahiri (1996) notation and definitions. For a smooth function h : ℝp → ℝ, let us Djh denote the partial derivative of h(x) with respect to the jth coordinate of x, 1 ≤ j ≤ p. For p × 1 vectors and w = (w1,…, wp)′ ∈ ℝp, let |v| = v1 + … + vp, , and . Let Dv denote the differential operator , namely, . For with 1 ≤ |v| ≤ s, let χv denote the vth cumulant and μv is the vth moment of w. Note that , , t ∈ ℝp, and .
We consider w is a ℝp -valued random vector with Ew =0 and E‖w‖s < ∞ for some integer s ≥ 3.
Let m3 = [log m log log(3 + m)], v1m = m−1/2(log m)1/2, vm = m−1/2, v2m = vm(log m)−1, and v3m = vm(log m)−3/2. Furthermore, define
A = τIp = Eψ′ (eij)Ip, τ = τi = Eψ′ (ei1) for each i, wi1j = ψ (ei1), Wi2j = ψ′(eij) − τ, Wi3j = ψ″(eij) − Eψ″(ei1), and Wi4j(k) = ψ(eij)ψ(ei(j+k)) − σi(k). Also, write for a random vector U in ℝp.
Let , t ∈ ℝp. Then, by Taylor's expansion, one can get
(29) |
Where , t ∈ ℝp.
Following Lahiri (1992), we obtain that
(30) |
where ,
(31) |
If we have Tn0m = T1n0m + Rn0ms, where Rn0ms is the remainder term that under the moment condition E‖y11‖s < ∞ satisfies P(‖Rn0ms‖ > δm, s) = δm, s for some sequence δm, s = o(m−(s−2)/2), then the random variable T1n0m is called a (s − 2)th -order stochastic approximation to Tn0m. Note that the (s − 2)th order Edgeworth expansions for Tn0m and T1n0m coincide. The reason for T1n0m is that the first term is the same as Tn0m, but the remaining terms consist of all independent variables for deriving a simpler expansion. The stochastic approximation T1n0m can be expressed in the form
(32) |
where q1 = (1, 0,…, 0),…, qp = (0, 0,…, 1) are the standard basis of ℝp, and Λn0rm, Λ1n0rm, Λvm are non random matrices satisfying max{m1/2‖Λn0rm‖ + ‖Λ1n0rm‖ + ‖Λvm‖ : 1 ≤ r ≤ p, |v| = 1} = o(1). In the following C, C(·) denotes pure constants which depend on each arguments, and the dependance of C(·) on p, α, and the finite moments of ψ (eij), ψ′(eij), and ψ″(eij) will be suppressed for notational simplicity. Using Lahiri's (1992, 1996) arguments, we can show that
(33) |
(34) |
(35) |
for all 1 ≤ u, l, z ≤ p, where diju denote the uth component of dij, we have Tn0m = T1n0m + Rn0m, where P(‖Rn0m‖ > Cv2m) = o(vm). We know that the first order Edgeworth expansions for T1n0m and Tn0m coincide.
Let with a = [m(1−2δ0)/2], , , , and
(36) |
The reason for T2n0m is that the first term is the same as T1n0m, but the remaining terms consist of truncated independent variables for obtaining the simplified forms of expansion. Using an Edgeworth expansion under dependence for T1n0m (Lahiri 1994, 1996), we have
(37) |
We have the same first-order Edgeworth expansions for T2n0m, T1n0m, and Tn0m; namely, three statistics are close to each other.
We obtain that
(38) |
with ‖pn0mϕ‖∞) = O(m−1/2). The proof is then complete.
Proof of Theorem 2
Proof. Let , , , , , , , component of , for u = 1, k,…,
As in the proof of Theorem 4.5.1 and in Lahiri's (1996) result, we have
(39) |
where . We also use and , which is the same definition of T1n0m and T2n0m in Theorem 4.5.1.
Now using the results of Bhattachararya and Ranga Rao (1986) and Lahiri (1996), We have the same first-order Edgeworth expansion forms for , , and , since those are close to each other. We obtain
(40) |
References
- Babu GJ. Bootstrapping statistics with linear combinations of chi-squares as weak limit. Sankhya Ser A. 1986;56:85–93. [Google Scholar]
- Bhattacharya RN, Ranga Rao R. Normal Approximations and Asymptotic Expansions. Malabar, FL: Krieger; 1986. [Google Scholar]
- Bühlmann P. Sieve bootstrap for time series. Bernoulli. 1997;3:128–148. [Google Scholar]
- Davison AC, Hinkley DV. Bootstrap Methods and Their Application. Cambridge, UK: Cambridge University Press; 1997. [Google Scholar]
- Efron B. Bootstrap methods: Another look at the jackknife. Annals Stat. 1979;7:1–16. [Google Scholar]
- Hall P, Horowitz JL, Jing BY. On blocking rules for the bootstrap withe dependent data. Biometrika. 1995;82:561–574. [Google Scholar]
- Lahiri SN. Bootstrapping m-estimators of a multiple linear regression parameter. Annals Stat. 1992;20:1548–1570. [Google Scholar]
- Lahiri SN. On two-term Edgeworth expansions and bootstrap approximations for studentized multivariate m-estimators. Sankhya Ser A. 1994;56:201–226. [Google Scholar]
- Lahiri SN. On Edgeworth and moving block bootstrap for studentized m-estimators in multiple linear regression models. J Multivar Anal. 1996;56:42–59. [Google Scholar]
- Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
- Paparoditis E, Politis DN. Tapered block bootstrap for general statistics from stationary sequences. Econometric J. 2002;5:131–148. [Google Scholar]
- Qu A, Lindsay BG, Li B. Improving generalized estimating equations using quadratic inference function. Biometrika. 2000;87:823–836. [Google Scholar]
- Singh K. On the asymptotic accuracy of the Efron's bootstrap. Annals Stat. 1981;9:1187–1195. [Google Scholar]
- Xie M, Yang Y. Asymptotics for generalized estimating equations with large cluster sizes. Annals Stat. 2003;31:13–22. [Google Scholar]