Summary
We propose a multiple imputation estimator for parameter estimation in a quantile regression model when some covariates are missing at random. The estimation procedure fully utilizes the entire dataset to achieve increased efficiency, and the resulting coefficient estimators are root-n consistent and asymptotically normal. To protect against possible model misspecification, we further propose a shrinkage estimator, which automatically adjusts for possible bias. The finite sample performance of our estimator is investigated in a simulation study. Finally, we apply our methodology to part of the Eating at American’s Table Study data, investigating the association between two measures of dietary intake.
Keywords: Missing data, Multiple imputation, Quantile regression, Regression quantile, Shrinkage estimation
1. Introduction
In many regression-type applications, some observations are missing. Ignoring the missing data will undermine study efficiency, and sometimes introduce substantial bias. There is a large literature dealing with missing data; see Little & Rubin (1987) for an early and still fundamental treatment. Quantile regression (Koenker & Bassett, 1978) has been an increasingly important modelling tool, due to its flexibility in exploring how covariates affect the distribution of the response. However, combining quantile regression with missing data is not a well-developed topic. In this paper, we consider a linear quantile regression model, where for τ ∈ (0, 1),
(1) |
Here (x, z) are both covariate vectors, but x may be missing, while z is always observed. We assume that z contains the constant 1, so the intercept term is not written out separately. We use n for the total sample size, and assume that n1 of these n observations are complete, while the remaining n0 of them have x missing. Thus, observations can be summarized as {(yi, xi, zi) : i =1, …, n1} and {(yj, ·, zj) : j = n1 + 1, …, n}. To avoid trivial situations, we assume 0 < limn→∞ n0/n1 = λ < ∞. We make a missing at random assumption that conditional on z, missingness and x are independent. The main interest of this paper is in estimating the regression parameter given the assumed missing data mechanism. This research is motivated by the Eating at American’s Table Study (Subar et al., 2001), an important study in nutritional epidemiology. In § 5, we describe how this study fits our model framework.
It is not difficult to see that since missingness depends only upon the observed covariates z, using the complete data only yields a consistent estimate of βτ. However, since a part of the data is completely excluded from the analysis, this practice can be highly inefficient. The main goal of this paper is to propose a multiple imputation method to include the incomplete data, so as to improve estimation efficiency. Since additional assumptions on (x, z) are needed to facilitate the imputation procedure, the method risks being inconsistent and we propose a shrinkage estimator to attenuate this risk. The final estimator has an automatic data-driven shrinkage parameter, which guarantees that the resulting estimator is consistent regardless of the correctness of the additional assumptions, and at the same time is more efficient than using the complete data only.
Most existing methods handling missing data are likelihood-based, and hence cannot be applied to quantile regression directly, since there is no likelihood function for quantile regression. Lipsitz et al. (1997) considered an inverse probability approach for longitudinal data with drop-outs. For the same type of data, Yi & He (2009) extended the inverse probability weighted generalized estimating equations proposed by Robins et al. (1995) to correct for the bias from longitudinal drop-out. Our setting is different from those methods, since we are dealing with missing covariates, rather than missing outcomes.
Throughout the paper, we write Qτ (y) as the τth quantile of a random variable y. We write β(τ) as the quantile coefficient process for τ ∈ (0, 1), and βτ as the quantile coefficient specifically at the τth quantile. In addition, we use ‖x‖ to mean Euclidean norm, and write g′ (x) as the first derivative of an arbitrary function g(x). If x and y are two random variables, then E(x,y){g(x, y)} stands for the expectation of g(x, y) over the joint distribution of (x, y).
2. Estimation with multiple imputation
2·1. Method
In this section, we propose a multiple imputation estimator of the quantile coefficient in the linear quantile model (1). The method has the following steps.
Step 1. Perform quantile regression with the complete data only. Run a quantile regression using the complete data only and write the resulting coefficients as β̂τ. That is, for a set of τ values in (0, 1), obtain , where ρτ (r) = r {τ − I (r < 0)} is an asymmetric L1 loss function. In practice, τ is typically chosen to be evenly spread and sufficiently dense grid points on (0, 1).
Step 2. Impute the missing x based on f (x | y, z). The main challenge is to estimate the conditional density of f (x | y, z). The density f (x | y, z) ∝ f (y | x, z) f (x | z), so it can be determined uniquely from the two densities f (y | x, z) and f (x | z).
Step 2a: Estimate the conditional density f (y | x, z). Under the assumption that the linear quantile model (1) holds for all quantile levels τ, we can write the conditional density f (y | x, z) as a function of the quantile coefficient process, that is, f {y | x, z; β0(τ)} = F′{y | x, z; β0(τ)}, where F{y | x, z; β0(τ)} = inf {τ ∈ (0, 1) : (xT, zT)β0(τ) > y} and β0(τ) is the true quantile coefficient process. We write the conditional density f (y | x, z) as f {y | x, z; β0(τ)} to indicate its dependence on the quantile coefficient function β0(τ).
Although the unknown coefficient function β0(τ) is of infinite dimension, it can be well-approximated by a natural linear spline expanding from a series of estimated β̂τk at a fine grid of quantile levels (τk). Specifically, we choose quantile levels τk = k/(Kn + 1) (k =1, …, Kn), where Kn is the number of quantile levels. We then define β̂(τ) as a p-dimensional piecewise linear function on [0,1], which satisfies β̂(τk) = β̂τk and β̂′ (0) = β̂′ (1) = 0. Under the conditions in Wei & Carroll (2009), β̂(τ) converges uniformly to the true quantile coefficient process in probability. The quantile function is the inverse distribution function, so the density function can be expressed as the reciprocal of the first derivative of the quantile function at the corresponding quantile level. Consequently, we can approximate the conditional density function by
Here f {y | x, z, β̂(τ)} is the previously defined density function that is induced from the estimated conditional quantile function (xT, zT) β̂(τ).
Step 2b: Estimate the conditional density f (x | z). The remaining problem is to estimate f (x | z). We model x given z parametrically as f (x | z, η). The missing-at-random assumption facilitates the estimation of η based on the complete data. We write the estimate as η̂, and the estimated conditional density of x given z as f (x | z, η̂).
Step 2c: Estimate the conditional density f (x | y, z) and impute the missing x accordingly. The estimated conditional density function is f̂ (x | yj, zj) ∝ f̂{yj | x, zj, β̂(τ)} f (x | zj, η̂). For each j =n1 + 1, …, n, we simulate the missing xj from f̂(x | yj, zj) by randomly drawing a Un(0,1) random variable, and inserting it into the quantile function F̂−1(u | yj, zj), for u ∈ (0, 1) that is derived from the estimated f̂(x | yj, zj). Let uℓ be the ℓth generated Un(0,1) random variable. We then define x̃j(ℓ) = F−1(uℓ | yj, zj), the ℓth imputed x associated with (yj, zj). Consequently, x̃j(ℓ) ~ f̂(x | yj, zj).
Step 3. Re-estimate β including the imputed data. We assemble a new objective function including the completely observed data and the ℓth imputed dataset as
and define β̂*(ℓ) = argminβ Sn(ℓ)(β) as the estimated coefficient using the ℓth assembled complete data. We repeat this imputation-estimation step m times, and the multiple imputation estimator is .
2·2. Large-sample properties of the multiple imputation estimator
In this section, we establish the consistency and asymptotic normality of the multiple imputation estimator β̃τ. Let δ = 0 when x is missing and δ =1 otherwise.We first reiterate the assumption on the missingness mechanism.
Assumption 1. For all z, pr(δ =1 | x, y, z) = pr(δ =1 | z) > 0.
Assumption 1 ensures that, conditioning on z, the event that x is missing is independent of x and the response y. We then introduce two identifiability conditions.
Assumption 2. There exists a β0,τ ∈ ℝp such that β0,τ uniquely minimizes the objective function S0(β) = E(y,x,z)[ρτ {y − (xT, zT)β}].
Define S̃0(β) = E(y, x̃,z)[ρτ {y − (x̃T, zT)β}], where, given (y, z), x̃ follows the conditional distribution f̂ (x | y, z). Since f̂ is estimated from completely observed data, this expectation is also conditional on the n1 completely observed data. We then make the following assumptions.
Assumption 3. There exists a compact set Ω ∈ Rp, and , such that .
Assumption 4. The covariate x has bounded support 𝒳. The true conditional density f (x | z) = f (x | z, η =η0), where f (x | z, η) is a continuous function of η uniformly for (x, z) in a neighbourhood of η0 and is bounded away from zero and infinity for all (x, z).
Recall that for any x and z, (xT, zT)β0(τ) defines the conditional quantile function of y given x and z. We further define a functional , which is the density of y given x and z at the τth quantile. We call this the conditional quantile density function. Its reciprocal is known as the sparsity function (Welsh, 1988; Koenker & Xiao, 2004). With these definitions, we now introduce the smoothness conditions on β0(τ).
Assumption 5. The true coefficient functions β0(τ) are smooth functions on (0, 1), and for any x ∈ 𝒳 and z,
0 < h(τ; x, z) < ∞, and limτ→0 h(τ ; x, z) = limτ→1 h(τ; x, z) = 0;
- there exist constants M and ν1, ν2 > −1 such that the first derivative of h(·) satisfies
(2)
Assumption 5 is similar to Assumption 3 in Wei & Carroll (2009). Assumption 5(i) implies that the conditional density f (y | x, z) is continuous, bounded away from zero and infinity and diminishes to zero as τ converges to 0 and 1, while Assumption 5(ii) is on the tail behaviour of f (y | x, z), since h′ (τ; x, z) determines how smoothly the density function diminishes as the quantile level converges to 0 or 1. Smaller ν1 and ν2 indicate heavier tails of the conditional distribution of y given x and z. Assumption 5(ii) covers a wide range of distributions, such as the exponential, Gaussian and the Student t-distributions.Assumption 5, together with Assumptions 2 and 4, ensures the uniform convergence of β̂(τ) over the intervals [1/(kn + 1), kn/(kn + 1)], which in turn ensures consistent estimation of f (y | x, z).
Assumption 6. The matrix , is positive definite, where ϕτ (r) = τ − I {r < 0}.
In addition, we also make the definitions
With these assumptions and notation, we now present the asymptotic behaviour of β̃τ. Recall that 0 < limn→∞ n0/n1 = λ < ∞.
Theorem 1. Under Assumptions 1–6, for Kn→ ∞ and Knn−1 → 0, the multiple imputation estimator in distribution, where Σ = (λ + 1)−1 V1 + (1 + 1/λ)−1[m−1V0 + {(m − 1)/m}U0].
The proof of Theorem 1 is provided in Appendix A1, while estimates of ψτ and Σ are provided in Appendix A2.
Remark 1. Throughout, we use the phrase complete-data analysis to mean an analysis based only on the completely observed data. The asymptotic variance of the estimator using the completely observed data only is . Comparing with the estimation variance of the imputed estimator, we see two sources of difference. First, the multiple imputation estimator has an effective sample size n, larger than that for the complete-data analysis, which helps to improve its efficiency. Second, the multiple imputation estimator has additional sources of variability, including the sampling variability from multiple imputation, the inherited variability from using the complete-data estimated parameters and their correlations. Hence, the multiple imputation estimators might be less efficient than the complete-data estimator. Such phenomena are common for multiple imputation estimators; see Tsiatis (2006, Ch. 14). In practice, one could assess the variabilities of both estimators to decide which to use; see Appendix A2.
3. Shrinkage estimation
The estimator β̂τ using the complete data only is consistent, but has a potential loss of efficiency. The multiple imputation estimator β̃τ is generally more efficient, as will be demonstrated via simulations in § 4. However, imputation may cause bias when the parametric likelihood for x given z is misspecified. There are many ways to balance the two estimators, including test-pretest estimation after testing for the parametric model, but a simple and general strategy that we adopt is a shrinkage estimator, as follows. Let θ̂τ = β̂τ − β̃τ be the componentwise differences of the multiple imputation and complete-data estimators, respectively, with elements (θ̂1,τ, …, θ̂p,τ)T. Let V be the covariance matrix of θ̂τ with diagonal elements (υ11, …, υpp). Then Chen et al. (2009) suggest the estimator
(3) |
where K is a diagonal matrix with jth . Recall that the asymptotic variances υjj (j = 1, …, p) are quantities of order n−1. The idea behind this method is that if there is no bias, then and the shrinkage factor K is between 0 and I, so that the multiple imputation estimator and the complete-data estimator both receive weight, although emphasis is on the former. Conversely, if there is a bias, then , and the elements of K → 0, so that the complete-data estimator asymptotically has weight 1.
Details of implementing the shrinkage estimator are given in Appendix A2. In Appendix A1, we show that the complete-data estimator and the multiple imputation estimator have linear expansions, based on which we outline in Appendix A2 estimation of the joint covariance matrix of (β̂τ, β̃τ). The results enable us to estimate V easily and also mean that the formulae in Chen et al. (2009) are applicable, so that we can construct an estimator of cov(). The general theory for such shrinkage estimators is given by Chen et al. (2009), although constructing the estimate of Σ is nontrivial because of our context.
4. Simulations
Here we investigate the performance of our multiple imputation estimator β̃τ and shrinkage estimator based on Monte-Carlo simulations. We first consider two models.
(4) |
(5) |
where the errors ei1 and ei2 are independent and standard normal, and the covariates (xi, zi) are jointly normal with mean vector (4, 4)T, variances (1, 1)T and correlation 0·5. In model (4), the true intercept at the τ th quantile is 1+Qτ (z), where z is a random variable with a standard normal distribution, and both coefficients associated with xi and zi equal 1 at every quantile. In model (5), the true intercept equals 1 at every quantile level, but the two slope coefficients vary across the quantiles, both equal to 1 + 0·5Qτ (z) at quantile level τ. In both models, we further assume that xi is missing with probability pr(xi is missing | zi) = max[0, {(zi − 3)/10}1/20], which results in approximately 25% missing xis. We then apply the multiple imputation estimation and shrinkage estimation procedures to the simulated data from the two models above. In both settings, the density f (x | z) is estimated by maximum likelihood estimation correctly assuming a joint normal distribution. When the covariates x and z are negative, there is an identifiability issue in model (5) since the distribution of ei1 is symmetric around 0. To avoid this trivial situation, we only kept the pairs (x, z) satisfying x + z > 0 in model (5). Because the probability of x + z < 0 is very small, the resulting true joint probability density function of (x; z) is very close to the joint normal distribution which we used in the imputation procedure. We choose m =10 in the multiple imputation estimation algorithm. The sample size was n = n0 + n1 = 200. The shrinkage factor is estimated following Appendix A2.
Table 1 displays the means and the standard errors of the estimated quantile coefficients in models (4) and (5) from 500 simulations at τ = 0·1, 0·5 and 0·9, using the three estimation approaches. The upper half of Table 1 displays the coefficients from model (4), while the bottom half shows those from model (5). All three methods are nearly unbiased. However, as expected from the theory, the variances of the multiple imputation estimators are smaller than the complete-data estimators, especially in the coefficient associated with zi. Such efficiency improvement is more evident for the heteroscedastic model (2). For example, for estimating the zi slope at the 0·9th quantile, the relative efficiency of multiple imputation estimation compared with using the complete data only, i.e., the ratio of their variances, is 217%, and that of shrinkage estimation is 149%. To investigate the performance of our methods in various model settings, we also allowed higher missing proportions, and weaker or stronger correlation between the covariates x and z. The resulting estimated coefficients and their standard errors are included in the Supplementary Material. On the basis of those tables, the proposed estimators performed well across various model specifications.
Table 1.
Intercept | True | τ = 0.1 Mean |
SE | True | τ = 0.5 Mean |
SE | True | τ = 0.9 Mean |
SE | |
---|---|---|---|---|---|---|---|---|---|---|
Model (4) | ||||||||||
β̂ | −0·28 | −0·27 | 0·29 | 1·00 | 1·00 | 0·21 | 2·28 | 2·28 | 0·28 | |
β̃ | −0·28 | −0·25 | 0·23 | 1·00 | 1·03 | 0·17 | 2·28 | 2·29 | 0·24 | |
β̂(s) | −0·28 | −0·27 | 0·25 | 1·00 | 1·01 | 0·19 | 2·28 | 2·29 | 0·25 | |
x | β̂ | 1·00 | 0·99 | 0·16 | 1·00 | 1·00 | 0·12 | 1·00 | 1·00 | 0·16 |
β̃ | 1·00 | 0·94 | 0·16 | 1·00 | 0·98 | 0·11 | 1·00 | 0·96 | 0·16 | |
β̂(s) | 1·00 | 0·98 | 0·16 | 1·00 | 0·99 | 0·12 | 1·00 | 0·99 | 0·16 | |
z | β̂ | 1·00 | 1·01 | 0·20 | 1·00 | 1·00 | 0·14 | 1·00 | 1·00 | 0·19 |
β̃ | 1·00 | 1·03 | 0·18 | 1·00 | 1·01 | 0·12 | 1·00 | 1·03 | 0·16 | |
β̂(s) | 1·00 | 1·02 | 0·19 | 1·00 | 1·01 | 0·13 | 1·00 | 1·01 | 0·17 | |
Model (5) | ||||||||||
β̂ | 1·00 | 0·69 | 3·88 | 1·00 | 0·61 | 2·68 | 1·00 | 1·00 | 3·16 | |
β̃ | 1·00 | 0·93 | 2·04 | 1·00 | 0·84 | 1·59 | 1·00 | 1·56 | 2·11 | |
β̂(s) | 1·00 | 0·58 | 3·22 | 1·00 | 0·69 | 2·24 | 1·00 | 1·37 | 2·64 | |
x | β̂ | 0·36 | 0·43 | 0·62 | 1·00 | 1·01 | 0·51 | 1·64 | 1·68 | 0·62 |
β̃ | 0·36 | 0·39 | 0·52 | 1·00 | 0·92 | 0·46 | 1·64 | 1·45 | 0·53 | |
β̂(s) | 0·36 | 0·42 | 0·58 | 1·00 | 0·98 | 0·49 | 1·64 | 1·62 | 0·60 | |
z | β̂ | 0·36 | 0·39 | 0·92 | 1·00 | 1·08 | 0·64 | 1·64 | 1·59 | 0·77 |
β̃ | 0·36 | 0·36 | 0·52 | 1·00 | 1·11 | 0·41 | 1·64 | 1·68 | 0·52 | |
β̂(s) | 0·36 | 0·41 | 0·78 | 1·00 | 1·09 | 0·54 | 1·64 | 1·60 | 0·64 |
β̂, the estimated coefficient using the completely observed data only; β̃, the multiple imputation estimator with 10 imputations; β̂(s), the shrinkage estimator; True, the true coefficients; SE, standard errors.
The results in Table 1 are obtained when f (x | z) is estimated from the correct model. To investigate the potential bias that could be induced from misspecified f (x | z), we simulate covariates (xi, zi) as xi = (0·18ui,1, + 0·68ui,2) + 3·14, and zi = (0·68ui,1 + 0·18ui,2) + 3·14, where ui,1 and ui,2 are two independent random variables. We choose the constants, 0·18, 0·68 and 3·14, such that (xi, zi) have mean 4, variance 1 and correlation of approximately 0·5, as in the earlier simulation. After simulating the nonnormally distributed covariates, we then generate the responses from model (2). For each generated sample, we allow xi to be missing completely at random with probability 0·25. We apply the same estimation procedures as above, pretending that (xi, zi) is jointly normal. Table 2 presents the mean squared errors and standard errors for the resulting estimated coefficients at τ = 0·1, 0·5 and 0·9. As a comparison, we also re-estimate the coefficients using the imputation method, but use the exact density f (x | z) in the algorithm. On the basis of Table 2, the mean squared errors from the multiple imputation estimators with the exact f (x | z) are the smallest. As expected, when f (x | z) is misspecified, the mean squared errors are inflated, and the shrinkage estimates have smaller mean squared errors due to the bias correction. Since the complete-data approach only uses part of the data for estimation, its mean squared errors are even larger than the multiple imputation estimator with misspecified f (x | z). Finally, the difference between the multiple imputation estimators using exact and misspecified densities are small relative to their standard errors, indicating that the multiple imputation estimator is also fairly robust against the misspecification of f (x | z).
Table 2.
τ =0·1 | τ =0·5 | τ =0·9 | ||||
---|---|---|---|---|---|---|
MSE | SE | MSE | SE | MSE | SE | |
β̂ | 1·58 | 0·09 | 0·76 | 0·05 | 1·33 | 0·08 |
β̃ | 1·49 | 0·08 | 0·70 | 0·04 | 1·14 | 0·06 |
β̂(s) | 1·36 | 0·08 | 0·72 | 0·04 | 1·07 | 0·07 |
β̃* | 1·31 | 0·07 | 0·68 | 0·04 | 1·02 | 0·06 |
β̂, the estimated coefficient using the completely observed data only; β̃, the multiple imputation estimator with 10 imputations; β̂(s), the shrinkage estimator; β̃*, the multiple imputation estimator using the exact f (x | z); SE, the standard error of the mean squared error; MSE, mean squared errors.
5. Application
We illustrate the performance of our methods using part of the Eating at American’s Table Study (Subar et al., 2001). The dataset consists of 1418 subjects who participated in this study from September 1997 to August 1998. They were required to complete a 24-hour recall on their dietary intakes, and they also completed a dietary history questionnaire. It is commonly thought that the 24-hour recall is an unbiased measure of dietary intake, but is expensive in cohort studies because it must be administered multiple times, and thus costs far more than the dietary history questionnaire. In measurement error modelling of diet and disease, the regression calibration method (Carroll et al., 2006) is to regress the 24-hour recall on the dietary history questionnaire. Since the distributions of nutrition intakes are commonly skewed, quantile regression is a desirable tool for this modelling.
Here we model carbohydrate intake, with yi being the 24-hour recall for the ith person, xi1 the dietary history questionnaire measurement, xi2 body mass index, xi3 the participant’s age, xi4 an indicator of Caucasian ethnic status and xi5 the gender. The model can be written as
(6) |
There are 453 randomly selected subjects among the 1418 who do not have measurements of body mass index and did not complete the dietary history questionnaire, because the study was a designed experiment with some participants randomly assigned to complete an alternative questionnaire. Therefore, those covariates are missing completely at random. Here we apply our multiple imputation estimation methodology to obtain the estimate of the βs, with x as the carbohydrate intake in the dietary history questionnaire and body mass index, and z as gender, ethnicity and age.
In these data, we found that the carbohydrate intake measured in the dietary history questionnaire and body mass index are essentially uncorrelated, with partial correlation 0·0084 conditional on the subject’s age and gender. We can thus estimate the conditional density of carbohydrate intakes in the dietary history questionnaire and body mass index separately based on the two Box–Cox transformation models
Here Λ(u, λ) is the Box–Cox transformation function, i.e., Λ(u, λ) = log(u) if λ = 0, and Λ(u, λ) = (uλ − 1)/λ for λ ≠ 0. We used maximum likelihood estimates of the transformation parameters, these being close to 0 and −1, respectively, which suggests that logarithm and reciprocal transformations are needed for carbohydrate intake in the dietary history questionnaire and body mass index, respectively. In the Supplementary Material, we present the quantile-quantile plot of the residuals from the above two models with their respective best fitted powers, which shows that the transformed variables are approximately normally distributed.
On the basis of the estimated models, the conditional density of the untransformed carbohydrate intake in the dietary history questionnaire is f̂c(υ) = (υσ̂1) −1 φ[{log(υ)− γ̂10 − γ̂11x3 − γ̂12x4 − γ̂13x5}/σ̂1], where φ is the density function of standard normal. The conditional density of body mass index is f̂b(υ) = (υ2σ̂2)−1φ[{1/υ − γ̂20 − γ̂21x3 − γ̂22x4 − γ̂23x5}/σ̂2].
Following our multiple imputation algorithm, we estimated model (6) at 50 evenly spaced quantile levels using the completely observed data only in the first step. On the basis of the resulting quantile coefficient process, and the estimated conditional densities f (x | z) using the models above, we imputed the missing carbohydrate intakes and body mass index m = 10 times. In Table 3, we listed the multiple imputation estimators at τ = 0·1, 0·5 and 0·9, as well as their standard errors. To illustrate the improved efficiency from multiple imputation, we calculated the relative efficiency. In addition, we also constructed the shrinkage estimator following (3). The shrinkage factors are estimated following Appendix A2.
Table 3.
τ | Raw | Multiple imputation | Shrinkage | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Covariates | β̂ |
|
β̃ |
|
(%) | β̂(s) |
|
(%) | ||||
0·1 | 0·08 | 0·06 | 0·04 | 0·06 | 99 | 0·06 | 0·06 | 104 | ||||
Carbohydrate intake | 0·5 | 0·27 | 0·04 | 0·24 | 0·03 | 109 | 0·27 | 0·03 | 102 | |||
0·9 | 0·60 | 0·07 | 0·48 | 0·07 | 112 | 0·59 | 0·07 | 101 | ||||
0·1 | −0·94 | 0·88 | −0·84 | 0·91 | 94 | −0·85 | 0·90 | 96 | ||||
Body mass index | 0·5 | −1·68 | 0·54 | −1·63 | 0·54 | 99 | −1·63 | 0·54 | 100 | |||
0·9 | −0·70 | 1·20 | −0·35 | 1·21 | 98 | −0·51 | 1·19 | 101 | ||||
0·1 | −0·53 | 0·36 | −0·39 | 0·35 | 108 | −0·42 | 0·33 | 117 | ||||
Age | 0·5 | −0·86 | 0·28 | −1·00 | 0·24 | 136 | −0·95 | 0·25 | 132 | |||
0·9 | −1·38 | 0·62 | −1·71 | 0·51 | 147 | −1·54 | 0·55 | 126 | ||||
0·1 | 5·95 | 14·43 | 14·95 | 12·57 | 132 | 11·61 | 12·02 | 144 | ||||
Caucasian | 0·5 | 4·67 | 10·87 | 6·22 | 8·38 | 168 | 6·16 | 8·37 | 169 | |||
0·9 | −38·45 | 41·02 | −1·39 | 25·57 | 257 | −27·91 | 35·76 | 132 | ||||
0·1 | −47·34 | 11·03 | −38·34 | 10·25 | 116 | −43·20 | 10·11 | 119 | ||||
Gender | 0·5 | −73·48 | 8·27 | −66·90 | 7·05 | 137 | −70·77 | 7·57 | 119 | |||
0·9 | −108·07 | 15·58 | −114·92 | 12·96 | 145 | −113·41 | 13·59 | 131 |
, standard errors following the estimation method described in Appendix A2; , relative efficiency, which is defined as the ratio between the estimated variance of the complete-data estimator and that of the multiple imputation/shrinkage estimates.
Table 3 shows that the multiple imputation estimators are fairly consistent with those using the complete data only, but have much smaller standard errors for the estimates associated with age, ethnicity and gender. Those variables are completely observed when the dietary history questionnaire carbohydrate intakes and body mass index are missing. The multiple imputation estimators make full use of those observations, which improves their efficiency. The shrinkage estimator is generally consistent with the complete-data and multiple imputation estimators; while its standard errors are slightly larger than the multiple imputation estimators, they are still much smaller than those of the complete-data estimators.
6. Discussion
The validity of our multiple imputation method relies on a correct specification of the conditional density f (x | z), which we model parametrically. To further protect against the possible misspecification of f (x | z), a shrinkage estimator was proposed. One could also opt to estimate f (x | z) nonparametrically, which will automatically yield a consistent estimator without an additional shrinkage step. However, nonparametric conditional density estimation is very complex, especially when z is multivariate, and the slow rates of convergence would undermine the usefulness of such an approach.
The missing covariate problem in the quantile regression context is challenging, because the conditional density of y given the covariates is unspecified under a typical quantile regression setting. Consequently, classical likelihood-based approaches cannot be applied directly. Here, we adopted a joint modelling approach similar to Wei & Carroll (2009) to circumvent this difficulty. However, the proposed method is different from Wei & Carroll (2009) in many aspects. First, the objectives are different. This paper handles missing covariates, while Wei & Carroll (2009) handle mismeasured covariates. Second, the estimation approaches are different. Wei & Carroll (2009) is based on constructing unbiased estimating equations; while this paper uses a multiple imputation approach. Consequently, the estimation algorithms are different; the former involves iterative estimation, while the estimation procedure in this paper does not. Finally, the asymptotic properties are obtained in a very different fashion.
We assumed the conditional quantile functions to be linear at all quantile levels. This assumption holds for location-scale models, i.e., Y = XTβ + XTγe, where e is a random error with Qτ (e | X) = 0. If needed, one can easily relax the linear quantile function to an arbitrary nonlinear or even nonparametric function. The algorithm remains largely unchanged, with the minimal adaptation of setting the linear function to be the new regression function in the check function ρτ. Although the method is presented for an independent sample, it can also be extended to longitudinal data using the so-called working independence construction. For a longitudinal sample (yi,j, xi, j, zi, j), if the quantiles of yi, j is linear in (xi, j, zi, j), then we can estimate the quantile coefficients using a similar algorithm with the longitudinal quantile regression objective function . The estimation of the conditional density f (x | z) also needs to be adapted for the longitudinal data. The resulting estimators would still be consistent, but the limiting distribution would need to be derived separately.
Supplementary Material
Acknowledgement
Wei's research was supported by the National Science Foundation (DMS-0906568) and a career award from NIEHS Center for Environmental Health in Northern Manhattan (ES009089). Ma's research was supported by a grant from the National Science Foundation (DMS-0906341). Carroll's research was supported by a grant from the National Cancer Institute (CA57030).
Appendix
A1. Technical arguments
Recall that x̃j(ℓ) is the ℓth imputed x associated with (yj, zj), based on the estimated density f̂(x | yj, zj). We define a partial objective function with the imputed proportion of the data
and define its minimizer .
We say that β̂0,(ℓ) is the estimated coefficient using the ℓth imputed portion of the data only. In later steps, we show that the multiple imputation estimator β̃τ can be written as a linear combination of β̂τ and β̂0,(ℓ)s. Hence, to find the asymptotic distribution of β̃τ, a key step is to find the asymptotic distribution of β̂0,(ℓ) as n = n0 + n1 → ∞, and 0 < limn n0/n1 =λ < ∞. To do that, we first show that
(A1) |
in probability as n1 → ∞. Here S̃0(β) and S0(β) are the two expected objective functions defined before Assumptions 2 and 3.
Recall that f̂{y | x, z, β̂(τ)} is the estimated conditional density of y given x and z using the complete data only. We first decompose the difference between the estimated density f̂{y | x, z, β̂(τ)} and its true value as
where
Following the definition of f̂{y | x, z, β̂(τ)}, and since for any given value of y, it can only be contained in one of those subintervals {(xT, zT) β̂τk, (xT, zT) β̂τk+1}, we have
Following the uniform convergence of β̂τ, readily available from the result in Wei & Carroll (2009) by considering in their context a special case where the measurement error variance is zero, the convergence holds uniformly for any k. Consequently, we can rewrite the upper bound as
By the mean value theorem, there exists a τ* ∈ (τk, τk+1) such that (τk+1 − τk)/{(xT, zT)(β0,τk+1 − β0,τk)} = h(τ*, x, z). On the other hand, let τy be the quantile level of y with respect to true quantile function (xT, zT)β0(τ) for y ∈ [(xT, zT) β̂τk, (xT, zT) β̂τk+1), then f {y | x, z, β0(τ)} = h(τy, x, z) by definition. Since the true quantile function (xT, zT)β0(τ) is a continuous function that satisfies the Lipschitz condition, the quantile level of (xT, zT) β̂τk+1 with respect to the true quantile function is . Moreover, due to the uniform convergence of β̂(τ), the quantile level of (xT, zT)β̂τk is , for any k. Therefore, together with the monotonicity of quantile function, we have . Following these arguments, we have
The last step follows from Assumption 5(i) and the fact that Kn/n → 0. Consequently, for any given values of y and z, as n1 → ∞ and Kn → ∞, we have
(A2) |
Let D0(y, z) = ∫ x f {y | x, z, β0(τ)} f (x) dx, and Dn1 (y, z) = ∫x f̂{y | z, x, β̂(τ)} f (x) dx. Since f (x | z) is an integrable function, the convergence (A2) also implies that
(A3) |
It follows that, for any y and z,
(A4) |
The last step is implied by (A2), (A3), together with the facts that D0(y, z) > 0 for any (y, z), and the density f (x) is bounded away from infinity under Assumption 4. Moreover, the distance between the two objective functions can be written as
where g(y, x, z) = supβ∈Ω ρτ {y − (xT, zT)β} f (y, z). Since x has bounded support, and Ω is a compact set, under the assumptions that E(y) < ∞ and E(z) < ∞, the function g(y, x, z) is integrable, i.e.,
(A5) |
On the other hand, due to the uniform convergence of β̂(τ), there exists a constant C1, such that for large enough n1, f̂{y | x, z, β̂(τ)} ≤ h(τy, x, z) + C1. Following Assumption 5(i), the quantile density function h(τ, x, z) is bounded for any τ, x and z, it follows that f̂{y | x, z, β̂(τ)} is bounded for any (y, x, z).Moreover, since f (x) is bounded with bounded support, Dn1(y, z) is also bounded. Consequently, the estimated density f̂(x | y, z) is bounded for any (y, x, z). Following the dominated convergence theorem, the convergence (A4), the integrability (A5) and the boundedness of f̂(x | y, z) together imply the convergence supβ∈Ω |S̃0(β) − S0(β)| = 0p(1) as n1 and Kn → ∞.
Since S0(β) is a continuous function, and uniquely minimized in β0,τ, following the arguments in Amemiya (1985, pp. 106–8), the convergence (A1) suffices for , where is the minimizer of S̃0(β). Recall that is the objective function which is minimized at β̂0(ℓ). Of course,
Then, following standard arguments for M-estimation (van der Vaart, 1998, 44–7), the estimator β̂0(ℓ) converges to in probability, conditioning on the completely observed data. Therefore,
(A6) |
as n0 + n1 → ∞. Thus, we have shown the consistency of β̂0(ℓ).
We now use a Taylor expansion to derive the asymptotic normality of β̂0(ℓ). Define the directional derivative function of as .
Arguments similar to those used in proving He & Shao (1996, Lemma 4.6) yield the uniform convergence result
(A7) |
for any descending sequence δn. Combining (A6) and (A7), we have
(A8) |
Since , we Taylor expand in (A8) around β0,τ, so that
and thus β̂0(ℓ) has Bahadur representation
(A9) |
where . Since the conditional density of f̂(x | yj, zj) converges to the true density f (x | yj, zj) as n1 → ∞ for any x, the joint distribution of (yj, x̃j(ℓ), zj) converges to the joint distribution of (yi, x̃i, zi) as n1 → ∞. Consequently, using Assumption 1 and the dominated convergence theorem, we have that converges to ψτ in probability as n1 → ∞, and converges to V0 as n1 → ∞. It follows that β̂0(ℓ) is asymptotically normally distributed with mean β0,τ and covariance matrix . This finishes our analysis of β̂0(ℓ).
We now define
as the estimated coefficient using the ℓth assembled complete data. Following similar lines in proving (A9) by treating the observed xi as an imputed value using the true density function f (x | z, η0), we have
(A10) |
where . Using the law of large numbers, the matrix ψn1,τ converges to ψτ in probability. On the other hand, recall that β̂τ is the estimated coefficient based on n1 complete data only. For any τ, β̂τ has the Bahadur representation (Koenker, 2005, Equation (4.4),
(A11) |
Combining (A9)– (A11), and using , we obtain
(A12) |
where
It follows immediately from the central limit theorem that 𝒰n → N(0, V1) in distribution. On the other hand, conditioning on the complete data, 𝒱n(ℓ) converges to N(0, Vn) in distribution, where . Since Vn converges to V0 with the increase of the total sample size, 𝒱n(ℓ) converges to N(0, V0) in distribution as n goes to infinity by Slutsky’s theorem. Because N(0, V0) does not depend on the complete data, this is also the limit of the marginal distribution of 𝒱n(ℓ). Moreover, it is easy to show that E(𝒰n𝒱n(ℓ)) → 0 and cov(𝒱n(ℓ), 𝒱n(ℓ′)) → U0. It follows that , where Σ = (λ + 1) −1V1 + (1 + 1/λ)−1[m−1V0 + {(m − 1)/m}U0], as claimed.
A2. Implementing the shrinkage estimator
Define . Let Γ be the estimated covariance matrix of ℬ̂, which is derived on a case-by-case basis. Then θ̂τ = (θ̂1,τ, …, θ̂p,τ)T = β̃τ − β̂τ. Let V̂ be the estimated covariance matrix of θ̂τ, with diagonal elements (υ̂1, …, υ̂p). Define
and define G = (K, Ip − K). Then the shrinkage estimator is β̂s(τ) = Gℬ̂. Its estimated covariance matrix is côv{β̂s(τ)} = GΓ̂ GT.
We estimate the covariance matrixes of β̃τ and β̂τ based on their Bahadur representations (A11) and (A12), respectively. That requires the estimation of the variance component matrices, ψn1,τ, ψn0,τ, the variances of 𝒰n and 𝒱n(ℓ) and the covariance of 𝒱n(ℓ) and 𝒱n(ℓ′). In what follows, we provide sample estimation of those variance component matrices. First, , where
Here hτ is the bandwidth chosen by the method of Hall & Sheather (1988). Compared with the density estimator that we used in the estimation procedure, here we incorporated a bandwidth selection hτ to improve the stability of f̂i (τ). Of course, . Following similar lines, we approximate this last term by
where the estimated density function is
Following the linear expansions of β̃τ and β̂τ, we first estimate var(𝒰n) and var(𝒱n(ℓ)), and cov(𝒱n(ℓ), 𝒱n(ℓ′)) using sample variances, i.e., we define the estimator
Let Qℓ be sample covariance matrix of . The variance component matrix var(𝒱n(ℓ)), for any ℓ, can be estimated by
For any ℓ ≠ ℓ′, we define Q(ℓ, ℓ′) as the sample covariance matrix between
We define
for any (ℓ, ℓ′). With the considerations above, we have
where λ̂ = n0/n1. Consequently, the estimated covariance matrix of β̃τ is , and the estimated covariance matrix of β̂τ is . Since 𝒰n and 𝒱n(l) are asymptotically independent and have means zero, we have that E(𝒰n𝒱n(ℓ)) = o(1).We can estimate the covariance between β̃τ and β̂τ by . Assembling these components together, we obtain
Footnotes
Supplementary material
Supplementary material available at Biometrika online includes additional simulation results with higher rates of missing data and stronger and weaker covariate correlations. Also included are the quantile-quantile plots for the transformed covariates in the data analysis.
Contributor Information
Ying Wei, Email: ying.wei@columbia.edu, Department of Biostatistics, Columbia University, 722 West 168th St., New York, New York 10032, U.S.A.
Yanyuan Ma, Email: ma@stat.tamu.edu, Department of Statistics, Texas A&M University, College Station, Texas 77843-3143, U.S.A.
Raymond J. Carroll, Email: carroll@stat.tamu.edu, Department of Statistics, Texas A&M University, College Station, Texas 77843-3143, U.S.A.
References
- Amemiya T. Advanced Econometrics. Boston: Harvard University Press; 1985. [Google Scholar]
- Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. 2nd ed. London: Chapman and Hall CRC Press; 2006. [Google Scholar]
- Chen YH, Chatterjee N, Carroll RJ. Shrinkage estimators for robust and efficient inference in haplotype-based case-control studies. J. Am. Statist. Assoc. 2009;104:220–233. doi: 10.1198/jasa.2009.0104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hall P, Sheather S. On the distribution of a studentized quantile. J. R. Statist Soc. B. 1988;50:381–391. [Google Scholar]
- He X, Shao QM. A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs. Ann. Statist. 1996;24:2608–2630. [Google Scholar]
- Koenker R. Quantile Regression. Cambridge: Cambridge University Press; 2005. [Google Scholar]
- Koenker R, Bassett GJ. Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]
- Koenker R, Xiao ZJ. Unit root quantile autoregression inference. J. Am. Statist. Assoc. 2004;99:775–787. [Google Scholar]
- Lipsitz SR, Fitzmaurice GM, Molenberghs G, Zhao LP. Quantile regression methods for longitudinal data with drop-outs: application to CD4 cell counts of patients infected with the human immunodeficiency virus. Appl. Statist. 1997;46:463–476. [Google Scholar]
- Little RJA, Rubin DB. Statistical Analysis with Missing Data. New York: Wiley; 1987. [Google Scholar]
- Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Statist. Assoc. 1995;90:106–121. [Google Scholar]
- Subar AF, Thompson FE, Kipnis V, Midthune D, Hurwitz P, Mcnutt S, Mcintosh A, Rosenfeld S. Comparative validation of the Block, Willett, and National Cancer Institute Food Frequency Questionnaires: the Eating at American’s Table Study. Am. J. Epidemiol. 2001;154:1089–1099. doi: 10.1093/aje/154.12.1089. [DOI] [PubMed] [Google Scholar]
- Tsiatis AA. Semiparametric Theory and Missing Data. New York: Springer; 2006. [Google Scholar]
- Van der Vaart AW. Asymptotic Statistics. Cambridge: Cambridge University Press; 1998. [Google Scholar]
- Wei Y, Carroll RJ. Quantile regression with measurement error. J. Am. Statist. Assoc. 2009;104:1129–1143. doi: 10.1198/jasa.2009.tm08420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Welsh AH. Asymptotically efficient estimation of the sparsity function at a point. Statist. Prob. Lett. 1988;6:427–432. [Google Scholar]
- Yi GY, He W. Median regression models for longitudinal data with dropouts. Biometrics. 2009;65:618–625. doi: 10.1111/j.1541-0420.2008.01105.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.