Multiple imputation in quantile regression

Ying Wei; Yanyuan Ma; Raymond J Carroll

doi:10.1093/biomet/ass007

. Author manuscript; available in PMC: 2014 Jun 16.

Published in final edited form as: Biometrika. 2012;99(2):423–438. doi: 10.1093/biomet/ass007

Multiple imputation in quantile regression

Ying Wei ¹, Yanyuan Ma ², Raymond J Carroll ³

PMCID: PMC4059083 NIHMSID: NIHMS568523 PMID: 24944347

Summary

We propose a multiple imputation estimator for parameter estimation in a quantile regression model when some covariates are missing at random. The estimation procedure fully utilizes the entire dataset to achieve increased efficiency, and the resulting coefficient estimators are root-n consistent and asymptotically normal. To protect against possible model misspecification, we further propose a shrinkage estimator, which automatically adjusts for possible bias. The finite sample performance of our estimator is investigated in a simulation study. Finally, we apply our methodology to part of the Eating at American’s Table Study data, investigating the association between two measures of dietary intake.

Keywords: Missing data, Multiple imputation, Quantile regression, Regression quantile, Shrinkage estimation

1. Introduction

In many regression-type applications, some observations are missing. Ignoring the missing data will undermine study efficiency, and sometimes introduce substantial bias. There is a large literature dealing with missing data; see Little & Rubin (1987) for an early and still fundamental treatment. Quantile regression (Koenker & Bassett, 1978) has been an increasingly important modelling tool, due to its flexibility in exploring how covariates affect the distribution of the response. However, combining quantile regression with missing data is not a well-developed topic. In this paper, we consider a linear quantile regression model, where for τ ∈ (0, 1),

Q_{τ} (y) = x^{T} β_{1, τ} + z^{T} β_{2, τ} .

(1)

Here (x, z) are both covariate vectors, but x may be missing, while z is always observed. We assume that z contains the constant 1, so the intercept term is not written out separately. We use n for the total sample size, and assume that n₁ of these n observations are complete, while the remaining n₀ of them have x missing. Thus, observations can be summarized as {(y_i, x_i, z_i) : i =1, …, n₁} and {(y_j, ·, z_j) : j = n₁ + 1, …, n}. To avoid trivial situations, we assume 0 < lim_n→∞ n₀/n₁ = λ < ∞. We make a missing at random assumption that conditional on z, missingness and x are independent. The main interest of this paper is in estimating the regression parameter $β_{τ} = {(β_{1, τ}^{T}, β_{2, τ}^{T})}^{T}$ given the assumed missing data mechanism. This research is motivated by the Eating at American’s Table Study (Subar et al., 2001), an important study in nutritional epidemiology. In § 5, we describe how this study fits our model framework.

It is not difficult to see that since missingness depends only upon the observed covariates z, using the complete data only yields a consistent estimate of β_τ. However, since a part of the data is completely excluded from the analysis, this practice can be highly inefficient. The main goal of this paper is to propose a multiple imputation method to include the incomplete data, so as to improve estimation efficiency. Since additional assumptions on (x, z) are needed to facilitate the imputation procedure, the method risks being inconsistent and we propose a shrinkage estimator to attenuate this risk. The final estimator has an automatic data-driven shrinkage parameter, which guarantees that the resulting estimator is consistent regardless of the correctness of the additional assumptions, and at the same time is more efficient than using the complete data only.

Most existing methods handling missing data are likelihood-based, and hence cannot be applied to quantile regression directly, since there is no likelihood function for quantile regression. Lipsitz et al. (1997) considered an inverse probability approach for longitudinal data with drop-outs. For the same type of data, Yi & He (2009) extended the inverse probability weighted generalized estimating equations proposed by Robins et al. (1995) to correct for the bias from longitudinal drop-out. Our setting is different from those methods, since we are dealing with missing covariates, rather than missing outcomes.

Throughout the paper, we write Q_τ (y) as the τth quantile of a random variable y. We write β(τ) as the quantile coefficient process for τ ∈ (0, 1), and β_τ as the quantile coefficient specifically at the τth quantile. In addition, we use ‖x‖ to mean Euclidean norm, and write g′ (x) as the first derivative of an arbitrary function g(x). If x and y are two random variables, then E_(x,y){g(x, y)} stands for the expectation of g(x, y) over the joint distribution of (x, y).

2. Estimation with multiple imputation

2·1. Method

In this section, we propose a multiple imputation estimator of the quantile coefficient $β_{τ} = {(β_{1, τ}^{T}, β_{2, τ}^{T})}^{T}$ in the linear quantile model (1). The method has the following steps.

Step 1. Perform quantile regression with the complete data only. Run a quantile regression using the complete data only and write the resulting coefficients as β̂_τ. That is, for a set of τ values in (0, 1), obtain ${β̂}_{τ} = {arg min}_{β} \sum_{i = 1}^{n_{1}} ρ_{τ} {y_{i} - (x_{i}^{T}, z_{i}^{T}) β}$ , where ρ_τ (r) = r {τ − I (r < 0)} is an asymmetric L₁ loss function. In practice, τ is typically chosen to be evenly spread and sufficiently dense grid points on (0, 1).

Step 2a: Estimate the conditional density f (y | x, z). Under the assumption that the linear quantile model (1) holds for all quantile levels τ, we can write the conditional density f (y | x, z) as a function of the quantile coefficient process, that is, f {y | x, z; β₀(τ)} = F′{y | x, z; β₀(τ)}, where F{y | x, z; β₀(τ)} = inf {τ ∈ (0, 1) : (x^T, z^T)β₀(τ) > y} and β₀(τ) is the true quantile coefficient process. We write the conditional density f (y | x, z) as f {y | x, z; β₀(τ)} to indicate its dependence on the quantile coefficient function β₀(τ).

Although the unknown coefficient function β₀(τ) is of infinite dimension, it can be well-approximated by a natural linear spline expanding from a series of estimated β̂_{τ_k} at a fine grid of quantile levels (τ_k). Specifically, we choose quantile levels τ_k = k/(K_n + 1) (k =1, …, K_n), where K_n is the number of quantile levels. We then define β̂(τ) as a p-dimensional piecewise linear function on [0,1], which satisfies β̂(τ_k) = β̂_{τ_k} and β̂′ (0) = β̂′ (1) = 0. Under the conditions in Wei & Carroll (2009), β̂(τ) converges uniformly to the true quantile coefficient process in probability. The quantile function is the inverse distribution function, so the density function can be expressed as the reciprocal of the first derivative of the quantile function at the corresponding quantile level. Consequently, we can approximate the conditional density function by

f̂ {y | x, z, β̂ (τ)} = \sum_{k = 1}^{K_{n}} \frac{τ_{k + 1} - τ_{k}}{(x^{T}, z^{T}) {β̂}_{τ_{k + 1}} - (x^{T}, z^{T}) {β̂}_{τ_{k}}} I {(x^{T}, z^{T}) {β̂}_{τ_{k}} \leq y < (x^{T}, z^{T}) {β̂}_{τ_{k + 1}}} .

Here f {y | x, z, β̂(τ)} is the previously defined density function that is induced from the estimated conditional quantile function (x^T, z^T) β̂(τ).

Step 2b: Estimate the conditional density f (x | z). The remaining problem is to estimate f (x | z). We model x given z parametrically as f (x | z, η). The missing-at-random assumption facilitates the estimation of η based on the complete data. We write the estimate as η̂, and the estimated conditional density of x given z as f (x | z, η̂).

Step 2c: Estimate the conditional density f (x | y, z) and impute the missing x accordingly. The estimated conditional density function is f̂ (x | y_j, z_j) ∝ f̂{y_j | x, z_j, β̂(τ)} f (x | z_j, η̂). For each j =n₁ + 1, …, n, we simulate the missing x_j from f̂(x | y_j, z_j) by randomly drawing a Un(0,1) random variable, and inserting it into the quantile function F̂⁻¹(u | y_j, z_j), for u ∈ (0, 1) that is derived from the estimated f̂(x | y_j, z_j). Let u_ℓ be the ℓth generated Un(0,1) random variable. We then define x̃_j(ℓ) = F⁻¹(u_ℓ | y_j, z_j), the ℓth imputed x associated with (y_j, z_j). Consequently, x̃_j(ℓ) ~ f̂(x | y_j, z_j).

Step 3. Re-estimate β including the imputed data. We assemble a new objective function including the completely observed data and the ℓth imputed dataset as

S_{n (ℓ)} (β) = \sum_{i = 1}^{n_{1}} ρ_{τ} {y_{i} - (x_{i}^{T}, z_{i}^{T}) β} + \sum_{j = n_{1} + 1}^{n} ρ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β},

and define β̂_*(ℓ) = argmin_β S_n(ℓ)(β) as the estimated coefficient using the ℓth assembled complete data. We repeat this imputation-estimation step m times, and the multiple imputation estimator is ${β̃}_{τ} = m^{- 1} \sum_{ℓ = 1}^{m} {β̂}_{* (ℓ)}$ .

2·2. Large-sample properties of the multiple imputation estimator

In this section, we establish the consistency and asymptotic normality of the multiple imputation estimator β̃_τ. Let δ = 0 when x is missing and δ =1 otherwise.We first reiterate the assumption on the missingness mechanism.

Assumption 1. For all z, pr(δ =1 | x, y, z) = pr(δ =1 | z) > 0.

Assumption 1 ensures that, conditioning on z, the event that x is missing is independent of x and the response y. We then introduce two identifiability conditions.

Assumption 2. There exists a β_0,τ ∈ ℝ^p such that β_0,τ uniquely minimizes the objective function S₀(β) = E_(y,x,z)[ρ_τ {y − (x^T, z^T)β}].

Define S̃₀(β) = E_{(y, x̃,z)}[ρ_τ {y − (x̃^T, z^T)β}], where, given (y, z), x̃ follows the conditional distribution f̂ (x | y, z). Since f̂ is estimated from completely observed data, this expectation is also conditional on the n₁ completely observed data. We then make the following assumptions.

Assumption 3. There exists a compact set Ω ∈ R^p, and $β_{τ}^{*} \in Ω$ , such that $β_{τ}^{*} = {arg min}_{β} {S̃}_{0} (β)$ .

Assumption 4. The covariate x has bounded support 𝒳. The true conditional density f (x | z) = f (x | z, η =η₀), where f (x | z, η) is a continuous function of η uniformly for (x, z) in a neighbourhood of η₀ and is bounded away from zero and infinity for all (x, z).

Recall that for any x and z, (x^T, z^T)β₀(τ) defines the conditional quantile function of y given x and z. We further define a functional $h (τ; x, z) = 1 / {x^{T}, z^{T}) β_{0}^{'} (τ)}$ , which is the density of y given x and z at the τth quantile. We call this the conditional quantile density function. Its reciprocal is known as the sparsity function (Welsh, 1988; Koenker & Xiao, 2004). With these definitions, we now introduce the smoothness conditions on β₀(τ).

Assumption 5. The true coefficient functions β₀(τ) are smooth functions on (0, 1), and for any x ∈ 𝒳 and z,

0 < h(τ; x, z) < ∞, and lim_τ→0 h(τ ; x, z) = lim_τ→1 h(τ; x, z) = 0;
there exist constants M and ν₁, ν₂ > −1 such that the first derivative of h(·) satisfies
$sup_{x} | h' (τ; x, z) | < M τ^{ν_{1}} {(1 - τ)}^{ν_{2}} .$ (2)

Assumption 5 is similar to Assumption 3 in Wei & Carroll (2009). Assumption 5(i) implies that the conditional density f (y | x, z) is continuous, bounded away from zero and infinity and diminishes to zero as τ converges to 0 and 1, while Assumption 5(ii) is on the tail behaviour of f (y | x, z), since h′ (τ; x, z) determines how smoothly the density function diminishes as the quantile level converges to 0 or 1. Smaller ν₁ and ν₂ indicate heavier tails of the conditional distribution of y given x and z. Assumption 5(ii) covers a wide range of distributions, such as the exponential, Gaussian and the Student t-distributions.Assumption 5, together with Assumptions 2 and 4, ensures the uniform convergence of β̂(τ) over the intervals [1/(k_n + 1), k_n/(k_n + 1)], which in turn ensures consistent estimation of f (y | x, z).

Assumption 6. The matrix $ψ_{τ} = (\partial / \partial β_{0, τ}) E [φ_{τ} {y_{i} - (x_{i}^{T}, z_{i}^{T}) β_{0, τ}} {(x_{i}^{T}, z_{i}^{T})}^{T}]$ , is positive definite, where ϕ_τ (r) = τ − I {r < 0}.

In addition, we also make the definitions

V_{1} = var [φ_{τ} {y_{i} - (x_{i}^{T}, z_{i}^{T}) β_{0, τ}} {(x_{i}^{T}, z_{i}^{T})}^{T}], V_{0} = lim_{n \to \infty} var [φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T}], U_{0} = lim_{n \to \infty} cov [φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T}, φ_{τ} {y_{j} - ({x̃}_{j (ℓ')}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ')}^{T}, z_{j}^{T})}^{T}] .

With these assumptions and notation, we now present the asymptotic behaviour of β̃_τ. Recall that 0 < lim_n→∞ n₀/n₁ = λ < ∞.

Theorem 1. Under Assumptions 1–6, for K_n→ ∞ and K_nn⁻¹ → 0, the multiple imputation estimator $n^{1 / 2} ({β̃}_{τ} - β_{0, τ}) \to N (0, ψ_{τ}^{- 1} Σ ψ_{τ}^{- 1})$ in distribution, where Σ = (λ + 1)⁻¹ V₁ + (1 + 1/λ)⁻¹[m⁻¹V₀ + {(m − 1)/m}U₀].

The proof of Theorem 1 is provided in Appendix A1, while estimates of ψ_τ and Σ are provided in Appendix A2.

Remark 1. Throughout, we use the phrase complete-data analysis to mean an analysis based only on the completely observed data. The asymptotic variance of the estimator using the completely observed data only is $n_{1}^{- 1} ψ_{τ}^{- 1} V_{1} ψ_{τ}^{- 1}$ . Comparing with the estimation variance $n^{- 1} ψ_{τ}^{- 1} Σ ψ_{τ}^{- 1}$ of the imputed estimator, we see two sources of difference. First, the multiple imputation estimator has an effective sample size n, larger than that for the complete-data analysis, which helps to improve its efficiency. Second, the multiple imputation estimator has additional sources of variability, including the sampling variability from multiple imputation, the inherited variability from using the complete-data estimated parameters and their correlations. Hence, the multiple imputation estimators might be less efficient than the complete-data estimator. Such phenomena are common for multiple imputation estimators; see Tsiatis (2006, Ch. 14). In practice, one could assess the variabilities of both estimators to decide which to use; see Appendix A2.

3. Shrinkage estimation

The estimator β̂_τ using the complete data only is consistent, but has a potential loss of efficiency. The multiple imputation estimator β̃_τ is generally more efficient, as will be demonstrated via simulations in § 4. However, imputation may cause bias when the parametric likelihood for x given z is misspecified. There are many ways to balance the two estimators, including test-pretest estimation after testing for the parametric model, but a simple and general strategy that we adopt is a shrinkage estimator, as follows. Let θ̂_τ = β̂_τ − β̃_τ be the componentwise differences of the multiple imputation and complete-data estimators, respectively, with elements (θ̂_1,τ, …, θ̂_p,τ)^T. Let V be the covariance matrix of θ̂_τ with diagonal elements (υ₁₁, …, υ_pp). Then Chen et al. (2009) suggest the estimator

{β̂}_{τ}^{(s)} = {β̂}_{τ} + K ({β̃}_{τ} - {β̂}_{τ}),

(3)

where K is a diagonal matrix with jth $diagonal element = υ_{j j} / (υ_{j j} + {θ̂}_{j, τ}^{2})$ . Recall that the asymptotic variances υ_jj (j = 1, …, p) are quantities of order n⁻¹. The idea behind this method is that if there is no bias, then ${θ̂}_{j, τ}^{2} = O_{p} (n^{- 1})$ and the shrinkage factor K is between 0 and I, so that the multiple imputation estimator and the complete-data estimator both receive weight, although emphasis is on the former. Conversely, if there is a bias, then ${θ̂}_{j, τ}^{2} = O (1)$ , and the elements of K → 0, so that the complete-data estimator asymptotically has weight 1.

Details of implementing the shrinkage estimator are given in Appendix A2. In Appendix A1, we show that the complete-data estimator and the multiple imputation estimator have linear expansions, based on which we outline in Appendix A2 estimation of the joint covariance matrix of (β̂_τ, β̃_τ). The results enable us to estimate V easily and also mean that the formulae in Chen et al. (2009) are applicable, so that we can construct an estimator of cov( $({β̂}_{τ}^{(s)})$ ). The general theory for such shrinkage estimators is given by Chen et al. (2009), although constructing the estimate of Σ is nontrivial because of our context.

4. Simulations

Here we investigate the performance of our multiple imputation estimator β̃_τ and shrinkage estimator ${β̂}_{τ}^{(s)}$ based on Monte-Carlo simulations. We first consider two models.

y_{i} = 1 + x_{i} + z_{i} + e_{i 2},

(4)

y_{i} = 1 + x_{i} + z_{i} + (0 \cdot 5 x_{i} + 0 \cdot 5 z_{i}) e_{i 1},

(5)

where the errors e_i1 and e_i2 are independent and standard normal, and the covariates (x_i, z_i) are jointly normal with mean vector (4, 4)^T, variances (1, 1)^T and correlation 0·5. In model (4), the true intercept at the τ th quantile is 1+Q_τ (z), where z is a random variable with a standard normal distribution, and both coefficients associated with x_i and z_i equal 1 at every quantile. In model (5), the true intercept equals 1 at every quantile level, but the two slope coefficients vary across the quantiles, both equal to 1 + 0·5Q_τ (z) at quantile level τ. In both models, we further assume that x_i is missing with probability pr(x_i is missing | z_i) = max[0, {(z_i − 3)/10}^1/20], which results in approximately 25% missing x_is. We then apply the multiple imputation estimation and shrinkage estimation procedures to the simulated data from the two models above. In both settings, the density f (x | z) is estimated by maximum likelihood estimation correctly assuming a joint normal distribution. When the covariates x and z are negative, there is an identifiability issue in model (5) since the distribution of e_i1 is symmetric around 0. To avoid this trivial situation, we only kept the pairs (x, z) satisfying x + z > 0 in model (5). Because the probability of x + z < 0 is very small, the resulting true joint probability density function of (x; z) is very close to the joint normal distribution which we used in the imputation procedure. We choose m =10 in the multiple imputation estimation algorithm. The sample size was n = n₀ + n₁ = 200. The shrinkage factor is estimated following Appendix A2.

Table 1 displays the means and the standard errors of the estimated quantile coefficients in models (4) and (5) from 500 simulations at τ = 0·1, 0·5 and 0·9, using the three estimation approaches. The upper half of Table 1 displays the coefficients from model (4), while the bottom half shows those from model (5). All three methods are nearly unbiased. However, as expected from the theory, the variances of the multiple imputation estimators are smaller than the complete-data estimators, especially in the coefficient associated with z_i. Such efficiency improvement is more evident for the heteroscedastic model (2). For example, for estimating the z_i slope at the 0·9th quantile, the relative efficiency of multiple imputation estimation compared with using the complete data only, i.e., the ratio of their variances, is 217%, and that of shrinkage estimation is 149%. To investigate the performance of our methods in various model settings, we also allowed higher missing proportions, and weaker or stronger correlation between the covariates x and z. The resulting estimated coefficients and their standard errors are included in the Supplementary Material. On the basis of those tables, the proposed estimators performed well across various model specifications.

Table 1.

Means and standard errors of the estimated coefficients at quantile levels 0·1, 0·5 and 0·9 from 500 simulations in models (4) and (5)

Intercept		True	τ = 0.1 Mean	SE	True	τ = 0.5 Mean	SE	True	τ = 0.9 Mean	SE
Model (4)
	β̂	−0·28	−0·27	0·29	1·00	1·00	0·21	2·28	2·28	0·28
	β̃	−0·28	−0·25	0·23	1·00	1·03	0·17	2·28	2·29	0·24
	β̂^(s)	−0·28	−0·27	0·25	1·00	1·01	0·19	2·28	2·29	0·25
x	β̂	1·00	0·99	0·16	1·00	1·00	0·12	1·00	1·00	0·16
	β̃	1·00	0·94	0·16	1·00	0·98	0·11	1·00	0·96	0·16
	β̂^(s)	1·00	0·98	0·16	1·00	0·99	0·12	1·00	0·99	0·16
z	β̂	1·00	1·01	0·20	1·00	1·00	0·14	1·00	1·00	0·19
	β̃	1·00	1·03	0·18	1·00	1·01	0·12	1·00	1·03	0·16
	β̂^(s)	1·00	1·02	0·19	1·00	1·01	0·13	1·00	1·01	0·17
Model (5)
	β̂	1·00	0·69	3·88	1·00	0·61	2·68	1·00	1·00	3·16
	β̃	1·00	0·93	2·04	1·00	0·84	1·59	1·00	1·56	2·11
	β̂^(s)	1·00	0·58	3·22	1·00	0·69	2·24	1·00	1·37	2·64
x	β̂	0·36	0·43	0·62	1·00	1·01	0·51	1·64	1·68	0·62
	β̃	0·36	0·39	0·52	1·00	0·92	0·46	1·64	1·45	0·53
	β̂^(s)	0·36	0·42	0·58	1·00	0·98	0·49	1·64	1·62	0·60
z	β̂	0·36	0·39	0·92	1·00	1·08	0·64	1·64	1·59	0·77
	β̃	0·36	0·36	0·52	1·00	1·11	0·41	1·64	1·68	0·52
	β̂^(s)	0·36	0·41	0·78	1·00	1·09	0·54	1·64	1·60	0·64

Open in a new tab

β̂, the estimated coefficient using the completely observed data only; β̃, the multiple imputation estimator with 10 imputations; β̂^(s), the shrinkage estimator; True, the true coefficients; SE, standard errors.

The results in Table 1 are obtained when f (x | z) is estimated from the correct model. To investigate the potential bias that could be induced from misspecified f (x | z), we simulate covariates (x_i, z_i) as x_i = (0·18u_i,1, + 0·68u_i,2) + 3·14, and z_i = (0·68u_i,1 + 0·18u_i,2) + 3·14, where u_i,1 and u_i,2 are two independent $χ_{1}^{2}$ random variables. We choose the constants, 0·18, 0·68 and 3·14, such that (x_i, z_i) have mean 4, variance 1 and correlation of approximately 0·5, as in the earlier simulation. After simulating the nonnormally distributed covariates, we then generate the responses from model (2). For each generated sample, we allow x_i to be missing completely at random with probability 0·25. We apply the same estimation procedures as above, pretending that (x_i, z_i) is jointly normal. Table 2 presents the mean squared errors and standard errors for the resulting estimated coefficients at τ = 0·1, 0·5 and 0·9. As a comparison, we also re-estimate the coefficients using the imputation method, but use the exact density f (x | z) in the algorithm. On the basis of Table 2, the mean squared errors from the multiple imputation estimators with the exact f (x | z) are the smallest. As expected, when f (x | z) is misspecified, the mean squared errors are inflated, and the shrinkage estimates have smaller mean squared errors due to the bias correction. Since the complete-data approach only uses part of the data for estimation, its mean squared errors are even larger than the multiple imputation estimator with misspecified f (x | z). Finally, the difference between the multiple imputation estimators using exact and misspecified densities are small relative to their standard errors, indicating that the multiple imputation estimator is also fairly robust against the misspecification of f (x | z).

Table 2.

Mean squared errors of the estimated coefficients at quantile levels 0·1, 0·5 and 0·9 from 500 simulations in model (2) when f (x | z) is misspecified

	τ =0·1		τ =0·5		τ =0·9
	MSE	SE	MSE	SE	MSE	SE
β̂	1·58	0·09	0·76	0·05	1·33	0·08
β̃	1·49	0·08	0·70	0·04	1·14	0·06
β̂^(s)	1·36	0·08	0·72	0·04	1·07	0·07
β̃*	1·31	0·07	0·68	0·04	1·02	0·06

Open in a new tab

β̂, the estimated coefficient using the completely observed data only; β̃, the multiple imputation estimator with 10 imputations; β̂^(s), the shrinkage estimator; β̃*, the multiple imputation estimator using the exact f (x | z); SE, the standard error of the mean squared error; MSE, mean squared errors.

5. Application

We illustrate the performance of our methods using part of the Eating at American’s Table Study (Subar et al., 2001). The dataset consists of 1418 subjects who participated in this study from September 1997 to August 1998. They were required to complete a 24-hour recall on their dietary intakes, and they also completed a dietary history questionnaire. It is commonly thought that the 24-hour recall is an unbiased measure of dietary intake, but is expensive in cohort studies because it must be administered multiple times, and thus costs far more than the dietary history questionnaire. In measurement error modelling of diet and disease, the regression calibration method (Carroll et al., 2006) is to regress the 24-hour recall on the dietary history questionnaire. Since the distributions of nutrition intakes are commonly skewed, quantile regression is a desirable tool for this modelling.

Here we model carbohydrate intake, with y_i being the 24-hour recall for the ith person, x_i1 the dietary history questionnaire measurement, x_i2 body mass index, x_i3 the participant’s age, x_i4 an indicator of Caucasian ethnic status and x_i5 the gender. The model can be written as

y_{i} = β_{0, τ} + β_{1, τ} x_{i, 1} + β_{2, τ} x_{i, 2} + β_{3, τ} x_{i, 3} + β_{4, τ} x_{i, 4} + β_{5, τ} x_{i, 5} + e_{i} .

(6)

There are 453 randomly selected subjects among the 1418 who do not have measurements of body mass index and did not complete the dietary history questionnaire, because the study was a designed experiment with some participants randomly assigned to complete an alternative questionnaire. Therefore, those covariates are missing completely at random. Here we apply our multiple imputation estimation methodology to obtain the estimate of the βs, with x as the carbohydrate intake in the dietary history questionnaire and body mass index, and z as gender, ethnicity and age.

In these data, we found that the carbohydrate intake measured in the dietary history questionnaire and body mass index are essentially uncorrelated, with partial correlation 0·0084 conditional on the subject’s age and gender. We can thus estimate the conditional density of carbohydrate intakes in the dietary history questionnaire and body mass index separately based on the two Box–Cox transformation models

Λ (x_{i 1}, λ_{1}) = γ_{10} + γ_{11} x_{i 3} + γ_{12} x_{i 4} + γ_{13} x_{i 5} + e_{i 1}, e_{i 1} ~ N (0, σ_{1}^{2}), Λ (x_{i 2}, λ_{2}) = γ_{20} + γ_{21} x_{i 3} + γ_{22} x_{i 4} + γ_{23} x_{i 5} + e_{i 2}, e_{i 2} ~ N (0, σ_{2}^{2}) .

Here Λ(u, λ) is the Box–Cox transformation function, i.e., Λ(u, λ) = log(u) if λ = 0, and Λ(u, λ) = (u^λ − 1)/λ for λ ≠ 0. We used maximum likelihood estimates of the transformation parameters, these being close to 0 and −1, respectively, which suggests that logarithm and reciprocal transformations are needed for carbohydrate intake in the dietary history questionnaire and body mass index, respectively. In the Supplementary Material, we present the quantile-quantile plot of the residuals from the above two models with their respective best fitted powers, which shows that the transformed variables are approximately normally distributed.

On the basis of the estimated models, the conditional density of the untransformed carbohydrate intake in the dietary history questionnaire is f̂_c(υ) = (υσ̂₁)⁻¹ φ[{log(υ)− γ̂₁₀ − γ̂₁₁x₃ − γ̂₁₂x₄ − γ̂₁₃x₅}/σ̂₁], where φ is the density function of standard normal. The conditional density of body mass index is f̂_b(υ) = (υ²σ̂₂)⁻¹φ[{1/υ − γ̂₂₀ − γ̂₂₁x₃ − γ̂₂₂x₄ − γ̂₂₃x₅}/σ̂₂].

Following our multiple imputation algorithm, we estimated model (6) at 50 evenly spaced quantile levels using the completely observed data only in the first step. On the basis of the resulting quantile coefficient process, and the estimated conditional densities f (x | z) using the models above, we imputed the missing carbohydrate intakes and body mass index m = 10 times. In Table 3, we listed the multiple imputation estimators at τ = 0·1, 0·5 and 0·9, as well as their standard errors. To illustrate the improved efficiency from multiple imputation, we calculated the relative efficiency. In addition, we also constructed the shrinkage estimator following (3). The shrinkage factors are estimated following Appendix A2.

Table 3.

Estimated coefficients in the Eating at American’s Table Study

Raw

Multiple imputation

Shrinkage

Covariates

β̂

\hat{s e}

β̃

\hat{s e}

\hat{r e}

(%)

β̂^(s)

\hat{s e}

\hat{r e}

(%)

0·1

0·08

0·06

0·04

0·06

104

Carbohydrate intake

0·5

0·27

0·04

0·24

0·03

109

0·27

0·03

102

0·9

0·60

0·07

0·48

0·07

112

0·59

0·07

101

0·1

−0·94

0·88

−0·84

0·91

−0·85

0·90

Body mass index

0·5

−1·68

0·54

−1·63

0·54

−1·63

0·54

100

0·9

−0·70

1·20

−0·35

1·21

−0·51

1·19

101

0·1

−0·53

0·36

−0·39

0·35

108

−0·42

0·33

117

Age

0·5

−0·86

0·28

−1·00

0·24

136

−0·95

0·25

132

0·9

−1·38

0·62

−1·71

0·51

147

−1·54

0·55

126

0·1

5·95

14·43

14·95

12·57

132

11·61

12·02

144

Caucasian

0·5

4·67

10·87

6·22

8·38

168

6·16

8·37

169

0·9

−38·45

41·02

−1·39

25·57

257

−27·91

35·76

132

0·1

−47·34

11·03

−38·34

10·25

116

−43·20

10·11

119

Gender

0·5

−73·48

8·27

−66·90

7·05

137

−70·77

7·57

119

0·9

−108·07

15·58

−114·92

12·96

145

−113·41

13·59

131

Open in a new tab

$\hat{s e}$ , standard errors following the estimation method described in Appendix A2; $\hat{r e}$ , relative efficiency, which is defined as the ratio between the estimated variance of the complete-data estimator and that of the multiple imputation/shrinkage estimates.

Table 3 shows that the multiple imputation estimators are fairly consistent with those using the complete data only, but have much smaller standard errors for the estimates associated with age, ethnicity and gender. Those variables are completely observed when the dietary history questionnaire carbohydrate intakes and body mass index are missing. The multiple imputation estimators make full use of those observations, which improves their efficiency. The shrinkage estimator is generally consistent with the complete-data and multiple imputation estimators; while its standard errors are slightly larger than the multiple imputation estimators, they are still much smaller than those of the complete-data estimators.

6. Discussion

The validity of our multiple imputation method relies on a correct specification of the conditional density f (x | z), which we model parametrically. To further protect against the possible misspecification of f (x | z), a shrinkage estimator was proposed. One could also opt to estimate f (x | z) nonparametrically, which will automatically yield a consistent estimator without an additional shrinkage step. However, nonparametric conditional density estimation is very complex, especially when z is multivariate, and the slow rates of convergence would undermine the usefulness of such an approach.

The missing covariate problem in the quantile regression context is challenging, because the conditional density of y given the covariates is unspecified under a typical quantile regression setting. Consequently, classical likelihood-based approaches cannot be applied directly. Here, we adopted a joint modelling approach similar to Wei & Carroll (2009) to circumvent this difficulty. However, the proposed method is different from Wei & Carroll (2009) in many aspects. First, the objectives are different. This paper handles missing covariates, while Wei & Carroll (2009) handle mismeasured covariates. Second, the estimation approaches are different. Wei & Carroll (2009) is based on constructing unbiased estimating equations; while this paper uses a multiple imputation approach. Consequently, the estimation algorithms are different; the former involves iterative estimation, while the estimation procedure in this paper does not. Finally, the asymptotic properties are obtained in a very different fashion.

We assumed the conditional quantile functions to be linear at all quantile levels. This assumption holds for location-scale models, i.e., Y = X^Tβ + X^Tγe, where e is a random error with Q_τ (e | X) = 0. If needed, one can easily relax the linear quantile function to an arbitrary nonlinear or even nonparametric function. The algorithm remains largely unchanged, with the minimal adaptation of setting the linear function to be the new regression function in the check function ρ_τ. Although the method is presented for an independent sample, it can also be extended to longitudinal data using the so-called working independence construction. For a longitudinal sample (y_i,j, x_{i, j}, z_{i, j}), if the quantiles of y_{i, j} is linear in (x_{i, j}, z_{i, j}), then we can estimate the quantile coefficients using a similar algorithm with the longitudinal quantile regression objective function $\sum_{i} \sum_{j} ρ_{τ} (y_{i, j} - x_{i, j}^{T} β - z_{i, j}^{T} γ)$ . The estimation of the conditional density f (x | z) also needs to be adapted for the longitudinal data. The resulting estimators would still be consistent, but the limiting distribution would need to be derived separately.

Supplementary Material

Supplement

NIHMS568523-supplement-Supplement.pdf^{(226.5KB, pdf)}

Acknowledgement

Wei's research was supported by the National Science Foundation (DMS-0906568) and a career award from NIEHS Center for Environmental Health in Northern Manhattan (ES009089). Ma's research was supported by a grant from the National Science Foundation (DMS-0906341). Carroll's research was supported by a grant from the National Cancer Institute (CA57030).

Appendix

A1. Technical arguments

Recall that x̃_j(ℓ) is the ℓth imputed x associated with (y_j, z_j), based on the estimated density f̂(x | y_j, z_j). We define a partial objective function with the imputed proportion of the data

{S̃}_{n_{0}}^{(ℓ)} (β) = \sum_{j = n_{1} + 1}^{n} ρ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β},

and define its minimizer ${β̂}_{0, (ℓ)} = {arg min}_{β} {S̃}_{n_{0}}^{(ℓ)} (β)$ .

We say that β̂_0,(ℓ) is the estimated coefficient using the ℓth imputed portion of the data only. In later steps, we show that the multiple imputation estimator β̃_τ can be written as a linear combination of β̂_τ and β̂_0,(ℓ)s. Hence, to find the asymptotic distribution of β̃_τ, a key step is to find the asymptotic distribution of β̂_0,(ℓ) as n = n₀ + n₁ → ∞, and 0 < lim_n n₀/n₁ =λ < ∞. To do that, we first show that

sup_{β \in Ω} | {S̃}_{0} (β) - S_{0} (β) | \to 0

(A1)

in probability as n₁ → ∞. Here S̃₀(β) and S₀(β) are the two expected objective functions defined before Assumptions 2 and 3.

Recall that f̂{y | x, z, β̂(τ)} is the estimated conditional density of y given x and z using the complete data only. We first decompose the difference between the estimated density f̂{y | x, z, β̂(τ)} and its true value as

sup_{x} | f̂ {y | x, z, β̂ (τ)} - f {y | x, z, β_{0} (τ)} = sup_{x} | \sum_{k = 1}^{K_{n}} [f̂ {y | x, z, β̂ (τ)} - f {y | x, z, β_{0} (τ)}] I {(x^{T}, z^{T}) {β̂}_{τ_{k}} \leq y < (x^{T}, z^{T}) {β̂}_{τ_{k + 1}}} - f {y | x, z, β_{0} (τ)} I {y < (x^{T}, z^{T}) {β̂}_{τ_{1}}} - f {y | x, z, β_{0} (τ)} I {y > (x^{T}, z^{T}) {β̂}_{τ_{K_{n}}}} | \leq sup_{x} A_{1} + sup_{x} h (τ_{1}, x, z) + sup_{x} h (τ_{K_{n}}, x, z),

where

sup_{x} A_{1} = sup_{x} \sum_{k = 1}^{K_{n}} | f̂ {y | x, z, β̂ (τ)} - f {y | x, z, β_{0} (τ)} | I {(x^{T}, z^{T}) {β̂}_{τ_{k}} \leq y < (x^{T}, z^{T}) {β̂}_{τ_{k + 1}}} .

Following the definition of f̂{y | x, z, β̂(τ)}, and since for any given value of y, it can only be contained in one of those subintervals {(x^T, z^T) β̂_{τ_k}, (x^T, z^T) β̂_{τ_k+1}}, we have

sup_{x} A_{1} \leq sup_{x} max_{k} | \frac{τ_{k + 1} - τ_{k}}{(x^{T}, z^{T}) ({β̂}_{τ_{k + 1}} - {β̂}_{τ_{k}})} - f {y | x, z, β_{0} (τ)} | I {(x^{T}, z^{T}) {β̂}_{τ_{k}} \leq y < (x^{T}, z^{T}) {β̂}_{τ_{k + 1}}} .

Following the uniform convergence of β̂_τ, readily available from the result in Wei & Carroll (2009) by considering in their context a special case where the measurement error variance is zero, the convergence $(x^{T}, z^{T}) ({β̂}_{τ_{k}} - β_{0, τ_{k}}) = o_{p} (K_{n}^{1 / 2} n_{1}^{- 1 / 2})$ holds uniformly for any k. Consequently, we can rewrite the upper bound as

sup_{x} A_{1} \leq sup_{x} max_{k} | \frac{τ_{k + 1} - τ_{k}}{(x^{T}, z^{T}) (β_{0, τ_{k + 1}} - β_{0, τ_{k}}) + o_{p} (K_{n}^{1 / 2} n_{1}^{- 1 / 2})} - f {y | x, z, β_{0} (τ)} | \times I {(x^{T}, z^{T}) {β̂}_{τ_{k}} \leq y < (x^{T}, z^{T}) {β̂}_{τ_{k + 1}}} = sup_{x} max_{k} | \frac{τ_{k + 1} - τ_{k}}{(x^{T}, z^{T}) (β_{0, τ_{k + 1}} - β_{0, τ_{k}})} - f {y | x, z, β_{0} (τ)} + o_{p} (K_{n}^{1 / 2} n_{1}^{- 1 / 2}) | \times I {(x^{T}, z^{T}) {β̂}_{τ_{k}} \leq y < (x^{T}, z^{T}) {β̂}_{τ_{k + 1}}} .

By the mean value theorem, there exists a τ^* ∈ (τ_k, τ_k+1) such that (τ_k+1 − τ_k)/{(x^T, z^T)(β_{0,τ_k+1} − β_{0,τ_k})} = h(τ^*, x, z). On the other hand, let τ_y be the quantile level of y with respect to true quantile function (x^T, z^T)β₀(τ) for y ∈ [(x^T, z^T) β̂_{τ_k}, (x^T, z^T) β̂_{τ_k+1}), then f {y | x, z, β₀(τ)} = h(τ_y, x, z) by definition. Since the true quantile function (x^T, z^T)β₀(τ) is a continuous function that satisfies the Lipschitz condition, the quantile level of (x^T, z^T) β̂_{τ_k+1} with respect to the true quantile function is $τ_{k + 1} + o_{p} (K_{n}^{1 / 2} n_{1}^{- 1 / 2})$ . Moreover, due to the uniform convergence of β̂(τ), the quantile level of (x^T, z^T)β̂_{τ_k} is $τ_{k} + o_{p} (K_{n}^{1 / 2} n_{1}^{- 1 / 2})$ , for any k. Therefore, together with the monotonicity of quantile function, we have $τ_{k} + o_{p} (K_{n}^{1 / 2} n_{1}^{- 1 / 2}) < τ_{y} < τ_{k + 1} + o_{p} (K_{n}^{1 / 2} n_{1}^{- 1 / 2})$ . Following these arguments, we have

sup_{x} A_{1} = sup_{x} max_{k} | h (τ^{*}, x, z) - h (τ_{y}, x, z) + o_{p} (n_{1}^{- 1 / 2}) | I {τ_{k} + o_{p} (n_{1}^{- 1 / 2}) \leq τ^{*}, τ_{y} \leq τ_{k + 1} + o_{p} (K_{n}^{1 / 2} n_{1}^{- 1 / 2})} \leq sup_{x} max_{k} [| h' (τ_{k}, x, z) | {O (K_{n}^{- 1}) + o_{p} (K_{n}^{1 / 2} n_{1}^{- 1 / 2})} + o_{p} (K_{n}^{1 / 2} n_{1}^{- 1 / 2})] = O (K_{n}^{- ν_{1} Λ ν_{2} - 1}) + o_{p} (K_{n}^{1 / 2} n_{1}^{- 1 / 2}) = o_{p} (1) .

The last step follows from Assumption 5(i) and the fact that K_n/n → 0. Consequently, for any given values of y and z, as n₁ → ∞ and K_n → ∞, we have

sup_{x} ‖ f̂ {y | x, z, β̂ (τ)} - f {y | x, z, β_{0} (τ)} ‖ \leq O_{p} (K_{n}^{- ν_{1} Λ ν_{2} - 1}) + o_{p} (n_{1}^{- 1 / 2}) + sup_{x} h (τ_{1}, x, z) + sup_{x} h (τ_{K_{n}}, x, z) = o_{p} (1) .

(A2)

Let D₀(y, z) = ∫ _x f {y | x, z, β₀(τ)} f (x) dx, and D_n₁ (y, z) = ∫x f̂{y | z, x, β̂(τ)} f (x) dx. Since f (x | z) is an integrable function, the convergence (A2) also implies that

| D_{n_{1}} (y, z) - D_{0} (y, z) | = \int_{x} | f̂ {y | x, z, β̂ (τ)} - f {y | x, z, β_{0} (τ)} | f (x | z) d x = o_{p} (1) .

(A3)

It follows that, for any y and z,

sup_{x} | f̂ (x | y, z) - f (x | y, z) | = sup_{x} | \frac{f̂ {y | x, z, β̂ (τ)} f (x)}{D_{n_{1}} (y, z)} - \frac{f {y | x, z, β_{0} (τ)} f (x | z)}{D_{0} (y, z)} | = sup_{x} | \frac{f̂ {y | x, z, β̂ (τ)} f (x)}{D_{0} (y, z)} - \frac{f {y | x, z, β_{0} (τ)} f (x | z)}{D_{0} (y, z)} | + O_{p} {| D_{n_{1}} (y, z) - D_{0} (y, z) |} = sup_{x} | \frac{[f̂ {y | x, z, β̂ (τ)} - f {y | x, z, β_{0} (τ)}] f (x | z)}{D_{0} (y, z)} | + O_{p} {| D_{n_{1}} (y, z) - D_{0} (y, z) |} = o_{p} (1) .

(A4)

The last step is implied by (A2), (A3), together with the facts that D₀(y, z) > 0 for any (y, z), and the density f (x) is bounded away from infinity under Assumption 4. Moreover, the distance between the two objective functions can be written as

sup_{β \in Ω} | {S̃}_{0} (β) - S_{0} (β) | = sup_{β \in Ω} | E_{(y, x̃, z)} [ρ_{τ} {y - ({x̃}^{T}, z^{T}) β}] - E_{(y, x̃, z)} [ρ_{τ} {y - (x^{T}, z^{T}) β}] | = \int_{(y, x, z)} sup_{β \in Ω} ρ_{τ} {y - (x^{T}, z^{T}) β} f (y, z) | f̂ (x | y, z) - f (x | y, z) | d (y, x, z) ≙ \int_{(y, x, z)} g (y, x, z) | f̂ (x | y, z) - f (x | y, z) | d (y, x, z),

where g(y, x, z) = sup_β∈Ω ρ_τ {y − (x^T, z^T)β} f (y, z). Since x has bounded support, and Ω is a compact set, under the assumptions that E(y) < ∞ and E(z) < ∞, the function g(y, x, z) is integrable, i.e.,

\int_{(y, x, z)} g (y, x, z) d (y, x, z) < \infty .

(A5)

On the other hand, due to the uniform convergence of β̂(τ), there exists a constant C₁, such that for large enough n₁, f̂{y | x, z, β̂(τ)} ≤ h(τ_y, x, z) + C₁. Following Assumption 5(i), the quantile density function h(τ, x, z) is bounded for any τ, x and z, it follows that f̂{y | x, z, β̂(τ)} is bounded for any (y, x, z).Moreover, since f (x) is bounded with bounded support, D_n1(y, z) is also bounded. Consequently, the estimated density f̂(x | y, z) is bounded for any (y, x, z). Following the dominated convergence theorem, the convergence (A4), the integrability (A5) and the boundedness of f̂(x | y, z) together imply the convergence sup_β∈Ω |S̃₀(β) − S₀(β)| = 0_p(1) as n₁ and K_n → ∞.

Since S₀(β) is a continuous function, and uniquely minimized in β_0,τ, following the arguments in Amemiya (1985, pp. 106–8), the convergence (A1) suffices for $‖ β_{τ}^{*} - β_{0, τ} ‖ = o_{p} (1)$ , where $β_{τ}^{*}$ is the minimizer of S̃₀(β). Recall that ${S̃}_{n_{0}}^{(ℓ)} (β) = \sum_{j = n_{1} + 1}^{n} ρ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β}$ is the objective function which is minimized at β̂_0(ℓ). Of course,

n_{0}^{- 1} E {{S̃}_{n_{0}}^{(ℓ)} (β)} = n_{0}^{- 1} \sum_{j = n_{1} + 1}^{n} E_{(y_{j}, {x̃}_{j (ℓ)}, z_{j})} [ρ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β}] = {S̃}_{0} (β) .

Then, following standard arguments for M-estimation (van der Vaart, 1998, 44–7), the estimator β̂_0(ℓ) converges to $β_{τ}^{*}$ in probability, conditioning on the completely observed data. Therefore,

‖ {β̂}_{0 (ℓ)} - β_{0, τ} ‖ \leq ‖ {β̂}_{0 (ℓ)} - β_{τ}^{*} ‖ + ‖ β_{τ}^{*} - β_{0, τ} ‖ = o_{p} (1)

(A6)

as n₀ + n₁ → ∞. Thus, we have shown the consistency of β̂_0(ℓ).

We now use a Taylor expansion to derive the asymptotic normality of β̂_0(ℓ). Define the directional derivative function of ${S̃}_{n_{0}}^{(ℓ)} (β)$ as ${S̃}_{n_{0}}^{' (ℓ)} (β) = \sum_{j = n_{1} + 1}^{n} φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T}$ .

Arguments similar to those used in proving He & Shao (1996, Lemma 4.6) yield the uniform convergence result

sup_{‖ β - β_{0, τ} ‖ < δ_{n} ↓ 0} n_{0}^{- 1 / 2} ‖ {S̃}_{n_{0}}^{' (ℓ)} (β) - {S̃}_{n_{0}}^{' (ℓ)} (β_{0, τ}) - E {{S̃}_{n_{0}}^{' (ℓ)} (β)} + E {{S̃}_{n_{0}}^{' (ℓ)} (β_{0, τ})} ‖ = o_{p} (1),

(A7)

for any descending sequence δ_n. Combining (A6) and (A7), we have

n_{0}^{- 1 / 2} ‖ {S̃}_{n_{0}}^{' (ℓ)} ({β̂}_{0 (ℓ)}) - {S̃}_{n_{0}}^{' (ℓ)} (β_{0, τ}) - E {S̃}_{n_{0}}^{' (ℓ)} ({β̂}_{0 (ℓ)}) + E {S̃}_{n_{0}}^{' (ℓ)} (β_{0, τ}) ‖ = o_{p} (1) .

(A8)

Since ${S̃}_{n_{0}}^{' (ℓ)} ({β̂}_{0 (ℓ)}) \approx 0$ , we Taylor expand $E {{S̃}_{n_{0}}^{' (ℓ)} ({β̂}_{0 (ℓ)})}$ in (A8) around β_0,τ, so that

0 \approx n_{0}^{- 1 / 2} \sum_{j = n_{1} + 1}^{n} φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) {β̂}_{0 (ℓ)}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T} = n_{0}^{- 1 / 2} \sum_{j = n_{1} + 1}^{n} φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T} + n_{0}^{- 1} \frac{\partial \sum_{j = n_{1} + 1}^{n} E {φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T}}}{\partial β_{0, τ}^{T}} n_{0}^{1 / 2} ({β̂}_{0 (ℓ)} - β_{0, τ}) + o_{p} (1),

and thus β̂_0(ℓ) has Bahadur representation

n_{0}^{1 / 2} ({β̂}_{0 (ℓ)} - β_{0, τ}) = - {(Ψ_{n_{0}, τ} / n_{0})}^{- 1} n_{0}^{- 1 / 2} \sum_{i = 1}^{n_{0}} φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T} + o_{p} (1),

(A9)

where $ψ_{n_{0}, τ} = (\partial / \partial β_{0, τ}) \sum_{j = n_{1} + 1}^{n} E [φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T}]$ . Since the conditional density of f̂(x | y_j, z_j) converges to the true density f (x | y_j, z_j) as n₁ → ∞ for any x, the joint distribution of (y_j, x̃_j(ℓ), z_j) converges to the joint distribution of (y_i, x̃_i, z_i) as n₁ → ∞. Consequently, using Assumption 1 and the dominated convergence theorem, we have that $n_{0}^{- 1} ψ_{n_{0}, τ}$ converges to ψ_τ in probability as n₁ → ∞, and $var [φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T}]$ converges to V₀ as n₁ → ∞. It follows that β̂_0(ℓ) is asymptotically normally distributed with mean β_0,τ and covariance matrix $ψ_{τ}^{- 1} V_{0} ψ_{τ}^{- 1}$ . This finishes our analysis of β̂_0(ℓ).

We now define

{β̂}_{* (ℓ)} = arg min_{τ} [\sum_{i = 1}^{n_{1}} ρ_{τ} {y_{i} - (x_{i}^{T}, z_{i}^{T}) β} + \sum_{j = n_{1} + 1}^{n} ρ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β}]

as the estimated coefficient using the ℓth assembled complete data. Following similar lines in proving (A9) by treating the observed x_i as an imputed value using the true density function f (x | z, η₀), we have

n^{1 / 2} ({β̂}_{* (ℓ)} - β_{0, τ}) = - n {(Ψ_{n_{1}, τ}, + Ψ_{n_{0}, τ})}^{- 1} [n^{- 1 / 2} \sum_{i = 1}^{n_{1}} φ_{τ} {y_{i} - (x_{i}^{T}, z_{i}^{T}) β_{0, τ}} {(x_{i}^{T}, z_{i}^{T})}^{T} + n^{- 1 / 2} \sum_{j = n_{1} + 1}^{n} φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T}] + o_{p} (1),

(A10)

where $ψ_{n_{1}, τ} = (\partial / \partial β_{0, τ}) \sum_{i = 1}^{n} E [φ_{τ} {y_{i} - (x_{i}^{T}, z_{i}^{T}) β_{0, τ}} {(x_{i}^{T}, z_{i}^{T})}^{T}]$ . Using the law of large numbers, the matrix ψ_n₁,τ converges to ψ_τ in probability. On the other hand, recall that β̂_τ is the estimated coefficient based on n₁ complete data only. For any τ, β̂_τ has the Bahadur representation (Koenker, 2005, Equation (4.4),

n_{1}^{1 / 2} ({β̂}_{τ} - β_{0, τ}) = - {(Ψ_{n 1, τ} / n_{1})}^{- 1} n_{1}^{- 1 / 2} \sum_{i = 1}^{n_{1}} ψ_{τ} {y_{i} - (x_{i}^{T}, z_{i}^{T}) β_{0, τ}} {(x_{i}^{T}, z_{i}^{T})}^{T} + o_{p} (n_{1}^{- 1 / 2}) .

(A11)

Combining (A9)– (A11), and using $n^{1 / 2} (β̃ - β_{0, τ}) = m^{- 1} \sum_{ℓ = 1}^{m} n^{1 / 2} ({β̂}_{* (ℓ)} - β_{0, τ})$ , we obtain

n^{1 / 2} (β̃ - β_{0, τ}) = - {{(λ + 1)}^{- 1} n_{1}^{- 1} Ψ_{n_{1}, τ} + {(1 + 1 / λ)}^{- 1} n_{0}^{- 1} Ψ_{n_{0}, τ}}^{- 1} \times ({(λ + 1)}^{- 1 / 2} n_{1}^{- 1 / 2} \sum_{i = 1}^{n_{1}} φ_{τ} {y_{i} - (x_{i}^{T}, z_{i}^{T}) β_{0, τ}} {(x_{i}^{T}, z_{i}^{T})}^{T} + [{(1 + 1 / λ)}^{- 1 / 2} m^{- 1} \sum_{ℓ = 1}^{m} n_{0}^{- 1 / 2} \sum_{j = n_{1} + 1}^{n} φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T}]) = - {{(λ + 1)}^{- 1} n_{1}^{- 1} Ψ_{n_{1}, τ} + {(1 + 1 / λ)}^{- 1} n_{0}^{- 1} Ψ_{n_{0}, τ}}^{- 1} \times ({(λ + 1)}^{- 1 / 2} (n_{1}^{- 1} Ψ_{n_{1}, τ}) {n_{1}^{1 / 2} ({β̂}_{τ} - β_{0, τ})} + [{(1 + 1 / λ)}^{- 1 / 2} m^{- 1} \sum_{ℓ = 1}^{m} (Ψ_{n_{0}, τ} / n_{0}) {n_{0}^{1 / 2} ({β̂}_{0 (ℓ)} - β_{0, τ})}]) = - {{(λ + 1)}^{- 1} n_{1}^{- 1} Ψ_{n_{1}, τ} + {(1 + 1 / λ)}^{- 1} n_{0}^{- 1} Ψ_{n_{0}, τ}}^{- 1} \times {{(λ + 1)}^{- 1 / 2} 𝒰_{n} + {(1 + 1 / λ)}^{- 1 / 2} m^{- 1} \sum_{ℓ = 1}^{m} 𝒱_{n (ℓ)}},

(A12)

where

𝒰_{n} = n_{1}^{- 1 / 2} \sum_{i = 1}^{n_{1}} φ_{τ} {y_{i} - (x_{i}^{T}, z_{i}^{T}) β_{0, τ}} {(x_{i}^{T}, z_{i}^{T})}^{T}, 𝒱_{n (ℓ)} = n_{0}^{- 1 / 2} \sum_{j = n_{1} + 1}^{n} φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T} .

It follows immediately from the central limit theorem that 𝒰_n → N(0, V₁) in distribution. On the other hand, conditioning on the complete data, 𝒱_n(ℓ) converges to N(0, V_n) in distribution, where $V_{n} = var [φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}} {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T}]$ . Since V_n converges to V₀ with the increase of the total sample size, 𝒱_n(ℓ) converges to N(0, V₀) in distribution as n goes to infinity by Slutsky’s theorem. Because N(0, V₀) does not depend on the complete data, this is also the limit of the marginal distribution of 𝒱_n(ℓ). Moreover, it is easy to show that E(𝒰_n𝒱_n(ℓ)) → 0 and cov(𝒱_n(ℓ), 𝒱_n(ℓ′)) → U₀. It follows that $n^{1 / 2} (β̃ - β_{0, τ}) \to N (0, ψ_{τ}^{- 1} Σ ψ_{τ}^{- 1})$ , where Σ = (λ + 1) ⁻¹V₁ + (1 + 1/λ)⁻¹[m⁻¹V₀ + {(m − 1)/m}U₀], as claimed.

A2. Implementing the shrinkage estimator

Define $ℬ̂ = {({β̃}_{τ}^{T}, {β̃}_{τ}^{T})}^{T}$ . Let Γ be the estimated covariance matrix of ℬ̂, which is derived on a case-by-case basis. Then θ̂_τ = (θ̂_1,τ, …, θ̂_p,τ)^T = β̃_τ − β̂_τ. Let V̂ be the estimated covariance matrix of θ̂_τ, with diagonal elements (υ̂₁, …, υ̂_p). Define

K = diag (\frac{{υ̂}_{1}}{{υ̂}_{1} + {ψ̂}_{1}^{2}}, \dots, \frac{{υ̂}_{p}}{{υ̂}_{p} + {ψ̂}_{p}^{2}});

and define G = (K, I_p − K). Then the shrinkage estimator is β̂_s(τ) = Gℬ̂. Its estimated covariance matrix is côv{β̂_s(τ)} = GΓ̂ G^T.

We estimate the covariance matrixes of β̃_τ and β̂_τ based on their Bahadur representations (A11) and (A12), respectively. That requires the estimation of the variance component matrices, ψ_n₁,τ, ψ_n₀,τ, the variances of 𝒰_n and 𝒱_n(ℓ) and the covariance of 𝒱_n(ℓ) and 𝒱_n(ℓ′). In what follows, we provide sample estimation of those variance component matrices. First, ${ψ̂}_{n_{1}, τ} = n_{1}^{- 1} \sum_{i = 1}^{n_{1}} - {f̂}_{i} (τ) {(x_{i}^{T}, z_{j}^{T})}^{T} (x_{i}^{T}, z_{j}^{T})$ , where

{f̂}_{i} (τ) = \frac{2 h_{τ}}{(x_{i}^{T}, z_{i}^{T}) {{β̂}_{(τ + h_{τ})} - {β̂}_{(τ - h_{τ})}}} .

Here h_τ is the bandwidth chosen by the method of Hall & Sheather (1988). Compared with the density estimator that we used in the estimation procedure, here we incorporated a bandwidth selection h_τ to improve the stability of f̂_i (τ). Of course, $ψ_{n_{0}, τ} = {lim}_{n_{0} \to \infty} n_{0}^{- 1} \sum_{j} \partial E {φ_{τ} (y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) β_{0, τ}) {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T}} / \partial β^{T}$ . Following similar lines, we approximate this last term by

{Ψ̂}_{n_{0}, τ} \approx n_{0}^{- 1} \sum_{j = n_{1} + 1}^{n} \frac{1}{m} \sum_{ℓ = 1}^{m} - {f̂}_{j (ℓ)} (τ) {({x̃}_{j (ℓ)}^{T}, z_{j}^{T})}^{T} ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}),

where the estimated density function is

{f̂}_{j (ℓ)} (τ) = \frac{2 h}{({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) {{β̂}_{(τ + h_{τ})} - {β̂}_{(τ - h_{τ})}}} .

Following the linear expansions of β̃_τ and β̂_τ, we first estimate var(𝒰_n) and var(𝒱_n(ℓ)), and cov(𝒱_n(ℓ), 𝒱_n(ℓ′)) using sample variances, i.e., we define the estimator

{V̂}_{1} = \hat{var} (𝒰_{n}) = n_{1}^{- 1} \sum_{i = 1}^{n_{1}} φ_{τ}^{2} {y_{i} - (x_{i}^{T}, z_{i}^{T}) {β̂}_{τ}} {(x_{i}^{T}, z_{i}^{T})}^{T} (x_{i}^{T}, z_{i}^{T}) .

Let Q_ℓ be sample covariance matrix of ${[φ_{τ} {y_{j} - ({x̃}_{j, (l)}^{T}, z_{j}^{T}) β_{τ}} {({x̃}_{j, (l)}^{T}, z_{j}^{T})}^{T}]}_{j = n_{1} + 1}^{n}$ . The variance component matrix var(𝒱_n(ℓ)), for any ℓ, can be estimated by

{V̂}_{0} = \hat{var} (𝒱_{n (ℓ)}) = m^{- 1} \sum_{ℓ = 1}^{m} Q_{ℓ} .

For any ℓ ≠ ℓ′, we define Q(ℓ, ℓ′) as the sample covariance matrix between

{[φ_{τ} {y_{j} - ({x̃}_{j (ℓ)}^{T}, z_{j}^{T}) {β̂}_{τ}} {({x̃}_{j (ℓ)}^{T}, b z_{j}^{T})}^{T}]}_{j = n_{1} + 1}^{n} and {[φ_{τ} {y_{j} - ({x̃}_{j (ℓ')}^{T}, z_{j}^{T}) {β̂}_{τ}} {({x̃}_{j (ℓ')}^{T}, b z_{j}^{T})}^{T}]}_{j = n_{1} + 1}^{n} .

We define

Û = cov (𝒱_{n (ℓ)}, 𝒱_{n (ℓ')}) = {m (m - 1)}^{- 1} \sum_{ℓ} \sum_{ℓ' \neq ℓ} Q (ℓ, ℓ'),

for any (ℓ, ℓ′). With the considerations above, we have

M̂ = {(λ̂ + 1)}^{- 1} {Ψ̂}_{n_{1}, τ} + {(1 + 1 / λ̂)}^{- 1} {Ψ̂}_{n_{0}, τ}, Σ̂ = {(λ̂ + 1)}^{- 1} {V̂}_{1} + {(1 + 1 / λ̂)}^{- 1} [m^{- 1} {V̂}_{0} + (m - 1) / m Û_{0}],

where λ̂ = n₀/n₁. Consequently, the estimated covariance matrix of β̃_τ is $\hat{var} ({β̃}_{τ}) = n^{- 1} {M̂}^{- 1} Σ̂ {M̂}^{- 1}$ , and the estimated covariance matrix of β̂_τ is $\hat{var} ({β̂}_{τ}) = n^{- 1} (1 + λ̂) ψ_{n_{1}, τ}^{- 1} V_{1} ψ_{n_{1}, τ}^{- 1}$ . Since 𝒰_n and 𝒱_n(l) are asymptotically independent and have means zero, we have that E(𝒰_n𝒱_n(ℓ)) = o(1).We can estimate the covariance between β̃_τ and β̂_τ by $\hat{cov} ({β̃}_{τ}, {β̂}_{τ}) = n^{- 1} {M̂}^{- 1} {V̂}_{1} {ψ̂}_{n_{1}, τ}^{- 1}$ . Assembling these components together, we obtain

Γ̂ = n^{- 1} (\begin{matrix} {M̂}^{- 1} Σ̂ {M̂}^{- 1} & {M̂}^{- 1} {V̂}_{1} {Ψ̂}_{n_{1}, τ}^{- 1} \\ {({M̂}^{- 1} {V̂}_{1} {Ψ̂}_{n_{1}, τ}^{- 1})}^{T} & (1 + λ) {Ψ̂}_{n_{1}, τ}^{- 1} {V̂}_{1} {Ψ̂}_{n_{1}, τ}^{- 1} \end{matrix}) .

Footnotes

Supplementary material

Supplementary material available at Biometrika online includes additional simulation results with higher rates of missing data and stronger and weaker covariate correlations. Also included are the quantile-quantile plots for the transformed covariates in the data analysis.

Contributor Information

Ying Wei, Email: ying.wei@columbia.edu, Department of Biostatistics, Columbia University, 722 West 168th St., New York, New York 10032, U.S.A.

Yanyuan Ma, Email: ma@stat.tamu.edu, Department of Statistics, Texas A&M University, College Station, Texas 77843-3143, U.S.A.

Raymond J. Carroll, Email: carroll@stat.tamu.edu, Department of Statistics, Texas A&M University, College Station, Texas 77843-3143, U.S.A.

References

Amemiya T. Advanced Econometrics. Boston: Harvard University Press; 1985. [Google Scholar]
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. 2nd ed. London: Chapman and Hall CRC Press; 2006. [Google Scholar]
Chen YH, Chatterjee N, Carroll RJ. Shrinkage estimators for robust and efficient inference in haplotype-based case-control studies. J. Am. Statist. Assoc. 2009;104:220–233. doi: 10.1198/jasa.2009.0104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hall P, Sheather S. On the distribution of a studentized quantile. J. R. Statist Soc. B. 1988;50:381–391. [Google Scholar]
He X, Shao QM. A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs. Ann. Statist. 1996;24:2608–2630. [Google Scholar]
Koenker R. Quantile Regression. Cambridge: Cambridge University Press; 2005. [Google Scholar]
Koenker R, Bassett GJ. Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]
Koenker R, Xiao ZJ. Unit root quantile autoregression inference. J. Am. Statist. Assoc. 2004;99:775–787. [Google Scholar]
Lipsitz SR, Fitzmaurice GM, Molenberghs G, Zhao LP. Quantile regression methods for longitudinal data with drop-outs: application to CD4 cell counts of patients infected with the human immunodeficiency virus. Appl. Statist. 1997;46:463–476. [Google Scholar]
Little RJA, Rubin DB. Statistical Analysis with Missing Data. New York: Wiley; 1987. [Google Scholar]
Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Statist. Assoc. 1995;90:106–121. [Google Scholar]
Subar AF, Thompson FE, Kipnis V, Midthune D, Hurwitz P, Mcnutt S, Mcintosh A, Rosenfeld S. Comparative validation of the Block, Willett, and National Cancer Institute Food Frequency Questionnaires: the Eating at American’s Table Study. Am. J. Epidemiol. 2001;154:1089–1099. doi: 10.1093/aje/154.12.1089. [DOI] [PubMed] [Google Scholar]
Tsiatis AA. Semiparametric Theory and Missing Data. New York: Springer; 2006. [Google Scholar]
Van der Vaart AW. Asymptotic Statistics. Cambridge: Cambridge University Press; 1998. [Google Scholar]
Wei Y, Carroll RJ. Quantile regression with measurement error. J. Am. Statist. Assoc. 2009;104:1129–1143. doi: 10.1198/jasa.2009.tm08420. [DOI] [PMC free article] [PubMed] [Google Scholar]
Welsh AH. Asymptotically efficient estimation of the sparsity function at a point. Statist. Prob. Lett. 1988;6:427–432. [Google Scholar]
Yi GY, He W. Median regression models for longitudinal data with dropouts. Biometrics. 2009;65:618–625. doi: 10.1111/j.1541-0420.2008.01105.x. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

NIHMS568523-supplement-Supplement.pdf^{(226.5KB, pdf)}

[R1] Amemiya T. Advanced Econometrics. Boston: Harvard University Press; 1985. [Google Scholar]

[R2] Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. 2nd ed. London: Chapman and Hall CRC Press; 2006. [Google Scholar]

[R3] Chen YH, Chatterjee N, Carroll RJ. Shrinkage estimators for robust and efficient inference in haplotype-based case-control studies. J. Am. Statist. Assoc. 2009;104:220–233. doi: 10.1198/jasa.2009.0104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Hall P, Sheather S. On the distribution of a studentized quantile. J. R. Statist Soc. B. 1988;50:381–391. [Google Scholar]

[R5] He X, Shao QM. A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs. Ann. Statist. 1996;24:2608–2630. [Google Scholar]

[R6] Koenker R. Quantile Regression. Cambridge: Cambridge University Press; 2005. [Google Scholar]

[R7] Koenker R, Bassett GJ. Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]

[R8] Koenker R, Xiao ZJ. Unit root quantile autoregression inference. J. Am. Statist. Assoc. 2004;99:775–787. [Google Scholar]

[R9] Lipsitz SR, Fitzmaurice GM, Molenberghs G, Zhao LP. Quantile regression methods for longitudinal data with drop-outs: application to CD4 cell counts of patients infected with the human immunodeficiency virus. Appl. Statist. 1997;46:463–476. [Google Scholar]

[R10] Little RJA, Rubin DB. Statistical Analysis with Missing Data. New York: Wiley; 1987. [Google Scholar]

[R11] Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Statist. Assoc. 1995;90:106–121. [Google Scholar]

[R12] Subar AF, Thompson FE, Kipnis V, Midthune D, Hurwitz P, Mcnutt S, Mcintosh A, Rosenfeld S. Comparative validation of the Block, Willett, and National Cancer Institute Food Frequency Questionnaires: the Eating at American’s Table Study. Am. J. Epidemiol. 2001;154:1089–1099. doi: 10.1093/aje/154.12.1089. [DOI] [PubMed] [Google Scholar]

[R13] Tsiatis AA. Semiparametric Theory and Missing Data. New York: Springer; 2006. [Google Scholar]

[R14] Van der Vaart AW. Asymptotic Statistics. Cambridge: Cambridge University Press; 1998. [Google Scholar]

[R15] Wei Y, Carroll RJ. Quantile regression with measurement error. J. Am. Statist. Assoc. 2009;104:1129–1143. doi: 10.1198/jasa.2009.tm08420. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Welsh AH. Asymptotically efficient estimation of the sparsity function at a point. Statist. Prob. Lett. 1988;6:427–432. [Google Scholar]

[R17] Yi GY, He W. Median regression models for longitudinal data with dropouts. Biometrics. 2009;65:618–625. doi: 10.1111/j.1541-0420.2008.01105.x. [DOI] [PubMed] [Google Scholar]

PERMALINK

Multiple imputation in quantile regression

Ying Wei

Yanyuan Ma

Raymond J Carroll

Summary

1. Introduction

2. Estimation with multiple imputation

2·1. Method

2·2. Large-sample properties of the multiple imputation estimator

3. Shrinkage estimation

4. Simulations

Table 1.

Table 2.

5. Application

Table 3.

6. Discussion

Supplementary Material

Acknowledgement

Appendix

A1. Technical arguments

A2. Implementing the shrinkage estimator

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Multiple imputation in quantile regression

Ying Wei

Yanyuan Ma

Raymond J Carroll

Summary

1. Introduction

2. Estimation with multiple imputation

2·1. Method

2·2. Large-sample properties of the multiple imputation estimator

3. Shrinkage estimation

4. Simulations

Table 1.

Table 2.

5. Application

Table 3.

6. Discussion

Supplementary Material

Acknowledgement

Appendix

A1. Technical arguments

A2. Implementing the shrinkage estimator

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases