Abstract
The empirical likelihood method is a powerful tool for incorporating moment conditions in statistical inference. We propose a novel application of the empirical likelihood for handling item nonresponse in survey sampling. The proposed method takes the form of fractional imputation (Kim, 2011) but it does not require parametric model assumptions. Instead, only the first moment condition based on a regression model is assumed and the empirical likelihood method is applied to the observed residuals to get the fractional weights. The resulting semiparametric fractional imputation provides -consistent estimates for various parameters. Variance estimation is implemented using a jackknife method. Two limited simulation studies are presented to compare several imputation estimators.
Keywords: Item nonresponse, missing data, quantile estimation, robust estimation
1 Introduction
Missing data are frequently encountered in many areas, such as survey sampling, epidemiology and other fields. Simply ignoring missing values can potentially lead to biased estimation (Little and Rubin 2002, Kim and Shao 2013). Two statistical approaches for handling missing data have been used in practice: propensity score weighting and imputation. Propensity score weighting is used mainly to correct for unit non-response, while imputation is mainly used to handle item nonresponse. Haziza (2009) provides a comprehensive overview of the imputation methods in survey sampling.
Multiple imputation (MI), proposed by Rubin (1987), is a popular approach of imputation for general-purpose estimation due to its practical simplicity. However, the Rubin’s variance estimator may be biased under certain situation (Fay 1992; Wang and Robins 1998; Kim, et. al. 2006; Yang and Kim, 2016) and its validity requires the congeniality condition of Meng (1994), which may not hold for general-purpose estimation.
Fractional imputation (FI), first proposed by Kalton and Kish (1984), provides an alternative method for handling item nonresponse. Fay (1996), Kim and Fuller (2004), Fuller and Kim (2005), Durrant (2005), and Durrant and Skinner (2006) discussed fractional hot deck imputation. Kim (2011) and Kim and Yang (2014) discussed a fully parametric approach to fractional imputation. The parametric fractional imputation provides a powerful tool for handling missing data for various situations. However, it relies on a strong parametric model assumption and making such an assumption is not usually preferred in survey sampling. Balanced random imputation of Chauvet et al (2011) is also an attractive imputation technique, but it still requires parametric model assumptions for multipurpose estimation.
The empirical likelihood (EL) method, considered by Owen (2001) and Qin and Lawless (1994), is a useful tool for semiparametric inference in statistics. It involves a likelihood-based inference without making a parametric distributional assumption about the observed data. Qin (1993) addressed the missing survey data problem by using a biased sampling argument of Vardi (1985). Wang and Rao (2002) brought regression-type imputation approaches to empirical likelihood inference. Wang and Chen (2009) used a nonparametric regression imputation approach to handle missing data in the empirical likelihood inference. Müller (2009) considered a novel application of empirical likelihood method to handle missing data under a regression model assumption. In Müller (2009), the moment condition of the error term in the regression model is used to construct a fully imputed estimator.
In this paper, motivated by the fully imputed estimator of Müller (2009), we propose a semiparametric fractional imputation (SFI) method using empirical likelihood that can be used to handle item nonresponse in survey sampling. Because the proposed SFI uses only moment conditions in the semiparametric regression model, it is more robust than the PFI method or parametric MI method. By using a regression model assumptions, the proposed SFI method is more efficient than the nonparametric regression imputation method of Wang and Chen (2009). The proposed method takes the form of fractional imputation, so the actual implementation is very attractive in practice. The proposed SFI method can be used to estimate various parameters, including nonsmooth parameters such as population quantiles.
The paper is organized as follows. The basic setup is introduced and the proposed method is presented in Section 2. The asymptotic properties of the SFI estimators are presented in Section 3. Extensions to non-smooth statistics as well as random imputations are covered in Section 4. In Section 5, variance estimation is discussed. Some numerical results are given in Section 6. Some concluding remarks are made in Section 7.
2 Basic Setup
Consider a finite population ℱN = {(xi, yi);i = 1, 2, …, N}, where xi is the vector of auxiliary variables that are always observed and yi is the study variable that is subject to missingness. We assume (xi, yi) are realizations from a regression model
| (1) |
where m(X; β0) is assumed to be known with unknown parameter β0 and ε satisfies E(ε|X) = 0. No parametric distributional assumption on X is made.
Let δi be the response indicator such that δi = 1 if yi is observed and δi = 0 otherwise. We assume missing at random (MAR) in the sense that
| (2) |
Even though we observe δi only in the sample, we can conceptually assume that δi’s are defined throughout the population. Such extended definition of δi has been adopted in Fay (1992), Shao and Steel (1999), Kim, Navarro, and Fuller (2006).
Given the finite population, suppose that sample A of size n is selected from the finite population by a probability sampling mechanism. Let πi,i = 1, 2,…, N, be the first order inclusion probability of unit i in the population. We are interested in estimating η0, defined as a solution to the estimating equation E {U(η; x, y)} = 0 where U(η; x, y) is a known function with parameter η. To avoid unnecessary details, we assume that the solution to E {U(η; x, y)} = 0 is unique and the dimensions of η and U(η; x, y) are r. Thus, the parameter η is just-identified Under complete response, a consistent estimator of η0 is obtained by solving
for η. If some of yi are missing, under the MAR assumption, a consistent estimator of η0 can be obtained by solving the following expected estimating equation
| (3) |
for η. The conditional expectation in (3) is with respect to f(y | x), which is unknown as we only assume (1).
In fractional imputation, our goal is to approximate the conditional expectation in (3) by the weighted mean of the fractionally imputed estimating functions. That is, we wish to achieve
| (4) |
as closely as possible for some satisfying , where ’s are desired fractional weights and ’s are m imputed values for subject i. Kim (2011) and Kim and Yang (2014) developed a fractional imputation satisfying (4) using a parametric model assumption on f(y | x).
In our proposed method, we use the empirical likelihood approach to achieve the approximation in (4). To explain the idea, assume for now that the true parameter β0 in (1) is known. In this case, εi = yi − m(xi; β0) are available among δi = 1. Because E(ε | x) = 0 holds, we can compute
where fε(ε | x) is the (unknown) conditional density of ε given x. To apply the empirical likelihood method, we assume that the conditional distribution of ε given x can be approximated by
| (5) |
such that wi ≥ 0 with Σδiwi = 1 are the point mass assigned to the observed εi by assuming that the support of εi is equal to the set of observed εi. Using the approximation in (5), we can obtain
which can be written in the fractional imputation form in (4). To determine wj uniquely, we can use the idea of pseudo empirical likelihood method of Wu and Rao (2006) to maximize
| (6) |
subject to
| (7) |
In practice, we do not know β0 and, hence, we do not observe εi = yi − m(xi; β0). We can use -consistent estimator of β0 to obtain and apply the above empirical likelihood method to the observed residuals. In general, one can use
| (8) |
to obtain a -consistent estimator of β, where h(xi; β) is an arbitrary function that enables the above equation to have a solution. If the variance function V(y|x) = σ2q(xi; β0) for a known function q, then one can choose h(xi; β) = ṁ(xi; β)/q(xi; β), where ṁ(xi; β) = ∂m(xi; β)/∂β. This choice is motivated by the quasilikelihood euations for generalized linear models (McCullagh and Nelder, 1989, Ch. 9). The solution to (8) can be called complete-case (CC) method. The CC estimator is not efficient in general, but it is efficient for estimating β under MAR. Thus, the resulting SFI estimator can be constructed as follows:
[Step 1] Obtain -consistent estimator of β0 and compute among the respondents.
- [Step 3] Use ŵj in Step 2 to approximate
where and . - [Step 4] The SFI estimator of η is computed by solving
for η.(11)
Instead of (11), one can also consider a fully imputed estimating equation based on
which was considered by Müller (2009) under the independently and identically distributed (I.I.D.) setup. The fully imputed estimating equation may lead to a more efficient estimator of η (Matloff, 1981) but such over-imputation does not appeal to survey practice since we usually do not want to replace the true values of respondents with some imputed values. In the following section, we present the asymptotic properties of under complex survey designs.
3 Asymptotic Properties
To discuss the asymptotic properties of the proposed SFI estimator of η, we first assume a sequence of finite populations and samples with finite fourth moments as in Fuller (2009, Ch.1). The following theorem presents the asymptotic normality of the proposed SFI estimator. The sketched proof of Theorem 1 is provided in Appendix A.
Theorem 1
Under the regularity conditions (C1)–(C13) in Appendix A, the SFI estimator defined in (11) is a -consistent estimator of η0, that is
where , and
| (12) |
and
with , and .
Remark 1
In (12), ζi can be written as the sum of four terms. The first two terms is the conditional expectation of U(η; x, y), the third term is the additional term due to approximating f(y | x) by the empirical likelihood method, and the fourth term is the additional term due to estimating β.
According to Theorem 1, a consistent variance estimator of can be written as
| (13) |
where
with and
| (14) |
where and is a plug-in estimator of ζi in (12). One can use
with
When nN−1 = o(1), the second term of (14) is of smaller order and can be safely ignored.
4 Extensions
In this section, we discuss two extensions of the proposed method. In Section 4.1, our proposed method is extended to handle non-smooth statistics including distribution functions and percentiles. In Section 4.2, an extension to stochastic imputation is discussed.
4.1 Inference for non-smooth statistics
Suppose that we are interested in estimating parameter η0, the solution of E {U(η; x, y)} = 0 with non-smooth function U(η; x, y), where the non-smoothness can be with respect to either η or y. For generality, we assume the non-smoothness is with respect to both η and y. Wang and Opsomer (2011) discussed asymptotic results for nondifferentiable survey estimators. Define Let and , where
Denote as the solution of estimating equation Ũn(θ) = 0. To discuss asymptotic properties, we replace regularity conditions (C7)–(C10) in Appendix A with the regularity conditions (C14)–(C17) in Appendix B. The following theorem presents the asymptotic expansion of under this scenario and the sketched proof is presented in Appendix B.
Theorem 2
Under regularity conditions (C1)–(C3), and (C11)–(C17) in Appendix A and Appendix B, has the following asymptotic expansion
where
where
and
evaluated at β0 and other terms are the same as those in Theorem 1.
By Theorem 2, we can obtain
where B = [E {∂U(η; x, y)/∂η}]−1 and . If we are interested in estimating the cumulative density function of y, which is Pr(y < t), then we can choose U(η; x, y) = I(y < t) − η and
where p(x) = Pr(δ = 1|x). Therefore, we have
A consistent estimators of D* can be written as
with
where Kx and Ky are kernel functions for x and y with bandwidth hx and hy. Thus, a consistent variance estimator of here can be obtained similarly to (13).
If the parameter of interest is the τ-th percentile of Y, given by , the SFI estimator of η can be obtained by solving the estimating equation (11) with U(η; x, y) = I(y < η) − τ. Since E {I(Y < η)} = FY(η), it can be shown that has the asymptotic expansion in Theorem 2 with
where fy is the density function for y. A consistent estimator of ∂E {U(η0; x, y)} /∂η can be written as
and a consistent estimator of D* can be written as
with .
4.2 Stochastic imputation
For a multi-purpose survey, stochastic imputation is often preferred to deterministic imputation since it can preserve distributional relationships better. In stochastic imputation, imputed values are generated from a stochastic imputation mechanism and with additional variability due to the imputation. For simplicity, we only consider the case where is a smooth function of η and β. The results can be naturally extended to non-smooth statistics. The stochastic imputation estimator can be obtained by solving the following estimating equation
where are randomly selected from with the selection probability, where are the fractional weights in (11). Since
where the conditional expectation is with respect to the stochastic imputation mechanism, we have
Thus, using an argument similar to Theorem 1, we can obtain
| (15) |
where Therefore, a consistent variance estimator can be written as
where
| (16) |
and can be obtained similarly to (13) and
The second term of (16) estimates the additional variance due to stochastic imputation. If M is large, the second term is negligible.
5 Replication variance estimation
Estimating the variance of the estimator can be done through the linearization formulas presented in Section 3 for smooth statistics and the formulas in Section 4 for non-smooth statistics, respectively. However, it requires tedious algebra to compute all the terms. In this section, we consider an alternative approach using replication methods. Shao and Tu (1995) considered the theoretical aspects of replication methods such as Jackknife and Bootstrap. Wolter (2007) gives a comprehensive overview of replication variance estimation methods in survey sampling.
Suppose we are interested in estimating . Define the design weight as . The design unbiased estimator of T is and the consistent replication variance estimator of is given by
where there are L replication weights, ck is the replication factor associated with the k-th replication and with being the k-th replicate of di. For example, ck = (L − 1)/L for deleting one group jackknife method. For details of corresponding ck with different variance estimation approaches, see Wolter (2007).
To obtain replication variance estimator of our proposed SFI estimator, we apply the same SFI method to each of the replicates. In the first step, we obtain the k-th replicate of by solving
In the second step, the replicated EL weights are computed by maximizing
subject to constraints
with . In the final step, the replicated SFI estimator is computed using the replicated EL weights. For smooth statistics, the k-th replicate of , denoted by , is obtained by the solution to the following estimating equation
where and . The final replication variance estimator of is given by
For non-smooth statistics, our estimator is similar to that of Wang and Opsomer (2011). Define
where Ê{εŪm(ε)} and are defined in Section 4.1, is defined in (11) with design weight replaced by replication weight and fractional weights replaced by replication fractional weights . Then the replication variance estimator can be written as:
with ∂Ê{U(η; x, y)} /∂η defined in Section 4.1.
6 Simulation studies
In this Section, we conduct two limited simulation studies. The first one is generated from an artificial data set and the second one is based on the real data treated as a finite population.
6.1 Simulation One
We repeatedly generate B = 2, 000 finite populations of (xi, yi, δi) of size N =10, 000 from a super-population model
with xi ~ exp(1) and E(εi | xi) = 0. Two error distributions are considered: (E1) εi ~ N(0, 1) and (E2) ε ~ {χ2(2) − 2} /2. Given (x, y), the response indicator δ has a Bernoulli distribution with Pr(δ = 1|x) = {1 + exp(1 − x)}−1. The overall response rate is about 50%. Given each finite population (x, y, δ), we draw a sample by using a Poisson sampling design with the first-order inclusion probability , where n = 200 and zi = max{0.5yi + 2, 1} + ui, with ui ~ χ2(1) and χ2(1) corresponding to the chi-squared distribution with degrees of freedom equal to one. In this simulation, we are interested in estimating three parameters:
, the population mean of y.
, the proportion of y less than 1.
θ3 = F−1(0.5), the population median of y.
From each sample, we compute the following four estimators:
The complete-case (CC) estimator only based on the complete cases only. The CC estimator is the solution to , where U(η; x, y) is the corresponding estimating equation for each parameter.
-
Full sample estimator based on the original sampling without missing data and pseudo empirical likelihood method (Full). Specifically, we maximize , subject to the following constraints
where and is obtained by solving the following estimating equation:The full sample estimator serves as a benchmark for comparison.
The parametric fractional imputation (PFI) estimator of Kim (2011) assuming yi | xi ~ N(β0 + β1xi, σ2) with imputation size M = 100.
- The nonparametric fractional imputation (NFI) estimator that uses the following nonparametric fractional weights:
for each unit i ∈ A with δi = 0 and j ∈ A with δj = 1. We use the reference bandwidth with and . A Gaussian kernel density function Kx(t) = (2π)−1/2 exp(−t2/2) has also been used. The stochastic regression imputation (SRI) estimator assuming the following model: yi = β0 + β1xi + εi with E(εi) = 0 and V(εi) = σ2.
The proposed semiparametric fractional imputation (SFI) estimator .
From the Monte Carlo sample of size B = 2,000, Monte Carlo bias, standard error and root mean squared error are computed for each point estimator. The results are presented in Table 1. Under (E1) and (E2), the CC estimators perform worst since the response mechanism is not missing completely at random (MCAR). Unless the response mechanism is MCAR, the CC estimator is biased. The FULL estimators always perform best since they assume no missing values and use moment condition (1). Under distribution (E1), the SFI and PFI estimators have similar performances. Among the three imputation estimators, the NFI and SFI estimator performs worst in terms of RMSE for all scenarios since they used less information.
Table 1.
The Monte Carlo Bias (×10−2), Standard Error (SE) (×10−2) and Root Mean Squared Error (RMSE) (×10−2) for four different methods with two error distributions in Simulation One.
| Par | Method | (E1) | (E2) | ||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| Bias | SE | RMSE | Bias | SE | RMSE | ||
| E(y) | CC | 18.6 | 13.0 | 22.7 | 20.0 | 19.8 | 28.1 |
| FULL | 0.1 | 5.9 | 5.9 | 0.3 | 9.0 | 9.0 | |
| PFI | −0.1 | 7.8 | 7.8 | 0.7 | 12.0 | 12.0 | |
| NFI | 0.7 | 12.9 | 12.9 | 2.9 | 21.3 | 21.5 | |
| SRI | −0.2 | 13.9 | 13.9 | 1.7 | 23.3 | 23.3 | |
| SFI | −0.2 | 6.9 | 6.9 | 0.3 | 12.4 | 12.4 | |
|
| |||||||
| Pr(y < 1) | CC | −6.3 | 5.3 | 8.2 | −3.5 | 4.8 | 5.9 |
| FULL | 0.0 | 3.0 | 3.0 | 0.0 | 2.6 | 2.6 | |
| PFI | 0.2 | 3.1 | 3.1 | −5.4 | 3.1 | 6.2 | |
| NFI | −0.1 | 5.1 | 5.1 | −0.5 | 4.9 | 4.9 | |
| SRI | 0.2 | 5.0 | 5.0 | −0.4 | 5.0 | 5.0 | |
| SFI | 0.1 | 3.2 | 3.2 | 0.1 | 3.3 | 3.3 | |
|
| |||||||
| Quantile | CC | 15.0 | 13.9 | 20.4 | 21.9 | 22.8 | 31.6 |
| FULL | 0.1 | 7.8 | 7.8 | 0.2 | 13.2 | 13.2 | |
| PFI | −0.3 | 8.5 | 8.5 | 30.0 | 14.5 | 33.3 | |
| NFI | −0.8 | 14.9 | 14.9 | 2.7 | 25.3 | 25.4 | |
| SRI | −1.0 | 15.7 | 15.7 | 2.4 | 25.4 | 25.5 | |
| SFI | −0.5 | 9.1 | 9.1 | 0.2 | 17.0 | 17.0 | |
Under model (E2), the SFI estimator shows negligible bias for all parameters, but the PFI estimator has non-negligible bias for estimating proportion and quantile which is due to the misspecification of the error distribution. The NFI and SRI estimators are not as efficient as the SFI estimator in terms of bias and variance. The SFI estimator outperforms PFI, NFI and SRI estimators in terms of RMSE. The overall results indicate the robustness of SFI. For variance estimation, we computed the relative bias based on the Taylor linearization and replication methods, respectively. All the relative bias are below 7%. In addition, we calculate the Monte Carlo coverage rate for the 95% confidence intervals. Under model (E1), the coverage rates are 94.8%, 93.4% and 95.0% for estimating mean, proportion and quantile by using Taylor method and 94.9%, 93.6% and 95.1% by using Replication method. The results under model (E2) are similar and the coverage rates are close to the nominal rate.
6.2 Simulation Two
In the second simulation study, we use 2013–2014 U.S. National Health Examination and Nutrition Survey (NHANES) data as a pseudo finite population. Suppose the study variable is Systolic blood pressure (BPXSY1) and the covariate variable is body mass index (BMXBMI). Keeping only the cases where both BPXSY1 and BMXBMI are greater than zero, the pseudo finite population eventually contains 7104 cases. The scatter plot of BPXSY1 versus BMXBMI is presented in Figure 1. We assume BPXSY1 is roughly linear with respect to BMXBMI. After performing linear regression of BPXSY1 versus BMXBMI, the QQ plot of residuals and residuals vs fitted values plot are presented in Figure 2. The residual plots suggest deviation from normality. The p-value from Anderson-Darling test for normality is less than 2.2 × 10−16. We first generate response indicators δi, i = 1, 2,…., 7104 from the following logistic regression model:
Figure 1.

Scatter plot of BPXSY1 vs BMXBMI
Figure 2.

QQ plot (left panel) and Residual vs fitted value plot (right panel)
The response rate is around 60%. Then given (BPXSY1i, BMXBMI, δi), B = 2000 Monte Carlo samples are generated from simple random sampling with sample size n = 200. Assume the parameters of interest are:
(Mean). Finite population mean of BPXSY1, which is θm = 118.056.
- (Prop1). Finite population proportion one of BPXSY1:
- (Prop2). Finite population proportion two of BPXSY1:
- (Prop3). Finite population proportion three of BPXSY1:
We consider the same PFI, NFI, SRI and SFI estimators as discussed in Simulation One. The Monte Carlo Bias, Standard Error and Root Mean Squared Error (RMSE) are presented in Table 2. For the population mean, PFI and SFI performs similarly and the NFI estimator has slightly larger bias and standard error. SRI has comparable bias as PFI and SFI, but it has larger SE, as expected. For population proportions, the PFI estimator has substantially larger bias than NFI, SRI and SFI which may be due to the misspecification of error distributions. The NFI and SRI estimators have larger standard errors than PFI and SFI estimators since the nonparametric methods are not as efficient as parametric or semiparametric methods and stochastic imputation will produce larger variance. Overall, SFI estimator performs the best in terms of both bias and variance.
Table 2.
The Monte Carlo Bias (×10−2), Standard Error (SE) (×10−2) and Root Mean Squared Error (RMSE) (×10−2) for four different methods and four parameters.
| Par | Method | Bias | SE | RMSE |
|---|---|---|---|---|
| Mean | COM | −2.9 | 124.8 | 124.9 |
| PFI | −2.3 | 153.2 | 153.2 | |
| NFI | −5.0 | 153.5 | 153.6 | |
| SRI | 1.4 | 169.7 | 169.7 | |
| SFI | −2.2 | 153.3 | 153.3 | |
|
| ||||
| Prop1 | COM | 0.0 | 0.2 | 0.2 |
| PFI | 0.5 | 0.3 | 0.6 | |
| NFI | 0.0 | 0.3 | 0.3 | |
| SRI | 0.1 | 0.3 | 0.3 | |
| SFI | 0.0 | 0.2 | 0.2 | |
|
| ||||
| Prop2 | COM | 0.0 | 3.4 | 3.4 |
| PFI | −2.2 | 3.8 | 4.4 | |
| NFI | −0.5 | 4.2 | 4.3 | |
| SRI | 0.5 | 4.2 | 4.3 | |
| SFI | 0.2 | 3.9 | 3.9 | |
|
| ||||
| Prop3 | COM | 0.0 | 1.2 | 1.2 |
| PFI | 0.7 | 1.1 | 1.3 | |
| NFI | 0.2 | 1.4 | 1.4 | |
| SRI | −0.3 | 1.6 | 1.6 | |
| SFI | 0.1 | 1.4 | 1.4 | |
7 Conclusions
Regression imputation is often used to handle item nonresponse in survey sampling. Unlike the usual regression imputation, the proposed semiparametric fractional imputation offers valid inference for a wide set of parameters such as population proportions and quantiles. Besides, only the first moment assumption is needed to obtain a consistent SFI estimator of the parameter, which leads to robust parameter estimation. The proposed SFI method shows good performances in the limited simulation studies.
The proposed method has several possible future research topics. First, instead of assuming ignorable response mechanism, we can consider an extension to nonignorable nonresponse (Kim and Yu, 2011) using an exponential tilting response model. Also, extension of the SFI for handling multivariate missing data will be an important future research topic.
Appendix
A: Proof of Theorem 1
We first assume the following regularity conditions:
-
(C1)
The finite population is a random sample from the semiparametric regression model in (1). The regression function m(x; β) in (1) has a continuous first derivative ∂m(x; β)/∂β in the neighborhood of the true value β0 and E {m2(x; β)} and E {∂m(x; β)/∂β} are bounded in this neighborhood.
-
(C2)
Function h(x; β) in the estimating function Ûβ(β) in (8) has continuous first derivative ∂h(x; β)/∂β in the neighborhood of the true value β0 and ‖h(x; β)‖2 and ‖∂h(x; β)/∂β‖ are bounded by some integrable function G1(x) in the neighborhood.
-
(C3)
The model error term in (1) satisfies E(ε2) < ∞ and max {‖εi‖: i ∈ A} = op(n1/2).
-
(C4)
Let Uβ(β) = E[δ{y − m (x; β)} h (x; β)], assume Ûβ(β) converges to Uβ(β) in probability uniformly in the neighborhood of the true value β0. For every a > 0, .
-
(C5)
∂Ûβ(β) /∂β converges to continuous nonsingular derivative ∂Ûβ(β) /∂β in probability uniformly in the neighborhood of the true value β0.
-
(C6)
, as n, N → ∞, where denotes the design model variance, the variance under the joint distribution of the superpopulation model and the sampling mechanism.
-
(C7)
Function U(η; x, y) has continuous partial derivatives ∂U(η; x, y)/∂η and ∂U(η; x, y)/∂y in the neighborhood of the true value η0 and ‖U(η; x, y)‖2, ‖∂U(η; x, y)/∂η‖ and ‖∂U(η; x, y)/∂y‖ are bounded by some integrable function G2(x, y) in the neighborhood.
-
(C8)
Let and U(η) = E{U(η; xi, yi}, then Ûn(η) converges to U(η) in probability uniformly in the neighborhood of the true value η0. For every a > 0,
-
(C9)
∂Ûn (η) /δη converges to continuous nonsingular derivative ∂U (η) /∂η in probability uniformly in the neighborhood of the true value η0.
-
(C10)
, as n, N → ∞, where denotes the design model variance.
-
(C11)
The first order inclusion probabilities satisfy KL ≤ Nn−1πi ≤ KU for all i, where KL and KU are positive constants.
-
(C12)
for any i, j = 1, 2,…, N and i ≠ j, where πij are the second order inclusion probability of unit i and unit j in the population.
-
(C13)
The response probability satisfies (2) and a < Pr(δi = 1|xi) ≤ 1 for i = 1, 2,…, N for some fixed a > 0
Conditions (C1)–(C2) are the model assumptions about the finite population. Condition (C3) is used to control the asymptotic order of in (10). Chen and Sitter (1999, Appendix 2) argued that (C3) holds for common unequal probability sampling designs. Conditions (C4) and (C8) ensure the consistency of and , respectively. Conditions (C5), (C6), (C9) and (C10) are the regularity conditions that ensure asymptotic normality of and . Van der Vaart (1998, Ch. 5) used similar regularity conditions. Specifically, Conditions (C6) and (C10) have been used in many existing literature such as Wu and Rao (2006), Wang and Opsomer (2011), among others. Hajek (1960, 1964) established the asymptotic normality condition under simple random sampling and rejective sampling with unequal selection probabilities. Visek (1979) established the asymptotic normality for the Horvitz-Thompson estimator under Rao-Sampford sampling designs. Condition (C7) controls the smoothness and asymptotic behavior of estimating function U(η; x, y). Conditions (C11) and (C12) are the standard assumptions for the sampling designs. Similar conditions have been used in Isaki and Fuller (1982) and Wang and Opsomer (2011). Condition (C13) controls the behavior of the individual response probability. According to assumption (C3) and by using similar techniques as Wu and Rao (2006), we can show that . Assumption (C4) and Taylor linearization can establish
Therefore,
| (A.1) |
We know that is the solution of the following estimating equation
In addition, we have
| (A.2) |
and
| (A.3) |
Based on (A.2), (A.3), by using Taylor linearization, we have
| (A.4) |
According to (A.1)–(A.4) and after some algebra, it can be shown that
| (A.5) |
where σ2 is the variance for the residuals. With condition (C6), it can be shown that . In addition, we have
| (A.6) |
| (A.7) |
and
| (A.8) |
where Ūm(ε) = E{(1 − δ) U(η0; x, y)|ε} and
with l(ε) = −f′(ε)/−1(ε). Define
then by using Taylor linearization,
with E(S) = E{(1 − δ)U(η0; x, y)} and . According to the Hoeffding decomposition,
Therefore,
| (A.9) |
According to Taylor linearization, we have
| (A.10) |
By (A.1),(A.5)–(A.10), after some algebra, we can show that
where ζi is defined in (12) of Theorem 1.
B: Proof of Theorem 2
We replace regularity conditions (C7)–(C10) in Appendix A with the following regularity conditions (C14)–(C17):
-
(C14)
Ũn(θ) converges to Ũ(θ) in probability uniformly in the neighborhood of the true value θ0. For every a > 0.
-
(C15)
There exists a measurable function L(δ, x, y) with E {L2(δ, x, y)} < ∞ and for every θ1 and θ2 in the neighborhood of the true value θ0, ‖Ũ (θρ δ, x, y) − Ũ(θ2; δ, x, y) ‖ ≤ L(δ, x, y)‖θ1 − θ2‖.
-
(C16)
Assume that and has continuous and invertible first derivatives with respect to θ and the corresponding first derivatives are bounded by some integrable function in the neighborhood of the true value θ0.
-
(C17)
, as n, N → ∞, where denotes the design model variance.
Similar as conditions (C4) and (C8), condition (C14) ensures the consistency of proposed estimator. Conditions (C15) and (C16) are required to derive asymptotic expansion of proposed estimator. See Van der Vaart (1998, Ch. 5) for more details for those conditions. Similar as conditions (C6) and (C10), Condition (C17) is used to derive the central limit theory.
The proof of the consistency of and is similar to the relevant proof in Theorem 1. According to the regularity conditions (C10), (C11), (C12) and by using similar techniques as that of Theorem 19.26 of Van der Vaart (1998), we can show that
| (B.1) |
In addition, we have
| (B.2) |
and
| (B.3) |
where D* is defined in Theorem 2. According to (A.1), (A.5), (A.6), (A.9), (B.1)–(B.3), we have
where ζi is defined in Theorem 2.
References
- Chen J, Sitter R. A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys. Statistica Sinica. 1999;9:385–406. [Google Scholar]
- Chauvet G, Deville JC, Haziza D. On balanced random imputation in surveys. Biometrika. 2011;98:459–471. [Google Scholar]
- Durrant GB. Imputation methods for handling item-nonresponse in the social sciences: a methodological review. ESRC National Center for Research Methods and Southampton Stat Sci.s Research Institute NCRM Methods Review Papers NCRM/002 2005 [Google Scholar]
- Durrant GB, Skinner C. Using missing data methods to correct for measurement error in a distribution function. Survey Methodology. 2006;32(1):25–36. [Google Scholar]
- Fay RE. When are inferences from multiple imputation valid? Proceedings of the Survey Research Methods Section of the American Statistical Association. 1992;81:227–32. [Google Scholar]
- Fay RE. Alternative paradigms for the analysis of imputed survey data. Journal of the American Statistical Association. 1996;91(434):490–498. [Google Scholar]
- Fuller WA, Kim JK. Hot deck imputation for the response model. Survey Methodology. 2005;31:139–149. [Google Scholar]
- Fuller WA. Sampling Statistics. Wiley; Hoboken, NJ: 2009. [Google Scholar]
- Haziza D. Imputation and inference in the presence of missing data. In: Pfeffermann D, Rao CR, editors. Handbook of Statistics. Vol. 29, Sample Surveys: Theory, Methods and Inference. Amsterdam: Elsevier BV; 2009. pp. 215–46. [Google Scholar]
- Kalton G, Kish L. Some efficient random imputation methods. Communications in Statistics A. 1984;13:1919–1939. [Google Scholar]
- Kim JK, Fuller WA. Fractional hot deck imputation. Biometrika. 2004;91(3):559–578. [Google Scholar]
- Kim JK, Brick J, Fuller WA, Kalton G. On the bias of the multiple-imputation variance estimator in survey sampling. Journal of Royal Statistical Society: Series B. 2006;68(3):509–521. [Google Scholar]
- Kim JK, Navarro A, Fuller WA. Replicate variance estimation after multi-phase stratified sampling. Journal of the American Statistical Association. 2006;101:312–320. [Google Scholar]
- Kim JK. Parametric fractional imputation for missing data analysis. Biometrika. 2011;98:119–132. [Google Scholar]
- Kim JK, Yu CL. A semi-parametric estimation of mean functionals with non-ignorable missing data. Journal of the American Statistical Association. 2011;106:157–165. [Google Scholar]
- Kim JK, Shao J. Statistical methods for handling incomplete data. London: Chapman and Hall/CRC; 2013. [Google Scholar]
- Kim JK, Yang S. Fractional hot deck imputation for robust inference under item nonresponse in survey sampling. Survey Methodology. 2014;40:211–230. [Google Scholar]
- Little RJA, Rubin DB. Statistical Analysis With Missing Data. 2nd. Hoboken, NJ: Wiley; 2002. [Google Scholar]
- Matloff NS. Use of regression functions for improved estimation of means. Biometrika. 1981;68:685–689. [Google Scholar]
- McCullagh P, Nelder J. Generalized Linear Models. London: Chapman and Hall; 1989. [Google Scholar]
- Meng XL. Multiple-imputation inferences with uncongenial sources of input. Statistical Science. 1994;9:538–558. [Google Scholar]
- Müller UU. Estimating linear functionals in nonlinear regression with response missing at random. Annals of Statistics. 2009;98:2245–2277. [Google Scholar]
- Owen AB. Empirical Likelihood. Chapman and Hall/CRC; New York: 2001. [Google Scholar]
- Qin J. Empirical likelihood in biased sample problems. Annals of Statistics. 1993;21(3):1182–1196. [Google Scholar]
- Qin J, Lawless J. Empirical likelihood and general estimating equations. The Annals of Statistics. 1994;22:300–325. [Google Scholar]
- Rubin DB. Multiple Imputation for Nonresponse in Surveys. Wiley; 1987. [Google Scholar]
- Shao J, Tu D. The Jackknife and Bootstrap. Springer; 1995. [Google Scholar]
- Shao J, Steel P. Variance estimation for survey data with composite imputation and nonnegligible sampling fractions. Journal of the American Statistical Association. 1999;94:254–265. [Google Scholar]
- Vardi Y. Empirical distributions in selection bias models. Annals of Statistics. 1985;13:178–203. [Google Scholar]
- Van der Vaart AW. Asymptotic Statistics. New York: Cambridge University Press; 1998. [Google Scholar]
- Víšek JA. Asymptotic distribution of simple estimate for rejective, Sampford and successive sampling. In: Jurecková J, editor. Contributions to Statistics: Jaroslav Hj́ek Memorial. Academia, Prague & D. Reidel; Dordrecht: 1979. pp. 263–275. [Google Scholar]
- Wang N, Robins JM. Large-sample theory for parametric multiple imputation procedures. Biometrika. 1998;85(4):935–948. [Google Scholar]
- Wang Q, Rao JNK. Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics. 2002;30:896–924. [Google Scholar]
- Wang D, Chen SX. Empirical likelihood for estimating equations with missing values. The Annals of Statistics. 2009;37:490–517. [Google Scholar]
- Wang JQ, Opsomer JD. On asymptotic normality and variance estimation for nondifferentiable survey estimators. Biometrika. 2011;98:91–106. [Google Scholar]
- Wolter KM. Introduction to Variance Estimation. Wiley; New York: 2007. [Google Scholar]
- Wu C, Rao JNK. Pseudo empirical likelihood ratio confidence intervals for complex surveys. The Canadian Journal of Statistics. 2006;34:359–375. [Google Scholar]
- Yang S, Kim JK. A Note on Multiple Imputation for General-Purpose Estimation. Biometrika. 2016;103:244–251. [Google Scholar]
