Abstract
For complex survey data, the parameters in a quantile regression can be estimated by minimizing an objective function with units weighted by the original design weights. However, when the complex survey sampling design is informative (i.e., when the design weights are correlated with the study variable even after conditioning on other covariates), the efficiency of this design-weighted estimator may be improved. In this article, we propose several weight-smoothing estimators for quantile regression analysis of complex survey data collected with an informative sampling design. Our new estimators incorporate nonparametric methods for modeling the weight functions and pseudo-population bootstrap methods for variance estimation. A simulation study compares, our proposed methods with the original design-based method in terms of bias, standard error, mean squared error, and confidence coverage. Our proposed estimators have smaller bias and mean squared error than does the design-based estimator. We further illustrate and compare estimators for the 1988 US National Maternal and Infant Health Survey.
Keywords: Complex survey, Informative sampling, Nonparametric, Quantile regression, Weight-smoothing
1. INTRODUCTION
Researchers often use data collected from complex survey designs to draw scientific conclusions. For instance, Nelson, Powell-Griner, Town, and Kovar (2003) compared national estimates of smoking, height, and diabetes by using the National Health Interview Survey (NHIS) and the Behavioral Risk Factor Surveillance System (BRFSS). Harrington, Barreira, Staiano, and Katzmarzyk (2014) used National Health Nutrition and Examination (NHANES) 2009/2010 to estimate the amount of time that the US population spent sitting by age, sex, ethnicity, education, and body mass index. It is well known that statistical analysis ignoring design features—including stratification, clustering, and unequal weighting—may lead to biased results (see Pfeffermann and Sverchkov 1999, 2003, and 2009, among others). Generalized linear and mixed models with complex survey designs were developed in studies by Chambers and Skinner (2003) and Heeringa, West, and Berglund (2010).
The sampling design is informative when sample inclusion is related to the outcome variable conditional on covariates (Fuller 2009). For such designs, survey weights are often used in regression analysis of survey data to ensure consistent estimation of parameters. For example, the sampling design of NHANES (2013–2014) was informative since the first stage strata were built by using county-level health characteristics that are correlated with the study variables of interest, given the covariates. The traditional design-based approach using the original design weights leads to unbiased estimates, but the efficiency can be improved. One approach is a likelihood-based method that maximizes the conditional sample likelihood by using the joint model of study variable and sampling indicator (see Chambers 2003; Pfeffermann and Sverchkov 2009; Pfeffermann 2011; Scott and Wild 2011, among others). A second approach replaces the original design weights by predictions from a model for the conditional distribution of the design weights given the data, as in Magee (1998), Pfeffermann and Sverchkov (1999), Beaumont (2008), Fuller (2009), and Kim and Skinner (2013). In particular, Kim and Skinner (2013) proposed optimal weight modifications compared with other methods under generalized linear models.
These statistical methods model the conditional mean values of the study variables by regression. Such models may be suboptimal when the distribution of study variables is skewed or has outliers. In such cases, quantile regression (QR) (Koenker and Bassett 1978; Koenker 2005) is an effective tool for conditional modeling, providing robustness against outliers and a more comprehensive analysis of the relationship between variables than is offered by the conditional mean model.
There is rich literature on the use of QR for data collected by simple random sampling; for examples, see He and Shao (1996), Knight (1998), Mu and He (2007), and references therein. Deaton (1997) and Cameron and Trivedi (2005) applied QR to survey data ignoring the complex sampling scheme, and their estimates may be biased if the original sampling design is informative. Only a few research articles discuss QR estimates that account for a complex survey sampling scheme, including Li, Graubard, and Korn (2010) and Geraci (2016). These articles do not discuss QR for data collected using informative sampling, which is the topic of the present article. The quantile regression coefficients are defined at the super-population level, and consistency depends on the quantile regression–model assumption. Specifically, we extended several weight-smoothing estimators to our data, including unsmoothed and smoothed optimal estimators in Kim and Skinner (2013) and estimators proposed by Beaumont (2008) and Pfeffermann and Sverchkov (1999). Some of our proposed estimators (Design-Weighted [DW], PS, Unsmoothed Optimal [UOPT]) are design-consistent, even when the model (1) does not hold. Other estimators (Smoothed Design-Weighted [SDW], Smoothed Pfeffermann-Sverchkov [SPS], Smoothed Optimal [SOPT]) are consistent if the corresponding weight models are correct.
The remainder of the article is organized as follows: After preliminaries in section 2, our weight-smoothing estimators are proposed and developed in section 3. In section 4, we describe algorithms for computing our proposed estimators. Variance estimation is presented in section 5. A simulation study is described in section 6, and a real-data-based simulation study using the 1988 US National Maternal and Infant Health Survey is presented in section 7. In section 8, we conclude with a discussion.
2. QUANTILE REGRESSION AND THE DESIGN-BASED ESTIMATOR
Suppose the finite population is generated from a super-population model , where xi is a vector of covariates, yi is the study variable, and zi is the design variable, which may not be observed. We assume yi given xi follows the following QR model:
| (1) |
where is unknown coefficient vector, and ϵi is an error term such that for the τ-th quantile . A complex survey sample S is drawn with sampling indicator Ii such that Ii = 1 if unit i is selected and 0 otherwise, . The first- and second-order inclusion probabilities are denoted as for selecting unit i and for unit i and unit j. Consequently, the corresponding design weight for unit i is , which is known for units in S.
Following Kim and Skinner (2013), the sampling design is assumed to be informative in the sense that the design weights are functions of the covariates and the design variable; that is, with , and . The sampling is informative if y and z are related, conditional on x. Under informative sampling, we have (i.e., the selection of unit i depends not only on the covariates but also on the study variable). The design-weighted (DW) estimator of the QR coefficents is
where By an argument similar to that of Koenker (2005) and after some algebra, it can be shown that is the solution of the following estimating equation:
| (2) |
The estimator is consistent for estimating by an argument similar to that of Wang and Opsomer (2011). Its efficiency may be further improved by modeling the design weights as described in the next section.
3. PROPOSED METHODS
We propose five new weight-smoothing estimators of QR coefficient in generalized linear models. Specifically, we consider estimators that satisfy the following estimating equation:
| (3) |
where the weights di in (2) are replaced by new weights wi chosen to improve efficiency. All of the weight-smoothing methods were initially developed for regression analysis of mean values of the study variable. We adapt these methods to our QR problem.
3.1 Smoothed Design-Weight (SDW) Estimator
Beaumont (2008) used a smoothing weight to estimate the population mean of y. Kim and Skinner (2013) extended the idea by using in the context of linear regression to obtain regression coefficient estimates. For our QR analysis, we use the same weights in (4) and denote the corresponding estimator as . As in Kim and Skinner (2013), we can show that is consistent and if the conditional expectation is correctly modeled.
In general, in (3) is unknown. To estimate , one can use a parametric model, such as a linear or nonlinear regression method, or a nonparametric model, such as splines. However, the parametric model approach is vulnerable to model misspecification, and the nonparametric model approach is subject to the well-known curse of dimensionality if the dimension of covariate x is large. Instead, we fit the following generalized additive model (GAM) to estimate (Hastie and Tibshirani 1990).
| (4) |
where xit is the t-th variable in g0 is an unknown parameter, are unknown functions that satisfy certain regularity conditions, and ei is assumed to have normal distribution with mean 0 and variance . Model (4) is quite general and can be easily extended to more general cases with unequal variance and non-Gaussian exponential family distributions. For simplicity, we only consider (4) with lower-order spline functions and Gaussian errors with constant variance. After obtaining estimators , and we estimate by using
3.2 Unsmoothed Pfeffermann-Sverchkov (PS) Estimator and Smoothed Pfeffermann-Sverchkov (SPS) Estimator
Pfeffermann and Sverchkov (1999) proposed weights where to produce efficient and consistent estimates of linear regression coefficients. We propose to obtain by using a similar technique to that used to obtain in section 3.1. The extension to quantile regression is trivial, and we denote the corresponding estimator as Note that is consistent even if the model is misspecified (see the justification for the consistency of UOPT estimator).
To further improve efficiency, Pfeffermann and Sverchkov (1999) proposed weights , which yield a consistent and a more efficient estimator if the weight model is correctly specified. We denote this SPS estimator as
The SPS estimator minimizes the following prediction distance function
Because
and
a consistent estimator can be obtained by solving (3) with
3.3 Unsmoothed and Smoothed Optimal (UOPT and SOPT) Estimators
In this section, we propose two novel optimal weight modification estimators. Under the correct weight models, one will be more efficient than , and the other will be more efficient than and We assume , and the sampling design is Poisson; Kim and Skinner (2013) also made this assumption to derive the optimal weight in linear regression models.
Consider a class of estimators that solves (3) with The UOPT estimator is obtained by choosing to minimize the variance of the following class of estimators:
or equivalently as the solution of the following estimating equations:
| (5) |
According to Koenker (2005), we have so
| (6) |
By the argument in Van der Vaart (1998, Chapter 5) and according to (6), it can be shown that is consistent for for arbitrary , under mild regularity conditions. After some algebra, it can be shown that has the following asymptotic expansion:
and the corresponding asymptotic conditional variance can be written as
| (7) |
where is the conditional density of y given x evaluated at and Thus, with minimizes the variance defined in (7). Specifically, we have
| (8) |
The estimator is discussed in section 4. Denote the estimator by using as . It is easy to see that estimators and belong to this class of estimators, so the UOPT estimator is more efficient.
For a more efficient estimator than and the SOPT estimator is obtained by minimizing variance for a class of estimators defined by By arguments similar to those for UOPT, the corresponding estimators are consistent, since
Under the correct weight models, SOPT is even more efficient than UOPT, as seen in our simulation studies. By similar techniques to those used for the UOPT estimator, it can be shown that the optimal choice of qi is with Specifically, we have
We discuss how to obtain the estimator of in section 4. We denote the estimator by using as
4. ALGORITHMS FOR COMPUTING THE UOPT AND SOPT ESTIMATORS
In this section, we discuss algorithms for computing the UOPT and SOPT estimators by the GAM approach. The UOPT estimator can be estimated by the following steps:
Set , the solution of the estimating equation (2).
Estimate by using the GAM approach and assuming a normal distribution of where is the estimated conditional density of y given x. The conditional expectation is assumed to include main effects of xi and their second- and third-order interactions.
- Estimate by predictions
by assuming the following generalized additive model (GAM):
where S1 denotes the units in S such that , using techniques similar to the estimation of in section 3.1. Estimate by similar techniques. Estimate and by substituting the estimated density in Step 2. Then, according to (8), Estimate and the corresponding optimal estimator by solving (5) with
Repeat step 3 to step 4 with updated estimator until convergence.
The SOPT estimator is obtained as follows:
Same as step 1 for UOPT estimator.
Estimate of by the GAM approach described in section 3.1.
Same as step 2 for the UOPT estimator.
Same as step 3 for the UOPT estimator, with di replaced by in the model.
Estimate and the corresponding optimal estimator by solving equation (3) with
Repeat steps 4 and 5 until convergence.
5. VARIANCE ESTIMATION
The Taylor linearization approach to variance estimation involves tedious technical derivations, especially when the estimation procedure includes semiparametric methods. We now describe bootstrap estimates of variance for our proposed estimators, with associated confidence regions.
We apply pseudo-population bootstrap methods (Gross 1980; Booth, Butler, and Hall 1994; Conti, Marelia, and Mecatti 2017), which are simple and practical and have been shown to work effectively under high-entropy designs (Conti et al. 2017), such as Rao-Sampford (Rao 1965; Sampford 1967) and randomized proportional-to-size systematic sampling. Our proposed bootstrap method can be described as follows:
For choose a unit i from the original sample S independently with probability If at trial k the unit is selected, define
The pseudo-bootstrap population is then Draw a bootstrap sample from by using the same design as the original design with first-order inclusion probabilities If zi is unknown, then one can use , which is the corresponding original inclusion probability for the k-th element in the pseudo-bootstrap population.
Obtain the bootstrap sample estimator from by using our proposed method.
Generate B bootstrap samples by the previously described procedure, with corresponding estimators for Then, the bootstrap variance estimator is:
where The confidence region of is then
where is the -th percentile of a χ2distribution with degrees of freedom q, the dimension of Alternatively, one can use bootstrap percentiles of statistics to obtain the confidence region. For inference with individual parameter defined in where one can use the following normal-based confidence interval:
| (9) |
where is the a-th component of and is the corresponding estimated variance of
6. SIMULATION STUDY
We now compare the performance of all six estimators in a simulation study. We generated M = 1,000 finite populations with population size N = 10,000 from the following population model: where , covariates were independently and identically distributed (iid) with a normal distribution with means and variances , and ϵi were iid with a standard normal distribution. The parameter ψ1 was set to zero (homoscedastic variance) or 0.2 (heteroscedastic variance).
For each generated finite population of size N, a Poisson sample was then selected with inclusion probabilities where n = 400 was the expected sample size and ki was the size variable such that and . Note that this sampling was informative because the inclusion probabilities depended on the outcome variable y. Specifically, the correlation between ϵ and π was about 0.6. Sample sizes varied but were all close to 400.
For the population model, it can be shown that the τ-th conditional quantile of yi is where with as the τ-th quantile of ϵi, which could be readily calculated. Our parameters of interest were the QR regression coefficients In the simulation study, and 0.6 were considered.
We compared the six estimators described previously in terms of Monte Carlo (MC) relative bias (RBias), MC relative standard error (RSE), MC relative root mean squared error (RRMSE), and MC coverage properties, including MC coverage probability (CP), standard error relative bias (SERBias), and relative average confidence interval length (RCILen). The formulas for those quantities are as follows:
where β represents the true value for parameters or represents the estimator based on the m-th MC sample for β, and represent the lower and upper 95 percent confidence interval bounds for β based on the formula (9) in section 5, and represents our proposed bootstrap variance estimator based on the m-th MC sample. We selected 200 bootstrap samples for variance estimation for each MC sample.
The point estimation results are presented in table 1 for and table 2 for . Under the homoscedastic scenario in table 1, all estimators had small RBias, which was consistent with the underlying theorem. The DW and SDW estimators had the largest RRMSE, since the DW estimator did not use any smoothing technique to reduce variance, and the single smoothing model in the SDW estimator was not efficient. The UOPT, PS, SPS, and SOPT estimators had comparable RSE and RRMSE. To test the sensitivity of model specification, we assumed an equal variance structure under the heteroscedastic scenario; the results were comparable with assuming the correct heteroscedastic variance structure. As shown in table 2, all estimators had small bias for most of the cases. For all cases, the UOPT and SOPT estimators had significantly smaller RSE and RRMSE than did other estimators. For simplicity, we only present the results for coverage properties for the scenario in which and (table 3). Other scenarios had similar results. The UOPT and SOPT estimators had better or comparable coverage than other estimators for most of the cases, and their CP was close to the nominal level of 95 percent. The SERBias for all estimators, based on our proposed bootstrap methods, were less than 8.8 percent, validating our proposed variance estimation approach. The DW and SDW estimators had larger RCILen than did other estimators. The UOPT and SOPT estimators had RCILen smaller than that of the PS and SPS estimators. We also considered the scenario where the correlation between ϵ and π is about 0.3, and the results were similar (results not presented here).
Table 1.
Monte Carlo (MC) Relative Bias (RBias) ( ), Relative Standard Error (RSE) ( ), and Relative Root Mean Squared Error (RRMSE) ( ) for Six Different Methods with .
| Tau | Par | Method | RBias | RSE | RRMSE |
|---|---|---|---|---|---|
| 0.4 | DW | −1 | 94 | 94 | |
| UOPT | 5 | 79 | 80 | ||
| SDW | 2 | 90 | 90 | ||
| PS | 2 | 79 | 79 | ||
| SPS | 5 | 77 | 77 | ||
| SOPT | 8 | 78 | 78 | ||
| DW | 7 | 169 | 169 | ||
| UOPT | 4 | 149 | 149 | ||
| SDW | 10 | 162 | 162 | ||
| PS | 2 | 148 | 148 | ||
| SPS | 6 | 143 | 143 | ||
| SOPT | 8 | 145 | 145 | ||
| 0.6 | DW | −1 | 81 | 81 | |
| UOPT | 6 | 68 | 68 | ||
| SDW | 2 | 78 | 78 | ||
| PS | 4 | 68 | 68 | ||
| SPS | 8 | 67 | 67 | ||
| SOPT | 9 | 68 | 68 | ||
| DW | −1 | 150 | 150 | ||
| UOPT | 5 | 133 | 133 | ||
| SDW | 4 | 145 | 145 | ||
| PS | 3 | 131 | 131 | ||
| SPS | 8 | 127 | 127 | ||
| SOPT | 9 | 129 | 129 |
Table 2.
Monte Carlo (MC) Relative Bias (RBias) ( ), Relative Standard Error (RSE) ( ), and Relative Root Mean Squared Error (RRMSE) ( ) for Six Different Methods with .
| Tau | Par | Method | RBias | RSE | RRMSE |
|---|---|---|---|---|---|
| 0.4 | DW | 1 | 91 | 91 | |
| UOPT | 9 | 54 | 55 | ||
| SDW | 5 | 89 | 89 | ||
| PS | 6 | 60 | 60 | ||
| SPS | 10 | 59 | 60 | ||
| SOPT | 11 | 55 | 56 | ||
| DW | 3 | 155 | 155 | ||
| UOPT | 9 | 104 | 104 | ||
| SDW | 9 | 152 | 152 | ||
| PS | 6 | 114 | 114 | ||
| SPS | 12 | 110 | 111 | ||
| SOPT | 12 | 102 | 102 | ||
| 0.6 | DW | −1 | 86 | 86 | |
| UOPT | 7 | 53 | 54 | ||
| SDW | 4 | 83 | 83 | ||
| PS | 5 | 56 | 56 | ||
| SPS | 8 | 55 | 56 | ||
| SOPT | 10 | 54 | 55 | ||
| DW | −3 | 155 | 155 | ||
| UOPT | 9 | 112 | 113 | ||
| SDW | 3 | 152 | 152 | ||
| PS | 5 | 116 | 116 | ||
| SPS | 10 | 113 | 114 | ||
| SOPT | 13 | 110 | 111 |
Table 3.
Coverage Probability (CP) ( ), Standard Error Relative Bias (SERBias) ( ), and Relative Average Confidence Interval Length (RCILen) ( ) for Six Different Methods with and .
| Par | Method | CP | SERBias | RCILen |
|---|---|---|---|---|
| DW | 948 | −6 | 337 | |
| UOPT | 953 | 88 | 228 | |
| SDW | 939 | 14 | 330 | |
| PS | 954 | 64 | 234 | |
| SPS | 952 | 65 | 230 | |
| SOPT | 949 | 66 | 225 | |
| DW | 960 | 61 | 646 | |
| UOPT | 949 | 65 | 469 | |
| SDW | 959 | 60 | 631 | |
| PS | 945 | 60 | 483 | |
| SPS | 950 | 67 | 474 | |
| SOPT | 952 | 74 | 462 |
7. REAL-DATA-BASED SIMULATION STUDY
We further compare our estimators on a real data set previously analyzed by Korn and Graubard (1995) and Pfeffermann and Sverchkov (1999). The data were collected as part of the 1988 US National Maternal and Infant Health Survey, which used a stratified random sample of vital records corresponding to live births, late fetal deaths, and infant deaths in the United States. The strata were constructed using the mother’s race and child’s birth weight, and the sampling fractions varied according to strata.
Pfeffermann and Sverchkov (1999) treated birth weight (measured in grams) as the study variable Y and gestational age (measured in weeks) as the predictor X. After deleting 506 observations with missing values, the finite population size was reduced to 9,447. One can fit the following linear regression model using the finite population and obtain the estimated model
| (10) |
with and The p-values for all regression coefficients were highly significant (p <0.0001). The R2 value was about 0.6. The original design was informative because the strata were determined using the study variable birth weight. The correlation between d0 and was 0.32, where d0 was the original design weight in the survey and was the estimated residuals obtained from (10). In other words, even after adjusting for predictor variable gestational age, a correlation remained between the design weights and the study variable.
Rather than the mean model described in (10), we estimated the quantile regression of Y on X, with parameters of interest the τth quantile regression coefficients and . Before conducting the simulation, we first fitted the mean regression model, as well as quantile regression models with 0.6, and 0.8. The results are presented in figure 1. From figure 1, it is clear that the quantile regression–fitted lines are not parallel, unlike the conventional homoscedastic mean regression model. This result suggests the skewness of distribution for birth weight and shows that quantile regression provides a more comprehensive analysis.
Figure 1.
Mean Regression and Quantile Regression Models.
For the simulation, we chose for illustration. We conducted 1,000 Monte Carlo simulations to compare the six quantile regression coefficient estimators. In each simulation, one sample was generated from the finite population with an expected sample size of 400 by using the Poisson sampling design with inclusion probability , where was the design weight for the jth subject in the finite population. The bootstrap size was set to 200 for variance estimation for all six estimators, and 95 percent confidence intervals were constructed.
Before comparing the performance of our proposed estimators for quantile regression coefficients, we first compared the performance of the design-based estimator of median regression coefficients and mean regression coefficients using only the linear term for the purpose of illustration. The purpose of this comparison was to show that there is an advantage in using quantile regression instead of mean regression for data with certain features. The results in table 4 show that the estimators of median regression coefficients have smaller relative bias, relative standard error, and relative root mean squared error than do the estimators of mean regression. This occurs because the distribution of residual terms displays some skewness and the Kolmogorov-Smirnov test rejects the normality assumption (p < 0.05). Furthermore, there was some heteroscedastic trend in variance.
Table 4.
Monte Carlo (MC) Relative Bias (RBias), Relative Standard Error (RSE), and Relative Root Mean Squared Error (RRMSE) for Comparing Mean Regression with Median Regression.
| Parameters | Method | RBias | RSE | RRMSE |
|---|---|---|---|---|
| β 0 | Mean | −0.013 | 0.194 | 0.195 |
| Median | 0.000 | 0.116 | 0.116 | |
| β 1 | Mean | 0.005 | 0.097 | 0.097 |
| Median | 0.001 | 0.071 | 0.071 |
Table 5 summarizes the simulation results, comparing the performance of all six estimators. For point estimation, the DW and SDW estimators had larger RBias than did the other estimators. The DW estimator had the largest RSE and RRMSE for all cases—as expected—because the efficiency of the DW estimator is improved through weight-smoothing. The SOPT estimator had the smallest RSE and RRMSE. The SPS estimator was the second-best estimator in terms of RSE and RRMSE. The UOPT estimator had similar RRMSE to that of the PS estimator, as in Kim and Skinner (2013). All confidence coverages were close to the nominal rate of 95 percent.
Table 5.
Monte Carlo (MC) Relative Bias (RBias), Relative Standard Error (RSE), and Relative Root Mean Squared Error (RRMSE) for Six Different Methods.
| Parameters | Method | RBias | RSE | RRMSE |
|---|---|---|---|---|
| DW | 0.063 | 0.285 | 0.292 | |
| UOPT | −0.021 | 0.223 | 0.224 | |
| SDW | 0.081 | 0.228 | 0.242 | |
| PS | −0.043 | 0.213 | 0.218 | |
| SPS | −0.001 | 0.199 | 0.199 | |
| SOPT | 0.001 | 0.182 | 0.182 | |
| DW | −0.027 | 0.128 | 0.131 | |
| UOPT | 0.005 | 0.100 | 0.100 | |
| SDW | −0.035 | 0.102 | 0.108 | |
| PS | 0.018 | 0.097 | 0.098 | |
| SPS | −0.002 | 0.089 | 0.089 | |
| SOPT | −0.002 | 0.082 | 0.082 |
8. DISCUSSION
In this paper, we proposed several weight-smoothing estimators for estimating quantile regression coefficients in complex surveys under informative sampling design. Our proposed estimators were compared in terms of point estimation and variance estimation by using both simulated data and a real-data-based simulation study. All proposed estimators have smaller standard errors than the original design-based estimator. Unsmoothed and smoothed optimal estimators showed a better balance of variance, bias, and coverage rate compared with other estimators. Smoothed estimators, based on nonparametric weight-smoothing models, outperformed unsmoothed estimators. All related R codes and an example data file are posted at the following website: https://github.com/yandzhao/Quantile-Regression-of-Survey-Data, last accessed August 28, 2018. For future research, we will consider estimating quantile regression coefficients with a clustered informative sampling design.
Acknowledgments
The authors sincerely thank Professor Danny Pfeffermann and Dr. Michael Sverchkov for sharing the 1988 US National Maternal and Infant Health Survey data with us. This work was supported partially by the funding provided by National Institutes of Health, National Institute of General Medical Sciences (Grant 1 U54GM104938), an IDeA-CTR to the University of Oklahoma Health Sciences Center
References
- Beaumont J. F. (2008), “ A New Approach to Weighting and Inference in Sample Surveys,” Biometrika, 95, 539–553. [Google Scholar]
- Booth J. G., Butler R. W., Hall P. (1994), “ Bootstrap Methods for Finite Populations,” Journal of the American Statistical Association, 89, 1282–1289. [Google Scholar]
- Cameron A. C., Trivedi P. K. (2005), Microeconometrics: Methods and Applications, Cambridge: Cambridge University Press. [Google Scholar]
- Chambers R. L. (2003), “Introduction to Part A,” in Analysis of Survey Data, eds. Chambers R. L., Skinner C. J., Chichester: Wiley. [Google Scholar]
- Chambers R. L., Skinner C. J. (2003), Analysis of Survey Data, Chichester: Wiley. [Google Scholar]
- Conti P. L., Marelia D., Mecatti F. (2017), “ Recovering Sampling Distributions of Statistics of Finite Populations via Resampling: A Predictive Approach,” submitted. [Google Scholar]
- Deaton A. (1997), The Analysis of Household Surveys: A Microeconometric Approach to Development Policy, Baltimore and London: Johns Hopkins University Press. [Google Scholar]
- Fuller W. (2009), Sampling Statistics, Hoboken: Wiley. [Google Scholar]
- Geraci M. (2016), “ Estimation of Regression Quantiles in Complex Surveys with Data Missing at Random: An Application to Birthweight Determinants,” Statistical Methods in Medical Research, 25, 1393–1421. [DOI] [PubMed] [Google Scholar]
- Gross S. (1980). “Median Estimation in Sample Surveys,” Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 181–184.
- Harrington D. M., Barreira T. V., Staiano A. E., Katzmarzyk P. T. (2014), “ The Descriptive Epidemiology of Sitting among US Adults, NHANES 2009/2010,” Journal of Science Medicine in Sport, 17, 371–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hastie T., Tibshirani R. (1990), Generalized Additive Models, New York: Chapman and Hall. [DOI] [PubMed] [Google Scholar]
- He X., Shao Q. (1996), “ A General Bahadur Representation of M -Estimators and Its Application to Linear Regression with Nonstochastic Designs,” The Annals of Statistics, 24, 2608–2630. [Google Scholar]
- Heeringa S. G., West B. T., Berglund P. A. (2010), Applied Survey Data Analysis, Boca Raton, FL: Taylor and Francis Group. [Google Scholar]
- Kim J. K., Skinner C. J. (2013), “ Weighting in Survey Analysis under Informative Sampling,” Biometrika, 100, 385–398. [Google Scholar]
- Knight K. (1998), “ Limiting Distribution for L1 Regression Estimators under General Conditions,” The Annals of Statistics, 26, 755–770. [Google Scholar]
- Koenker R. (2005), Quantile Regression, Cambridge. [Google Scholar]
- Koenker R., Bassett G. (1978), “ Regression Quantiles,” Econometrica, 46, 33–50. [Google Scholar]
- Korn E. L., Graubard B. I. (1995), “ Examples of Differing Weighted and Unweighted Estimates from a Sample Survey,” The American Statistician, 49, 291–295. [Google Scholar]
- Li Y., Graubard B. I., Korn E. L. (2010), “ Application of Nonparametric Quantile Regression to Body Mass Percentile Curves from Survey Data,” Statistics in Medicine, 29, 558–572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Magee L. (1998), “ Improving Survey-Weighted Least Squares Regression,” Journal of Royal Statistical Society, Series B, 60, 115–126. [Google Scholar]
- Mu Y., He X. (2007), “ Power Transformation toward a Linear Regression Quantile,” Journal of the American Statistical Association, 102, 269–279. [Google Scholar]
- Nelson D. E., Powell-Griner E., Town M., Kovar M. G. (2003), “ A Comparison of National Estimates from the National Health Interview Survey and the Behavioral Risk Factor Surveillance System,” American Journal of Public Health, 93, 1335–1341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pfeffermann D. (2011), “Modelling of Complex Survey Data: Why Model? Why Is It a Problem? How Can We Approach It?” Survey Methodology, 37, 115–136. [Google Scholar]
- Pfeffermann D., Sverchkov M. Y. (1999), “ Parametric and Semi-Parametric Estimation of Regression Models Fitted to Survey Data,” Sankhya B, 61, 166–186. [Google Scholar]
- Pfeffermann D., Sverchkov M. Y. (2003), “Fitting Generalized Linear Models under Informative Sampling,” in Analysis of Survey Data, eds. Chambers R. L., Skinner C. J., Chichester: Wiley. [Google Scholar]
- Pfeffermann D., Sverchkov M. Y. (2009), “Inference under Informative Sampling,” in Handbook of Statistics 29B; Sample Surveys: Inference and Analysis, eds. Pfeffermann D., Rao C. R., Amsterdam: North Holland. [Google Scholar]
- Rao J. N. K. (1965), “ On Two Simple Schemes of Unequal Probability Sampling without Replacement,” Journal of the Indian Statistical Association, 3, 173–180. [Google Scholar]
- Sampford M. R. (1967), “ On Sampling without Replacement with Unequal Probabilities of Selection,” Biometrika, 54, 499–513. [PubMed] [Google Scholar]
- Scott A., Wild C. (2011), “ Fitting Regression Models with Response-Biased Samples,” Canadian Journal of Statistics, 39, 519–536. [Google Scholar]
- Van der Vaart A. W. (1998), Asymptotic Statistics, New York: Cambridge University Press. [Google Scholar]
- Wang J. Q., Opsomer J. D. (2011), “ On Asymptotic Normality and Variance Estimation for Nondifferentiable Survey Estimators,” Biometrika, 98, 91–106. [Google Scholar]

