Model Averaging Methods for Weight Trimming

Michael R Elliott

. Author manuscript; available in PMC: 2009 Nov 26.

Published in final edited form as: J Off Stat. 2008 Dec 1;24(4):517–540.

Model Averaging Methods for Weight Trimming

Michael R Elliott ¹

PMCID: PMC2783643 NIHMSID: NIHMS153059 PMID: 19946471

Abstract

In sample surveys where sampled units have unequal probabilities of inclusion, associations between the inclusion probabilities and the statistic of interest can induce bias. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have highly variable weights, which can introduce undesirable variability in statistics such as the population mean or linear regression estimates. Weight trimming reduces large weights to a fixed maximum value, reducing variability but introducing bias. Most standard approaches are ad-hoc in that they do not use the data to optimize bias-variance tradeoffs. This manuscript develops variable selection models, termed “weight pooling” models, that extend weight trimming procedures in a Bayesian model averaging framework to produce “data driven” weight trimming estimators. We develop robust yet efficient models that approximate fully-weighted estimators when bias correction is of greatest importance, and approximate unweighted estimators when variance reduction is critical.

Keywords: Sample survey, sampling weights, Bayesian population inference, weight pooling, variable selection, fractional Bayes Factors

1 Introduction

Analysis of data from samples designed to have differential probabilities of inclusion typically use case weights equal to the inverse of the probability of inclusion to provide reduce bias in the estimators of population quantities of interest. An example is the Horvitz-Thompson estimator (Horvitz and Thompson 1952) of a population mean $\bar{Y} = N^{- 1} \sum_{i = 1}^{N} y_{i}$ given by $\hat{\bar{Y}} = N^{- 1} \sum_{i \in s} w_{i} y_{i}$ , where w_i = 1/π_i, π_i is the probability of inclusion and s is the subset of the population units sampled. This fully-weighted estimator is unbiased for the population mean. For the wide class of non-linear estimators such as ratio estimators or linear regression slopes that are functions of linear statistics, bias can be reduced and consistent estimates of population values obtained by replacing implicit means or totals with their weighted equivalents (Binder 1983).

There is little debate that sampling weights be utilized when considering descriptive statistics such means and totals, although even here, highly variable probabilities of selection can give rise to bias-variance tradeoffs and the desire to employ weight trimming (Little et. al 1997). However, when estimating “analytical” models (Cochran 1977, p. 4) that focus on associations between, e.g., risk factors and health outcomes estimated via linear and generalized linear models, the decision to use sampling weights is less definitive (c.f. Korn and Gaubard, 1999, p. 180–182). Consider a population generated from

Y_{i} = A + {B X}_{i} + C X_{i}^{2} + ε_{i}, X_{i} \sim U (0, 1), ε_{i} \overset{ind}{\sim} N (0, 1),

(1.1)

while the superpopulation model of interest is the conditional distribution of Y_i given X_i modeled by

Y_{i} = α + β X_{i} + ε_{i}, ε_{i} \overset{iid}{\sim} N (0, σ^{2});

(1.2)

the superpopulation model is correctly specified when C = 0 and misspecified when C ≠ 0. We consider two sampling schemes; an ignorable sampling scheme that oversamples large values of X_i, and a non-ignorable scheme that oversamples large values of Y_i at a given value of X_i. The sampling scheme is ignorable in the regression context when the sampling probability is a function of X_i only and thus the inclusion indicator I_i is independent of Y_i | X_i because our goal is to determine the distribution of Y | X; non-ignorable designs in the regression setting retain an association between Y_i and I_i even conditional on X_i. Of course, designs in which I_i depends on X_i are non-ignorable for parameters that describe the marginal distribution of Y_i, unless Y_i ⊥ X_i (see Section 2). We assume that the goal of the modeler is to describe the association between Y and X using the regression slope β from the superpopulation model. If the superpopulation model is correctly specified, the target quantity of interest could be either the superpopulation slope or the population slope defined by $B = \sum_{i = 1}^{N} A_{i} (Y_{i} - \bar{Y})$ , where $A_{i} = (X_{i} - \bar{X}) / \sum_{i = 1}^{N} {(X_{i} - \bar{X})}^{2}, \bar{Y} = N^{- 1} \sum_{i = 1}^{N} Y_{i}, \bar{X} = N^{- 1} \sum_{i = 1}^{N} X_{i}$ (the “corresponding descriptive population quantity” in Pfeffermann [1993]). If the superpopulation model is misspecified, then only the population slope makes sense as a target quantity. The unweighted ordinary least squares (OLS) estimator and (case-)weighted least squares (WLS) estimator of α and β respectively are given by

\begin{matrix} (\begin{matrix} \hat{α} \\ \hat{β} \end{matrix}) = {(X^{T} S X)}^{- 1} X^{T} S Y \\ (\begin{matrix} {\hat{α}}^{w} \\ {\hat{β}}^{w} \end{matrix}) = {(X^{T} S^{w} X)}^{- 1} X^{T} S^{w} Y \end{matrix}

where the ith row of X, $X_{i}^{T}$ , is given by (1 X_i)^T, S is a diagonal matrix of sample inclusion indicators S_i, and S^w replaces S_i in S with S_i/π_i. Thus, the WLS estimator replaces the means and totals in the unweighted estimator with the Horvitz-Thompson equivalents.

Table 1 shows the results from 500 simulations for equivalent populations of N = 10000, under correctly specified and misspecified models and ignorable and non-ignorable sample designs, for sample sizes of n = 50 and n = 500. When the sample design is ignorable (probability of selection depends only on X) and the mean model correctly specified, both the unweighted and fully-weighted estimators are essentially unbiased, and the larger variance of the weighted estimator results in a larger mean square error (MSE). When the sampling is ignorable but the mean model incorrectly specified (linear instead of quadratic), the weighted estimator provides protection against model misspecificiation, but can introduce large variability into the estimator (note the larger MSEs for the weighted estimator when n=50). When the sample design is non-ignorable for the population slope, the weighted population slope estimator β̂^w accounts for the underrepresentation of smaller values of Y when X is small, reducing the negative bias in the slope; in these simulations this bias in the unweighted estimator was a greater contributor to MSE than variance from the weighted estimator.

Table 1.

% Bias (MSE in parentheses) for population slope for population generated under $Y_{i} ∣ X_{i} \sim N (A + {B X}_{i} + C X_{i}^{2}, 1)$ , i = 1,…, 10000, and superpopulation model is given by Y_i |X_i ~ N (α + βX_i, σ²): correctly specified (A = 0, B = 2, C = 0), misspecified (A = 0, B = 2, C = −1). Sampling design ignorable population slope (P(S_i|Y_i,X_i)∝X_i̇⁷⁵) or non-ignorable for population slope (P (S_i |Y_i, X_i) ∝ exp(.5Y_i/(X_i + .25) − 1)). Results from 500 simulations.

	Sampling ignorable?
	Yes		No
	Superpopulation model correctly specified?
	Yes	No	Yes	No
n=50
β̂	−0.2(.168)	−15.4(.203)	−30.8(.511)	−63.8(.518)
β̂^w	0.7(.264)	−3.8(.274)	−5.1(.239)	−10.5(.303)
n=500
β̂	−0.5(.015)	−16.3(.040)	−28.6(.339)	−55.7(.321)
β̂^w	−0.8(.033)	−0.3(.032)	−.7(.039)	0.0(.041)

Open in a new tab

The fully-weighted estimators (α̂^w β̂^w)^T are sometimes termed “pseudo-maximum likelihood” estimators (PMLEs) (Binder 1983, Pfeffermann 1993) because they are “design consistent” for the MLEs that would solve the score equations under the sampling model defined in (1.2) if we had observed data for the entire population:

U (β) = \sum_{i = 1}^{N} \frac{d}{d β} log f (y_{i} ∣ β) = \sum_{i = 1}^{N} X_{i} (X_{i}^{T} β - y_{i}) = 0

(1.3)

In brief, design consistency implies that the difference between the population target quantity and the estimate derived from the sample tends to zero as the sample size and population size jointly increase, or that these difference will on average tend to 0 from repeated sampling of the population, where samples are selected in an identical fashion from t → ∞ replicates of the population: see Sarndal (1980) or Isaki and Fuller (1982).

1.1 Weight Trimming

While PMLEs are popular in practice for the reasons discussed above, their bias reduction typically comes at the cost of increased variance. This increase can overwhelm the reduction in bias, so that the mean square error (MSE) actually increases under a weighted analysis, as in the example in Table 1. Even in cases where disproportional sample designs do reduce variance, as in “optimal allocation” where strata with more variable outcomes are oversampled (Kish 1965), designs that are optimal for one outcome may not be optimal for another, or for examination of associations (e.g., regression models).

Perhaps the most common approach to dealing with this problem is weight trimming or winsorization (Potter 1990, Kish 1992, Alexander et al. 1997), in which weights larger than some value w₀ are fixed as w₀. Thus bias is introduced to reduce variance, with the goal of an overall reduction in MSE. This manipulation of the weights reflects a traditional design-based approach to survey inference.

Other design-based methods have been considered in the literature. Potter (1990) discusses systematic methods for choosing w₀, including the weight distribution and MSE trimming procedures. The weight distribution technique assumes that the weights follow an inverted and scaled beta distribution; the parameters of the inverse-beta distribution are estimated by method-of-moment estimators, and weights from the upper tail of the distribution, say where 1− F (w_i) < .01, are trimmed to w₀ such that 1 − F (w₀) = .01. The MSE trimming procedure (Cox and McGrath 1981) determines the empirical MSE at a variety of trimming levels t = 1,…, T under the assumption that the true population mean is given by the fully weighted estimate: ${\hat{MSE}}_{t} = {({\hat{θ}}_{t} - {\hat{θ}}_{T})}^{2} + \hat{V} ({\hat{θ}}_{t})$ , where t = 1 corresponds to the unweighted data and t = T to the fully-weighted data, and θ̂_t is the value of the statistic using the trimmed weights at level t. The trimming level is then given by the level l minimized ${\hat{MSE}}_{t}$ over t.

In addition to adjusting for unequal probabilities of selection, case weights are also used to calibrate sample elements to known control totals in the population (Deville and Sarndal 1992), either jointly (poststratification weights) or marginally (raking weights). In the calibration literature, techniques have been developed that allow generalized poststratification or raking adjustments to be bounded to prevent the construction of extreme weights (Folsom and Singh 2000). Beaumont and Alavi (2004) extend this idea to develop estimators that focus on trimming large weights of highly influential or outlying observations. While these bounds trim extreme weights to a fixed cutpoint value, the choice of this cutpoint remains arbitrary. Another approach is to consider robust regression estimates (Hampel 1986) that downweight highly influential observations, although applications which consider downweighting influence statistics as an alternative to weight trimming in the context of survey designs are limited (Zavlasky et al. 2001 considered their use with ratio estimators).

This manuscript develops an alternative approach to weight trimming that considers the case weights as stratifying variables within strata defined by the probability of inclusion. These “inclusion strata” may correspond to formal strata from a disproportional stratified sample design, or may be “pseudo-strata” based on collapsed or pooled weights derived from selection, poststratification, and/or non-response adjustments. Ordering these weight strata by the inverse of the probability of selection and collapsing together the largest valued strata mimics weight trimming by assuming the underlying data from these combined strata are exchangeable (conditional on any covariates of interest). In a regression setting, this model can be posed as a variable selection problem, where dummy variables for the inclusion strata interact with the regression parameters; subtracting from or adding to the inclusion strata design matrix allows for a greater or lesser degree of weight trimming. By averaging over all possible of these “weight pooling” models, we can compute an estimator of the population parameter of interest whose bias-variance tradeoff is data-driven. By allowing for all contiguous inclusion strata (strata whose weights are closest in value) to be considered for pooling, we induce a high degree of robustness into our model, protecting against ”over-pooling” that simpler models suffered from (Elliott and Little 2000). We embed this model in a Bayesian framework, as we believe it provides a natural setting for model averaging, as well as a proper framework for population inference.

Section 2 reviews Bayesian finite population inference. Section 3 develops the weight pooling models for linear regression models in a fully Bayesian setting. Section 4 provides simulation results to determine the repeated sampling properties of the weight pooling estimators of linear regression parameters in a disproportional-stratified sample design and compares them with standard design-based estimators. Section 5 illustrates the use of the weight pooling estimator using data from the National Health and Nutrition Examination Survey to consider evidence for “Barker’s Hypothesis” that low birth weight babies are at greater risk for cardiovascular disease later in life. Section 6 summarizes the results of the simulations and considers extensions to generalized linear models.

2 Bayesian Finite Population Inference

Let the population data for a population with i = 1,…, N units be given by Y = (y₁,…, y_N), with associated covariate vectors X = (x₁,…, x_N) and sampling indicator variable I = (I₁,…, I_N), where I_i = 1 if the ith element is sampled and 0 otherwise. Similar to design-based population inference, Bayesian population inference focuses on population quantities of interest Q(Y), such as population means Q(Y) = Ȳ. In contrast to design-based inference, however, one posits a model for the population data Y as a function of parameters θ: Y ~ f (Y |θ). Inference about Q(Y) is made based on the posterior predictive distribution of p(Y_nob | Y_obs, I), where Y_nob consists of the elements of Y_i for which I_i = 0:

p (Y_{nob} ∣ Y_{obs}, I) = \frac{\int \int p (Y_{nob} ∣ Y_{obs}, θ, ϕ) p (I ∣ Y, θ, ϕ) p (Y_{obs} ∣ θ) p (θ, ϕ) d θ d ϕ}{\int \int \int p (Y_{nob} ∣ Y_{obs}, θ, ϕ) p (I ∣ Y, θ, ϕ) p (Y_{obs} ∣ θ) p (θ, ϕ) d θ d ϕ d Y_{n obs}}

(2.1)

where ϕ models the inclusion indicator. If we assume that ϕ and θ are a priori independent and if the distribution of sampling indicator I is independent of Y, the sampling design is said to be “unconfounded” or “noninformative”; if the distribution of I depends only on Y_obs, then the sampling mechanism is said to be “ignorable” (Rubin 1987), equivalent to the standard missing data terminology (the unobserved elements of the population can be thought of as missing by design). Under ignorable sampling designs p(θ, ϕ) = p (θ)p(ϕ) and p (I | Y, θ, ϕ) = p (I | Y_obs, ϕ), and thus (2.1) reduces to

\frac{\int p (Y_{nob} ∣ Y_{obs}, θ) p (Y_{obs} ∣ θ) p (θ) d θ}{\int \int p (Y_{nob} ∣ Y_{obs}, θ) p (Y_{obs} ∣ θ) p (θ) d θ d Y_{n obs}} = p (Y_{nob} ∣ Y_{obs}),

allowing inference about Q(Y) to be made without explicitly modeling the sampling inclusion parameter I (Ericson 1969, Holt and Smith 1979, Little 1993, Rubin 1987, Skinner et al. 1989). In the regression setting, where inference is desired about parameters that govern the distribution of Y conditional on fixed and known covariates X, (2.1) becomes

p (Y_{nob} ∣ Y_{obs}, X, I) = \frac{\int \int p (Y_{nob} ∣ Y_{obs}, X, θ, ϕ) p (I ∣ Y, X, θ, ϕ) p (Y_{obs} ∣ X, θ) p (θ, ϕ) d θ d ϕ}{\int \int \int p (Y_{nob} ∣ Y_{obs}, X, θ, ϕ) p (I ∣ Y, X, θ, ϕ) p (Y_{obs} ∣ X, θ) p (θ, ϕ) d θ d ϕ d Y_{n obs}}

which reduces to

p (Y_{nob} ∣ Y_{obs}, X) = \frac{\int p (Y_{nob} ∣ Y_{obs}, X, θ) p (Y_{obs} ∣ X, θ) p (θ) d θ}{\int \int p (Y_{nob} ∣ Y_{obs}, X, θ) p (Y_{obs} ∣ X, θ) p (θ) d θ d Y_{n obs}}

if and only if I depends only on (Y_obs, X), of which dependence on X only is a special case. Thus if inference is desired about a regression parameter Q(Y, X)|X, then a noninformative or more generally ignorable sample design can allow inclusion to be a function of the fixed covariates.

2.1 Accommodating Unequal Probabilities of Selection

Maintaining the ignorability assumption for the sampling mechanism often requires accounting for the sample design in both the likelihood and prior model structure. In the case of the disproportional probability-of-inclusion sample designs, this can be accomplished by developing an index h = 1,…, H of the probability of inclusion (Little 1983, 1991); this could either be a one-to-one mapping of the case weight order statistics to their rankings, or a preliminary “pooling” of the case weights using, e.g., the 100/H percentiles of the case weights. Let n_h be the number of included units and N_h the population size in weight stratum h, so that w_h = N_h/n_h for h = 1,…, H. We assume here that N_h is known, as when the weight strata come from a stratified random sample. (If N_h is unknown, as would be the case when the weights are constructed from estimated probabilities of inclusion via calibration or non-response adjustments, it can be replaced with N̂_h = n_hw_h. N̂_h can be treated as known, or if the underlying within-stratum samples are small, uncertainty in N̂_h can be incorporated into the model by treating n₁, …, n_H as a multinomial distribution of size n parameterized by unknown inclusion stratum probabilities q₁,…, q_H with, e.g., a Dirichlet prior [Lu and Gelman 2003]. Draws of $N_{h}^{*}$ could then be obtained as $N w_{h} q_{h}^{*} / n$ , where $q_{h}^{*}$ is drawn from the Dirichlet posterior for q. If the weights within a stratum are not all equal, then w_h can be approximated by the inverse of the mean probability of inclusion with the stratum given by $n_{h} / \sum_{i \in h} w_{h i}^{- 1}$ .) The data are then modeled by

y_{h i} ∣ θ_{h} \sim f (y_{h i}; θ_{h}), i = 1 \dots, N_{h}

for all elements in the hth inclusion stratum, where θ_h allows for an interaction between the model parameter(s) θ and the inclusion stratum h. Putting a noninformative prior distribution on θ_h then reproduces a fully-weighted analysis with respect to the expectation of the posterior predictive distribution of Q(Y).

3 Weight Pooling Models

Weight trimming effectively pools units with high weights by assigning them a common, trimmed weight. The untrimmed (design-based) weighted mean estimator in a disproportionally stratified design is then ${\bar{y}}_{w} = \frac{\sum_{h} \sum_{i} w_{h} y_{h i}}{\sum_{h} \sum_{i} w_{h}} = \sum_{h} \frac{N_{h}}{N_{+}} {\bar{y}}_{h}$ , where N₊ = Σ_h N_h, the total population. Weight trimming typically proceeds by establishing an a priori cutpoint, say 3 for the normalized weights, and multiplying the remaining weights by a normalizing constant γ = (N₊ − Σκ_iw_o)/Σ(1 − κ_i)w_i, where κ_i is an indicator variable for whether or not w_i ≥ w₀. The trimmed mean estimator is thus given by

\begin{matrix} {\bar{y}}_{w t} = \sum_{h = 1}^{l - 1} γ \frac{N_{h}}{N_{+}} {\bar{y}}_{h} + \sum_{h = l}^{H} \frac{w_{0} n_{h}}{N_{+}} {\bar{y}}_{h} = \\ γ \sum_{h = 1}^{l - 1} \frac{N_{h}}{N_{+}} {\bar{y}}_{h} + \frac{w_{0} \sum_{h = l}^{H} n_{h}}{N_{+}} {\bar{y}}^{(l)} \end{matrix}

where $γ = \frac{N_{+} - w_{0} \sum_{h = l}^{H} n_{h}}{\sum_{h = 1}^{l - 1} N_{h}}$ and ${\bar{y}}^{(l)} = (1 / \sum_{h = l}^{H} n_{h}) \sum_{h = l}^{H} n_{h} {\bar{y}}_{h}$ . Choosing $w_{0} = \frac{\sum_{h = l}^{H} N_{h}}{\sum_{h = l}^{H} n_{h}}$ yields γ = 1 and ${\bar{y}}_{w t} = \sum_{h = 1}^{l - 1} \frac{N_{h}}{N_{+}} {\bar{y}}_{h} + \frac{(\sum_{h = l}^{H} N_{h})}{N_{+}} {\bar{y}}^{(l)}$ , which corresponds to the estimate for a model that assumes distinct stratum means for the smaller weight strata and a common mean for the larger weight strata, that is:

\begin{matrix} y_{h i} ∣ μ_{h} \sim N (μ_{h}, σ^{2}) h < l \\ y_{h i} ∣ μ_{l} \sim N (μ_{l}, σ^{2}) h \geq l \\ μ_{h}, μ_{l}, log σ \propto const . \end{matrix}

(3.1)

Elliott and Little (2000) considered an extension of this model where we no longer assume the cutpoint l is known:

\begin{matrix} y_{h i} ∣ μ_{h} \sim N (μ_{h}, σ^{2}) h < l \\ y_{h i} ∣ μ_{l} \sim N (μ_{l}, σ^{2}) h \geq l \\ p (L = l) = 1 / H \\ p (σ^{2} ∣ L = l) = σ^{- (l + 1 / 2)} \\ p (β ∣ σ^{2}, L = l) = {(2 π)}^{- l} \end{matrix}

where μ₁ = β₀,…, μ_l = β₀ + β_l₋₁. This “weight pooling” model averages the estimators obtained from all possible weight trimming cutpoints, where each estimator contributes to the final average based on the probability that the cutpoint is “correct”. This posterior probability is determined via Bayesian variable selection models that determine the posterior probability of each cutpoint model conditional on the observed data.

3.1 Weight Pooling Models for Linear Regression

This manuscript extends Elliott and Little (2000) in two ways. First, we consider the linear regression of Y_i on fixed covariates x_i. Thus the most general model must allow for interactions between the probability of selection and the linear regression slopes; the full interaction model (a different slope within each probability-of-selection stratum, equivalent to no pooling) approximately reproduces the fully-weighted estimator, while the minimal model (a single slope across all probability-of-selection strata, equivalent to full pooling) approximately reproduces the unweighted estimator. Pooling of some, but not all, of the strata, reproduces the trimmed estimator where the degree of trimming is determined by the degree to which the data suggest that distinct probability-of-selection strata have similar linear regression slopes. Second, we allow for the pooling of all conterminous inclusion strata. This increases the robustness of the model, by permitting the lowest probability-of-selection strata to interact with the linear regression slopes even when higher probability-of-selection strata are pooled. Thus

\begin{matrix} y_{h i} ∣ x_{h i}, β_{l}, σ^{2}, L = l \overset{ind}{\sim} N (Z_{l i}^{T} β_{l}, σ^{2}) \\ β_{l} ∣ σ^{2}, L = l \sim N (β_{0}, σ^{2} \sum_{0}) \\ σ^{2} ∣ L = l \sim Inv - χ^{2} (a, s^{2}) \\ p (L = l) = 2^{- (H - 1)} \end{matrix}

(3.2)

where Z_li = D_hl ⊗ x_hi and D_hl is a vector of dummy variables that pool the appropriate conterminous inclusion strata based on the lth pooling pattern.

Table 2 shows the set of pooling patterns when H = 4. Under weak or non-informative priors, the first four pooling strata mimic standard weight trimming estimators, with L = 1 corresponding to an unweighted analysis and L = 4 corresponding to a fully-weighted analysis.

Table 2.

The set of {D_hl} when 4 weight strata are present: all patterns of pooling coterminous strata.

Pooling pattern index	Dummy variable pattern	Number of pooled strata

L = 1 (complete pooling)	D_hl= (1) for all h	H^* = 1

L = 2 (pool highest three weight strata)	D_hl=(1 0) for h = 1 D_hl=(0 1) for h ≥ 2	H^* = 2

L = 3 (pool highest two weight strata)	D_hl=(1 0 0) for h = 1 D_hl=(0 1 0) for h = 2 D_hl=(0 0 1) for h ≥ 3	H^* = 3

L = 4 (no pooling)	D_hl=(1 0 0 0) for h = 1 D_hl=(0 1 0 0) for h = 2 D_hl=(0 0 1 0) for h = 3 D_hl=(0 0 0 1) for h = 4	H^* = 4

L = 5 (pool all but highest weight stratum)	D_hl=(1 0) for h ≤ 3 D_hl=(0 1) for h = 4	H^* = 2

L = 6 (pool first and last two weight strata)	D_hl=(1 0) for h ≤ 2 D_hl=(0 1) for h ≥ 3	H^* = 2

L = 7 (pool middle two strata)	D_hl=(1 0 0) for h = 1 D_hl=(0 1 0) for h = 2, 3 D_hl=(0 0 1) for h = 4	H^* = 3

L = 8 (pool lowest two weight strata)	D_hl=(1 0 0) for h ≤ 2 D_hl=(0 1 0) for h = 3 D_hl=(0 0 1) for h = 4	H^* = 3

Open in a new tab

Our population quantity of interest B = (B₁,…, B_p)^T is the slope that solves the population score equation (1.3) where

U_{N} (β) = \sum_{i = 1}^{N} \frac{\partial}{\partial β} log f (y_{i}; β) = \sum_{i = 1}^{N} - \frac{1}{σ^{2}} (y_{i} - x_{i}^{T} β) x_{i} = 0

B = {(\sum_{i = 1}^{N} x_{i} x_{i}^{T})}^{- 1} (\sum_{i = 1}^{N} x_{i} y_{i})

Note that the quantity B such that U(B) = 0 is always a meaningful population quantity of interest even if the model is misspecified (i.e., y_i is not exactly linear with respect to the covariates), since it is the linear approximation of x_i to E(Y_i | x_i).

The posterior predictive distribution of B is then given by

p (B ∣ y, X) = \sum_{l} \int \int p (B ∣ y, X, θ_{l}) p (θ_{l} ∣ y, X) d θ_{l}

for (θ_l= (β_l, σ², L = l). Simulations from p(B | y, X) can be obtained by first obtaining a draw from p(θ_l|y, X), and then computing $B = {[\sum_{h = 1}^{H} W_{h} \sum_{i = 1}^{n_{h}} Z_{l i} Z_{l i}^{T}]}^{- 1} [\sum_{h = 1}^{H} W_{h} (\sum_{i = 1}^{n_{h}} Z_{l i} Z_{l i}^{T}) β_{l}]$ where W_h = N_h/n_h for the population size N_h and sample size n_h is the hth inclusion stratum. Note that this preserves the distribution of the covariates under the sample design while allowing the slopes to still be fully-modeled.

A direct draw from p(θ_l | y, X) = p(β_l | σ², L = l, y, X)p(σ²|L = l, y, X)p(L = l | y, X) is possible if H is of modest size; otherwise a Metropolis step can be run to obtain an approximation to the marginal posterior of p(L = l | y, X), and direct draws obtained accordingly. Details are provided in the Appendix.

3.2 Fractional Bayes Factors

In the absence of strong prior information to define p(θ_l), the Bayes Factors comparing weight pooling model l with weight pooling model l′

B F (y, X) = \frac{p (L = l ∣ y, X)}{p (L = l^{'} ∣ y, X)} = \frac{p (y ∣ L = l, X) p (L = l)}{p (y ∣ L = l^{'}, X) p (L = l^{'})} = \frac{\int f (y ∣ θ_{l}) p (θ_{l}) d θ_{l} p (L = l)}{\int f (y ∣ θ_{l^{'}}) p (θ_{l^{'}}) d θ_{l^{'}} p (L = l^{'})}

can be quite sensitive to the choice of p(θ_l) (Kass and Rafter 1995). We have a similar issue in our weight pooling model, since our marginal pooling probabilities are simply Bayes Factors converted from the odds to the probability scale. To counter this, we consider the “fractional Bayes factor” approach proposed in O’Hagan (1995). The concept extends the training-sample idea first proposed in Spiegelhalter and Smith (1982). A fraction b of the sample is set aside as to provide a data-based proper prior for θ_l. O’Hagan (1995) shows that the resulting Bayes factor for comparing model l with model l′ using the data-based prior, which he terms a fractional Bayes factor (FBF), is of the form $B F_{b} (y, X) = \frac{q_{l} (f, y, X) P (L = l)}{q_{l^{'}} (f, y, X) P (L = l^{'})}$ , where

q_{l} (f, y, X) = \frac{\int p (θ_{l}) f (y ∣ θ_{l}) d θ_{l}}{\int p (θ_{l}) f {(y ∣ θ_{l})}^{b} d θ_{l}} .

Small values of b should be most efficient at choosing correct models, while larger values of b are protective against outliers (data generated under a model not in the classes considered). O’Hagan proposed n⁻¹ log n and n^−1/2 as increasingly “robust” choices of b. O’Hagan assumes a non-informative prior h(θ_l) in contrast to our proper prior, but very weakly informative priors, as we use in simulations and examples below, can be used as well. The Appendix provides details describing the use of FBF in the weight pooling application.

4 Simulation Results

4.1 Mean Models

We consider the repeated sampling properties of our proposed models for estimating population means given by $\bar{Y} = N^{- 1} \sum_{i = 1}^{N} Y_{i}$ (i.e,, x_i = 1 for all i). We generated data under the following model:

Y_{i} ∣ Y_{i} \in h, μ, σ^{2} \sim N (μ_{h}, σ^{2}),

The population size of the H = 10 selection strata were as follows:

N = (800, 1000, 1200, 1500, 2000, 3000, 4000, 5000, 7500, 10000)

from which disproportional samples of size 500 and 100 were drawn:

\begin{matrix} n = (90, 80, 70, 60, 50, 50, 40, 30, 20, 10) \\ n = (18, 16, 14, 12, 10, 10, 8, 6, 4, 2) \end{matrix}

(maximum normalized weight=13.9).

We consider two patterns for the means across 10 inclusion strata:

μ_C = (22.5, 14.4, 9.0, 4.8, 1.8, −1.2, −1.8, −2.16, −1.92, −1.8)′
μ_D = (−1.8, −1.92, −2.16, −1.8, −1.2, 1.8, 4.8, 9.0, 14.4, 22.5)′

and considered values of σ² = 10^l, l = −1, 0,…, 3; 200 simulations were generated for each value of σ². The mean pattern μ_C would generally be favorable for weight trimming, since the means for the low probability-of-selection weight strata are approximately equal; μ_D would generally be unfavorable for weight trimming, since the means for the low probability-of-selection weight strata differ substantially. Generally, weight trimming should be more favorable as σ² → ∞ and the effect of the bias correction is minimized; the fully-weighted estimator will generally be favored as σ² → 0, and bias correction is paramount.

For priors, we considered μ₀ = μ̂ =(ȳ₁, …, ȳ_H)′, $\sum_{0} = c \sum_{h = 1}^{H} \sum_{i = 1}^{n_{h}} {(y_{h i} - {\bar{y}}_{h})}^{2}$ , and a = s = 10⁻⁸ (see (3.2)). This is a “data-based” prior that centers all the inclusion means at their unweighted sample values, with a variance scaled by the sample size n so that it is equivalent to a variance estimate based on a single observation. We further scale this prior by a factor c ≥ 1 to allow for reduced informativeness; we consider c = 1000 in the simulations below, making the prior effectively non-informative. We term the estimator of Ȳ obtained under this model PWT. We also consider the Factional Bayes Factor data-based prior as well; PWTF1, which uses a training fraction of log n/n, and PWTF2, which uses a larger training fraction of n^−1/2. O’Hagan suggests that PWTF1 will be more efficient at choosing the correct model when the true model is among the models considered, whereas PWTF2 will be more robust (have better repeated sampling properties when the true model is not among the models considered).

In addition to these three weight pooling models, we consider the standard designed-based (fully weighted) estimator (FWT), as well as two trimmed weight (TWT3, TWT7) and unweighted (UNWT) estimators. The TWT3 estimator is obtained by replacing the weights w_hi with trimmed values $w_{h i}^{t}$ that set the maximum normalized value to 3: $w_{h i}^{t} = \frac{N {\tilde{w}}_{h i}^{t}}{\sum_{h = 1}^{H} n_{h} {\tilde{w}}_{h}^{t}}$ , where ${\tilde{w}}_{h i}^{t} = min (w_{h i}, 3 N / n)$ ; this approximately corresponds to the weight pooling model (3.1) with l = 6. The TWT7 estimator uses trimmed values that set the maximized values to 7, approximately corresponds to the weight pooling model (3.1) with l = 8. The UNWT estimator obtained by fixing w_hi = N/n for all h, i. We estimate their variance using the Taylor Series (linearization) approximation (Binder 1983) that accounts for weighting and stratification.

Table 3 shows the root mean square error (RMSE) relative to the fully-weighted estimator and nominal 95% coverage for the three design-based and three model-based estimators of the population mean, as a function of the variance σ², under μ_C, the structure that favors weight trimming. Table 4 shows the equivalent measures under μ_D, the structure that is not consistent with weight trimming.

Table 3.

Square root of mean square error (RMSE) relative to RMSE of fully-weighted estimator, and true coverage of the 95% CI or PPI of population mean estimator under the model μ_C that is consistent with weight trimming.

	RMSE relative to FWT					True Coverage
n = 100	Variance log₁₀					Variance log₁₀
Estimator	−2	0	2	4	6	−2	0	2	4	6
UNWT	358.35	37.72	2.90	0.54	0.42	0	0	0	93	98
FWT	1	1	1	1	1	93	92	85	92	88
TWT3	36.80	3.92	0.65	0.70	0.65	0	0	94	96	96
TWT7	3.65	1.62	0.88	0.84	0.84	6	77	93	92	96
PWT	1.01	0.94	0.92	0.91	0.89	95	98	90	96	95
PWTF1	1.06	0.89	0.83	0.81	0.80	94	97	91	96	95
PWTF2	1.04	0.89	0.86	0.85	0.83	94	97	92	96	94

n = 500
UNWT	739.48	72.52	7.78	0.69	0.39	0	0	0	80	98
FWT	1	1	1	1	1	98	96	98	96	96
TWT3	76.14	7.56	1.02	0.64	0.63	0	0	88	97	98
TWT7	8.96	2.63	1.15	0.82	0.80	0	29	97	99	100
PWT	1.03	0.91	0.84	0.83	0.81	94	94	98	92	96
PWTF1	1.09	0.85	0.76	0.76	0.72	92	96	99	95	96
PWTF2	1.06	0.87	0.82	0.82	0.79	94	95	98	95	96

Open in a new tab

Table 4.

Square root of mean square error (RMSE) relative to RMSE of fully-weighted estimator, and true coverage of the 95% CI or PPI of population mean estimator under the model μ_D that is not consistent with weight trimming. (Dotted line indicates 95% interval for 200 simulations of a binomial with a 95% success rate.)

	RMSE relative to FWT					True Coverage
n = 100	Variance log₁₀					Variance log₁₀
Estimator	−1	0	1	2	3	−1	0	1	2	3
UNWT	471.23	37.73	4.37	0.56	0.39	0	0	0	94	98
FWT	1	1	1	1	1	94	88	88	93	90
TWT3	192.64	15.45	2.04	0.68	0.65	0	0	28	98	96
TWT7	23.49	7.83	2.74	1.09	0.86	0	0	30	82	94
PWT	1.00	1.01	1.08	0.90	0.91	96	88	90	94	96
PWTF1	1.00	0.99	1.17	0.82	0.81	96	88	88	96	96
PWTF2	1.00	1.00	1.13	0.85	0.84	95	90	89	94	96

n = 500

UNWT	1015.9	92.02	9.81	1.01	0.45	0	0	0	58	98
FWT	1	1	1	1	1	96	96	98	95	98
TWT3	415.58	37.64	4.02	0.75	0.66	0	0	0	95	98
TWT7	56.42	17.71	5.79	2.03	0.91	0	0	0	70	96
PWT	1.01	1.00	1.08	0.91	0.80	96	96	94	95	94
PWTF1	1.00	1.01	1.15	0.88	0.70	95	94	92	94	96
PWTF2	1.00	1.00	1.11	0.89	0.80	96	94	93	94	96

Open in a new tab

Even when the mean structure is favorable for weight trimming, the unweighted estimator (UNWT) and crude trimming estimators (TWT3, TWT7) behave poorly when σ² is small, but have better MSE properties than the fully weighted estimator and conservative coverage when then within-stratum variance is considerably greater than the between-stratum variance. The trimmed estimator requires a smaller residual variance to have better MSE properties that the fully weighted estimator, but the unweighted estimator has the best MSE properties for the largest residual variance. The fully weighted estimator is design-unbiased; coverage is approximately correct for n = 500, but anti-conservative when n = 100 due to the poor asymptotic approximation. The pooled weight estimator under the flat prior nearly dominates the fully-weighted estimator with respect to MSE and has approximately correct coverage when n = 100, since asymptotic assumption are not necessary for the Bayesian estimator. Similar results are found for the pooled weight estimator using the fractional Bayes Factor priors, except that the increase in efficiency is greater for larger σ² (RMSE reductions of nearly 30%).

When the mean structure is not favorable for weight trimming, the UNWT and TWT estimators both have larger MSE than the FWT estimator and very poor coverage except for very large σ². The pooled weight estimators are fairly robust, with slightly increased MSE relative to the fully weighted estimator for intermediate values of σ², and improved MSE relative to the fully weighted estimator for large values of σ². The true coverage of the pooled weight estimator is somewhat less that the nominal coverage when n = 100 but is still better than that of the fully weighted estimator, again reflecting the lack of need for asymptotic assumptions in the Bayesian paradigm.

4.2 Linear Regression Models

For the linear regression model, we generated population data under a linear spline as follows:

\begin{matrix} Y_{i} ∣ X_{i}, β, σ^{2} \sim N (β_{0} + \sum_{h = 1}^{10} β_{h} {(X_{i} - h)}_{+}, σ^{2}), \\ X_{i} \sim UNI (0, 10), i = 1, \dots, N = 20000. \end{matrix}

where (x)₊ = x if x ≥ 0 and (x)₊ = 0 if x < 0. A noninformative, disproportionally stratified sampling scheme sampled elements as a function of X_i (I_i equals 1 if sampled and 0 otherwise):

\begin{matrix} H_{i} = ⌈ X_{i} ⌉ \\ P (I_{i} = 1 ∣ H_{i}) = π_{h} \propto (1 + H_{i} / 2.5) H_{i} \end{matrix}

This created 10 strata, defined by the integer portions of the X_i values. A total of n = 1000 elements were sampled without replacement for each simulation (maximum normalized weight ≈ 14.9). The object of the analysis is to obtain the population slope $B_{1} = \frac{\sum_{i = 1}^{N} (Y_{i} - \bar{Y}) (X_{i} - \bar{X})}{\sum_{i = 1}^{N} {(X_{i} - \bar{X})}^{2}}$ .

We considered three patterns for β:

β_C = (0, 0, 0, 0, .5, .5, 1, 1, 2, 2, 4)′
β_D = (0, 11, −4, −2, −2, −1, −1, −.5, −.5, 0, 0)′
β_E = (0, 2, 0, 0, 0, 0, 0, 0, 0, 0,)′

and considered values of σ² = 10^l, l = 1,…, 5; 200 simulations were generated for each value of σ². The effect of model misspecification increases as σ² → 0 as the bias of the estimators becomes larger relative to the variance, and conversely decreases as σ² → ∞. Under β_C, weight trimming is likely to be a productive strategy under smaller values of σ² than under β_D, since the low probability-of-selection slopes are equal. Under β_E, the linear regression model for the population is correctly specified, and the unweighted estimator should be most efficient.

We use priors equivalent to the “data-based” priors we used for population means, extended to population slopes: β₀ = β̂ = (X^T X)⁻¹X^Ty, Σ₀ = cnVar(β̂) for Var(β̂) = τ̂²(X^T X)⁻¹, τ̂² = (n − p)⁻¹(y − Xβ̂)^T (y − Xβ̂), a = s = 10⁻⁸, and c = 1000. We again consider Fractional Bayes Factor with training fraction of log n/n and n^−1/2.

As in the population mean evaluation, we consider the FWT, TWT3, TWT7 and UNWT estimators, again estimating their variance using the Taylor Series (linearization) approximation that accounts for weighting and stratification. As in the mean model TWT3 approximately corresponds to the weight pooling model (3.1) with l = 6, and TWT7 approximately corresponds to the weight pooling model (3.1) with l = 8.

Table 5 shows the root mean square error (RMSE) relative to the fully-weighted estimator and nominal 95% coverage for the three design-based and three model-based estimators of the population slope (second component of B̂) as a function of the variance σ², under β_C, the structure that favors weight trimming for smaller values of σ²; Tables 6 and 7 show the equivalent measures under β_D and β_E, the structures that respectively favor weight trimming for only larger values of σ², and the correctly specified linear model. Under all three models, the nominal coverage of the 95% CI of fully weighted estimator is approximately correct.

Table 5.

Square root of mean square error (RMSE) relative to RMSE of fully-weighted estimator, and true coverage of the 95% CI or PPI of population linear regression slope estimator under the misspecified model β_C that favors weight trimming. (Dotted line indicates 95% interval for 200 simulations of a binomial with a 95% success rate.)

	RMSE relative to FWT					True Coverage
	Variance log₁₀					Variance log₁₀
Estimator	1	2	3	4	5	1	2	3	4	5
UNWT	15.27	4.68	1.75	0.61	0.57	0	0	16	87	96
FWT	1	1	1	1	1	96	92	94	95	94
TWT3	5.45	1.83	0.80	0.61	0.57	0	22	95	98	98
TWT7	2.72	1.18	0.73	0.74	0.66	19	82	98	96	98
PWT	0.99	0.98	0.97	0.93	0.93	96	94	96	96	96
PWTF1	1.00	1.01	1.00	0.73	0.55	90	88	91	92	97
PWTF2	0.94	0.90	0.84	0.72	0.70	96	94	96	96	99

Open in a new tab

Table 6.

Square root of mean square error (RMSE) relative to RMSE of fully-weighted estimator, and true coverage of the 95% CI or PPI of population linear regression slope estimator under the misspecified model β_D that is not consistent with weight trimming. (Dotted line indicates 95% interval for 200 simulations of a binomial with a 95% success rate.)

	RMSE relative to FWT					True Coverage
	Variance log₁₀					Variance log₁₀
Estimator	1	2	3	4	5	1	2	3	4	5
UNWT	9.69	3.68	1.52	0.57	0.63	0	0	25	93	96
FWT	1	1	1	1	1	92	91	96	94	96
TWT3	5.40	2.22	1.00	0.65	0.68	0	7	88	98	99
TWT7	3.24	1.43	0.89	0.85	0.75	6	69	93	95	98
PWT	1.00	1.00	1.01	0.93	0.90	84	92	93	96	98
PWTF1	1.02	1.04	1.11	0.60	0.53	85	92	90	96	98
PWTF2	1.03	1.03	0.96	0.74	0.70	88	93	94	98	96

Open in a new tab

Table 7.

Square root of mean square error (RMSE) relative to RMSE of fully-weighted estimator, and true coverage of the 95% CI or PPI of population linear regression slope estimator under the correctly specified model β_E. (Dotted line indicates 95% interval for 200 simulations of a binomial with a 95% success rate.)

	RMSE relative to FWT					True Coverage
	Variance log₁₀					Variance log₁₀
Estimator	1	2	3	4	5	1	2	3	4	5
UNWT	0.55	0.46	0.55	0.50	0.49	94	96	94	96	96
FWT	1	1	1	1	1	96	95	96	94	96
TWT3	0.64	0.54	0.66	0.60	0.59	96	100	98	98	98
TWT7	0.75	0.70	0.68	0.71	0.72	98	97	98	98	98
PWT	0.93	0.91	0.93	0.93	0.93	94	98	98	94	96
PWTF1	0.62	0.56	0.63	0.61	0.59	98	98	94	96	95
PWTF2	0.69	0.70	0.72	0.71	0.68	97	97	97	98	98

Open in a new tab

The unweighted and trimmed estimators are always biased because of model misspecification, although the reduction in variance overwhelms bias correction for large σ², yielding approximately correct nominal 95% CI coverage and smaller MSEs relative to the fully weighted estimator. When the model is correctly specified, the unweighted and trimmed estimators reduce RMSE by 35–45%, and nominal 95% CI coverage is correct.

The weight pooling estimator with non-informative prior generally tracks the fully weighted estimator in the presence of model misspecification, although for large σ² there is a 10% reduction in RMSE. Nominal 95% coverage is correct except for small values of σ² under β_D, the model least favorable to weight trimming. Under the correctly specified model, the weight pooling estimator with non-informative prior has a 5–10% reduction in RMSE, with correct nominal 95% PPI coverage.

The weight pooling estimator with the smaller training fraction FBF prior (PWTF1) has equivalent RMSE to the fully-weighted estimator when σ² is small under β_C and weight trimming is not warranted, but has equivalent RMSE to the unweighted estimator when σ² is large and weight trimming is appropriate. A similar pattern is seen under β_D, except that PWTF1 “overpools” somewhat for intermediate levels of σ², leading to slightly higher RMSE that the fully-weighted estimator. Under the correctly specified model β_E, PWTF1 has RMSE properties similar to that of TWT3, with a 35–45% reduction in RMSE. There is modest undercoverage of the nominal 95% PPI when σ² is small and the model is misspecified.

The weight pooling estimator with the larger training fraction FBF prior (PWTF2) is more robust that PWTF1, with little increase in RMSE over the fully-weighted estimator even when the model is misspecified and σ² is small, but retaining substantial RMSE reductions (over 30%) when bias correction is unimportant or the model is correctly specified. Coverage properties of the 95% PPI are correct, except for modest undercoverage under the “worst case” model (β_D with small σ²).

5 Application: Consideration of the Barker Hypothesis using NHANES data

Barker et al. (1993) described an association between low birth weight, and adult cardiovascular disease and type 2 diabetes. It was postulated that in face of a nutritionally stressed fetal environment, the fetus adapts in a manner which predisposes to the development of insulin resistance and increased CVD risk factors in later life. This hypothesis has been evaluated by a number of others (Curhan et al. 1996, Rich-Edwards et al. 1997, among many), but usually in convenience samples. A few analyses have considered whether evidence of the “Barker Hypothesis” exists in children (Forrester et al. 1996, Matthes et al. 1994), again with convenience samples and limited ethnic diversity.

To evaluate the Barker hypothesis in children using a population-based sample, we use the National Health and Examination Nutrition Survey III (NHANES III). NHANES III (U.S. Department of Health and Human Services 1997) is a US-wide survey designed to collect information about the diet and health status of the US population. The survey was conducted between 1988 and 1994 with 33,394 subjects, drawn from a probability sample of the US population with a complex sample design construction. The primary sample units (PSUs) consisting of standard metropolitan statistical areas (SMSAs), counties, or groups of counties were collapsed into strata. Strata containing a single large SMSA had that PSU selected with probability 1. Two PSUs were selected from the remaining strata using a “controlled selection” process that selected PSUs proportional to population while assuring balance on key covariates such as region, socioeconomic status, etc. Within each PSU clusters of dwelling units were sampled using controlled selection as well, and a systematic sample of addresses were then selected from each cluster. Oversamples of minorities (African- and Mexican-Americans) and the young (<6) and old (60+) were also obtained. The NHANES III sampling weights are highly variable: 215 ≤ w_i ≤ 79, 382, where 8% of the weights have a normalized values greater than 3. The weights include a non-response adjustment as well as a post-stratification adjustment to known Census age-sex-geographic-ethnicity (non-Hispanic Caucasian, non-Hispanic African-American, Mexican-American, and other) totals that also account for the age-ethnicity oversampling, and included crude trimming adjustments at each step. (No detail is provided about the weight trimming procedures except that fewer than 1% of cases have trimmed weights [Mohadjer et al., 1996]). In the analysis below, the weights were grouped into 10 strata for the weight-pooling model.

To evaluate this hypothesis using the population-based estimates in NHANES, we regress non-HDL cholesterol on birth weight and birth weight² among 4–12 year olds, unadjusted and adjusting for age, gender, age × gender, and current body-mass index (BMI). Table 8 shows the unweighted, fully-weighted, weight trimming (to a maximum normalized value of 3), pooled weight, and fractional Bayes factor pooled weight estimators along with estimates of bias and mean squared error under the assumption that the fully weighted estimator is unbiased for both the unadjusted and adjusted models. Because a fully-weighted regression estimator β̂_w is unbiased only in expectation, the estimated squared bias of a regression estimator β̂^* is given by max((β̂^* − β̂_w)² − V̂₀₁, 0) where ${\hat{V}}_{01} = \hat{Var} ({\hat{β}}^{*}) + \hat{Var} ({\hat{β}}_{w}) - 2 \hat{Cov} ({\hat{β}}^{*}, {\hat{β}}_{w}))$ (Kish 1992). To account for the effects of clustering and stratification in the multi-stage sample design, the variances of the regression estimators were calculated using a bootstrap (Davidson and Hinckley 1997, p.92–102) where PSUs were resampled with replacement within strata. For each resampled dataset, the unweighted and fully-weighted estimates were computed as B^u = (X′X)⁻¹X′y and B^w = (X′W X)⁻¹X′W y respectively, where W is the n × n diagonal case weight matrix. Point estimates under the weight pooling method were computed as $B^{p} = \sum_{l = 1}^{H^{*}} {\hat{B}}_{l} P (L = l ∣ y, X)$ , where ${\hat{B}}_{l} = {[\sum_{h = 1}^{H} W_{h} \sum_{i = 1}^{n_{h}} Z_{l i} Z_{l i}^{T}]}^{- 1} [\sum_{h = 1}^{H} W_{h} (\sum_{i = 1}^{n_{h}} Z_{l i} Z_{l i}^{T}) {\hat{β}}_{l}]$ , for ${\hat{β}}_{l} = {(Z_{l}^{'} Z_{l})}^{- 1} Z_{l}^{'} y$ where Z_l consists of the stacked vectors of Z_li. To compute P(L = l | y, X) we used a Factional Bayes Factor data-based prior with a training fraction of n^−1/2 (PWTF2).

Table 8.

Change in non-HDL cholesterol (mg/dL) associated with each 1 lb. change in birth weight, among US 4–12 year-olds, using unweighted (UNWT), fully-weighted (FWT), trimmed weight (TWT3), pooled weight (PWT), and fractional Bayes factor pooled weight (PWFT2) estimators; unadjusted and adjusted for age, gender and age × gender interactions. Point estimates for PWT and PWTF2 models from posterior median; 95% CI or PPI in subscript. RMSE=estimated root mean square error, treating fully-weighted estimator as unbiased in expectation. Data from National Health and Nutrition Examination Survey III.

	Unadjusted		Adjusted
	Birth weight	Birth weight²	Birth weight	Birth weight²
UNWT
Est._95%_CI	0.25_{−0.41, 0.85}	−0.19_{−0.42, 0.03}	−0.08_{−0.84, 0.44}	0.15_{−0.45, −0.01}
Bias	0.38	0.02	0.42	0.06
RMSE	0.32	0.12	0.36	0.11

FWT
Est._95%_CI	−0.13_{−1.39, 1.04}	−0.21_{−0.58, 0.08}	−0.51_{−1.75, 0.40}	−0.21_{−0.55, 0.00}
Bias	0	0	0	0
RMSE	0.61	0.15	0.57	0.13

TWT3
Est._95%_CI	−0.06_{−1.00, 0.81}	−0.19_{−0.48, 0.05}	−0.35_{−1.06, 0.32}	−0.24_{−0.63, 0.07}
Bias	0.07	0.02	0.16	−0.03
RMSE	0.46	0.12	0.34	0.18

PWT
Est._95%_CI	0.25_{−0.46, 0.81}	−0.19_{−0.40, 0.01}	−0.12_{−0.75, 0.39}	−0.28_{−0.45, −0.00}
Bias	0.38	0.02	0.40	−0.07
RMSE	0.31	0.11	0.29	0.11

PWFT2
Est._95%_CI	0.18_{−1.24, 0.92}	−0.19_{−.46, 0.03}	−0.13_{−1.20, 0.39}	−0.21_{−0.52, 0.02}
Bias	0.31	0.02	0.38	0.00
RMSE	0.50	0.13	0.40	0.12

Open in a new tab

In this example, the unweighted estimator appears to have better RMSE properties than the fully-weighted estimator, particularly for the linear term; the unweighted and weighted quadratic terms are approximately equal under both models. The weight pooling estimator compromises between the unweighted and fully-weighted estimator for the unadjusted linear term, but tracks the unweighted estimator in the adjusted model. The weight pooling estimator tracks the unweighted estimator in the unadjusted model and compromises between the weighted and unweighted estimator in the adjusted model; the fractional Bayes factor weight pooling estimator compromises between the unweighted and fully-weighted estimator for the unadjusted linear term, but tracks the unweighted estimator in the adjusted model. The weight pooling estimator has the best MSE properties, somewhat smaller than those of the unweighted estimator; the variance of the fractional Bayes factor weight pooling estimator is somewhat greater than that of the unweighted estimator. The crude trimming estimator has the next-best MSE properties, with the fully-weighted estimator having the maximum MSE for both the unadjusted and adjusted models.

Both the unadjusted and adjusted estimates suggest that a quadratic effect might be present, with extremely underweight, and normal and above-normal weight children having lower levels of non-HDL cholesterol than moderately underweight children. However, the trends were not jointly significant using a Wald test with 2 degrees of freedom using either the unweighted, fully-weighted, or weight-pooling estimators.

6 Discussion

In this manuscript we have developed a “weight smoothing” methodology that allows the data to make a principled tradeoff between bias and variance – approximating the fully weighted estimator when bias is of great importance, but moving toward the unweighted estimator when variance overwhelms the square of the bias correction factor. This model generalizes the work Elliott and Little (2000), where population inference was restricted to population means using a weight pooling model that mimicked weight trimming. A shortcoming of the previous model was lack of robustness: by considering submodels that pooled only the largest weight strata, data structures that favored fully-weighted estimators were “overpooled” and the resulting bias yielded MSEs that were larger than the fully-weighted estimators’ MSEs. Here we consider a model that allows for the pooling of all conterminous inclusion strata. This yields weight pooling estimators that are protected against overpooling, but have limited efficiency gains over fully-weighted estimators. By considering the “Fractional Bayes Factors” of O’Hagan (1995), in which a fraction b of the sample is set aside as to provide a data-based proper prior, we showed our resulting estimators retained their robustness properties while gaining considerable efficiencies over standard fully-weighted estimators. This manuscript also extends the weight pooling method to consider population linear regression slopes as well as population means.

We also applied the methods to assess “Barker’s Hypothesis,” an association between low birth weight, and adult cardiovascular disease and type 2 diabetes (Barker et al. 1993), using the nationally-representative National Health and Examination Nutrition Survey III. In this situation, the unweighted estimates of the quadratic effect of birth weight on non-HDL cholesterol generally had the best RMSE properties; however the weight pooling estimators outperformed the fully-weighted estimators.

When sampling weights are used to account for misspecification of the mean in a regression setting, it could be argued that the correct approach is to correctly specify the mean to eliminate discrepancies between the fully-weighted and unweighted estimates of the regression parameters. However, perfect specification is an unattainable goal, and even good approximations might be highly biased if case weights are ignored if the sampling probabilities are highly variable, even if the sampling itself is noninformative. In the informative sampling setting, it may be impossible to determine whether discrepancies between weighted and unweighted estimates are due to model misspecification or to the sample design itself. Finally, even misspecified regression models have the attractive feature in the finite population setting of yielding a unique target population quantity. Consequently accounting for the probability of inclusion in linear model settings continues to be advised, and methods that balance between a low-bias, high variance fully-weighted analysis and a high bias, low variance unweighted analysis remain useful.

The next logical extension of the weight pooling methods is into the generalized linear model setting. The situation is complicated here by the lack of a closed form solution for p(y | L = l, X) outside of the Gaussian special case, making it difficult to compute a Fraction Bayes Factor to enhance efficiency. One possibility is to utilize Laplace approximations (Tierney and Kadane 1986). In general, we have p(L = l | y) = C_l/Σ_l C_l, where C_l = ∫ f (y | θ_l)p(θ_l)dθ_l. By approximating the posterior with a normal distribution, we estimate C_l with (2π)^l/² | Σ̂ |^1/2 f (y | θ̂_l) p(θ̂_l), where θ̂_l is a value with high posterior probability (a median or mode). DiCiccio et al. (1997) discuss improvements on this approximation that may be utilized as well.

Acknowledgments

This research was supported by National Institute of Heart, Lung, and Blood grant R01-HL-068987-01. The author acknowledges Jack Chen for his assistance with programming, and Dr. Andrew Tershakovec for his assistance with the Barker’s Hypothesis analysis, as well as the Editor, Associate Editor, and three anonymous reviewers whose comments improved the manuscript.

7 Appendix

From (3.2), we obtain a direct draw from the posterior of p(β_l, σ², L = l | y, X) as follows:

$p (L = l ∣ y, X) = \frac{p (y ∣ L = l, X) P (L = l)}{\sum_{l} p (y ∣ L = l, X) P (L = l)}$ , where $p (y ∣ L = l, X) \propto ∣ Ψ_{l} ∣^{1 / 2} {[Δ_{l} - θ_{l}^{T} Ψ_{l} θ_{l}]}^{- (n + a) / 2}$ for $Ψ_{l} = {((Z_{l}^{T} Z_{l}) + \sum_{0})}^{- 1}, θ_{l} = (Z_{l}^{T} Z_{l}) b + \sum_{0} β_{0}, Δ_{l} = b^{T} (Z_{l}^{T} Z_{l}) b + β_{0}^{T} \sum_{0}^{- 1} β_{0} + Q_{l}^{2} + a s^{2}, b = {(Z_{l}^{T} Z_{l})}^{- 1} Z_{l}^{T} y$ , and $Q_{l}^{2} = y^{T} (I_{p H^{*}} - H_{l}) y, H_{l} = Z_{l} {(Z_{l}^{T} Z_{l})}^{- 1} Z_{l}^{T}$ .
$σ^{2} ∣ L = l, y, X \sim Inv - χ^{2} (n + a, Δ_{l} - θ_{l}^{T} Ψ_{l} θ_{l})$
β_l | σ², L = l, y, X ~ N(Γ_l A_l, σ²Γ_l), $A_{l} = Z_{l}^{T} y + \sum_{0}^{- 1} β_{0}, Γ_{l} = {[\sum_{0}^{- 1} + (Z_{l}^{T} Z_{l})]}^{- 1}$

We derive these marginal and conditional distributions in reverse order to simplify computation and notation.

3. is derived by noting that

\begin{matrix} p (β_{l} ∣ σ^{2}, L = l ∣ y, X) \propto f (y ∣ X, β_{l}, σ^{2}, L = l) p (β_{l} ∣ σ^{2}, L = l) \propto \\ exp (- \frac{1}{2 σ^{2}} [{(b - β_{l})}^{T} (Z_{l}^{T} Z_{l}) (b - β_{l}) + {(β_{l} - β_{0})}^{T} \sum_{0}^{- 1} (β_{l} - β_{0})]) \propto \\ f (b ∣ β_{l}, σ^{2}, L = l) f (β_{l} ∣ σ^{2}, L = l) \end{matrix}

for

\begin{matrix} b ∣ β_{l}, σ^{2}, L = l \sim N (β_{l}, σ^{2} {(Z_{l}^{T} Z_{l})}^{- 1}) \\ β_{l} ∣ σ^{2}, L = l \sim N (β_{0}, σ^{2} \sum_{0}) \end{matrix}

and thus by standard results (Gelman et al., 2004, p. 85–86)

β_{l} ∣ b, σ^{2}, L = l \sim N (\tilde{β}, \sum^{\sim})

where $\tilde{β} = {[{(σ^{2} \sum_{0})}^{- 1} + {(σ^{2} {(Z_{l}^{T} Z_{l})}^{- 1})}^{- 1}]}^{- 1} [{(σ^{2} {(Z_{l}^{T} Z_{l})}^{- 1})}^{- 1} b + {(σ^{2} \sum_{0})}^{- 1} β_{0}] = {[\sum_{0}^{- 1} + Z_{l}^{T} Z_{l}]}^{- 1} [Z_{l}^{T} y + \sum_{0}^{- 1} β_{0}]$ and $\sum^{\sim} = σ^{2} {[\sum_{0}^{- 1} + Z_{l}^{T} Z_{l}]}^{- 1}$ .

2. is derived by

\begin{matrix} p (σ^{2} ∣ y, X, L = l) \propto \int_{- \infty}^{\infty} f (y ∣ β_{l}, σ^{2}, L = l, X) p (β_{l} ∣ σ^{2}, L = l) p (σ^{2} ∣ L = l) d β_{l} \propto \\ {(2 π)}^{- \frac{n + p H^{*}}{2}} {(σ^{2})}^{- (\frac{n + p H^{*} + a}{2} + 1)} \times \\ \int_{- \infty}^{\infty} exp (- \frac{1}{2 σ^{2}} [{(β_{l} - b)}^{T} (Z_{l}^{T} Z_{l}) (β_{l} - b) + {(β_{l} - β_{0})}^{T} \sum_{0}^{- 1} (β_{l} - β_{0}) + Q_{l}^{2} + a s^{2})]) d β_{l} . \end{matrix}

Now

\begin{matrix} {(β_{l} - b)}^{T} (Z_{l}^{T} Z_{l}) (β_{l} - b) + {(β_{l} - β_{0})}^{T} \sum_{0}^{- 1} (β_{l} - β_{0}) + Q_{l}^{2} + a s^{2} = \\ β_{l}^{T} (Z_{l}^{T} Z_{l} + \sum_{0}^{- 1}) β_{l} - 2 β_{l}^{T} [(Z_{l}^{T} Z_{l}) b + \sum_{0}^{- 1} β_{0}] + b^{T} (Z_{l}^{T} Z_{l}) b + β_{0}^{T} \sum_{0}^{- 1} β_{0} + Q_{l}^{2} + a s^{2} = \\ {(β_{l} - Ψ_{l} θ)}^{T} Ψ_{l}^{- 1} (β_{l} - Ψ_{l} θ) + Δ_{l} - θ_{l}^{T} Ψ_{l} θ_{l} \end{matrix}

Thus

\begin{matrix} \int_{- \infty}^{\infty} exp (- \frac{1}{2 σ^{2}} [{(β_{l} - b)}^{T} (Z_{l}^{T} Z_{l}) (β_{l} - b) + {(β_{l} - β_{0})}^{T} \sum_{0}^{- 1} (β_{l} - β_{0} + Q_{l}^{2} + a s^{2})]) d β_{l} = \\ {(2 π σ^{2})}^{\frac{p H^{*}}{2}} ∣ Ψ_{l} ∣^{1 / 2} exp (- \frac{1}{2 σ^{2}} [Δ_{l} - θ_{l}^{T} Ψ_{l} θ_{l}]) \end{matrix}

from the normalizing constant for a N (μ, Σ) distribution, and thus

p (σ^{2} ∣ L = l, y, X) \propto {(2 π)}^{- \frac{n}{2}} {(σ^{2})}^{- (\frac{n + a}{2} + 1)} ∣ Ψ_{l} ∣^{1 / 2} exp (- \frac{1}{2 σ^{2}} [Δ_{l} - θ_{l}^{T} Ψ_{l} θ_{l}])

which is the kernel of a scaled inverse chi-square distribution with n + a degrees of freedom and scaling factor $Δ_{l} - θ_{l}^{T} Ψ_{l} θ_{l}$ .

1. then follows from 2.:

p (L = l ∣ y, X) \propto p (y ∣ L = l, X) p (L = l)

where

\begin{matrix} p (y ∣ L = l, X) = \int_{0}^{\infty} \int_{- \infty}^{\infty} f (y ∣ β_{l}, σ^{2}, L = l, X) p (β_{l} ∣ σ^{2}, L = l) p (σ^{2} ∣ L = l) d β_{l} d σ^{2} \propto \\ \int_{0}^{\infty} {(2 π)}^{- \frac{n}{2}} {(σ^{2})}^{- (\frac{n + a}{2} + 1)} ∣ Ψ_{l} ∣^{1 / 2} exp (- \frac{1}{2 σ^{2}} [Δ_{l} - θ_{l}^{T} Ψ_{l} θ_{l}]) d σ^{2} \propto \\ {(2 π)}^{- \frac{n}{2}} ∣ Ψ_{l} ∣^{1 / 2} Γ (\frac{n + a}{2}) {(\frac{n + a}{2})}^{- (n + a) / 2} {[\frac{Δ_{l} - θ_{l}^{T} Ψ_{l} θ_{l}}{n + a}]}^{- (n + a) / 2} \propto \\ ∣ Ψ_{l} ∣^{1 / 2} {[Δ_{l} - θ_{l}^{T} Ψ_{l} θ_{l}]}^{- (n + a) / 2} \end{matrix}

from the normalizing constant for the Inv − χ²(n, s²) distribution.

7.1 Fractional Bayes Factors

To implement O’Hagan’s (1995) Fractional Bayes Factors for the marginal weight pooling selection probability, we replaced

p (L = l ∣ y, X) \propto p (L = l) \int_{0}^{\infty} \int_{- \infty}^{\infty} f (y ∣ β_{l}, σ^{2}, L = l, X) p (β_{l} ∣ σ^{2}, L = l) p (σ^{2} ∣ L = l) d β_{l} d σ^{2}

with

p (L = l ∣ y, X) \propto p (L = l) \frac{\int_{0}^{\infty} \int_{- \infty}^{\infty} f (y ∣ β_{l}, σ^{2}, L = l, X) p (β_{l} ∣ σ^{2}, L = l) p (σ^{2} ∣ L = l) d β_{l} d σ^{2}}{\int_{0}^{\infty} \int_{- \infty}^{\infty} f {(y ∣ β_{l}, σ^{2}, L = l, X)}^{b} p (β_{l} ∣ σ^{2}, L = l) p (σ^{2} ∣ L = l) d β_{l} d σ^{2}} .

where 0 < b < 1 represents a “training fraction” of the data set aside to provide prior information for the parameters for the lth pooling model. From the derivation of 1. above we have

\begin{matrix} \int_{0}^{\infty} \int_{- \infty}^{\infty} f {(y ∣ β_{l}, σ^{2}, L = l, X)}^{b} p (β_{l} ∣ σ^{2}, L = l) p (σ^{2} ∣ L = l) d β_{l} d σ^{2} \propto \\ ∣ Ψ_{b l} ∣^{1 / 2} {[Δ_{b l} - θ_{b l}^{T} Ψ_{b l} θ_{b l}]}^{- (b n + a) / 2} \end{matrix}

for for $Ψ_{b l} = {((b Z_{l}^{T} Z_{l}) + \sum_{0})}^{- 1}, θ_{b l} = b (Z_{l}^{T} Z_{l}) b + \sum_{0} β_{0}, Δ_{b l} = b [b^{T} (Z_{l}^{T} Z_{l}) b + Q_{l}^{2}] + β_{0}^{T} \sum_{0}^{- 1} β_{0} + a s^{2}$ . Thus using FBF, we have

p (L = l ∣ y, X) \propto p (L = l) \frac{{[Δ_{b l} - θ_{b l}^{T} Ψ_{b l} θ_{b l}]}^{(b n + a) / 2} ∣ Ψ_{l} ∣^{1 / 2}}{{[Δ_{l} - θ_{l}^{T} Ψ_{l} θ_{l}]}^{(n + a) / 2} ∣ Ψ_{b l} ∣^{1 / 2}}

References

Alexander CH, Dahl S, Weidman L. Making estimates from the American Community Survey. Proceedings of the Social Statistics Section, American Statistical Association. 1997;2000:88–97. [Google Scholar]
Barker DJP, Gluckman PD, Godfrey KM, Harding JE, Owens JA, Robinson JS. Fetal Nutrition and Cardiovascular Disease in Adult Life. Lancet. 1993;341:938–941. doi: 10.1016/0140-6736(93)91224-a. [DOI] [PubMed] [Google Scholar]
Beaumont J-F, Alavi A. Robust Generalized Regression Estimation. Survey Methodology. 2004;30:195–208. [Google Scholar]
Binder DA. On the Variances of Asymptotically Normal Estimators from Complex Surveys. International Statistical Review. 1983;51:279–292. [Google Scholar]
Cox BG, McGrath DS. An Examination of the Effect of Sample Weight Truncation on the Mean Square Error of Survey Estimates. Paper presented at the 1981 Biometric Society ENAR meeting; Richmond, VA. 1981. [Google Scholar]
Curhan GC, Willett WC, Rimm EB, Spiegelman D, Ascherio AL, Stampfer MJ. Birth Weight and Adult Hypertension, Diabetes Mellitus and Obesity in US Men. Circulation. 1996;94:3246–3250. doi: 10.1161/01.cir.94.12.3246. [DOI] [PubMed] [Google Scholar]
Davidson AC, Hinckley DV. Bootstrap Methods and their Applications. Cambridge Press; Cambridge: 1997. [Google Scholar]
Deville JC, Sarndal CE. Calibration Estimators in Survey Sampling. Journal of the American Statistical Association. 1992;87:376–382. [Google Scholar]
DiCiccio TJ, Kass RE, Rafter A, Wasserman L. Computing Bayes factors by Combining Simulation and Asymptotic Approximations. Journal of the American Statistical Association. 1997;92:903–915. [Google Scholar]
Elliott MR, Little RJA. Model-based Approaches to Weight Trimming. Journal of Official Statistics. 2000;16:191–210. [Google Scholar]
Ericson WA. Subjective Bayesian Modeling in Sampling Finite Populations. Journal of the Royal Statistical Society. 1969;B31:195–234. [Google Scholar]
Folsom RE, Singh AC. The Generalized Exponential Model for Sampling Weight Calibration for Extreme Values, Nonresponse, and Poststratification. Proceedings of the Survey Research Methods Section, American Statistical Association. 2000;2000:598–603. [Google Scholar]
Forrester TE, Wilks RJ, Bennett FI, Simeon D, Osmond C, Allen M, Chung AP, Scott P. Fetal Growth and Cardiovascular Risk factors in Jamaican Schoolchildren. British Medical Journal. 1996;312:156–160. doi: 10.1136/bmj.312.7024.156. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2. Boca Raton, FL: Chapman and Hall/CRC; 2004. [Google Scholar]
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust Statistics: The Approach Based on Influence Functions. New York: Wiley; 1986. [Google Scholar]
Holt D, Smith TMF. Poststratification. Journal of the Royal Statistical Society. 1979;A142:33–46. [Google Scholar]
Horvitz DG, Thompson DJ. A Generalization of Sampling Without Replacement from a Finite Universe. Journal of the American Statistical Association. 1952;47:663–685. [Google Scholar]
Isaki CT, Fuller WA. Survey Design under a Regression Superpopulation Model. Journal of the American Statistical Association. 1982;77:89–96. [Google Scholar]
Kass RE, Rafter AE. Bayes Factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]
Kish L. Survey Sampling. New York: John Wiley and Sons; 1965. [Google Scholar]
Kish L. Weighting for Unequal Pi. Journal of Official Statistics. 1992;8:183–200. [Google Scholar]
Korn EL, Graubard BI. Analysis of Health Surveys. Wiley; New York: 1999. [Google Scholar]
Little RJA. Estimating a Finite Population Mean from Unequal Probability Samples. Journal of the American Statistical Association. 1983;78:596–604. [Google Scholar]
Little RJA. Inference with Survey Weights. Journal of Official Statistics. 1991;7:405–424. [Google Scholar]
Little RJA. Poststratification: A Modeler’s Perspective. Journal of the American Statistical Association. 1993;88:1001–1012. [Google Scholar]
Little RJA, Lewitzky S, Heeringa S, Lepkowski J, Kessler RC. Assessment of Weighting Methodology for the National Comorbidity Survey. American Journal of Epidemiology. 1997;146:439–449. doi: 10.1093/oxfordjournals.aje.a009297. [DOI] [PubMed] [Google Scholar]
Lu H, Gelman A. A Method for Estimating Design-based Sampling Variances for Surveys with Weighting, Poststratification, and Raking. Journal of Official Statistics. 2003;19:133–152. [Google Scholar]
Matthes JW, Lewis PA, Davies DP, Bethel JA. Relation between Birth Weight at Term and Systolic Blood Pressure in Adolescence. British Medical Journal. 1994;308:1074–1077. doi: 10.1136/bmj.308.6936.1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mohadjer L, Montaquila J, Waksberg J, Bell B, James P, Flores-Cervantes I, Montes M. Prepared by Westat, Inc., for the National Center for Health Statistics; Hyattsville, MD: 1996. National Health and Nutrition Examination Survey III: Weighting and Estimation Methodology, Executive Summary. http://www.cdc.gov/nchs/data/nhanes/nhanes3/cdrom/NCHS/MANUALS/WGTEXEC.PDF. [Google Scholar]
O’Hagan A. Fraction Bayes Factors for Model Comparison. Journal of the Royal Statistical Society. 1995;B57:99–138. [Google Scholar]
Potter F. A Study of Procedures to Identify and Trim Extreme Sample Weights. Proceedings of the Survey Research Methods Section, American Statistical Association. 1990;1990:225–230. [Google Scholar]
Pfeffermann D. The Role of Sampling Weights when Modeling Survey Data. International Statistical Review. 1993;61:317–337. [Google Scholar]
Pfeffermann D. The Use of Sampling Weights for Survey Data Analysis. Statistical Methods in Medical Research. 1996;5:239–261. doi: 10.1177/096228029600500303. [DOI] [PubMed] [Google Scholar]
Rich-Edwards JW, Stampfer MJ, Manson JE, Rosner B, Hankinson SE, Colditz GA, Willett WC, Hennekens CH. Birth Weight and Risk of Cardiovascular Disease in a Cohort of Women Followed up since 1976. British Medical Journal. 1997;315:396–400. doi: 10.1136/bmj.315.7105.396. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rubin DB. Multiple Imputation for Non-Response in Surveys. New York: Wiley; 1987. [Google Scholar]
Sarndal CE. On p-inverse Weighting Verses Best Linear Unbiased Weighting in Probability Sampling. Biometrika. 1980;67:639–650. [Google Scholar]
Skinner CJ, Holt D, Smith TMF. Analysis of Complex Surveys. Wiley; New York: 1989. [Google Scholar]
Spiegelhalter DJ, Smith AFM. Bayes factors for Linear and Log-linear Models with Vague Prior Information. Journal of the Royal Statistical Society. 1982;B44:377–387. [Google Scholar]
Tierney L, Kadane J. Accurate Approximations for Posterior Moments and Marginal Densities. Journal of the American Statistical Association. 1986;81:82–86. [Google Scholar]
Zaslavsky AM, Schenker N, Belin TR. Downweighting Influential Clusters in Surveys: Application to the 1990 Post Enumeration Survey. Journal of the American Statistical Association. 2001;96:858–869. [Google Scholar]

[R1] Alexander CH, Dahl S, Weidman L. Making estimates from the American Community Survey. Proceedings of the Social Statistics Section, American Statistical Association. 1997;2000:88–97. [Google Scholar]

[R2] Barker DJP, Gluckman PD, Godfrey KM, Harding JE, Owens JA, Robinson JS. Fetal Nutrition and Cardiovascular Disease in Adult Life. Lancet. 1993;341:938–941. doi: 10.1016/0140-6736(93)91224-a. [DOI] [PubMed] [Google Scholar]

[R3] Beaumont J-F, Alavi A. Robust Generalized Regression Estimation. Survey Methodology. 2004;30:195–208. [Google Scholar]

[R4] Binder DA. On the Variances of Asymptotically Normal Estimators from Complex Surveys. International Statistical Review. 1983;51:279–292. [Google Scholar]

[R5] Cox BG, McGrath DS. An Examination of the Effect of Sample Weight Truncation on the Mean Square Error of Survey Estimates. Paper presented at the 1981 Biometric Society ENAR meeting; Richmond, VA. 1981. [Google Scholar]

[R6] Curhan GC, Willett WC, Rimm EB, Spiegelman D, Ascherio AL, Stampfer MJ. Birth Weight and Adult Hypertension, Diabetes Mellitus and Obesity in US Men. Circulation. 1996;94:3246–3250. doi: 10.1161/01.cir.94.12.3246. [DOI] [PubMed] [Google Scholar]

[R7] Davidson AC, Hinckley DV. Bootstrap Methods and their Applications. Cambridge Press; Cambridge: 1997. [Google Scholar]

[R8] Deville JC, Sarndal CE. Calibration Estimators in Survey Sampling. Journal of the American Statistical Association. 1992;87:376–382. [Google Scholar]

[R9] DiCiccio TJ, Kass RE, Rafter A, Wasserman L. Computing Bayes factors by Combining Simulation and Asymptotic Approximations. Journal of the American Statistical Association. 1997;92:903–915. [Google Scholar]

[R10] Elliott MR, Little RJA. Model-based Approaches to Weight Trimming. Journal of Official Statistics. 2000;16:191–210. [Google Scholar]

[R11] Ericson WA. Subjective Bayesian Modeling in Sampling Finite Populations. Journal of the Royal Statistical Society. 1969;B31:195–234. [Google Scholar]

[R12] Folsom RE, Singh AC. The Generalized Exponential Model for Sampling Weight Calibration for Extreme Values, Nonresponse, and Poststratification. Proceedings of the Survey Research Methods Section, American Statistical Association. 2000;2000:598–603. [Google Scholar]

[R13] Forrester TE, Wilks RJ, Bennett FI, Simeon D, Osmond C, Allen M, Chung AP, Scott P. Fetal Growth and Cardiovascular Risk factors in Jamaican Schoolchildren. British Medical Journal. 1996;312:156–160. doi: 10.1136/bmj.312.7024.156. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2. Boca Raton, FL: Chapman and Hall/CRC; 2004. [Google Scholar]

[R15] Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust Statistics: The Approach Based on Influence Functions. New York: Wiley; 1986. [Google Scholar]

[R16] Holt D, Smith TMF. Poststratification. Journal of the Royal Statistical Society. 1979;A142:33–46. [Google Scholar]

[R17] Horvitz DG, Thompson DJ. A Generalization of Sampling Without Replacement from a Finite Universe. Journal of the American Statistical Association. 1952;47:663–685. [Google Scholar]

[R18] Isaki CT, Fuller WA. Survey Design under a Regression Superpopulation Model. Journal of the American Statistical Association. 1982;77:89–96. [Google Scholar]

[R19] Kass RE, Rafter AE. Bayes Factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]

[R20] Kish L. Survey Sampling. New York: John Wiley and Sons; 1965. [Google Scholar]

[R21] Kish L. Weighting for Unequal Pi. Journal of Official Statistics. 1992;8:183–200. [Google Scholar]

[R22] Korn EL, Graubard BI. Analysis of Health Surveys. Wiley; New York: 1999. [Google Scholar]

[R23] Little RJA. Estimating a Finite Population Mean from Unequal Probability Samples. Journal of the American Statistical Association. 1983;78:596–604. [Google Scholar]

[R24] Little RJA. Inference with Survey Weights. Journal of Official Statistics. 1991;7:405–424. [Google Scholar]

[R25] Little RJA. Poststratification: A Modeler’s Perspective. Journal of the American Statistical Association. 1993;88:1001–1012. [Google Scholar]

[R26] Little RJA, Lewitzky S, Heeringa S, Lepkowski J, Kessler RC. Assessment of Weighting Methodology for the National Comorbidity Survey. American Journal of Epidemiology. 1997;146:439–449. doi: 10.1093/oxfordjournals.aje.a009297. [DOI] [PubMed] [Google Scholar]

[R27] Lu H, Gelman A. A Method for Estimating Design-based Sampling Variances for Surveys with Weighting, Poststratification, and Raking. Journal of Official Statistics. 2003;19:133–152. [Google Scholar]

[R28] Matthes JW, Lewis PA, Davies DP, Bethel JA. Relation between Birth Weight at Term and Systolic Blood Pressure in Adolescence. British Medical Journal. 1994;308:1074–1077. doi: 10.1136/bmj.308.6936.1074. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Mohadjer L, Montaquila J, Waksberg J, Bell B, James P, Flores-Cervantes I, Montes M. Prepared by Westat, Inc., for the National Center for Health Statistics; Hyattsville, MD: 1996. National Health and Nutrition Examination Survey III: Weighting and Estimation Methodology, Executive Summary. http://www.cdc.gov/nchs/data/nhanes/nhanes3/cdrom/NCHS/MANUALS/WGTEXEC.PDF. [Google Scholar]

[R30] O’Hagan A. Fraction Bayes Factors for Model Comparison. Journal of the Royal Statistical Society. 1995;B57:99–138. [Google Scholar]

[R31] Potter F. A Study of Procedures to Identify and Trim Extreme Sample Weights. Proceedings of the Survey Research Methods Section, American Statistical Association. 1990;1990:225–230. [Google Scholar]

[R32] Pfeffermann D. The Role of Sampling Weights when Modeling Survey Data. International Statistical Review. 1993;61:317–337. [Google Scholar]

[R33] Pfeffermann D. The Use of Sampling Weights for Survey Data Analysis. Statistical Methods in Medical Research. 1996;5:239–261. doi: 10.1177/096228029600500303. [DOI] [PubMed] [Google Scholar]

[R34] Rich-Edwards JW, Stampfer MJ, Manson JE, Rosner B, Hankinson SE, Colditz GA, Willett WC, Hennekens CH. Birth Weight and Risk of Cardiovascular Disease in a Cohort of Women Followed up since 1976. British Medical Journal. 1997;315:396–400. doi: 10.1136/bmj.315.7105.396. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Rubin DB. Multiple Imputation for Non-Response in Surveys. New York: Wiley; 1987. [Google Scholar]

[R36] Sarndal CE. On p-inverse Weighting Verses Best Linear Unbiased Weighting in Probability Sampling. Biometrika. 1980;67:639–650. [Google Scholar]

[R37] Skinner CJ, Holt D, Smith TMF. Analysis of Complex Surveys. Wiley; New York: 1989. [Google Scholar]

[R38] Spiegelhalter DJ, Smith AFM. Bayes factors for Linear and Log-linear Models with Vague Prior Information. Journal of the Royal Statistical Society. 1982;B44:377–387. [Google Scholar]

[R39] Tierney L, Kadane J. Accurate Approximations for Posterior Moments and Marginal Densities. Journal of the American Statistical Association. 1986;81:82–86. [Google Scholar]

[R40] Zaslavsky AM, Schenker N, Belin TR. Downweighting Influential Clusters in Surveys: Application to the 1990 Post Enumeration Survey. Journal of the American Statistical Association. 2001;96:858–869. [Google Scholar]

PERMALINK

Model Averaging Methods for Weight Trimming

Michael R Elliott

Abstract

1 Introduction

Table 1.

1.1 Weight Trimming

2 Bayesian Finite Population Inference

2.1 Accommodating Unequal Probabilities of Selection

3 Weight Pooling Models

3.1 Weight Pooling Models for Linear Regression

Table 2.

3.2 Fractional Bayes Factors

4 Simulation Results

4.1 Mean Models

Table 3.

Table 4.

4.2 Linear Regression Models

Table 5.

Table 6.

Table 7.

5 Application: Consideration of the Barker Hypothesis using NHANES data

Table 8.

6 Discussion

Acknowledgments

7 Appendix

7.1 Fractional Bayes Factors

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Model Averaging Methods for Weight Trimming

Michael R Elliott

Abstract

1 Introduction

Table 1.

1.1 Weight Trimming

2 Bayesian Finite Population Inference

2.1 Accommodating Unequal Probabilities of Selection

3 Weight Pooling Models

3.1 Weight Pooling Models for Linear Regression

Table 2.

3.2 Fractional Bayes Factors

4 Simulation Results

4.1 Mean Models

Table 3.

Table 4.

4.2 Linear Regression Models

Table 5.

Table 6.

Table 7.

5 Application: Consideration of the Barker Hypothesis using NHANES data

Table 8.

6 Discussion

Acknowledgments

7 Appendix

7.1 Fractional Bayes Factors

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases