Predicting Random Effects with an Expanded Finite Population Mixed Model

Edward J Stanek, III; Julio M Singer

doi:10.1016/j.jspi.2007.11.012

. Author manuscript; available in PMC: 2009 Oct 1.

Published in final edited form as: J Stat Plan Inference. 2008 Oct 1;138(10):2991–3004. doi: 10.1016/j.jspi.2007.11.012

Predicting Random Effects with an Expanded Finite Population Mixed Model

Edward J Stanek III ¹, Julio M Singer ²

PMCID: PMC2597867 NIHMSID: NIHMS60208 PMID: 19802323

Abstract

Prediction of random effects is an important problem with expanding applications. In the simplest context, the problem corresponds to prediction of the latent value (the mean) of a realized cluster selected via two-stage sampling. Recently, Stanek and Singer (JASA, 2004) developed best linear unbiased predictors (BLUP) under a finite population mixed model that outperform BLUPs from mixed models and superpopulation models. Their setup, however, does not allow for unequally sized clusters. To overcome this drawback, we consider an expanded finite population mixed model based on a larger set of random variables that span a higher dimensional space than those typically applied to such problems. We show that BLUPs for linear combinations of the realized cluster means derived under such a model have considerably smaller mean squared error (MSE) than those obtained from mixed models, superpopulation models, and finite population mixed models. We motivate our general approach by an example developed for two-stage cluster sampling and show that it faithfully captures the stochastic aspects of sampling in the problem. We also consider simulation studies to illustrate the increased accuracy of the BLUP obtained under the expanded finite population mixed model.

Keywords: superpopulation, best linear unbiased predictor, random permutation, optimal estimation, design-based inference, mixed models

1. INTRODUCTION

Optimal estimation of average costs for hospitals that typically vary in size is an important practical problem because of the impact in health care economics, and patient choice of hospital care (see http://www.healthgrades.com, for example). In many cases, this is based on information obtained from patients (units) in hospitals (clusters) realized under a two-stage sampling scheme.

The best linear unbiased predictor (BLUP) developed under a mixed model is often offered as a solution to this problem (Searle et al. 1992). Although the mixed model accounts for unequal numbers of units in sample clusters, it does not use often available information about their sizes. The superpopulation model of Scott and Smith (1969) is an alternative that incorporates this information. Both models can be plausibly used to represent the problem of interest, but neither is formally linked to the finite population from which the two-stage sample is drawn as is the finite population mixed model recently proposed by Stanek and Singer (2004)¹ for situations where clusters are of equal size. Under this model, predictors have smaller mean squared error (MSE) than the competitors, even when the variance components are replaced by estimates as indicated in San Martino, Singer and Stanek (2007). We extend the approach Stanek and Singer (2004) by developing predictors under a new expanded finite population mixed model that outperforms the competitors both in equal and unequal size two-stage cluster sampling problems.

Suppose our interest is in the average cost of appendectomies (the latent value) for each of three hospitals in the past year (Table 1), and that such costs are known (without error) for some patients in two of the hospitals. When the data are obtained from a stratified simple random sample of appendectomy patients, with hospitals as strata, the best linear unbiased estimate is the average cost for the available patients in each hospital (i.e., $2000 for Central, and $1800 for Mercy).

Table 1.

Population of hospital’s appendectomy patients in the past year and observed data

Hospital (s)	M_s	Mean	Variance	Patient^* (t)
s=1 (County)	2	μ₁	$σ_{1}^{2}$	y₁₁	y₁₂
s=1 (County)	2	μ₁	$σ_{1}^{2}$
s=2 (Central)	4	μ₂	$σ_{2}^{2}$	y₂₁	y₂₂	y₂₃	y₂₄
s=2 (Central)	4	μ₂	$σ_{2}^{2}$	$2100 (Jane Blake)		$1400 (Sam Evans)	$2500 (Hong Yao)
s=3 (Mercy)	2	μ₃	$σ_{3}^{2}$	y₃₁	y₃₂	y₃₃
s=3 (Mercy)	2	μ₃	$σ_{3}^{2}$	$1700 (Mary Slokum)	$1900 (Juan Marcus)

Open in a new tab

Names are fictitious

Now assume that a simple random sample of appendectomy patients is selected from each of a simple random sample of hospitals (Table 2) according to a two-stage sampling scheme. We refer to a sample hospital as a primary sampling unit (PSU) to distinguish it from a specific hospital, and to a sample patient as a secondary sampling unit (SSU) to distinguish it from a specific patient. Under the usual mixed model, the sample appendectomy cost for SSU j in PSU i is

Y_{i j} = μ + B_{i} + E_{i j}

(1)

where μ is the overall mean, B_i is the random effect for PSU i, and E_ij is a random variable corresponding to the deviation of the response of SSU j from the latent value of PSU i, namely T_i = μ + B_i The random variables B_i and E_ij are usually considered independent with null expected values, and variances given by σ² and $σ_{i}^{2},$ respectively. Model (1) is an example of the general linear mixed model

Y = X α + Z B + E

(2)

where, for the sample in Table 2, $X = 1_{r}, Z = \oplus_{i = 1}^{n} 1_{m_{i}}, α = μ,$ and B = (B₁,…,B_n)′ with $Γ = σ^{2} I_{n}, \sum = \oplus_{i = 1}^{n} σ_{i}^{2} I_{m_{i}},$ and var (Y) = Ω = ZΓZ′ + ∑ with 1_a denoting an a×1 vector with all elements equal to one, I_a representing an a×a identity matrix, and $\oplus_{i = 1}^{n} A_{i}$ indicating a block diagonal matrix with blocks given by A_i (Graybill 1983). This model has a long history (see for example Harville 1978, Laird and Ware 1982) and is the main topic in several recent texts such as Brown and Prescott (1999), Verbeke and Molenberghs (2000), McCulloch and Searle (2001), Byrk and Raudenbush (2002), Diggle et al (2002), Singer and Willett (2003), Demidenko (2004), Littell et al (2006), and Jiang (2007). Under (1), the BLUP of the latent value for PSU i is

{\hat{P}}_{i} = \hat{μ} + k_{i} ({\bar{Y}}_{i} - \hat{μ})

(3)

where $\hat{μ} = \sum_{i = 1}^{n} (w_{i} / \sum_{i = 1}^{n} w_{i}) {\bar{Y}}_{i}$ is a weighted sample mean with $w_{i} = 1 / (σ^{2} + σ_{i}^{2} / m_{i}), {\bar{Y}}_{i} = \frac{1}{m_{i}} \sum_{j = 1}^{m_{i}} Y_{i j}, and k_{i} = \frac{σ^{2}}{σ^{2} + σ_{i}^{2} / m_{i}}$ (Goldberger 1962; Henderson 1984, McLean, Sanders and Stroup 1991; Robinson 1991). The predictor ${\hat{P}}_{i}$ is a linear function of Y (i.e., ${\hat{P}}_{i} = L' Y$ ), is unbiased (i.e. $E ({\hat{P}}_{i} - T_{i}) = 0$ ), and has minimum MSE. Using the realized random variables represented in Table 1, and assuming that σ = 100, σ₁ = 300 and σ₂ = 50 it follows that μ̂ = $1844, k₁ = 0.25, k₂ = 0.89, and the predictor of the latent value for the realized hospital corresponding to i = 1 (i.e., Central) is ${\hat{P}}_{1} = $ 1883,$ while the predictor of the latent value for i = 2 (i.e., Mercy) is ${\hat{P}}_{2} = $ 1805 .$

Table 2.

Notation for common mixed model (sample) and superpopulation model (sample and remainder)

Sample Hospital PSU (i)	Sample Size m_i	Latent Value μ + B_i	Sample Patient SSU (j)
Sample Hospital PSU (i)	Sample Size m_i	Latent Value μ + B_i	j=1	j=2	j=3	j=4
i=1	3	μ + B₁	Y₁₁	Y₁₂	Y₁₃
i=2=n	2	μ + B₂	Y₂₁	Y₂₂
i=1 (Central)						Y₁₄
i=2 (Mercy)					Y₂₃
i=3=N (County)			Y₃₁	Y₃₂

Open in a new tab

The estimate of a realized hospital’s latent value derived from the stratified model or the corresponding predictor obtained from the mixed model do not use additional information, such as the number of hospitals in the population, or the number of appendectomy patients in each hospital, even though such additional information may be available (as illustrated in the remainder in Table 2). The combined sample and remainder represents a superpopulation that is constructed by first (conceptually) selecting a finite population (presumably from some larger population in time or space), and then selecting a two-stage sample from it. Scott and Smith (1969) show that the latent value for a hospital in the superpopulation, $T_{i} = \sum_{j = 1}^{M_{i}} Y_{i j} / M_{i},$ is predicted by

{\hat{P}}_{i} = f_{i} {\bar{Y}}_{i} + (1 - f_{i}) [\hat{μ} + k_{i} ({\bar{Y}}_{i} - \hat{μ})],

(4)

where f_i = m_i/M_i. Using the data in Table 1, the resulting predictor for i = 1 (i.e., Central) is ${\hat{P}}_{1} = $ 1971,$ and for i = 2 (i.e. Mercy) is ${\hat{P}}_{2} = $ 1802 .$

The superpopulation model does not clearly separate the labeled clusters (as in Table 1) from random variables that represent a sample of clusters (i.e., note how Central Hospital is uniquely associated with i = 1 in Table 2). This separation is clear when the two-stage sampling process is represented with indicator random variables in the finite population mixed model developed by Stanek and Singer (2004). The resulting predictor (limited to equal size clusters and equal size cluster sample sizes) is

{\hat{P}}_{i} = f {\bar{Y}}_{i} + (1 - f) [\bar{Y} + k ({\bar{Y}}_{i} - \bar{Y})]

(5)

where $\bar{Y} = \frac{1}{n} \sum_{i = 1}^{n} {\bar{Y}}_{i}, f = \frac{m}{M}, k = \frac{σ^{* 2}}{σ^{* 2} + σ_{e}^{2} / m}, σ^{* 2} = σ^{2} - \frac{σ_{e}^{2}}{M} and σ_{e}^{2} = \frac{1}{N} \sum_{s = 1}^{N} σ_{s}^{2} .$ With equal size clusters, the predictor specified by (4) differs from (5) since variance components have different definitions. Theoretically, the expected MSE of (5) is less than the expected MSE for (4) or (3) as shown by Stanek and Singer (2004), while the empirical version of (5), formed by replacing variance components with their sample estimates, in general outperforms the empirical versions of the other predictors, as summarized by San Martino, Singer and Stanek (2007). However, (5) cannot be used for data like those in Table 1 and Table 2 since cluster and sample sizes differ.

When clusters are of equal size, the finite population mixed model can be used to represent the remaining random variables (as in Table 2) without the need to identify the realized clusters for sample PSUs. When clusters differ in size, we do not know how many SSUs remain since we do not know the size of the realized PSU. Other problems occur with the representation in Table 2, as for example, the impossibility that the PSU i = 1 be County Hospital (Table 2), even though the first stage sampling is assumed to be simple random sampling, or the apparent random nature of the second stage sample size, PSU size, and SSU variance due to the first stage sampling.

We extend the expanded model used by Stanek, Singer, and Lencina (2004) for simple random sampling to two-stage unbalanced sampling to overcome these problems. The expanded model simultaneously retains the cluster identity and the PSU position, and for each PSU, distinguishes the relevant contribution of both sampled SSUs, and non-sampled SSUs to a target random variable such as a PSU mean. For such purposes, we first define an expanded set of random variables, and subsequently show that a lower dimensional (collapsed) set can adequately represent the problem without loss of information. Following the steps in Stanek and Singer (2004), we specify the expanded finite population mixed model in Section 2, derive the corresponding BLUP along with its theoretical expected MSE in Section 3, and compare the proposed predictor to others via simulation studies in Section 4, and conclude with discussion in Section 5.

2. AN EXPANDED MIXED MODEL FOR A FINITE CLUSTERED POPULATION

Let a finite population be defined (as in Table 1) by a listing of units, labeled by t = 1,…, M_s, in each cluster, labeled by s = 1,…,N, where the non-stochastic potentially observable response for unit t in cluster s is given by y_st. The finite population mean and variance for cluster s are respectively defined as $μ_{s} = \frac{1}{M_{s}} \sum_{t = 1}^{M_{s}} y_{s t} and (\frac{M_{s} - 1}{M_{s}}) σ_{s}^{2} = \frac{1}{M_{s}} \sum_{t = 1}^{M_{s}} {(y_{s t} - μ_{s})}^{2} .$ Similarly, the population mean, and between cluster variance are respectively defined as $μ = \frac{1}{N} \sum_{s = 1}^{N} μ_{s} and (\frac{N - 1}{N}) σ^{2} = \frac{1}{N} \sum_{s = 1}^{N} {(μ_{s} - μ)}^{2} .$ We represent the potentially observable response for unit t in cluster s as y_st = μ + β_s + ε_st where β_s = (μ_s − μ) is the deviation of the mean for cluster s from the overall mean, and ε_st = (y_st − μ_s) is the deviation of the response for unit t (in cluster s) from the mean for cluster s. Letting $y = (y_{1}^{'} y_{1}^{'} \dots y_{N}^{'})$ where y_s = (y_s1 y_s2 … y_{sM_s})′, the reparameterized finite population can be summarized as

y = X μ + Z β + ε

(6)

where $X = 1_{ℕ}, \underset{ℕ \times N}{Z} = \oplus_{s = 1}^{N} 1_{M_{s}}, ℕ = \sum_{s = 1}^{N} M_{s}, β = (β_{1} β_{2} … β_{N})',$ and ε is defined similarly to y. None of the terms in (6) are random variables.

2.1. The Expanded Set of Random Variables

We define a vector of random variables to represent equally likely two-stage random permutations of the population (i.e., with probability $\frac{1}{N! \prod_{s = 1}^{N} M_{s}!},$ as in Cochran 1977). Without loss of generality, we assume that the sample clusters are in the first n positions in a permutation of clusters and that the sample units in cluster s correspond to the units in the first m_s positions in a permutation of that cluster’s units. The ordering of the random variables is important, since any realization can be re-ordered to exactly match the finite population values.

We use indicator random variables to relate the response for unit t in cluster s, namely, y_st, to the response for SSU j in PSU i. To do so, we let $U_{j t}^{(s)}$ be an indicator random variable that takes on a value of one when SSU j in cluster s is unit t, and zero otherwise, so that the response for SSU j in cluster s may be expressed as ${\tilde{Y}}_{s j} = \sum_{t = 1}^{M_{s}} U_{j t}^{(s)} y_{s t} .$ We include a fixed non-stochastic weight w_sj for SSU j in cluster s, and define the weighted response as ${\tilde{Y}}_{w s j} = w_{s j} {\tilde{Y}}_{s j}$ so that the sum, $\sum_{j = 1}^{M_{s}} {\tilde{Y}}_{w s j},$ will correspond to a cluster total when w_sj = 1 for all j = 1,…,M_s, or to a cluster mean when $w_{s j} = \frac{1}{M_{s}}$ for all j = 1,…,M_s, for example. Letting $U_{j}^{(s)} = (U_{j 1}^{(s)} U_{j 2}^{(s)} … U_{j M_{s}}^{(s)})',$ it follows that ${\tilde{Y}}_{w s j} = w_{s j} y_{s}^{'} U_{j}^{(s)} .$ The vector ${\tilde{Y}}_{w s} = ({\tilde{Y}}_{w s 1} {\tilde{Y}}_{w s 2} … {\tilde{Y}}_{w s M_{s}})'$ represents a permutation of weighted responses for the SSUs in cluster s.

We also let U_is be an indicator random variable that takes on a value of one when PSU i is cluster s, and a value of zero otherwise. If all clusters were equal in size, we could represent a permutation of SSUs for PSU i by $\sum_{s = 1}^{N} U_{i s} {\tilde{Y}}_{w s} .$ When cluster sizes differ, this sum is not defined, since the dimensions of the vectors composing it cannot all be equal. We solve this problem by expanding the set of random variables associated to PSU i into the ℕ×1 vector ${\overset{\leftrightarrow}{Y}}_{w i} = ((U_{i s} {\tilde{Y}}_{w s})) = (U_{i 1} {\tilde{Y}}_{w 1}^{'} U_{i 2} {\tilde{Y}}_{w 2}^{'} … U_{i N} {\tilde{Y}}_{w N}^{'})'$ so that a two-stage random permutation of the population is then represented by the Nℕ × 1 vector, ${\overset{\leftrightarrow}{Y}}_{w} = (({\overset{\leftrightarrow}{Y}}_{w i})) = ({\overset{\leftrightarrow}{Y}}_{w 1}^{'} {\overset{\leftrightarrow}{Y}}_{w 2}^{'} … {\overset{\leftrightarrow}{Y}}_{w N}^{'})',$ where the j^th element of ${\overset{\leftrightarrow}{Y}}_{w}$ that corresponds to U_is is $U_{i s} {\tilde{Y}}_{w s j} .$

2.2 The Expanded Finite Population Mixed Model

We construct a mixed model for the expanded response vector ${\overset{\leftrightarrow}{Y}}_{w}$ next. Indexing expectation with respect to permutations of clusters with the subscript ξ₁ and expectation with respect to permutations of units in a cluster with the subscript ξ₂, and for PSU i, we let

{\overset{\leftrightarrow}{Y}}_{w i} = E_{ξ_{1} ξ_{1}} ({\overset{\leftrightarrow}{Y}}_{w i}) + [E_{ξ_{2} | ξ_{1}} ({\overset{\leftrightarrow}{Y}}_{w i}) - E_{ξ_{1} ξ_{1}} ({\overset{\leftrightarrow}{Y}}_{w i})] + {\overset{\leftrightarrow}{E}}_{w i}

where $E_{ξ_{1} ξ_{2}} ({\overset{\leftrightarrow}{Y}}_{w i}) = \frac{1}{N} (\oplus_{s = 1}^{N} w_{s}) μ, E_{ξ_{2} | ξ_{1}} ({\overset{\leftrightarrow}{Y}}_{w i}) = (\oplus_{s = 1}^{N} w_{s} μ_{s}) U_{i}, {\overset{\leftrightarrow}{E}}_{w i} = {\overset{\leftrightarrow}{Y}}_{w i} - E_{ξ_{1} | ξ_{2}} ({\overset{\leftrightarrow}{Y}}_{w i})$ , $w_{s} = ((w_{s j})) = (w_{s 1} w_{s 2} … w_{s M_{s}})', μ = ((μ_{s})) = (μ_{1} μ_{2} … μ_{N})'$ , $U_{i} = ((U_{i s})) = (U_{i 1} U_{i 2} … U_{i N})' and {\overset{\leftrightarrow}{E}}_{w i}$ denotes the deviation of response from the expected response within a PSU. The fixed effects are given by μ, the vector of cluster means, while the random effects correspond to $E_{ξ_{2} | ξ_{1}} ({\overset{\leftrightarrow}{Y}}_{w i}) - E_{ξ_{1} ξ_{1}} ({\overset{\leftrightarrow}{Y}}_{w i}) .$ In the finite population mixed model of Stanek and Singer (2004), the random effect for PSU i was defined as $\sum_{s = 1}^{N} U_{i s} β_{s} = \sum_{s = 1}^{N} (U_{i s} μ_{s} - U_{i s} \sum_{s * = 1}^{N} \frac{1}{N} μ_{s *}),$ with the random variables U_is explicitly linking the clusters to PSU i. In the finite expanded mixed model, random effects are defined for SSU j in PSU i as $w_{s j} μ_{s} (U_{i s} - E_{ξ_{1}} (U_{i s})) .$ For example, when $w_{s j} = \frac{1}{M_{s}}$ for all j = 1,…,M_s (corresponding to the PSU mean), it follows that $\sum_{j = 1}^{M_{s}} w_{s j} μ_{s} (U_{i s} - E_{ξ_{1}} (U_{i s})) = U_{i s} μ_{s} - \frac{1}{N} μ_{s} .$ For both models, the expected value of the random effects (with respect to ξ₁) is zero. We combine the fixed and random effects to define the expanded finite population mixed model as

{\overset{\leftrightarrow}{Y}}_{w} = [\frac{1}{N} 1_{N} \otimes (\oplus_{s = 1}^{N} w_{s})] μ + [I_{N} \otimes (\oplus_{s = 1}^{N} w_{s} μ_{s})] v e c (U - E_{ξ_{1}} (U)) + {\overset{\leftrightarrow}{E}}_{w}

(7)

where U = (U₁ U₂ … U_N). The covariance matrix of the random effects is

{var}_{ξ_{1} ξ_{2}} ([I_{N} \otimes (\oplus_{s = 1}^{N} w_{s} μ_{s})] v e c (U - E_{ξ_{1}} (U))) = \frac{1}{N - 1} P_{N} \otimes [(\oplus_{s = 1}^{N} w_{s} μ_{s}) P_{N} (\oplus_{s = 1}^{N} w_{s} μ_{s})]

while the covariance matrix of ${\overset{\leftrightarrow}{E}}_{w}$ is

{var}_{ξ_{1} ξ_{2}} ({\overset{\leftrightarrow}{E}}_{w}) = I_{N} \otimes (\oplus_{s = 1}^{N} [\frac{σ_{s}^{2}}{N} (\oplus_{j = 1}^{M_{s}} w_{s j}) P_{M_{s}} (\oplus_{j = 1}^{M_{s}} w_{s j})]),

where $P_{a} = I_{a} - \frac{1}{a} J_{a}$ and J_a denotes an a × a matrix with all elements equal to one.

2.3. Defining Target Quantities

Model (7) is an expanded version of a finite population mixed model that retains the identity of clusters, while accounting for a two-stage random permutation. Our interest is to predict target linear combinations defined by $T = g' {\overset{\leftrightarrow}{Y}}_{w},$ where g is non-stochastic. For simplicity, we limit discussion to the case

g' = c' \otimes 1_{ℕ}^{'}

(8)

where c is an N × 1 vector of constants. In particular, we focus on the setting where c = e_i, i.e., an N × 1 vector with all elements equal to zero, except for element i which has the value of one. The principal interest lies in the setting where i≤n, i.e., in the clusters realized in the sample. When $w_{s j} = \frac{1}{M_{s}}$ for all s = 1,…,N, j = 1,…,M_s, the target, $T = \sum_{s = 1}^{N} U_{i s} (\sum_{j = 1}^{M_{s}} w_{s j} {\tilde{Y}}_{s j})$ is the mean of PSU i; when w_sj = 1 for all s = 1,…,N, j = 1, …,M_s, the target, $T = \sum_{s = 1}^{N} U_{i s} (\sum_{j = 1}^{M_{s}} w_{s j} {\tilde{Y}}_{s j})$ is the total of PSU i. Note that in both cases, the target is a random variable.

3. PREDICTING A PSU MEAN IN THE EXPANDED FINITE POPULATION MIXED MODEL

To obtain the BLUP, we adopt the basic strategy employed by Scott and Smith (1969), Royall (1976), Bolfarine and Zacks (1992), Valliant et al. (2000), and Stanek and Singer (2004), among others. We assume that the elements in the sample portion of ${\overset{\leftrightarrow}{Y}}_{w}$ will be observed, and express the target T as the sum of two parts, one which is a function of the sample, and the other, a function of the remaining random variables. Then, requiring the predictor to be a linear function of the sample random variables and to be unbiased, we obtain coefficients that minimize the MSE. While in theory, an optimal predictor can be obtained via this recipe, in practice, the high dimensionality of the expanded random vectors may result in singularities that lead to multiple solutions as discussed in Stanek, Singer, and Lencina (2004). For this reason, we explore projections of the expanded random variables into lower dimensional spaces that retain the necessary information for an optimal solution.

3.1. Partial Collapsing of the Expanded Finite Population Mixed Model Random Variables

Following Rao and Bellhouse (1978), we provide a way of determining whether the optimal linear unbiased predictor of a target random variable, $T = g' {\overset{\leftrightarrow}{Y}}_{w}$ can be obtained as the optimal linear unbiased predictor of $T = g_{p}^{'} {\overset{\leftrightarrow}{Y}}_{w p}$ based on a vector of collapsed random variables that spans a lower dimensional space defined by ${\overset{\leftrightarrow}{Y}}_{w p} = \overset{\leftrightarrow}{C}' {\overset{\leftrightarrow}{Y}}_{w}, where \overset{\leftrightarrow}{C}$ is a matrix of dimension Nℕ × c with c < Nℕ. We take $\overset{\leftrightarrow}{C}' = (\frac{\oplus_{i = 1}^{N}}{\oplus_{i = 1}^{N}} \frac{\oplus_{s = 1}^{N}}{\oplus_{s = 1}^{N}} \frac{(1_{m_{s}}^{'} | 0_{M_{s} - m_{s}}^{'})}{(0_{m_{s}}^{'} | 1_{M_{s} - m_{s}}^{'})}) and g_{P}^{'} = g' [\overset{\leftrightarrow}{C} {(\overset{\leftrightarrow}{C}' \overset{\leftrightarrow}{C})}^{- 1}],$ so that the effect of collapsing is the generation of sums of the SSUs for the sample and for the remainder in each cluster for each PSU, thus reducing the number of random variables from Nℕ to 2N². Since ${\overset{\leftrightarrow}{Y}}_{w} = [\overset{\leftrightarrow}{C} {(\overset{\leftrightarrow}{C}' \overset{\leftrightarrow}{C})}^{- 1}] {\overset{\leftrightarrow}{Y}}_{w p} + P_{\overset{\leftrightarrow}{C}} {\overset{\leftrightarrow}{Y}}_{w}, where P_{\overset{\leftrightarrow}{C}} = I_{N ℕ} - \overset{\leftrightarrow}{C} {(\overset{\leftrightarrow}{C}' \overset{\leftrightarrow}{C})}^{- 1} \overset{\leftrightarrow}{C}',$ we can write $g' {\overset{\leftrightarrow}{Y}}_{w} = g' [\overset{\leftrightarrow}{C} {(\overset{\leftrightarrow}{C}' \overset{\leftrightarrow}{C})}^{- 1}] {\overset{\leftrightarrow}{Y}}_{w p} + g' P_{\overset{\leftrightarrow}{C}} {\overset{\leftrightarrow}{Y}}_{w} .$ Using (8), it follows that $g_{p}^{'} = 1_{2}^{'} \otimes [c' \otimes 1_{N}^{'}], g' P_{\overset{\leftrightarrow}{C}} {\overset{\leftrightarrow}{Y}}_{w} = 0 and T = g_{p}^{'} {\overset{\leftrightarrow}{Y}}_{w p} . Let {\hat{L}}_{p}$ represent an nN × 1. vector of constants, and ${\overset{\leftrightarrow}{Y}}_{w I}$ be the first nN random variables (corresponding to the sample) in ${\overset{\leftrightarrow}{Y}}_{w P} .$ Then letting ${\hat{T}}_{p} = {\hat{L}}_{p}^{'} {\overset{\leftrightarrow}{Y}}_{w I}$ be the optimal linear unbiased predictor of T based on ${\overset{\leftrightarrow}{Y}}_{w I} and {\hat{B}}_{P}$ be a linear unbiased predictor of $g' P_{\overset{\leftrightarrow}{C}} {\overset{\leftrightarrow}{Y}}_{w} = 0$ it follows from Rao and Bellhouse (1978, Theorem 1.1) that ${\hat{T}}_{p}$ will be optimal for $T = g' {\overset{\leftrightarrow}{Y}}_{w}$ if and only if $E_{ξ_{1} ξ_{2}} [({\hat{T}}_{p} - T) {\hat{B}}_{p}] = 0 .$ Expressing $E_{ξ_{1} ξ_{2}} [({\hat{T}}_{p} - T) {\hat{B}}_{p}]$ as a function of $E_{ξ_{1} ξ_{2}} ({\overset{\leftrightarrow}{Y}}_{w} {\overset{\leftrightarrow}{Y}}_{w}^{'}),$ and simplifying terms, it follows that $E_{ξ_{1} ξ_{2}} [({\hat{T}}_{p} - T) {\hat{B}}_{p}] = 0$ when w_sj = w_s for all j = 1,…,M_s (see details at http://www.umass.edu/cluster/). This implies that we can obtain the optimal predictor using the partially collapsed random variables as long as within each cluster, the weights are equal for all SSUs.

Having this in mind, we assume that w_sj = w_s for all j = 1,…,M_s and develop the BLUP of $T = g' {\overset{\leftrightarrow}{Y}}_{w p}$ based on the 2N² collapsed random variables contained in ${\overset{\leftrightarrow}{Y}}_{w p}$ . The first N² of them are of the form $U_{i s} w_{s} m_{s} {\bar{Y}}_{s I},$ while the remaining N² are of the form. $U_{i s} w_{s} (M_{s} - m_{s}) {\bar{Y}}_{s I I}, where {\bar{Y}}_{s I} = \frac{1}{m_{s}} \sum_{j = 1}^{m_{s}} {\tilde{Y}}_{s j} and {\bar{Y}}_{s I I} = \frac{1}{M_{s} - m_{s}} \sum_{j = m_{s} + 1}^{M_{s}} {\tilde{Y}}_{s j} .$

3.2. Predicting Linear Combinations of PSU Latent Values Using the Expanded Finite Population Mixed Model

We partition ${\overset{\leftrightarrow}{Y}}_{w p},$ into the first nN random variables corresponding to the sample, ${\overset{\leftrightarrow}{Y}}_{w I},$ and the N(N − n) remaining random variables, ${\overset{\leftrightarrow}{Y}}_{w I I},$ and write the target as $T = g_{I}^{'} {\overset{\leftrightarrow}{Y}}_{w I} + g_{I I}^{'} {\overset{\leftrightarrow}{Y}}_{w I I}, where g_{I}^{'} = c_{I}^{'} \otimes 1_{N}^{'} and g_{I I}^{'} = (c_{I I}^{'} \otimes 1_{N}^{'} | c' \otimes 1_{N}^{'}) .$ Explicitly, the partitioned expanded finite population mixed model is

(\begin{matrix} {\overset{\leftrightarrow}{Y}}_{w I} \\ {\overset{\leftrightarrow}{Y}}_{w I I} \end{matrix}) = (\begin{matrix} X_{I} \\ X_{I I} \end{matrix}) μ + [E_{ξ_{2}} (\begin{matrix} {\overset{\leftrightarrow}{Y}}_{w I} \\ {\overset{\leftrightarrow}{Y}}_{w I I} \end{matrix}) - E_{ξ_{1} ξ_{2}} (\begin{matrix} {\overset{\leftrightarrow}{Y}}_{w I} \\ {\overset{\leftrightarrow}{Y}}_{w I I} \end{matrix})] + (\begin{matrix} {\overset{\leftrightarrow}{E}}_{W I} \\ {\overset{\leftrightarrow}{E}}_{W I I} \end{matrix}) .

(9)

Requiring the predictor of T to be a linear function of ${\overset{\leftrightarrow}{Y}}_{w I},$ to be unbiased, and to have minimum MSE, the BLUP of T in (9) is

{\hat{T}}_{p} = \sum_{i = 1}^{n} c_{i} (\hat{Y} - \bar{\hat{Y}}) + \bar{c} \frac{N}{n} (\sum_{s = 1}^{N} I_{s} (M_{s} w_{s} {\bar{Y}}_{s I})) + {\bar{c}}_{I I} \frac{N - n}{n} (\sum_{s = 1}^{N} I_{s} (M_{s} f_{s} w_{s} {\bar{Y}}_{s I}))

(10)

where ${\hat{Y}}_{i} = \sum_{s = 1}^{N} U_{i s} M_{s} w_{s} k_{s}^{*} {\bar{Y}}_{s I}, \bar{\hat{Y}} = \frac{1}{n} \sum_{i = 1}^{n} {\hat{Y}}_{i}, k_{s}^{*} = k_{s} - \frac{k_{s}}{d_{s}} \frac{1}{N} \sum_{s^{*} = 1}^{N} (\frac{1 - k_{s^{*}}}{1 - \bar{k}}) d_{s^{*}}, d_{s} = M_{s} w_{s} μ_{s}, k_{s} = \frac{f_{s}^{2} d_{s}^{2}}{f_{s}^{2} d_{s}^{2} + (N - 1) ν_{s e}^{2}}$ (see a: http://www.umass.edu/cluster/ for details), $\bar{k} = \frac{1}{N} \sum_{s = 1}^{N} k_{s}, {\bar{c}}_{I I} = \frac{1}{N - n} \sum_{i = n + 1}^{N} c_{i}, \bar{c} = \frac{1}{N} \sum_{i - 1}^{N} c_{i} and I_{s} = \sum_{i = 1}^{n} U_{i s}$ is an indicator ‘inclusion’ random variable for cluster s in the sample (see derivation in Appendix A). An expression for the MSE of the predictor can be developed directly using expressions for the variance, and simplifies to

\begin{matrix} {var}_{ξ_{1} ξ_{2}} ({\hat{T}}_{p} - T) & = (\sum_{i = 1}^{N} {(c_{i} - {\bar{c}}_{I})}^{2}) (σ_{k d}^{2} - 2 σ_{k d, d} + \frac{1}{N} \sum_{s = 1}^{N} \frac{k_{s}^{* 2} v_{s e}^{2}}{f_{s}^{2}}) + \frac{{(N \bar{c})}^{2}}{n} (\frac{1}{N} \sum_{s = 1}^{N} \frac{v_{s e}^{2}}{f_{s}^{2}}) \\ + [\sum_{i = 1}^{N} c_{i}^{2} + \frac{N \bar{c}}{n} (N \bar{c} - 2 n {\bar{c}}_{I})] σ_{d}^{2} \end{matrix}

where ${\bar{c}}_{I} = \frac{1}{n} \sum_{i = 1}^{n} c_{i}, μ_{d} = \frac{1}{N} \sum_{s = 1}^{N} d_{s}, μ_{k d} = \frac{1}{N} \sum_{s = 1}^{N} k_{s}^{*} d_{s}, σ_{k d}^{2} = \frac{1}{N - 1} \sum_{s = 1}^{N} {(k_{s}^{*} d_{s} - μ_{k d})}^{2}$ , $σ_{d}^{2} = \frac{1}{N - 1} \sum_{s = 1}^{N} {(d_{s} - μ_{d})}^{2} and σ_{k d, d} = \frac{1}{N - 1} \sum_{s = 1}^{N} (k_{s}^{*} d_{s} - μ_{k d}) (d_{s} - μ_{d})$ (see b. http://www.umass.edu/cluster/ for details).

When predicting a PSU mean, i.e., using $w_{s} = \frac{1}{M_{s}},$ (10) simplifies to ${\hat{T}}_{p} = \bar{Y} + ({\hat{Y}}_{i} - \bar{\hat{Y}})$ if i ≤ n and to ${\hat{T}}_{p} = \bar{Y}$ when i > n, where ${\bar{Y}}_{i} = \sum_{s = 1}^{N} U_{i s} {\bar{Y}}_{s I} .$ The MSE for a sample PSU mean predictor (when i ≤ n) simplifies to

\begin{matrix} {var}_{ξ_{1} ξ_{2}} ({\hat{T}}_{p} - T) & = (\frac{n - 1}{n}) \frac{1}{N - 1} \sum_{s = 1}^{N} [{((k_{s}^{*} - 1) μ_{s} - \frac{1}{N} \sum_{s = 1}^{N} (k_{s}^{*} - 1) μ_{s})}^{2}], \\ + \frac{1}{n N} \sum_{s = 1}^{N} [1 + (n - 1) k_{s}^{* 2}] (1 - f_{s}) \frac{σ_{s}^{2}}{m_{s}} \end{matrix}

(see c. http://www.umass.edu/cluster/ for details) while the MSE for a PSU not in the sample is given by

{var}_{ξ_{1} ξ_{2}} ({\hat{T}}_{p} - T) = (\frac{n = 1}{n}) σ^{2} + \frac{1}{n N} \sum_{s = 1}^{N} (1 - f_{s}) \frac{σ_{s}^{2}}{m_{s}} .

4. COMPARISON OF PREDICTORS

We compare the MSE of (10) to that of the simple mean, and of predictors (3) and (4). When clusters are of equal size, have homogeneous unit variances, and sample sizes are equal, the MSE for each predictor can be explicitly calculated. In this setting, we also compare the results with the MSE for predictor (5). For the sample mean, $M S E ({\bar{Y}}_{i}) - \frac{1}{N} \sum_{s = 1}^{N} (1 - f_{s}) \frac{σ_{s}^{2}}{m_{s}},$ while for (5), $M S E ({\hat{T}}_{R P}) = (1 - f) [\frac{σ_{e}^{2}}{n m} + (\frac{n - 1}{n}) (1 - k) σ^{2}] .$ The MSE for predictors (3) and (4) are given by $M S E ({\hat{T}}_{R P}) + (\frac{n - 1}{n k}) (σ^{2} - \frac{σ_{e}^{2}}{M}) {(c - [f + (1 - f) k])}^{2} with c = \frac{m σ^{2}}{m σ^{2} + σ_{e}^{2}}$ for (3) and $c = f + \frac{(1 - f) m σ^{2}}{m σ^{2} + σ_{e}^{2}}$ for (4) as shown in Stanek and Singer (2004). Although we have explicit expressions for the MSE of these predictors, the difference between them is a complicated function of the population parameters. Since shrinkage constants for the expanded predictor depend on the cluster latent values, we compare the MSE relative to the expanded finite population mixed model predictor in four settings with different values of the unit intra-class correlation coefficient, $ρ_{s} = \frac{σ^{2}}{σ^{2} + σ_{s}^{2}} .$ In each setting, the cluster latent values are set equal to evenly spaced quantiles from some specified distribution. The results, expressed as percent increase in MSE relative to the MSE of (10) are presented in Figure 1, and illustrate that in all settings considered, using (10) results in a substantial reduction in MSE (over 40% when f < 0.2). This is true even for (5), illustrating that a smaller MSE can be achieved for the BLUP derived under the expanded finite population mixed model as compared to the BLUP obtained under the finite population mixed model of Stanek and Singer (2004). There were little differences in the MSE comparisons with different distributions of the cluster latent values. The results illustrate that predictor (3) has larger MSE relative to the other predictors when f > 0.5. The MSE for predictors (4) and (5) are similar, and differ more from the MSE of (10) when f is small.

Percent increase in MSE for the Finite Population Mixed Model (FP), Superpopulation Model (SP), Mixed Model (MM), and Sample Mean (Mean) Predictors Relative to Finite Expanded Mixed Model Predictor of a Realized PSU Mean where N = 100, n = 30, and M = 20 for all clusters. Equal Size Clusters and Equal Unit Sampling Fractions per Cluster

Figure 2 summarizes increases in MSE for different intra-class correlation coefficients; quantiles of a uniform distribution were used to determine cluster latent values and unit parameters. The results illustrate that for low intra-class correlation coefficients, the relative increase in MSE can be dramatic. Once again, for low sampling fractions, similar patterns in MSE are evident for (3), (4) and (5).

In Figure 3, we compare predictors of the sample mean, for (3) and (4) in two settings where cluster sizes differ. Predictor (5) is not applicable in such settings. These results are based on simulation studies (with 5000 trials each) that repeat a two-stage sampling process from a finite population. The MSE is estimated by the average squared difference between the predictor and the latent PSU value in each case. In the left column, cluster sizes differ by 10-fold, with sample sizes for clusters proportional to the cluster size. The results illustrate the performance of the predictors for different sampling fractions. The right column in Figure 3 compares the MSE of predictors when the sample size per cluster is constant.

5. DISCUSSION

The expanded finite population mixed model uses a larger set of 2N² random variables than the ℕ random variables typically used in superpopulation models or in the finite population mixed model of Stanek and Singer (2004). These random variables are fewer than the ℕ₂ random variables resulting from an expansion that retains the identity of units and SSUs, and even fewer than the very general representation of the model used by Godambe (1955). We show that this intermediate set of random variables allows a clear representation of a two-stage sample, while accounting for details on different cluster and sample sizes. Other approaches do not appear to connect the potentially observable data to the random variables in the stochastic model. Since more than one finite population mixed model can be used, we have shown how they can be compared by considering them in a hierarchy, and identifying whether the additional set of orthogonal random variables adds to the information about the target quantity. Further reductions in the number of random variables from the expanded finite population mixed model were considered (Appendix B), each of which lead to loss of information.

It is valuable to note that these results depend on selection of the target quantity. For example, if there is interest in the relationship between two variables among units (in a cluster), the collapsed expanded set of 2N² random variables is not likely to be sufficient.

The BLUP obtained under the new model offers substantial gains over previous predictors. These gains are likely mitigated by the need to estimate shrinkage constants for use in practical setups. Simulation studies comparing the performance of the empirical predictors (3), (4), and (5) in the equal cluster size/sample size settings indicate some loss in efficiency, but with a similar ordering of MSE (San Martino, Singer, and Stanek 2007). Limited simulation studies have been conducted using the expanded model predictor and have indicated that there is a greater loss in the MSE of (10) relative to the other predictors. Iterative estimation procedures may be possible, and are currently being investigated. This area requires more study.

ACKNOWLEDGEMENT

This work was developed with the support of the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil and the National Institutes of Health (NIH-PHS-R01-HD36848, R01-HL071828-02), USA. We wish to thank Dr. Wenjun Li for helpful comments that have improved the manuscript.

APPENDIX A

Consider the partitioned expanded finite population mixed model in (9) where $X_{i} = [\frac{1}{N} 1_{n} \otimes (\oplus_{s = 1}^{N} w_{s} m_{s})] and X_{I I} = (\frac{\frac{1}{N} 1_{N - n} \otimes (\oplus_{s = 1}^{N} w_{s} m_{s})}{\frac{1}{N} 1_{N} \otimes (\oplus_{s = 1}^{N} w_{s} (M_{s} - m_{s}))}),$ and the random effects are given by $[E_{ξ_{2}} (\begin{matrix} {\overset{\leftrightarrow}{Y}}_{w I} \\ {\overset{\leftrightarrow}{Y}}_{w I I} \end{matrix}) - E_{ξ_{1} ξ_{2}} (\begin{matrix} {\overset{\leftrightarrow}{Y}}_{w I} \\ {\overset{\leftrightarrow}{Y}}_{w I I} \end{matrix})] = [\begin{matrix} I_{n} \otimes (\oplus_{s = 1}^{N} f_{s} d_{s}) v e c (U_{I} - E_{ξ_{1}} (U_{I})) \\ \frac{I_{N - n} \otimes (\oplus_{s = 1}^{N} f_{s} d_{s}) v e c (U_{I I} - E_{ξ_{1}} (U_{I I}))}{I_{N} \otimes (\oplus_{s = 1}^{N} (1 - f_{s}) d_{s}) v e c [U - E_{ξ_{1}} (U)]} \end{matrix}],$ with $(\begin{matrix} {\overset{\leftrightarrow}{E}}_{w I} \\ {\overset{\leftrightarrow}{E}}_{W I I} \end{matrix}) = (\begin{matrix} {\overset{\leftrightarrow}{Y}}_{W I} \\ {\overset{\leftrightarrow}{Y}}_{W I I} \end{matrix}) - E_{ξ_{2}} (\begin{matrix} {\overset{\leftrightarrow}{Y}}_{W I} \\ {\overset{\leftrightarrow}{Y}}_{W I I} \end{matrix}),$ and U = (U_I U_II), U_I = ((U_i)) = (U₁ U₂ … U_n) and U_II = ((U_i)) = (U_n+1 U_n+2 … U_N). The corresponding covariance matrix is given by

\begin{array}{l} var ({\overset{\leftrightarrow}{Y}}_{w p}) & = \frac{1}{N - 1} (\frac{P_{N} \otimes [(\oplus_{s = 1}^{N} f_{s} d_{s}) P_{N} (\oplus_{s = 1}^{N} f_{s} d_{s})] P_{N} \otimes [(\oplus_{s = 1}^{N} f_{s} d_{s}) P_{N} (\oplus_{s = 1}^{N} (1 - f_{s}) d_{s})]}{P_{N} \otimes [(\oplus_{s = 1}^{N} (1 - f_{s}) d_{s}) P_{N} (\oplus_{s = 1}^{N} f_{s} d_{s})] P_{N} \otimes [(\oplus_{s = 1}^{N} (1 - f_{s}) d_{s}) P_{N} (\oplus_{s = 1}^{N} (1 - f_{s}) d_{s})]}) \\ + (\frac{I_{N} - I_{N}}{- I_{N} I_{N}}) \otimes (\oplus_{s = 1}^{N} v_{s e}^{2}) \end{array}

(see a. http://www.umass.edu/cluster/ for details) which we partition as ${v a r}_{ξ_{1} ξ_{2}} (\begin{matrix} {\overset{\leftrightarrow}{Y}}_{w I} \\ {\overset{\leftrightarrow}{Y}}_{w I I} \end{matrix}) = (\begin{matrix} V_{I} & V_{I, I I} \\ V_{I, I I}^{'} & V_{I I} \end{matrix}),$ where $V_{s e}^{2} = f_{s} (1 - f_{s}) \frac{M_{s} w_{s}^{2} σ_{s}^{2}}{N}$ (see b. http://www.umass.edu/cluster/ for details). Letting L be a vector of constants, it follows that $L' {\overset{\leftrightarrow}{Y}}_{w I} - T = (L' - g_{I}^{'} - g_{I I}^{'}) (\begin{matrix} {\overset{\leftrightarrow}{Y}}_{w I} \\ {\overset{\leftrightarrow}{Y}}_{w I I} \end{matrix}),$ and the unbiased constraint is given by $(L' - g_{I}^{'}) X_{I} - g_{I I}^{'} X_{I I} = 0 .$ Using Lagrange multipliers, we minimize ${v a r}_{ξ_{1} ξ_{2}} (L' {\overset{\leftrightarrow}{Y}}_{w I} - T)$ while accounting for the unbiased constraint results and obtain the familiar solution

\hat{L} = g_{I} + [V_{I}^{- 1} - V_{I}^{- 1} X_{I} {(X_{I}^{'} V_{I}^{- 1} X_{I})}^{- 1} X_{I}^{'} V_{I}^{- 1}] V_{I, I I} g_{I I} + V_{I}^{- 1} X_{I} {(X_{I}^{'} V_{I}^{- 1} X_{I})}^{- 1} X_{I I}^{'} g_{I I} .

This result simplifies to

\hat{L} = (P_{n} c_{I} \otimes [\oplus_{s = 1}^{N} \frac{k_{s}^{*}}{f_{s}}] 1_{N}) + \frac{N}{n} \bar{c} [1_{n} \otimes (\oplus_{s = 1}^{N} \frac{1}{f_{s}}) 1_{N}] + \frac{N - n}{n} {\bar{c}}_{I I} (1_{n} \otimes 1_{N})

(see c. http://www.umass.edu/cluster/ for details) where $k_{s}^{*} = k_{s} - \frac{k_{s}}{d_{s}} \frac{1}{N} \sum_{s^{*} = 1}^{N} (\frac{1 - k_{s^{*}}}{1 - \bar{k}}) d_{s^{*}}, k_{s} = \frac{f_{s}^{2} d_{s}^{2}}{f_{s}^{2} d_{s}^{2} + (N - 1) v_{s e}^{2}}$ (see d. http://www.umass.edu/cluster/ for details), $\bar{k} = \frac{1}{N} \sum_{s = 1}^{N} k_{s}, {\bar{c}}_{I I} = \frac{1}{N - n} \sum_{i = n + 1}^{N} c_{i} and \bar{c} = \frac{1}{N} \sum_{i = 1}^{N} c_{i} .$ The predictor ${\hat{T}}_{p} = \hat{L}' {\overset{\leftrightarrow}{Y}}_{w I}$ can then be expressed as as (10) (see e. http://www.umass.edu/cluster/ for details).

APPENDIX B

We discuss whether several other plausible reductions in the dimension of the set of expanded random variables (given by Nℕ), including a reduction to the set of 2N random variables used by Stanek and Singer (2004), may be considered without loss of information. First, it is natural to consider whether it is sufficient to predict $T = g_{p}^{'} {\overset{\leftrightarrow}{Y}}_{w p}$ using the 2N collapsed random variables defined by $Y_{w} = C^{*}' {\overset{\leftrightarrow}{Y}}_{w p} .$ This set of random variables is similar to that used by Stanek and Singer (2004) for a population with equal size clusters and equal size samples per cluster with no response error. Since $g^{*}' = g_{p}^{'} [C^{*} {(C^{*}' C^{*})}^{- 1}] = 1_{2}^{'} \otimes c', the target T = g^{*}' {\overset{\leftrightarrow}{Y}}_{w}$ defines a linear combination of PSU means.

First, let T̂ = L̂′Y_wI where L^ represents an n × 1. vector of constants, and Y_wI represents the first n random variables (corresponding to the sample) in Y_w. In this case, the bias E_ξ₁ξ₂(L̂′Y_WI − T) = (L̂′1_n)m_s − (c′1_N)(M_s − m_s) is zero only if sampling of clusters is conducted with probability proportional to size (PPS) (see a. http://www.umass.edu/cluster/).

Now assume a PPS sampling scheme, and notice that since $g_{p}^{'} (P_{C^{*}} {\overset{\leftrightarrow}{Y}}_{w p}) = 0$ , we may write $T = g^{*}' Y_{w} + g_{P}^{'} (P_{C^{*}} {\overset{\leftrightarrow}{Y}}_{w p}) .$ Letting $\hat{B} = \hat{b}' (I_{n} \otimes P_{n}) {\overset{\leftrightarrow}{Y}}_{w I}$ be a linear unbiased predictor of $g_{p}^{'} (P_{C^{*}} {\overset{\leftrightarrow}{Y}}_{w p})$ based on the sample part of $P_{C^{*}} {\overset{\leftrightarrow}{Y}}_{w p} given by (I_{n} \otimes P_{N}) {\overset{\leftrightarrow}{Y}}_{w I},$ the predictor T̂ will be optimal if and only if E_ξ₁ξ₂[(T̂ − T)B̂] = 0. Simplifying this expectation, we find

\begin{array}{l} E_{ξ_{1} ξ_{2}} [(\hat{T} - T) \hat{B}] & = [f (f \hat{L}' - c_{I}^{'}) \otimes 1_{N}^{'}] [I_{n} \otimes (\frac{1}{N - 1} (\oplus_{s = 1}^{N} d_{s}) P_{N} (\oplus_{s = 1}^{N} d_{s}) P_{N})] \hat{b} \\ + [\frac{f (1 - f)}{N} (\hat{L}' + c_{I}^{'}) \otimes [1_{N}^{'} (\oplus_{s = 1}^{N} M_{s} w_{s}^{2} σ_{s}^{2}) P_{N}]] \hat{b} \end{array}

where $d_{s} = M_{s} w_{s} μ_{s}, c = (c_{I}^{'} c_{I I}^{'})',$ c_I is an n × 1. vector and f denotes the common sampling fraction (see b. http://www.umass.edu/cluster/). This expression is not equal to zero, even when the population consists of equal size clusters with homogeneous variances, and equal size samples are taken from sample clusters. By Theorem 1.1 in Rao and Bellhouse (1978), this result implies that some efficiency is lost in prediction when collapsing ${\overset{\leftrightarrow}{Y}}_{w p}$ to Y_w. The predictor based on ${\overset{\leftrightarrow}{Y}}_{w p}$ will have smaller MSE than the predictor based on Y_w, even in the settings considered by Stanek and Singer (2004) with no response error when clusters are of equal size, and equal size samples are selected.

Footnotes

We refer to such models as a finite population mixed models instead of random permutation models as in Stanek and Singer (2004) to avoid confusion with the homonymous, but different, model considered in Hedayat and Sinha (1991).

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Edward J. Stanek, III, Department of Public Health, 401 Arnold House, University of Massachusetts, 715 North Pleasant Street, Amherst, MA 01003-9304 USA, stanek@schoolph.umass.edu.

Julio M. Singer, Departamento de Estatística, Universidade de São Paulo, São Paulo, Brazil, jmsinger@ime.usp.br

REFERENCES

Bolfarine H, Zacks S. Prediction Theory for Finite Populations. New York: Springer-Verlag; 1992. [Google Scholar]
Brown H, Prescott R. Applied Mixed Models in Medicine. New York: Wiley; 1999. [Google Scholar]
Bryk AS, Raudenbush SW. Hierarchical linear modeling: Applications, Data Analysis Methods. 2nd edition. Thousand Oaks, California: Sage Publishing; 2002. [Google Scholar]
California Cluster Web Site: http://www.umass.edu/cluster/. (Details for Section 3.1 in document c06ed54.doc; Details for Section 3.2: a. c07ed15.doc, page 41. b. c07ed15.doc from page 41. c. c07ed25.doc, page 3; Details of Appendix A: a. c07ed27.doc, page 5. b. c07ed15.doc, from page 41. c. c06ed56.doc, from page 42 and, and c07ed01.doc, from page 1. d. c07ed15.doc, page 41. e. c07ed01.doc, page 6.; Details of Appendix B: a. c07ed29.doc, pages 8. b. c07ed29.doc, page 19).
Cochran W. Survey Sampling. New York: Wiley; 1977. [Google Scholar]
Demidenko E. Mixed models: Theory and Application. New York: John Wiley; 2004. [Google Scholar]
Diggle PL, Heagerty P, Liang KY, Zeger S. Analysis of Longitudinal Data. Oxford University Press; 2002. [Google Scholar]
Godambe VP. A unified theory of sampling from finite populations. Journal of the Royal Statistical Society B. 1955;17:269–278. [Google Scholar]
Goldberger AS. Best Linear Unbiased Prediction in the Generalized Linear Regression Model. Journal of the American Statistical Association. 1962;57:369–375. [Google Scholar]
Graybill FA. Matrices with applications in statistics. Belmont, California: Wadsworth International; 1983. [Google Scholar]
Harville DA. Alternative formulations and procedures for the two-way mixed model. Biometrics. 1978;34:441–453. [Google Scholar]
Hedayat AS, Sinha BK. Design and Inference in Finite Population Sampling. New York: John Wiley and Sons; 1991. [Google Scholar]
Henderson CR. Applications of Linear Models in Animal Breeding. Guelph, Canada: University of Guelph; 1984. [Google Scholar]
Jiang J. Linear and generalized linear mixed models and their applications. New York: Springer; 2007. [Google Scholar]
Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O. SAS for Mixed Models: Second Edition. Cary N.C.: SAS Institute; 2006. [Google Scholar]
McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: John Wiley and Sons; 2001. [Google Scholar]
McLean RA, Sanders WL, Stroup WW. A Unified Approach to Mixed Linear Models. The American Statistician. 1991;45(1):54–64. [Google Scholar]
Rao JNK, Bellhouse DR. Optimal estimation of a finite population mean under generalized random permutation models. Journal of Statistical Planning and Inference. 1978;2:125–141. [Google Scholar]
Robinson GK. That BLUP is a Good Thing: the Estimation of Random Effects. Statistical Science. 1991;6(1):15–51. [Google Scholar]
Royall RM. The Linear Least-squares Prediction Approach to Two-stage Sampling. Journal of the American Statistical Association. 1976;71:657–664. [Google Scholar]
San Martino S, Singer JM, Stanek EJ., III Performance of balanced two-stage empirical predictors of realized cluster latent values from finite populations: A simulation Study. Computational Statistics and Data Analysis. 2007 doi: 10.1016/j.csda.2007.07.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Scott A, Smith TMF. Estimation in Multi-stage Surveys. Journal of the American Statistical Association. 1969;64(327):830–840. [Google Scholar]
Searle SR, Casella G, McCulloch CE. Variance Components. New York: Wiley and Sons; 1992. [Google Scholar]
Singer JD, Willett JB. Applied Longitudinal Data Analysis. New York: Oxford Press; 2003. [Google Scholar]
Stanek EJ, III, Singer JM. Predicting Random Effects from Finite Population Clustered Samples with Response Error. Journal of the American Statistical Association. 2004;99:119–130. [Google Scholar]
Stanek EJ, III, Singer JM, Lencina VB. A unified approach to estimation and prediction under simple random sampling. Journal of Statistical Planning and Inference. 2004;121:325–338. [Google Scholar]
Valliant R, Dorfman HA, Royall RM. Finite Population Sampling and Inference. New York: Wiley; 2000. [Google Scholar]
Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer-Verlag; 2000. [Google Scholar]

[R1] Bolfarine H, Zacks S. Prediction Theory for Finite Populations. New York: Springer-Verlag; 1992. [Google Scholar]

[R2] Brown H, Prescott R. Applied Mixed Models in Medicine. New York: Wiley; 1999. [Google Scholar]

[R3] Bryk AS, Raudenbush SW. Hierarchical linear modeling: Applications, Data Analysis Methods. 2nd edition. Thousand Oaks, California: Sage Publishing; 2002. [Google Scholar]

[R4] California Cluster Web Site: http://www.umass.edu/cluster/. (Details for Section 3.1 in document c06ed54.doc; Details for Section 3.2: a. c07ed15.doc, page 41. b. c07ed15.doc from page 41. c. c07ed25.doc, page 3; Details of Appendix A: a. c07ed27.doc, page 5. b. c07ed15.doc, from page 41. c. c06ed56.doc, from page 42 and, and c07ed01.doc, from page 1. d. c07ed15.doc, page 41. e. c07ed01.doc, page 6.; Details of Appendix B: a. c07ed29.doc, pages 8. b. c07ed29.doc, page 19).

[R5] Cochran W. Survey Sampling. New York: Wiley; 1977. [Google Scholar]

[R6] Demidenko E. Mixed models: Theory and Application. New York: John Wiley; 2004. [Google Scholar]

[R7] Diggle PL, Heagerty P, Liang KY, Zeger S. Analysis of Longitudinal Data. Oxford University Press; 2002. [Google Scholar]

[R8] Godambe VP. A unified theory of sampling from finite populations. Journal of the Royal Statistical Society B. 1955;17:269–278. [Google Scholar]

[R9] Goldberger AS. Best Linear Unbiased Prediction in the Generalized Linear Regression Model. Journal of the American Statistical Association. 1962;57:369–375. [Google Scholar]

[R10] Graybill FA. Matrices with applications in statistics. Belmont, California: Wadsworth International; 1983. [Google Scholar]

[R11] Harville DA. Alternative formulations and procedures for the two-way mixed model. Biometrics. 1978;34:441–453. [Google Scholar]

[R12] Hedayat AS, Sinha BK. Design and Inference in Finite Population Sampling. New York: John Wiley and Sons; 1991. [Google Scholar]

[R13] Henderson CR. Applications of Linear Models in Animal Breeding. Guelph, Canada: University of Guelph; 1984. [Google Scholar]

[R14] Jiang J. Linear and generalized linear mixed models and their applications. New York: Springer; 2007. [Google Scholar]

[R15] Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]

[R16] Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O. SAS for Mixed Models: Second Edition. Cary N.C.: SAS Institute; 2006. [Google Scholar]

[R17] McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: John Wiley and Sons; 2001. [Google Scholar]

[R18] McLean RA, Sanders WL, Stroup WW. A Unified Approach to Mixed Linear Models. The American Statistician. 1991;45(1):54–64. [Google Scholar]

[R19] Rao JNK, Bellhouse DR. Optimal estimation of a finite population mean under generalized random permutation models. Journal of Statistical Planning and Inference. 1978;2:125–141. [Google Scholar]

[R20] Robinson GK. That BLUP is a Good Thing: the Estimation of Random Effects. Statistical Science. 1991;6(1):15–51. [Google Scholar]

[R21] Royall RM. The Linear Least-squares Prediction Approach to Two-stage Sampling. Journal of the American Statistical Association. 1976;71:657–664. [Google Scholar]

[R22] San Martino S, Singer JM, Stanek EJ., III Performance of balanced two-stage empirical predictors of realized cluster latent values from finite populations: A simulation Study. Computational Statistics and Data Analysis. 2007 doi: 10.1016/j.csda.2007.07.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Scott A, Smith TMF. Estimation in Multi-stage Surveys. Journal of the American Statistical Association. 1969;64(327):830–840. [Google Scholar]

[R24] Searle SR, Casella G, McCulloch CE. Variance Components. New York: Wiley and Sons; 1992. [Google Scholar]

[R25] Singer JD, Willett JB. Applied Longitudinal Data Analysis. New York: Oxford Press; 2003. [Google Scholar]

[R26] Stanek EJ, III, Singer JM. Predicting Random Effects from Finite Population Clustered Samples with Response Error. Journal of the American Statistical Association. 2004;99:119–130. [Google Scholar]

[R27] Stanek EJ, III, Singer JM, Lencina VB. A unified approach to estimation and prediction under simple random sampling. Journal of Statistical Planning and Inference. 2004;121:325–338. [Google Scholar]

[R28] Valliant R, Dorfman HA, Royall RM. Finite Population Sampling and Inference. New York: Wiley; 2000. [Google Scholar]

[R29] Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer-Verlag; 2000. [Google Scholar]

PERMALINK

Predicting Random Effects with an Expanded Finite Population Mixed Model

Edward J Stanek III

Julio M Singer

Abstract