Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Oct 1.
Published in final edited form as: J Stat Plan Inference. 2008 Oct 1;138(10):2991–3004. doi: 10.1016/j.jspi.2007.11.012

Predicting Random Effects with an Expanded Finite Population Mixed Model

Edward J Stanek III 1, Julio M Singer 2
PMCID: PMC2597867  NIHMSID: NIHMS60208  PMID: 19802323

Abstract

Prediction of random effects is an important problem with expanding applications. In the simplest context, the problem corresponds to prediction of the latent value (the mean) of a realized cluster selected via two-stage sampling. Recently, Stanek and Singer (JASA, 2004) developed best linear unbiased predictors (BLUP) under a finite population mixed model that outperform BLUPs from mixed models and superpopulation models. Their setup, however, does not allow for unequally sized clusters. To overcome this drawback, we consider an expanded finite population mixed model based on a larger set of random variables that span a higher dimensional space than those typically applied to such problems. We show that BLUPs for linear combinations of the realized cluster means derived under such a model have considerably smaller mean squared error (MSE) than those obtained from mixed models, superpopulation models, and finite population mixed models. We motivate our general approach by an example developed for two-stage cluster sampling and show that it faithfully captures the stochastic aspects of sampling in the problem. We also consider simulation studies to illustrate the increased accuracy of the BLUP obtained under the expanded finite population mixed model.

Keywords: superpopulation, best linear unbiased predictor, random permutation, optimal estimation, design-based inference, mixed models

1. INTRODUCTION

Optimal estimation of average costs for hospitals that typically vary in size is an important practical problem because of the impact in health care economics, and patient choice of hospital care (see http://www.healthgrades.com, for example). In many cases, this is based on information obtained from patients (units) in hospitals (clusters) realized under a two-stage sampling scheme.

The best linear unbiased predictor (BLUP) developed under a mixed model is often offered as a solution to this problem (Searle et al. 1992). Although the mixed model accounts for unequal numbers of units in sample clusters, it does not use often available information about their sizes. The superpopulation model of Scott and Smith (1969) is an alternative that incorporates this information. Both models can be plausibly used to represent the problem of interest, but neither is formally linked to the finite population from which the two-stage sample is drawn as is the finite population mixed model recently proposed by Stanek and Singer (2004)1 for situations where clusters are of equal size. Under this model, predictors have smaller mean squared error (MSE) than the competitors, even when the variance components are replaced by estimates as indicated in San Martino, Singer and Stanek (2007). We extend the approach Stanek and Singer (2004) by developing predictors under a new expanded finite population mixed model that outperforms the competitors both in equal and unequal size two-stage cluster sampling problems.

Suppose our interest is in the average cost of appendectomies (the latent value) for each of three hospitals in the past year (Table 1), and that such costs are known (without error) for some patients in two of the hospitals. When the data are obtained from a stratified simple random sample of appendectomy patients, with hospitals as strata, the best linear unbiased estimate is the average cost for the available patients in each hospital (i.e., $2000 for Central, and $1800 for Mercy).

Table 1.

Population of hospital’s appendectomy patients in the past year and observed data

Hospital (s) Ms Mean Variance Patient* (t)
s=1 (County) 2 μ1 σ12 y11 y12
s=2 (Central) 4 μ2 σ22 y21 y22 y23 y24
$2100 (Jane Blake) $1400 (Sam Evans) $2500 (Hong Yao)
s=3 (Mercy) 2 μ3 σ32 y31 y32 y33
$1700 (Mary Slokum) $1900 (Juan Marcus)
*

Names are fictitious

Now assume that a simple random sample of appendectomy patients is selected from each of a simple random sample of hospitals (Table 2) according to a two-stage sampling scheme. We refer to a sample hospital as a primary sampling unit (PSU) to distinguish it from a specific hospital, and to a sample patient as a secondary sampling unit (SSU) to distinguish it from a specific patient. Under the usual mixed model, the sample appendectomy cost for SSU j in PSU i is

Yij=μ+Bi+Eij (1)

where μ is the overall mean, Bi is the random effect for PSU i, and Eij is a random variable corresponding to the deviation of the response of SSU j from the latent value of PSU i, namely Ti = μ + Bi The random variables Bi and Eij are usually considered independent with null expected values, and variances given by σ2 and σi2, respectively. Model (1) is an example of the general linear mixed model

Y=Xα+ZB+E (2)

where, for the sample in Table 2, X=1r,Z=i=1n1mi,α=μ, and B = (B1,…,Bn)′ with Γ=σ2In,=i=1nσi2Imi, and var (Y) = Ω = ZΓZ′ + ∑ with 1a denoting an a×1 vector with all elements equal to one, Ia representing an a×a identity matrix, and i=1nAi indicating a block diagonal matrix with blocks given by Ai (Graybill 1983). This model has a long history (see for example Harville 1978, Laird and Ware 1982) and is the main topic in several recent texts such as Brown and Prescott (1999), Verbeke and Molenberghs (2000), McCulloch and Searle (2001), Byrk and Raudenbush (2002), Diggle et al (2002), Singer and Willett (2003), Demidenko (2004), Littell et al (2006), and Jiang (2007). Under (1), the BLUP of the latent value for PSU i is

P^i=μ^+ki(Y¯iμ^) (3)

where μ^=i=1n(wi/i=1nwi)Y¯i is a weighted sample mean with wi=1/(σ2+σi2/mi),Y¯i=1mij=1miYij,andki=σ2σ2+σi2/mi (Goldberger 1962; Henderson 1984, McLean, Sanders and Stroup 1991; Robinson 1991). The predictor P^i is a linear function of Y (i.e., P^i=LY), is unbiased (i.e. E(P^iTi)=0), and has minimum MSE. Using the realized random variables represented in Table 1, and assuming that σ = 100, σ1 = 300 and σ2 = 50 it follows that μ̂ = $1844, k1 = 0.25, k2 = 0.89, and the predictor of the latent value for the realized hospital corresponding to i = 1 (i.e., Central) is P^1=$1883, while the predictor of the latent value for i = 2 (i.e., Mercy) is P^2=$1805.

Table 2.

Notation for common mixed model (sample) and superpopulation model (sample and remainder)

Sample Hospital PSU (i) Sample Size mi Latent Value μ + Bi Sample Patient SSU (j)
j=1 j=2 j=3 j=4
graphic file with name nihms60208t1.jpg i=1 3 μ + B1 Y11 Y12 Y13
i=2=n 2 μ + B2 Y21 Y22
graphic file with name nihms60208t2.jpg i=1 (Central) Y14
i=2 (Mercy) Y23
i=3=N (County) Y31 Y32

The estimate of a realized hospital’s latent value derived from the stratified model or the corresponding predictor obtained from the mixed model do not use additional information, such as the number of hospitals in the population, or the number of appendectomy patients in each hospital, even though such additional information may be available (as illustrated in the remainder in Table 2). The combined sample and remainder represents a superpopulation that is constructed by first (conceptually) selecting a finite population (presumably from some larger population in time or space), and then selecting a two-stage sample from it. Scott and Smith (1969) show that the latent value for a hospital in the superpopulation, Ti=j=1MiYij/Mi, is predicted by

P^i=fiY¯i+(1fi)[μ^+ki(Y¯iμ^)], (4)

where fi = mi/Mi. Using the data in Table 1, the resulting predictor for i = 1 (i.e., Central) is P^1=$1971, and for i = 2 (i.e. Mercy) is P^2=$1802.

The superpopulation model does not clearly separate the labeled clusters (as in Table 1) from random variables that represent a sample of clusters (i.e., note how Central Hospital is uniquely associated with i = 1 in Table 2). This separation is clear when the two-stage sampling process is represented with indicator random variables in the finite population mixed model developed by Stanek and Singer (2004). The resulting predictor (limited to equal size clusters and equal size cluster sample sizes) is

P^i=fY¯i+(1f)[Y¯+k(Y¯iY¯)] (5)

where Y¯=1ni=1nY¯i,f=mM,k=σ*2σ*2+σe2/m,σ*2=σ2σe2Mandσe2=1Ns=1Nσs2. With equal size clusters, the predictor specified by (4) differs from (5) since variance components have different definitions. Theoretically, the expected MSE of (5) is less than the expected MSE for (4) or (3) as shown by Stanek and Singer (2004), while the empirical version of (5), formed by replacing variance components with their sample estimates, in general outperforms the empirical versions of the other predictors, as summarized by San Martino, Singer and Stanek (2007). However, (5) cannot be used for data like those in Table 1 and Table 2 since cluster and sample sizes differ.

When clusters are of equal size, the finite population mixed model can be used to represent the remaining random variables (as in Table 2) without the need to identify the realized clusters for sample PSUs. When clusters differ in size, we do not know how many SSUs remain since we do not know the size of the realized PSU. Other problems occur with the representation in Table 2, as for example, the impossibility that the PSU i = 1 be County Hospital (Table 2), even though the first stage sampling is assumed to be simple random sampling, or the apparent random nature of the second stage sample size, PSU size, and SSU variance due to the first stage sampling.

We extend the expanded model used by Stanek, Singer, and Lencina (2004) for simple random sampling to two-stage unbalanced sampling to overcome these problems. The expanded model simultaneously retains the cluster identity and the PSU position, and for each PSU, distinguishes the relevant contribution of both sampled SSUs, and non-sampled SSUs to a target random variable such as a PSU mean. For such purposes, we first define an expanded set of random variables, and subsequently show that a lower dimensional (collapsed) set can adequately represent the problem without loss of information. Following the steps in Stanek and Singer (2004), we specify the expanded finite population mixed model in Section 2, derive the corresponding BLUP along with its theoretical expected MSE in Section 3, and compare the proposed predictor to others via simulation studies in Section 4, and conclude with discussion in Section 5.

2. AN EXPANDED MIXED MODEL FOR A FINITE CLUSTERED POPULATION

Let a finite population be defined (as in Table 1) by a listing of units, labeled by t = 1,…, Ms, in each cluster, labeled by s = 1,…,N, where the non-stochastic potentially observable response for unit t in cluster s is given by yst. The finite population mean and variance for cluster s are respectively defined as μs=1Mst=1Msystand(Ms1Ms)σs2=1Mst=1Ms(ystμs)2. Similarly, the population mean, and between cluster variance are respectively defined as μ=1Ns=1Nμsand(N1N)σ2=1Ns=1N(μsμ)2. We represent the potentially observable response for unit t in cluster s as yst = μ + βs + εst where βs = (μs − μ) is the deviation of the mean for cluster s from the overall mean, and εst = (yst − μs) is the deviation of the response for unit t (in cluster s) from the mean for cluster s. Letting y=(y1y1yN) where ys = (ys1 ys2ysMs)′, the reparameterized finite population can be summarized as

y=Xμ+Zβ+ε (6)

where X=1,Z×N=s=1N1Ms,=s=1NMs,β=(β1β2βN), and ε is defined similarly to y. None of the terms in (6) are random variables.

2.1. The Expanded Set of Random Variables

We define a vector of random variables to represent equally likely two-stage random permutations of the population (i.e., with probability 1N!s=1NMs!, as in Cochran 1977). Without loss of generality, we assume that the sample clusters are in the first n positions in a permutation of clusters and that the sample units in cluster s correspond to the units in the first ms positions in a permutation of that cluster’s units. The ordering of the random variables is important, since any realization can be re-ordered to exactly match the finite population values.

We use indicator random variables to relate the response for unit t in cluster s, namely, yst, to the response for SSU j in PSU i. To do so, we let Ujt(s) be an indicator random variable that takes on a value of one when SSU j in cluster s is unit t, and zero otherwise, so that the response for SSU j in cluster s may be expressed as Y˜sj=t=1MsUjt(s)yst. We include a fixed non-stochastic weight wsj for SSU j in cluster s, and define the weighted response as Y˜wsj=wsjY˜sj so that the sum, j=1MsY˜wsj, will correspond to a cluster total when wsj = 1 for all j = 1,…,Ms, or to a cluster mean when wsj=1Ms for all j = 1,…,Ms, for example. Letting Uj(s)=(Uj1(s)Uj2(s)UjMs(s)), it follows that Y˜wsj=wsjysUj(s). The vector Y˜ws=(Y˜ws1Y˜ws2Y˜wsMs) represents a permutation of weighted responses for the SSUs in cluster s.

We also let Uis be an indicator random variable that takes on a value of one when PSU i is cluster s, and a value of zero otherwise. If all clusters were equal in size, we could represent a permutation of SSUs for PSU i by s=1NUisY˜ws. When cluster sizes differ, this sum is not defined, since the dimensions of the vectors composing it cannot all be equal. We solve this problem by expanding the set of random variables associated to PSU i into the ℕ×1 vector Ywi=((UisY˜ws))=(Ui1Y˜w1Ui2Y˜w2UiNY˜wN) so that a two-stage random permutation of the population is then represented by the Nℕ × 1 vector, Yw=((Ywi))=(Yw1Yw2YwN), where the jth element of Yw that corresponds to Uis is UisY˜wsj.

2.2 The Expanded Finite Population Mixed Model

We construct a mixed model for the expanded response vector Yw next. Indexing expectation with respect to permutations of clusters with the subscript ξ1 and expectation with respect to permutations of units in a cluster with the subscript ξ2, and for PSU i, we let

Ywi=Eξ1ξ1(Ywi)+[Eξ2|ξ1(Ywi)Eξ1ξ1(Ywi)]+Ewi

where Eξ1ξ2(Ywi)=1N(s=1Nws)μ,Eξ2|ξ1(Ywi)=(s=1Nwsμs)Ui,Ewi=YwiEξ1|ξ2(Ywi), ws=((wsj))=(ws1ws2wsMs),μ=((μs))=(μ1μ2μN), Ui=((Uis))=(Ui1Ui2UiN)andEwi denotes the deviation of response from the expected response within a PSU. The fixed effects are given by μ, the vector of cluster means, while the random effects correspond to Eξ2|ξ1(Ywi)Eξ1ξ1(Ywi). In the finite population mixed model of Stanek and Singer (2004), the random effect for PSU i was defined as s=1NUisβs=s=1N(UisμsUiss*=1N1Nμs*), with the random variables Uis explicitly linking the clusters to PSU i. In the finite expanded mixed model, random effects are defined for SSU j in PSU i as wsjμs(UisEξ1(Uis)). For example, when wsj=1Ms for all j = 1,…,Ms (corresponding to the PSU mean), it follows that j=1Mswsjμs(UisEξ1(Uis))=Uisμs1Nμs. For both models, the expected value of the random effects (with respect to ξ1) is zero. We combine the fixed and random effects to define the expanded finite population mixed model as

Yw=[1N1N(s=1Nws)]μ+[IN(s=1Nwsμs)]vec(UEξ1(U))+Ew (7)

where U = (U1 U2UN). The covariance matrix of the random effects is

varξ1ξ2([IN(s=1Nwsμs)]vec(UEξ1(U)))=1N1PN[(s=1Nwsμs)PN(s=1Nwsμs)]

while the covariance matrix of Ew is

varξ1ξ2(Ew)=IN(s=1N[σs2N(j=1Mswsj)PMs(j=1Mswsj)]),

where Pa=Ia1aJa and Ja denotes an a × a matrix with all elements equal to one.

2.3. Defining Target Quantities

Model (7) is an expanded version of a finite population mixed model that retains the identity of clusters, while accounting for a two-stage random permutation. Our interest is to predict target linear combinations defined by T=gYw, where g is non-stochastic. For simplicity, we limit discussion to the case

g=c1 (8)

where c is an N × 1 vector of constants. In particular, we focus on the setting where c = ei, i.e., an N × 1 vector with all elements equal to zero, except for element i which has the value of one. The principal interest lies in the setting where in, i.e., in the clusters realized in the sample. When wsj=1Ms for all s = 1,…,N, j = 1,…,Ms, the target, T=s=1NUis(j=1MswsjY˜sj) is the mean of PSU i; when wsj = 1 for all s = 1,…,N, j = 1, …,Ms, the target, T=s=1NUis(j=1MswsjY˜sj) is the total of PSU i. Note that in both cases, the target is a random variable.

3. PREDICTING A PSU MEAN IN THE EXPANDED FINITE POPULATION MIXED MODEL

To obtain the BLUP, we adopt the basic strategy employed by Scott and Smith (1969), Royall (1976), Bolfarine and Zacks (1992), Valliant et al. (2000), and Stanek and Singer (2004), among others. We assume that the elements in the sample portion of Yw will be observed, and express the target T as the sum of two parts, one which is a function of the sample, and the other, a function of the remaining random variables. Then, requiring the predictor to be a linear function of the sample random variables and to be unbiased, we obtain coefficients that minimize the MSE. While in theory, an optimal predictor can be obtained via this recipe, in practice, the high dimensionality of the expanded random vectors may result in singularities that lead to multiple solutions as discussed in Stanek, Singer, and Lencina (2004). For this reason, we explore projections of the expanded random variables into lower dimensional spaces that retain the necessary information for an optimal solution.

3.1. Partial Collapsing of the Expanded Finite Population Mixed Model Random Variables

Following Rao and Bellhouse (1978), we provide a way of determining whether the optimal linear unbiased predictor of a target random variable, T=gYw can be obtained as the optimal linear unbiased predictor of T=gpYwp based on a vector of collapsed random variables that spans a lower dimensional space defined by Ywp=CYw,whereC is a matrix of dimension Nℕ × c with c < Nℕ. We take C=(i=1Ni=1Ns=1Ns=1N(1ms|0Msms)(0ms|1Msms))andgP=g[C(CC)1], so that the effect of collapsing is the generation of sums of the SSUs for the sample and for the remainder in each cluster for each PSU, thus reducing the number of random variables from Nℕ to 2N2. Since Yw=[C(CC)1]Ywp+PCYw,wherePC=INC(CC)1C, we can write gYw=g[C(CC)1]Ywp+gPCYw. Using (8), it follows that gp=12[c1N],gPCYw=0andT=gpYwp.LetL^p represent an nN × 1. vector of constants, and YwI be the first nN random variables (corresponding to the sample) in YwP. Then letting T^p=L^pYwI be the optimal linear unbiased predictor of T based on YwIandB^P be a linear unbiased predictor of gPCYw=0 it follows from Rao and Bellhouse (1978, Theorem 1.1) that T^p will be optimal for T=gYw if and only if Eξ1ξ2[(T^pT)B^p]=0. Expressing Eξ1ξ2[(T^pT)B^p] as a function of Eξ1ξ2(YwYw), and simplifying terms, it follows that Eξ1ξ2[(T^pT)B^p]=0 when wsj = ws for all j = 1,…,Ms (see details at http://www.umass.edu/cluster/). This implies that we can obtain the optimal predictor using the partially collapsed random variables as long as within each cluster, the weights are equal for all SSUs.

Having this in mind, we assume that wsj = ws for all j = 1,…,Ms and develop the BLUP of T=gYwp based on the 2N2 collapsed random variables contained in Ywp. The first N2 of them are of the form UiswsmsY¯sI, while the remaining N2 are of the form. Uisws(Msms)Y¯sII,whereY¯sI=1msj=1msY˜sjandY¯sII=1Msmsj=ms+1MsY˜sj.

3.2. Predicting Linear Combinations of PSU Latent Values Using the Expanded Finite Population Mixed Model

We partition Ywp, into the first nN random variables corresponding to the sample, YwI, and the N(Nn) remaining random variables, YwII, and write the target as T=gIYwI+gIIYwII,wheregI=cI1NandgII=(cII1N|c1N). Explicitly, the partitioned expanded finite population mixed model is

(YwIYwII)=(XIXII)μ+[Eξ2(YwIYwII)Eξ1ξ2(YwIYwII)]+(EWIEWII). (9)

Requiring the predictor of T to be a linear function of YwI, to be unbiased, and to have minimum MSE, the BLUP of T in (9) is

T^p=i=1nci(Y^Y^¯)+c¯Nn(s=1NIs(MswsY¯sI))+c¯IINnn(s=1NIs(MsfswsY¯sI)) (10)

where Y^i=s=1NUisMswsks*Y¯sI,Y^¯=1ni=1nY^i,ks*=ksksds1Ns*=1N(1ks*1k¯)ds*,ds=Mswsμs,ks=fs2ds2fs2ds2+(N1)νse2 (see a: http://www.umass.edu/cluster/ for details), k¯=1Ns=1Nks,c¯II=1Nni=n+1Nci,c¯=1Ni1NciandIs=i=1nUis is an indicator ‘inclusion’ random variable for cluster s in the sample (see derivation in Appendix A). An expression for the MSE of the predictor can be developed directly using expressions for the variance, and simplifies to

varξ1ξ2(T^pT)=(i=1N(cic¯I)2)(σkd22σkd,d+1Ns=1Nks*2vse2fs2)+(Nc¯)2n(1Ns=1Nvse2fs2)+[i=1Nci2+Nc¯n(Nc¯2nc¯I)]σd2

where c¯I=1ni=1nci,μd=1Ns=1Nds,μkd=1Ns=1Nks*ds,σkd2=1N1s=1N(ks*dsμkd)2, σd2=1N1s=1N(dsμd)2andσkd,d=1N1s=1N(ks*dsμkd)(dsμd) (see b. http://www.umass.edu/cluster/ for details).

When predicting a PSU mean, i.e., using ws=1Ms, (10) simplifies to T^p=Y¯+(Y^iY^¯) if in and to T^p=Y¯ when i > n, where Y¯i=s=1NUisY¯sI. The MSE for a sample PSU mean predictor (when in) simplifies to

varξ1ξ2(T^pT)=(n1n)1N1s=1N[((ks*1)μs1Ns=1N(ks*1)μs)2],+1nNs=1N[1+(n1)ks*2](1fs)σs2ms

(see c. http://www.umass.edu/cluster/ for details) while the MSE for a PSU not in the sample is given by

varξ1ξ2(T^pT)=(n=1n)σ2+1nNs=1N(1fs)σs2ms.

4. COMPARISON OF PREDICTORS

We compare the MSE of (10) to that of the simple mean, and of predictors (3) and (4). When clusters are of equal size, have homogeneous unit variances, and sample sizes are equal, the MSE for each predictor can be explicitly calculated. In this setting, we also compare the results with the MSE for predictor (5). For the sample mean, MSE(Y¯i)1Ns=1N(1fs)σs2ms, while for (5), MSE(T^RP)=(1f)[σe2nm+(n1n)(1k)σ2]. The MSE for predictors (3) and (4) are given by MSE(T^RP)+(n1nk)(σ2σe2M)(c[f+(1f)k])2withc=mσ2mσ2+σe2 for (3) and c=f+(1f)mσ2mσ2+σe2 for (4) as shown in Stanek and Singer (2004). Although we have explicit expressions for the MSE of these predictors, the difference between them is a complicated function of the population parameters. Since shrinkage constants for the expanded predictor depend on the cluster latent values, we compare the MSE relative to the expanded finite population mixed model predictor in four settings with different values of the unit intra-class correlation coefficient, ρs=σ2σ2+σs2. In each setting, the cluster latent values are set equal to evenly spaced quantiles from some specified distribution. The results, expressed as percent increase in MSE relative to the MSE of (10) are presented in Figure 1, and illustrate that in all settings considered, using (10) results in a substantial reduction in MSE (over 40% when f < 0.2). This is true even for (5), illustrating that a smaller MSE can be achieved for the BLUP derived under the expanded finite population mixed model as compared to the BLUP obtained under the finite population mixed model of Stanek and Singer (2004). There were little differences in the MSE comparisons with different distributions of the cluster latent values. The results illustrate that predictor (3) has larger MSE relative to the other predictors when f > 0.5. The MSE for predictors (4) and (5) are similar, and differ more from the MSE of (10) when f is small.

Figure 1.

Figure 1

Percent increase in MSE for the Finite Population Mixed Model (FP), Superpopulation Model (SP), Mixed Model (MM), and Sample Mean (Mean) Predictors Relative to Finite Expanded Mixed Model Predictor of a Realized PSU Mean where N = 100, n = 30, and M = 20 for all clusters. Equal Size Clusters and Equal Unit Sampling Fractions per Cluster

Figure 2 summarizes increases in MSE for different intra-class correlation coefficients; quantiles of a uniform distribution were used to determine cluster latent values and unit parameters. The results illustrate that for low intra-class correlation coefficients, the relative increase in MSE can be dramatic. Once again, for low sampling fractions, similar patterns in MSE are evident for (3), (4) and (5).

Figure 2.

Figure 2

Percent increase in MSE for the Finite Population Mixed Model (FP), Superpopulation Model (SP), Mixed Model (MM), and Sample Mean (Mean) Predictors Relative to Finite Expanded Mixed Model Predictor of a Realized PSU Mean by Unit Intra-class Correlation and Unit Sampling Fraction where N = 100, n = 30, and M = 20 for all clusters. Equal Size Clusters and Equal Unit Sampling Fractions per Cluster

In Figure 3, we compare predictors of the sample mean, for (3) and (4) in two settings where cluster sizes differ. Predictor (5) is not applicable in such settings. These results are based on simulation studies (with 5000 trials each) that repeat a two-stage sampling process from a finite population. The MSE is estimated by the average squared difference between the predictor and the latent PSU value in each case. In the left column, cluster sizes differ by 10-fold, with sample sizes for clusters proportional to the cluster size. The results illustrate the performance of the predictors for different sampling fractions. The right column in Figure 3 compares the MSE of predictors when the sample size per cluster is constant.

Figure 3.

Figure 3

Percent increase in MSE for the Finite Population Mixed Model (FP), Superpopulation Model (SP), Mixed Model (MM), and Sample Mean (Mean) Predictors Relative to Finite Expanded Mixed Model Predictor of a Realized PSU Mean with Probability Proportional to Size SSU sampling and for Equal SSU size sampling for different Cluster Sizes where N = 100, n = 30, and M = 20 for all clusters.

5. DISCUSSION

The expanded finite population mixed model uses a larger set of 2N2 random variables than the ℕ random variables typically used in superpopulation models or in the finite population mixed model of Stanek and Singer (2004). These random variables are fewer than the ℕ2 random variables resulting from an expansion that retains the identity of units and SSUs, and even fewer than the very general representation of the model used by Godambe (1955). We show that this intermediate set of random variables allows a clear representation of a two-stage sample, while accounting for details on different cluster and sample sizes. Other approaches do not appear to connect the potentially observable data to the random variables in the stochastic model. Since more than one finite population mixed model can be used, we have shown how they can be compared by considering them in a hierarchy, and identifying whether the additional set of orthogonal random variables adds to the information about the target quantity. Further reductions in the number of random variables from the expanded finite population mixed model were considered (Appendix B), each of which lead to loss of information.

It is valuable to note that these results depend on selection of the target quantity. For example, if there is interest in the relationship between two variables among units (in a cluster), the collapsed expanded set of 2N2 random variables is not likely to be sufficient.

The BLUP obtained under the new model offers substantial gains over previous predictors. These gains are likely mitigated by the need to estimate shrinkage constants for use in practical setups. Simulation studies comparing the performance of the empirical predictors (3), (4), and (5) in the equal cluster size/sample size settings indicate some loss in efficiency, but with a similar ordering of MSE (San Martino, Singer, and Stanek 2007). Limited simulation studies have been conducted using the expanded model predictor and have indicated that there is a greater loss in the MSE of (10) relative to the other predictors. Iterative estimation procedures may be possible, and are currently being investigated. This area requires more study.

ACKNOWLEDGEMENT

This work was developed with the support of the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil and the National Institutes of Health (NIH-PHS-R01-HD36848, R01-HL071828-02), USA. We wish to thank Dr. Wenjun Li for helpful comments that have improved the manuscript.

APPENDIX A

Consider the partitioned expanded finite population mixed model in (9) where Xi=[1N1n(s=1Nwsms)]andXII=(1N1Nn(s=1Nwsms)1N1N(s=1Nws(Msms))), and the random effects are given by [Eξ2(YwIYwII)Eξ1ξ2(YwIYwII)]=[In(s=1Nfsds)vec(UIEξ1(UI))INn(s=1Nfsds)vec(UIIEξ1(UII))IN(s=1N(1fs)ds)vec[UEξ1(U)]], with (EwIEWII)=(YWIYWII)Eξ2(YWIYWII), and U = (UI UII), UI = ((Ui)) = (U1 U2Un) and UII = ((Ui)) = (Un+1 Un+2UN). The corresponding covariance matrix is given by

var(Ywp)=1N1(PN[(s=1Nfsds)PN(s=1Nfsds)]PN[(s=1Nfsds)PN(s=1N(1fs)ds)]PN[(s=1N(1fs)ds)PN(s=1Nfsds)]PN[(s=1N(1fs)ds)PN(s=1N(1fs)ds)])+(ININININ)(s=1Nvse2)

(see a. http://www.umass.edu/cluster/ for details) which we partition as varξ1ξ2(YwIYwII)=(VIVI,IIVI,IIVII), where Vse2=fs(1fs)Msws2σs2N (see b. http://www.umass.edu/cluster/ for details). Letting L be a vector of constants, it follows that LYwIT=(LgIgII)(YwIYwII), and the unbiased constraint is given by (LgI)XIgIIXII=0. Using Lagrange multipliers, we minimize varξ1ξ2(LYwIT) while accounting for the unbiased constraint results and obtain the familiar solution

L^=gI+[VI1VI1XI(XIVI1XI)1XIVI1]VI,IIgII+VI1XI(XIVI1XI)1XIIgII.

This result simplifies to

L^=(PncI[s=1Nks*fs]1N)+Nnc¯[1n(s=1N1fs)1N]+Nnnc¯II(1n1N)

(see c. http://www.umass.edu/cluster/ for details) where ks*=ksksds1Ns*=1N(1ks*1k¯)ds*,ks=fs2ds2fs2ds2+(N1)vse2 (see d. http://www.umass.edu/cluster/ for details), k¯=1Ns=1Nks,c¯II=1Nni=n+1Nciandc¯=1Ni=1Nci. The predictor T^p=L^YwI can then be expressed as as (10) (see e. http://www.umass.edu/cluster/ for details).

APPENDIX B

We discuss whether several other plausible reductions in the dimension of the set of expanded random variables (given by Nℕ), including a reduction to the set of 2N random variables used by Stanek and Singer (2004), may be considered without loss of information. First, it is natural to consider whether it is sufficient to predict T=gpYwp using the 2N collapsed random variables defined by Yw=C*Ywp. This set of random variables is similar to that used by Stanek and Singer (2004) for a population with equal size clusters and equal size samples per cluster with no response error. Since g*=gp[C*(C*C*)1]=12c,the targetT=g*Yw defines a linear combination of PSU means.

First, let = YwI where L^ represents an n × 1. vector of constants, and YwI represents the first n random variables (corresponding to the sample) in Yw. In this case, the bias Eξ1ξ2(YWIT) = (1n)ms − (c′1N)(Ms − ms) is zero only if sampling of clusters is conducted with probability proportional to size (PPS) (see a. http://www.umass.edu/cluster/).

Now assume a PPS sampling scheme, and notice that since gp(PC*Ywp)=0, we may write T=g*Yw+gP(PC*Ywp). Letting B^=b^(InPn)YwI be a linear unbiased predictor of gp(PC*Ywp) based on the sample part of PC*Ywpgiven by(InPN)YwI, the predictor will be optimal if and only if Eξ1ξ2[(T̂ − T)] = 0. Simplifying this expectation, we find

Eξ1ξ2[(T^T)B^]=[f(fL^cI)1N][In(1N1(s=1Nds)PN(s=1Nds)PN)]b^+[f(1f)N(L^+cI)[1N(s=1NMsws2σs2)PN]]b^

where ds=Mswsμs,c=(cIcII), cI is an n × 1. vector and f denotes the common sampling fraction (see b. http://www.umass.edu/cluster/). This expression is not equal to zero, even when the population consists of equal size clusters with homogeneous variances, and equal size samples are taken from sample clusters. By Theorem 1.1 in Rao and Bellhouse (1978), this result implies that some efficiency is lost in prediction when collapsing Ywp to Yw. The predictor based on Ywp will have smaller MSE than the predictor based on Yw, even in the settings considered by Stanek and Singer (2004) with no response error when clusters are of equal size, and equal size samples are selected.

Footnotes

1

We refer to such models as a finite population mixed models instead of random permutation models as in Stanek and Singer (2004) to avoid confusion with the homonymous, but different, model considered in Hedayat and Sinha (1991).

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Edward J. Stanek, III, Department of Public Health, 401 Arnold House, University of Massachusetts, 715 North Pleasant Street, Amherst, MA 01003-9304 USA, stanek@schoolph.umass.edu.

Julio M. Singer, Departamento de Estatística, Universidade de São Paulo, São Paulo, Brazil, jmsinger@ime.usp.br

REFERENCES

  1. Bolfarine H, Zacks S. Prediction Theory for Finite Populations. New York: Springer-Verlag; 1992. [Google Scholar]
  2. Brown H, Prescott R. Applied Mixed Models in Medicine. New York: Wiley; 1999. [Google Scholar]
  3. Bryk AS, Raudenbush SW. Hierarchical linear modeling: Applications, Data Analysis Methods. 2nd edition. Thousand Oaks, California: Sage Publishing; 2002. [Google Scholar]
  4. California Cluster Web Site: http://www.umass.edu/cluster/. (Details for Section 3.1 in document c06ed54.doc; Details for Section 3.2: a. c07ed15.doc, page 41. b. c07ed15.doc from page 41. c. c07ed25.doc, page 3; Details of Appendix A: a. c07ed27.doc, page 5. b. c07ed15.doc, from page 41. c. c06ed56.doc, from page 42 and, and c07ed01.doc, from page 1. d. c07ed15.doc, page 41. e. c07ed01.doc, page 6.; Details of Appendix B: a. c07ed29.doc, pages 8. b. c07ed29.doc, page 19).
  5. Cochran W. Survey Sampling. New York: Wiley; 1977. [Google Scholar]
  6. Demidenko E. Mixed models: Theory and Application. New York: John Wiley; 2004. [Google Scholar]
  7. Diggle PL, Heagerty P, Liang KY, Zeger S. Analysis of Longitudinal Data. Oxford University Press; 2002. [Google Scholar]
  8. Godambe VP. A unified theory of sampling from finite populations. Journal of the Royal Statistical Society B. 1955;17:269–278. [Google Scholar]
  9. Goldberger AS. Best Linear Unbiased Prediction in the Generalized Linear Regression Model. Journal of the American Statistical Association. 1962;57:369–375. [Google Scholar]
  10. Graybill FA. Matrices with applications in statistics. Belmont, California: Wadsworth International; 1983. [Google Scholar]
  11. Harville DA. Alternative formulations and procedures for the two-way mixed model. Biometrics. 1978;34:441–453. [Google Scholar]
  12. Hedayat AS, Sinha BK. Design and Inference in Finite Population Sampling. New York: John Wiley and Sons; 1991. [Google Scholar]
  13. Henderson CR. Applications of Linear Models in Animal Breeding. Guelph, Canada: University of Guelph; 1984. [Google Scholar]
  14. Jiang J. Linear and generalized linear mixed models and their applications. New York: Springer; 2007. [Google Scholar]
  15. Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
  16. Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O. SAS for Mixed Models: Second Edition. Cary N.C.: SAS Institute; 2006. [Google Scholar]
  17. McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: John Wiley and Sons; 2001. [Google Scholar]
  18. McLean RA, Sanders WL, Stroup WW. A Unified Approach to Mixed Linear Models. The American Statistician. 1991;45(1):54–64. [Google Scholar]
  19. Rao JNK, Bellhouse DR. Optimal estimation of a finite population mean under generalized random permutation models. Journal of Statistical Planning and Inference. 1978;2:125–141. [Google Scholar]
  20. Robinson GK. That BLUP is a Good Thing: the Estimation of Random Effects. Statistical Science. 1991;6(1):15–51. [Google Scholar]
  21. Royall RM. The Linear Least-squares Prediction Approach to Two-stage Sampling. Journal of the American Statistical Association. 1976;71:657–664. [Google Scholar]
  22. San Martino S, Singer JM, Stanek EJ., III Performance of balanced two-stage empirical predictors of realized cluster latent values from finite populations: A simulation Study. Computational Statistics and Data Analysis. 2007 doi: 10.1016/j.csda.2007.07.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Scott A, Smith TMF. Estimation in Multi-stage Surveys. Journal of the American Statistical Association. 1969;64(327):830–840. [Google Scholar]
  24. Searle SR, Casella G, McCulloch CE. Variance Components. New York: Wiley and Sons; 1992. [Google Scholar]
  25. Singer JD, Willett JB. Applied Longitudinal Data Analysis. New York: Oxford Press; 2003. [Google Scholar]
  26. Stanek EJ, III, Singer JM. Predicting Random Effects from Finite Population Clustered Samples with Response Error. Journal of the American Statistical Association. 2004;99:119–130. [Google Scholar]
  27. Stanek EJ, III, Singer JM, Lencina VB. A unified approach to estimation and prediction under simple random sampling. Journal of Statistical Planning and Inference. 2004;121:325–338. [Google Scholar]
  28. Valliant R, Dorfman HA, Royall RM. Finite Population Sampling and Inference. New York: Wiley; 2000. [Google Scholar]
  29. Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer-Verlag; 2000. [Google Scholar]

RESOURCES