Oversampling of Minority Populations Through Dual-Frame Surveys

Sixia Chen; Alexander Stubblefield; Julie A Stoner

doi:10.1093/jssam/smz054

. 2020 Jan 11;9(3):626–649. doi: 10.1093/jssam/smz054

Oversampling of Minority Populations Through Dual-Frame Surveys

Sixia Chen ^1,^✉, Alexander Stubblefield ¹, Julie A Stoner ¹

PMCID: PMC8308969 PMID: 34322557

Abstract

Previous studies have shown disparities in health conditions and behaviors among different ethnic groups. Sampling designs that do not consider oversampling certain minority populations, such as American Indians or African Americans, may not produce sufficient sample sizes for estimating health parameters for minority populations. Oversampling is one of the most common approaches that researchers use to achieve required precision levels for small domain estimation. However, it has not been rigorously investigated in dual-frame survey settings. To take advantage of extra information for minority populations in the Marketing Systems Group database, we propose a novel optimal oversampling strategy that minimizes the domain variance subject to total cost restriction or vice versa. We further extend the method to oversample multiple minorities simultaneously. Empirical study using a population-based community survey shows the benefits of our proposed methods compared with traditional methods in terms of statistical efficiency and cost balance.

Keywords: Minority, Oversampling, Stratification, Telephone survey

1. INTRODUCTION

The dual-frame telephone survey, utilizing both cell phone and landline numbers, has been used frequently due to its time, cost efficiency, and high coverage rate. Even though the response rates went down in recent years for telephone surveys, it can still serve as one of the effective data collection tools since it is more cost effective compared with in-person surveys and has several advantages compared with mail or email surveys (see Marcus and Crane 1986; Chang and Krosnick 2009; and Szolnoki and Hoffmann 2013, among others). According to the 2016 National Health Interview Survey (NHIS), the estimated coverage rate for either cell phones or landlines in the United States is more than 95 percent. There are many surveys that implement such a sampling design, including the 2016 Behavioral Risk Factor Surveillance System (BRFSS), the 2016 Tobacco Settlement Endowment Trust Healthy Living Program Community Member Survey (TSET HLP Survey), and the 2015–2016 California Health Interview Survey (CHIS). The corresponding statistical inference for dual-frame surveys has been discussed in Lohr and Rao (2000). Estimators and weight adjustments for dual-frame surveys have been considered in Hartley (1962, 1974), Skinner and Rao (1996), Fuller and Burmeister (1972), Bankier (1986), Kalton and Anderson (1986), and Rao and Wu (2010), among others.

Most existing dual-frame telephone surveys were designed to estimate parameters for the general population. For example, Wolter, Tao, Montgomery, and Smith (2015) discussed optimal allocation for a dual-frame telephone survey when there is overall cost constraint. The idea is to minimize the variability subject to the budget constraint or vice versa. However, they only considered estimating parameters for the overall population. Therefore, their proposed approach cannot be used to target minority populations, such as American Indians. However, estimating health-related parameters for minority populations is crucial in public health research since estimates may be different for these populations compared with the general population. For instance, Espey, Jim, Cobb, Bartholomew, Becker, et al. (2014) and Mowery, Dube, Thorne, Garrett, Homa, et al. (2015) showed that American Indians and Alaska Natives have a higher risk of experiencing tobacco-related disease and death due to high prevalence of cigarette smoking and other commercial tobacco use. The 2014 Minnesota Survey on Adult Substance Use (MSASU; Helba, Love, Wivagg, Frissell, Lee, et al. 2015) is one of the few dual-frame random digital dialing surveys that has oversampled certain minority populations, such as African American, Asian American, and American Indian populations. However, an ad hoc manual adjustment procedure was used to oversample these populations.

Several methods for oversampling rare populations have been proposed (see Kalton and Anderson 1986; Kalsbeek 2003; Elliot, Finch, Klein, Sai, Phuong Do, et al. 2008; Kalton 2009; Tourangeau, Edwards, Johnson, Wolter, and Bates 2014; and Chen and Kalton 2015). Chen and Kalton (2015) proposed an optimal sampling design for oversampling single or multiple rare populations simultaneously. These researchers examined the methods for achieving oversampling by increasing the sampling fractions for phone numbers from areas with a greater prevalence of the rare populations of interest. Other forms of oversampling using this method were performed in the 2003 US National Assessment of Adult Literacy to oversample the African American population (Mohadjer and Krenzke 2009) and in the US National Health and Nutrition Examination Survey (NHANES) to oversample several race/ethnicity populations (Curtin, Mohadjer, Dohrmann, Kruszon-Moran, Mirel, et al. 2013).

Despite the well-established theory and practice of dual-frame telephone surveying and of oversampling rare domains, to the best of our knowledge, there has been little rigorous discussion of the combination of these two methods. Even though the dual-frame optimization for area and telephone surveys has been considered in the sample design report of the 1997 and 1999 National Survey of America’s Families (NSAF; Judkins, Brick, Broene, Ferraro, and Strickler 2001), there is a significant absence in the literature regarding analysis of the optimal allocation for single and multiple minorities in a dual-frame telephone survey with rigorous justifications. The method of optimal allocation detailed in this article presents a novel approach to improve the precision of estimators for rare populations using oversampling for dual-frame telephone surveys. We propose using constraint optimization to achieve this goal. Given that a dual-frame approach is becoming increasingly standard for telephone surveys and that more precise estimators of rare populations are often needed, this method could contribute significantly to future survey design.

Section 2 introduces relevant mathematical notations and problems. Section 3 presents our proposed approach for oversampling a single minority using a dual-frame design. The corresponding design effect due to oversampling and stratification is contained in section 4. We then discuss oversampling multiple minorities simultaneously in section 5. Section 6 contains a real application of this allocation approach to the TSET HLP Survey. In section 7, we conclude the article with final remarks and discussion. All technical proofs are contained in the appendix.

2. NOTATIONS

We consider only dual-frame telephone survey in the following sections, even though our proposed approach can work for any dual-frame sampling designs. Denote A and B as landline and cell, Y as the study variable of interest, and D as the domain of interest. Population level notations including populations, domain populations, population sizes, domain population sizes, domain totals, domain means, and domain variances are defined in table 1. Let y_i be the study variable of interest for unit i and D_i be the domain indicator for domain D, such that D_i = 1 if unit i is in the domain D and zero otherwise. Assume the screening costs per interview for frame A and frame B are $c'_{A}$ and $c'_{B}$ and the full interview costs per interview are $c_{A}^{*}$ and $c_{B}^{*} .$ The total budget is denoted as C. The parameter of interest is the population domain mean $θ_{D} = N_{D}^{- 1} \sum_{i \in U_{D}} y_{i} .$

Table 1.

Notations for Different Populations

Category	Pop	Dom	Pop	Dom	Dom	Dom	Dom
			size	size	total	mean	Var
Overall	U	U_D	N	N_D	Y_D	${\bar{Y}}_{D}$	$S_{D}^{2}$
LL	U_A	$U_{A, D}$	N_A	$N_{A, D}$	$Y_{A, D}$	${\bar{Y}}_{A, D}$	$S_{A, D}^{2}$
Cell	U_B	$U_{B, D}$	N_B	$N_{B, D}$	$Y_{B, D}$	${\bar{Y}}_{B, D}$	$S_{B, D}^{2}$
LL only	U_a	$U_{a, D}$	N_a	$N_{a, D}$	$Y_{a, D}$	${\bar{Y}}_{a, D}$	$S_{a, D}^{2}$
LL dual	U_ab	$U_{ab, D}$	N_ab	$N_{ab, D}$	$Y_{ab, D}$	${\bar{Y}}_{ab, D}$	$S_{ab, D}^{2}$
Cell dual	U_ba	$U_{ba, D}$	N_ba	$N_{ba, D}$	$Y_{ba, D}$	${\bar{Y}}_{ba, D}$	$S_{ba, D}^{2}$
Cell only	U_b	$U_{b, D}$	N_b	$N_{b, D}$	$Y_{b, D}$	${\bar{Y}}_{b, D}$	$S_{b, D}^{2}$

Open in a new tab

Note.— Pop, population; Dom, domain; LL, landline.

3. PROPOSED METHOD

Denote the oversampling strata for frame A as $h_{A} = 1, 2, \dots, H_{A}$ and the oversampling strata for frame B as $h_{B} = 1, 2, \dots, H_{B} .$ Such oversampling strata can be constructed using the cumulative root frequency rule developed by Dalenius (1957) and population-level aggregated information obtained from the Marketing Systems Group company. Specifically, the Marketing Systems Group produced rate center–level ethnicity information for the cell phone frame and six-digit group-level ethnicity information for the landline frame. Let the sample sizes and population sizes for strata h_A and h_B be $n_{A, h_{A}},$ $n_{B, h_{B}},$ $N_{A, h_{A}}$ , and $N_{B, h_{B}} .$ Stratified simple random sampling without replacement sampling designs are applied to both frames. Write the corresponding samples for different frames and strata as $s_{A, h_{A}}$ and $s_{B, h_{B}}$ for $h_{A} = 1, \dots, H_{A}$ and $h_{B} = 1, \dots, H_{B} .$ The consistent weighted estimator for θ_D can be written as

{\hat{θ}}_{D} = \frac{1}{{\hat{N}}_{D}} ({\hat{Y}}_{a, D} + p_{D} {\hat{Y}}_{ab, D} + q_{D} {\hat{Y}}_{ba, D} + {\hat{Y}}_{b, D}),

(1)

where p_D and q_D depend on D with $p_{D} + q_{D} = 1;$ ${\hat{Y}}_{a, D},$ ${\hat{Y}}_{ab, D},$ ${\hat{Y}}_{ba, D}$ , and ${\hat{Y}}_{b, D}$ are the consistent estimators for the corresponding population domain totals; and ${\hat{N}}_{D}$ is the consistent estimator of N_D. They can be written as

\begin{matrix} {\hat{Y}}_{a, D} = \sum_{h_{A} = 1}^{H_{A}} \frac{N_{A, h_{A}}}{n_{A, h_{A}}} \sum_{i \in s_{A, h_{A}}} a_{i} D_{i} y_{i}, {\hat{Y}}_{ab, D} = \sum_{h_{A} = 1}^{H_{A}} \frac{N_{A, h_{A}}}{n_{A, h_{A}}} \sum_{i \in s_{A, h_{A}}} a b_{i} D_{i} y_{i}, \\ {\hat{Y}}_{b, D} = \sum_{h_{B} = 1}^{H_{B}} \frac{N_{B, h_{B}}}{n_{B, h_{B}}} \sum_{i \in s_{B, h_{B}}} b_{i} D_{i} y_{i}, {\hat{Y}}_{ba, D} = \sum_{h_{B} = 1}^{H_{B}} \frac{N_{B, h_{B}}}{n_{B, h_{B}}} \sum_{i \in s_{B, h_{B}}} b a_{i} D_{i} y_{i}, \end{matrix}

and

{\hat{N}}_{D} = \sum_{h_{A} = 1}^{H_{A}} \frac{N_{A, h_{A}}}{n_{A, h_{A}}} \sum_{i \in s_{A, h_{A}}} a_{i} D_{i} + p_{D} \sum_{h_{A} = 1}^{H_{A}} \frac{N_{A, h_{A}}}{n_{A, h_{A}}} \sum_{i \in s_{A, h_{A}}} a b_{i} D_{i} + q_{D} \sum_{h_{B} = 1}^{H_{B}} \frac{N_{B, h_{B}}}{n_{B, h_{B}}} \sum_{i \in s_{B, h_{B}}} b a_{i} D_{i} + \sum_{h_{B} = 1}^{H_{B}} \frac{N_{B, h_{B}}}{n_{B, h_{B}}} \sum_{i \in s_{B, h_{B}}} b_{i} D_{i},

where $a_{i},$ $a b_{i},$ b_i, and ba_i are corresponding indicator variables (zero or one) for $U_{a},$ $U_{ab},$ U_b, and $U_{ba} .$ The optimal choices of p_D and q_D were discussed in Hartley (1962, 1974) and Skinner and Rao (1996), among others. By using Taylor linearization and after some algebra (see Appendix A), we have

V ({\hat{θ}}_{D}) = \frac{1}{N_{D}^{2}} (\sum_{h_{A} = 1}^{H_{A}} N_{A, h_{A}}^{2} \frac{E_{A, h_{A}}}{n_{A, h_{A}}} + \sum_{h_{B} = 1}^{H_{B}} N_{B, h_{B}}^{2} \frac{E_{B, h_{B}}}{n_{B, h_{B}}}) + o (n^{- 1}),

(2)

where $n = \min (n_{A}, n_{B}),$

\begin{matrix} E_{A, h_{A}} = \frac{N_{A, h_{A}, a, D}}{N_{A, h_{A}}} S_{A, h_{A}, a, D}^{2} + p_{D}^{2} \frac{N_{A, h_{A}, ab, D}}{N_{A, h_{A}}} S_{A, h_{A}, ab, D}^{2} \\ + \frac{N_{A, h_{A}, a, D}}{N_{A, h_{A}}} {({\bar{Y}}_{A, h_{A}, a, D} - θ_{D})}^{2} + p_{D}^{2} \frac{N_{A, h_{A}, ab, D}}{N_{A, h_{A}}} {({\bar{Y}}_{A, h_{A}, ab, D} - θ_{D})}^{2} \\ - {\frac{N_{A, h_{A}, a, D}}{N_{A, h_{A}}} ({\bar{Y}}_{A, h_{A}, a, D} - θ_{D}) + p_{D} \frac{N_{A, h_{A}, ab, D}}{N_{A, h_{A}}} ({\bar{Y}}_{A, h_{A}, ab, D} - θ_{D})}^{2}, \end{matrix}

and

\begin{matrix} E_{B, h_{B}} = \frac{N_{B, h_{B}, b, D}}{N_{B, h_{B}}} S_{B, h_{B}, b, D}^{2} + q_{D}^{2} \frac{N_{B, h_{B}, ba, D}}{N_{B, h_{B}}} S_{B, h_{B}, ba, D}^{2} \\ + \frac{N_{B, h_{B}, b, D}}{N_{B, h_{B}}} {({\bar{Y}}_{B, h_{B}, b, D} - θ_{D})}^{2} + q_{D}^{2} \frac{N_{B, h_{B}, ba, D}}{N_{B, h_{B}}} {({\bar{Y}}_{B, h_{B}, ba, D} - θ_{D})}^{2} \\ - {\frac{N_{B, h_{B}, b, D}}{N_{B, h_{B}}} ({\bar{Y}}_{B, h_{B}, b, D} - θ_{D}) + q_{D} \frac{N_{B, h_{B}, ba, D}}{N_{B, h_{B}}} ({\bar{Y}}_{B, h_{B}, ba, D} - θ_{D})}^{2}, \end{matrix}

where $N_{A, h_{A}, a, D},$ $N_{A, h_{A}, ab, D},$ $N_{B, h_{B}, b, D},$ and $N_{B, h_{B}, ba, D}$ are the corresponding population sizes for domain D in subpopulations; $S_{A, h_{A}, a, D}^{2},$ $S_{A, h_{A}, ab, D}^{2},$ $S_{B, h_{B}, b, D}^{2}$ , and $S_{B, h_{B}, ba, D}^{2}$ are corresponding domain population variances; and ${\bar{Y}}_{A, h_{A}, a, D},$ ${\bar{Y}}_{A, h_{A}, ab, D},$ ${\bar{Y}}_{B, h_{B}, b, D}$ , and ${\bar{Y}}_{B, h_{B}, ba, D}$ are the corresponding domain population means. The total cost can be written as

C = \sum_{h_{A} = 1}^{H_{A}} n_{A, h_{A}} {P_{A, h_{A}} c_{A}^{*} + (1 - P_{A, h_{A}}) c'_{A}} + \sum_{h_{B} = 1}^{H_{B}} n_{B, h_{B}} {P_{B, h_{B}} c_{B}^{*} + (1 - P_{B, h_{B}}) c'_{B}},

(3)

with $P_{A, h_{A}}$ and $P_{B, h_{B}}$ as the stratum prevalence for domain D. By minimizing $V ({\hat{θ}}_{D})$ in (2) subject to cost constraint (3), it can be shown that

f_{A, h_{A}, opt} = K \sqrt{\frac{E_{A, h_{A}}}{P_{A, h_{A}} (c_{A}^{*} - c'_{A}) + c'_{A}}}, f_{B, h_{B}, opt} = K \sqrt{\frac{E_{B, h_{B}}}{P_{B, h_{B}} (c_{B}^{*} - c'_{B}) + c'_{B}}},

(4)

where $f_{A, h_{A}, opt} = n_{A, h_{A}, opt} / N_{A, h_{A}}$ and $f_{B, h_{B}, opt} = n_{B, h_{B}, opt} / N_{B, h_{B}} .$

Remark 3.1 If we assume $S_{A, h_{A}, a, D}^{2} = S_{A, h_{A}, ab, D}^{2} = S_{B, h_{B}, b, D}^{2} = S_{B, h_{B}, ba, D}^{2} = S_{D}^{2},$ ${\bar{Y}}_{A, h_{A}, a, D} = {\bar{Y}}_{A, h_{A}, ab, D} = {\bar{Y}}_{B, h_{B}, b, D} = {\bar{Y}}_{B, h_{B}, ba, D} = θ_{D},$ then (4) can be simplified as

f_{A, h_{A}, opt} = \tilde{K} \sqrt{\frac{P_{A, h_{A}} (Q_{A, h_{A}, a} + p_{D}^{2} Q_{A, h_{A}, ab})}{P_{A, h_{A}} (c_{A}^{*} - c'_{A}) + c'_{A}}},

and

f_{B, h_{B}, opt} = \tilde{K} \sqrt{\frac{P_{B, h_{B}} (Q_{B, h_{B}, b} + q_{D}^{2} Q_{B, h_{B}, ba})}{P_{B, h_{B}} (c_{B}^{*} - c'_{B}) + c'_{B}}},

where $Q_{A, h_{A}, a} = N_{A, h_{A}, a, D} N_{A, h_{A}, D}^{- 1},$ $Q_{A, h_{A}, ab} = N_{A, h_{A}, ab, D} N_{A, h_{A}, D}^{- 1},$ $Q_{B, h_{B}, b} = N_{B, h_{B}, b, D} N_{B, h_{B}, D}^{- 1}$ and $Q_{B, h_{B}, ba} = N_{B, h_{B}, ba, D} N_{B, h_{B}, D}^{- 1} .$ Therefore, the sampling fractions $f_{A, h_{A}, opt}$ and $f_{B, h_{B}, opt}$ increase as $P_{A, h_{A}},$ $Q_{A, h_{A}, a},$ $Q_{A, h_{A}, ab}$ , and $P_{B, h_{B}},$ $Q_{B, h_{B}, b},$ $Q_{B, h_{B}, ba}$ increase.

According to Remark 3.1, we know that the sampling fractions $f_{A, h_{A}, opt}$ and $f_{B, h_{B}, opt}$ will increase with increasing prevalence of $P_{A, h_{A}}$ and $P_{B, h_{B}}$ and decreasing prevalence of $c_{A}^{*} / c'_{A}$ and $c_{B}^{*} / c'_{B},$ consistent with previous literature.

4. DESIGN EFFECT DUE TO STRATIFICATION

Without using oversampling stratification, our proposed optimal allocation in (4) becomes

f_{A, opt} = K_{0} \sqrt{\frac{E_{A}}{P_{A} (c_{A}^{*} - c_{A}^{'}) + c_{A}^{'}}}, f_{B, opt} = K_{0} \sqrt{\frac{E_{B}}{P_{B} (c_{B}^{*} - c_{B}^{'}) + c_{B}^{'}}},

(5)

where

\begin{matrix} E_{A} = \frac{N_{A, a, D}}{N_{A}} S_{A, a, D}^{2} + p_{D}^{2} \frac{N_{A, ab, D}}{N_{A}} S_{A, ab, D}^{2} + \frac{N_{A, a, D}}{N_{A}} {({\bar{Y}}_{A, a, D} - θ_{D})}^{2} \\ + p_{D}^{2} \frac{N_{A, ab, D}}{N_{A}} {({\bar{Y}}_{A, ab, D} - θ_{D})}^{2} - {\frac{N_{A, a, D}}{N_{A}} ({\bar{Y}}_{A, a, D} - θ_{D}) + p_{D} \frac{N_{A, ab, D}}{N_{A}} ({\bar{Y}}_{A, ab, D} - θ_{D})}^{2}, \end{matrix}

and

\begin{matrix} E_{B} = \frac{N_{B, b, D}}{N_{B}} S_{B, b, D}^{2} + q_{D}^{2} \frac{N_{B, ba, D}}{N_{B}} S_{B, ba, D}^{2} + \frac{N_{B, b, D}}{N_{B}} {({\bar{Y}}_{B, b, D} - θ_{D})}^{2} \\ + q_{D}^{2} \frac{N_{B, ba, D}}{N_{B}} {({\bar{Y}}_{B, ba, D} - θ_{D})}^{2} - {\frac{N_{B, b, D}}{N_{B}} ({\bar{Y}}_{B, b, D} - θ_{D}) + q_{D} \frac{N_{B, ba, D}}{N_{B}} ({\bar{Y}}_{B, ba, D} - θ_{D})}^{2} . \end{matrix}

The optimal allocation in (4) can be treated as an extension of Wolter et al. (2015) with estimation of the domain parameter. Suppose we wish to compare our proposed sampling design without oversampling stratification with our proposed sampling design with oversampling stratification. Assume our target sample size for minority domain D is $n_{D}^{*}$ . Then we have

n_{D}^{*} = f_{A, opt} N_{A} P_{A} + f_{B, opt} N_{B} P_{B} .

Together with (5), we have

K_{0} = {N_{A} P_{A} \sqrt{\frac{E_{A}}{P_{A} (c_{A}^{*} - c_{A}^{'}) + c_{A}^{'}}} + N_{B} P_{B} \sqrt{\frac{E_{B}}{P_{B} (c_{B}^{*} - c_{B}^{'}) + c_{B}^{'}}}}^{- 1} n_{D}^{*} .

So, the total cost can be written as

C_{0}^{*} = f_{A, opt}^{*} N_{A} {P_{A} c_{A}^{*} + (1 - P_{A}) c'_{A}} + f_{B, opt}^{*} N_{B} {P_{B} c_{B}^{*} + (1 - P_{B}) c'_{B}},

where $f_{A, opt}^{*}$ and $f_{B, opt}^{*}$ are defined in (5) by using K₀, defined previously. Define the point estimator as ${\hat{θ}}_{D}^{(0)}$ under this design by using the formula in (1) without stratification. The corresponding design variance of ${\hat{θ}}_{D}^{(0)}$ can be written as

V_{opt} ({\hat{θ}}_{D}^{(0)}) = \frac{1}{N_{D}^{2}} (N_{A}^{2} \frac{E_{A}}{n_{A, opt}^{*}} + N_{B}^{2} \frac{E_{B}}{n_{B, opt}^{*}}) + o (n^{- 1}),

(6)

where $n_{A, opt}^{*} = f_{A, opt}^{*} N_{A}$ and $n_{B, opt}^{*} = f_{B, opt}^{*} N_{B} .$ According to (3), (4), and (6), to achieve the same cost, the corresponding estimation of K can be written as

K^{*} = {(Δ_{A} + Δ_{B})}^{- 1} C_{0}^{*},

(7)

with

Δ_{A} = \sum_{h_{A} = 1}^{H_{A}} N_{A, h_{A}} \sqrt{\frac{E_{A, h_{A}}}{P_{A, h_{A}} (c_{A}^{*} - c'_{A}) + c'_{A}}} {P_{A, h_{A}} c_{A}^{*} + (1 - P_{A, h_{A}}) c'_{A}},

and

Δ_{B} = \sum_{h_{B} = 1}^{H_{B}} N_{B, h_{B}} \sqrt{\frac{E_{B, h_{B}}}{P_{B, h_{B}} (c_{B}^{*} - c'_{B}) + c'_{B}}} {P_{B, h_{B}} c_{B}^{*} + (1 - P_{B, h_{B}}) c'_{B}} .

According to (2), (4), and (7), we have

V_{opt} ({\hat{θ}}_{D}) = \frac{1}{N_{D}^{2}} (\sum_{h_{A} = 1}^{H_{A}} N_{A, h_{A}}^{2} \frac{E_{A, h_{A}}}{n_{A, h_{A}, opt}^{*}} + \sum_{h_{B} = 1}^{H_{B}} N_{B, h_{B}}^{2} \frac{E_{B, h_{B}}}{n_{B, h_{B}, opt}^{*}}) + o (n^{- 1}),

(8)

where $n_{A, h_{A}, opt}^{*} = N_{A, h_{A}} f_{A, h_{A}, opt}^{*}$ and $n_{B, h_{B}, opt}^{*} = N_{B, h_{B}} f_{B, h_{B}, opt}^{*} .$ Therefore, according to (6) and (8), the corresponding design effect for oversampling stratification is

Def f_{s} = \frac{V_{opt} ({\hat{θ}}_{D})}{V_{opt} ({\hat{θ}}_{D}^{(0)})} \approx \frac{\sum_{h_{A} = 1}^{H_{A}} N_{A, h_{A}}^{2} E_{A, h_{A}} n_{A, h_{A}, opt}^{* - 1} + \sum_{h_{B} = 1}^{H_{B}} N_{B, h_{B}}^{2} E_{B, h_{B}} n_{B, h_{B}, opt}^{* - 1}}{N_{A}^{2} E_{A} n_{A, opt}^{* - 1} + N_{B}^{2} E_{B} n_{B, opt}^{* - 1}} .

(9)

The design effect defined in (9) can be used to compare the design efficiency for designs with and without oversampling stratification. Such comparisons are conducted in section 6.

5. EXTENSION TO OVERSAMPLE MULTIPLE MINORITIES

For simplicity, we only consider two minority populations here. However, our proposed approach can be naturally extended to oversample more than two minority populations. Denote more prevalent minority population as M and less prevalent minority population as L. Similar to Chen and Kalton (2015), we propose first constructing final oversampling strata by crossing the two oversampling strata for M and L. For example, if we consider African American as M and American Indian as L, then we can first construct two oversampling strata for African American and American Indian separately by using the cumulative root frequency rule developed by Dalenius (1957). The final four oversampling strata can then be constructed by crossing the two kinds of oversampling strata. Waksberg, Judkins, and Massey (1997) and Jewett and Judkins (1988) also discussed multivariate stratification with minority populations. Suppose we first draw $n_{A, h_{A}}$ cases and $n_{B, h_{B}}$ cases in stratum h_A of frame A and stratum h_B of frame B by using simple random sampling design without replacement (SRSWOR), then keep all sampled cases in L and draw a second phase sample in M by using stratified simple random sampling design without replacement with sampling fraction $f_{A, h_{A}}^{(M)}$ and $f_{B, h_{B}}^{(M)} .$ The total cost can be written as

\begin{matrix} C = \sum_{h_{A} = 1}^{H_{A}} n_{A, h_{A}} {P_{A, h_{A}}^{(L)} c_{A}^{*} + f_{A, h_{A}}^{(M)} P_{A, h_{A}}^{(M)} c_{A}^{*} + c'_{A} (1 - P_{A, h_{A}}^{(L)} - f_{A, h_{A}}^{(M)} P_{A, h_{A}}^{(M)})} \\ + \sum_{h_{B} = 1}^{H_{B}} n_{B, h_{B}} {P_{B, h_{B}}^{(L)} c_{B}^{*} + f_{B, h_{B}}^{(M)} P_{B, h_{B}}^{(M)} c_{B}^{*} + c'_{B} (1 - P_{B, h_{B}}^{(L)} - f_{B, h_{B}}^{(M)} P_{B, h_{B}}^{(M)})}, \end{matrix}

(10)

where $P_{A, h_{A}}^{(L)},$ $P_{A, h_{A}}^{(M)},$ $P_{B, h_{B}}^{(L)}$ , and $P_{B, h_{B}}^{(M)}$ are the population prevalence of corresponding minority and stratum. Suppose we are interested in estimating two minority population means $θ_{D}^{(L)} = N_{D}^{(L) - 1} \sum_{i \in U_{D}^{(L)}} y_{i}$ and $θ_{D}^{(M)} = N_{D}^{(M) - 1} \sum_{i \in U_{D}^{(M)}} y_{i},$ , where $N_{D}^{(L)}$ and $N_{D}^{(M)}$ are the population sizes for two minority populations L and M; $U_{D}^{(L)}$ and $U_{D}^{(M)}$ denote the universes for L and M. Our proposed sample sizes $n_{A, h_{A}},$ $n_{B, h_{B}}$ , and subsampling fractions $f_{A, h_{A}}^{(L)}$ and $f_{B, h_{B}}^{(M)}$ can be obtained by minimizing the total cost (10) subject to the following precision requirements for the estimates of two minority population parameters

CV ({\hat{θ}}_{D}^{(L)}) \leq α^{(L)}, CV ({\hat{θ}}_{D}^{(M)}) \leq α^{(M)},

(11)

where $CV (\hat{θ}) = V^{1 / 2} (\hat{θ}) / θ$ is the coefficient of variation of $\hat{θ},$ and $α^{(L)}$ and $α^{(M)}$ are the prespecified thresholds. In practice, θ is unknown and one can use weighted estimates obtained from previous survey or similar survey. According to the derivations in appendix B, by using the Lagrange multiplier approach, we have

n_{A, h_{A}} = \frac{N_{A, h_{A}}}{N_{D}^{(L)}} \sqrt{\frac{λ_{1} E_{A, h_{A}}^{(L)}}{(c_{A}^{*} - c'_{A}) P_{A, h_{A}}^{(L)} + c'_{A}}}, n_{B, h_{B}} = \frac{N_{B, h_{B}}}{N_{D}^{(L)}} \sqrt{\frac{λ_{1} E_{B, h_{B}}^{(L)}}{(c_{B}^{*} - c'_{B}) P_{B, h_{B}}^{(L)} + c'_{B}}},

(12)

f_{A, h_{A}}^{(M)} = \frac{N_{D}^{(L)}}{N_{D}^{(M)}} \sqrt{\frac{λ_{2} E_{A, h_{A}}^{(M)} {P_{A, h_{A}}^{(L)} + c'_{A} {(c_{A}^{*} - c_{A}^{'})}^{- 1}}}{λ_{1} P_{A, h_{A}}^{(M)} E_{A, h_{A}}^{(L)}}},

(13)

and

f_{B, h_{B}}^{(M)} = \frac{N_{D}^{(L)}}{N_{D}^{(M)}} \sqrt{\frac{λ_{2} E_{B, h_{B}}^{(M)} {P_{B, h_{B}}^{(L)} + c'_{B} {(c_{B}^{*} - c_{B}^{'})}^{- 1}}}{λ_{1} P_{B, h_{B}}^{(M)} E_{B, h_{B}}^{(L)}}},

(14)

where $E_{A, h_{A}}^{(L)}$ and $E_{B, h_{B}}^{(L)}$ are defined in section 3, and λ₁ and λ₂ are the Lagrange multipliers. From (12), we find that the sample sizes $n_{A, h_{A}}$ and $n_{B, h_{B}}$ decrease as cost ratios $c_{A}^{*} / c'_{A}$ and $c_{B}^{*} / c'_{B}$ and increase as the prevalence of $P_{A, h_{A}}^{(L)}$ and $P_{B, h_{B}}^{(L)}$ increase. According to (13) and (14), the subsampling fractions $f_{A, h_{A}}^{(M)}$ and $f_{B, h_{B}}^{(M)}$ increase as population ratio $N_{D}^{(L)} / N_{D}^{(M)}$ increases and as $P_{A, h_{A}}^{(L)}$ and $P_{B, h_{B}}^{(L)}$ increase or $P_{A, h_{A}}^{(M)}$ and $P_{B, h_{B}}^{(M)}$ decrease. The Lagrange multipliers λ₁ and λ₂ can be obtained by plugging (12)–(14) into (11) and solving the two equations. If $f_{A, h_{A}}^{(M)} \geq 1,$ then we set $f_{A, h_{A}}^{(M)} = 1$ and redo the Lagrange multiplier procedure to obtain

n_{A, h_{A}} = N_{A, h_{A}} \sqrt{\frac{λ_{1} E_{A, h_{A}}^{(L)} N_{D}^{(L) - 2} + λ_{2} E_{A, h_{A}}^{(M)} N_{D}^{(M) - 2}}{(P_{A, h_{A}}^{(L)} + P_{A, h_{A}}^{(M)}) (c_{A}^{*} - c'_{A}) + c'_{A}}},

and $n_{A, h_{A}}$ and $f_{A, h_{A}}^{(M)}$ for $f_{A, h_{A}}^{(M)} < 1$ remain the same. Similarly, if $f_{B, h_{B}}^{(M)} \geq 1,$ then we set $f_{B, h_{B}}^{(M)} = 1$ and redo the Lagrange multiplier procedure to obtain

n_{B, h_{B}} = N_{B, h_{B}} \sqrt{\frac{λ_{1} E_{B, h_{B}}^{(L)} N_{D}^{(L) - 2} + λ_{2} E_{B, h_{B}}^{(M)} N_{D}^{(M) - 2}}{(P_{B, h_{B}}^{(L)} + P_{B, h_{B}}^{(M)}) (c_{B}^{*} - c'_{B}) + c'_{B}}} .

Then λ₁ and λ₂ can be obtained by solving equations in (11). The corresponding total cost can be obtained by plugging $n_{A, h_{A}},$ $n_{B, h_{B}},$ $f_{A, h_{A}}^{(M)}$ , and $f_{B, h_{B}}^{(M)}$ into (10). Therefore, our proposed oversampling design can be compared with other designs in terms of total cost. Such comparisons are conducted in section 6.

6. REAL APPLICATION

We consider two scenarios in this section. Scenario one (section 6.1) includes a comparison between the stratified optimal and unstratified suboptimal sampling designs described in sections 3 and 4 for oversampling a single minority population. Scenario two (section 6.2) contains a comparison between the stratified optimal and unstratified suboptimal sampling designs for oversampling multiple minority populations simultaneously described in section 5.

Data are taken from the TSET HLP Survey, which is a dual-frame (landline and cell) random digit dialing (RDD) survey of adults living in Oklahoma. The survey was designed to gather information on Oklahomans’ knowledge, attitudes, and behaviors regarding physical activity, nutrition, tobacco use, and overall wellness and identify descriptive norms related to these health topics. The final number of completed surveys is about 4,500. The original sampling design of the survey is stratified simple random sampling without replacement by using frame (landline and cell) as the strata. The survey did not consider oversampling minority populations. Final weights were created by taking into account probability sampling, combining landline with cell frames, nonresponse adjustment, raking, and trimming. We use the survey data to estimate unknown quantities defined in previous sections for comparison purposes. In addition to the survey data, we used population-level aggregated data files obtained from the Marketing Systems Group company. Specifically, rate center–level ethnicity information for the cell phone frame and six-digit group-level ethnicity information for the landline frame were used to create oversampling strata.

6.1 Oversampling a Single Minority Population

This section compares the design effect of using stratified oversampling of a single minority population in a dual-frame telephone survey as opposed to a nonstratified sampling approach. We focus on the American Indian and African American populations separately in this application. We have created two different designs, one for each of these populations. Design effects were calculated by using our proposed methods in sections 3 and 4 to compare the variance of the stratified oversampling method with the nonstratified method.

We consider the following two study variables of interest. The first survey question of interest is the response to the question, “Do you believe that being overweight or obese can cause Type 2 Diabetes?” The study variable $Y_{1} = 1$ if the answer is “yes” and zero if the answer is “no.” The second survey question of interest is the response to the question, “Do you believe that being overweight or obese can cause heart disease?” The study variable $Y_{2} = 1$ if the answer is “yes” and zero if the answer is “no.” Any missing responses for either of the questions were imputed by using gender, age group, race, phone type, household income, and marriage status as covariates.

By using the population-level aggregated ethnicity information provided by Marketing Systems Group, we establish three strata for each race. Stratum one consists of those rate centers or six-digit groups with a relatively high density of the minority population. Stratum two consists of those rate centers or six-digit groups with a relatively moderate density of that minority population. Stratum three consists of those rate centers or six-digit groups with a low density of that minority population. The strata levels were determined by using the cumulative root frequency rule developed by Dalenius (1957) and population-level aggregated ethnicity information. The density cutoffs for strata were 5 percent and 13 percent for the American Indian population and 4 percent and 14 percent for the African American population. The prevalence of African Americans was about 7.56 percent, and the prevalence of American Indians was about 6.63 percent.

Without loss of generality, we assume the costs of a full complete interview with landline and cell are $c_{A}^{*} = 1$ and $c_{B}^{*} = 1$ . We consider seven different cost structures for the ratio of the cost of the full interview to cost of initial screening: $c_{A}^{*} / c'_{A} = c_{B}^{*} / c'_{B} =$ 1, 2, 3, 5, 10, 20, and 30. Assume the desired number of completes for minority population is $n_{D}^{*} = 1, 000.$ The corresponding design effects are calculated by using the formulas described in section 4. The results are presented in tables 2 and 3 for the first and second study variables with different cost ratios. For both study variables and two ethnicities, the design effects increase as the cost ratio increases. This finding demonstrates the benefits of oversampling with a low cost ratio. The strength of oversampling stratification also depends on the study variable. For instance, the design effects for study variable one are generally less than those for study variable two.

Table 2.

Ratios of the Sampling Variances of the Designs With and Without Oversampling Stratification by Different Ethnicity for “Type 2 Diabetes” Study Variable

Cost ratio	American Indians	African Americans
1	0.684	0.700
2	0.699	0.715
3	0.711	0.728
5	0.732	0.747
10	0.766	0.778
20	0.801	0.807
30	0.820	0.820

Open in a new tab

Table 3.

Ratios of the Sampling Variances of the Designs With and Without Oversampling Stratification by Different Ethnicity for “Heart Disease” Study Variable

Cost ratio	American Indians	African Americans
1	0.898	0.928
2	0.908	0.935
3	0.916	0.941
5	0.928	0.949
10	0.946	0.962
20	0.960	0.972
30	0.965	0.976

Open in a new tab

6.2 Oversampling Multiple Minority Populations

In this section, we compare the total costs between designs with and without oversampling stratification in terms of targeting multiple minority populations simultaneously. The TSET HLP Survey was again used, and the two minorities oversampled were African Americans and American Indians. African Americans were the more prevalent minority, representing 7.56 percent of the population, compared with American Indians, who represented 6.63 percent of the population.

This time, one cutoff per minority divided each minority group into two strata, creating a total of four strata combinations. Strata were again determined using the cumulative root method. The density cutoffs for the strata for the American Indian population were set at 9 percent, and the density cutoffs for the African American population were set at 7 percent. Thus, the following four strata were established: high prevalence of both American Indians (greater than 9 percent) and African Americans (greater than 7 percent), high prevalence of American Indians (greater than 9 percent) and low prevalence of African Americans (less than 7 percent), low prevalence of American Indians (less than 9 percent) and high prevalence of African Americans (greater than 7 percent), and low prevalence of both (less than 9 percent African Americans and less than 7 percent American Indians). Suppose the study variable of interest is again $Y_{1} :$ “Being overweight or obese can cause Type 2 Diabetes?” Any missing responses for either of the questions were imputed by using gender, age group, race, phone type, household income, and marriage status as covariates.

We assume a cost structure $c_{A}^{*} = 1$ and $c'_{B} = 2$ , with c = $c_{A}^{*} {(c_{A}^{'})}^{- 1} = c_{B}^{*} {(c_{B}^{'})}^{- 1}$ , with c taking the values 2, 3, 5, 10, 20, and 30. For simplicity, we set p = q = 0.5, where p and q are defined in (1). We assume $α^{(L)} = α^{(M)} = 0.05$ . We followed the method described in section 5 to determine the total cost for both the stratified oversampling and nonstratified approaches. The ratios of the costs for the stratified to nonstratified designs were calculated and presented in table 4. The total costs based on our proposed approach with oversampling stratification are about 0.83 to 0.88 times that for the design without oversampling stratification. The benefits of stratification decrease as the cost ratio increases, which is consistent with previous findings.

Table 4.

Ratios of the Total Costs With and Without Oversampling Stratification for Multiple Minorities by Different Cost Structures for “Type 2 Diabetes” Study Variable

Cost ratio c	Stratified cost	Nonstratified costs	Ratio of total costs
2	118.931	143.161	0.831
3	88.385	105.251	0.840
5	63.718	74.822	0.852
10	44.984	51.894	0.867
20	35.460	40.369	0.878
30	32.238	36.513	0.883

Open in a new tab

7. CONCLUDING REMARKS

In this article, we propose a novel oversampling procedure for targeting one or multiple minority populations with dual-frame sampling design. One application of our proposed approach is the dual-frame random digit dialing (RDD) telephone survey. We derived theoretical formulas for optimal allocation, including the corresponding sample sizes and subsampling fractions for all strata. We applied our proposed approach to the TSET HLP Survey and demonstrated its benefits compared with other methods. Weighting and statistical analysis based on our sampling design can be conducted by using similar techniques as described in the dual-frame research literature (Lohr and Rao, 2000; Lohr, 2011). Fahimi and Judkins (1991) considered differential sampling at second-stage in multi-stage sampling design to oversample small populations. Further investigation of oversampling minority populations with stratified multi-stage sampling design will be conducted.

Appendix

A. SKETCHED PROOF OF (2)

By using Taylor linearization, we have:

\begin{matrix} {\hat{θ}}_{D} & = θ_{D} + \frac{1}{N_{D}} ({\hat{Y}}_{a, D} + p_{D} {\hat{Y}}_{ab, D} + q_{D} {\hat{Y}}_{ba, D} + {\hat{Y}}_{b, D} - Y_{D}) - \frac{Y_{D}}{N_{D}^{2}} ({\hat{N}}_{D} - N_{D}) + o_{p} (n^{- \frac{1}{2}}) \\ = θ_{D} + \frac{1}{N_{D}} \sum_{h_{A} = 1}^{H_{A}} \frac{N_{A, h_{A}}}{n_{A, h_{A}}} \sum_{i \in S_{A, h_{A}}} a_{i} D_{i} (y_{i} - θ_{D}) + p_{D} \frac{1}{N_{D}} \sum_{h_{A} = 1}^{H_{A}} \frac{N_{A, h_{A}}}{n_{A, h_{A}}} \sum_{i \in S_{B, h_{B}}} a b_{i} D_{i} (y_{i} - θ_{D}) \\ + q_{D} \frac{1}{N_{D}} \sum_{h_{B} = 1}^{H_{B}} \frac{N_{B, h_{B}}}{n_{B, h_{B}}} \sum_{i \in S_{B, h_{B}}} b a_{i} D_{i} (y_{i} - θ_{D}) + \frac{1}{N_{D}} \sum_{h_{B} = 1}^{H_{B}} \frac{N_{B, h_{B}}}{n_{B, h_{B}}} \sum_{i \in S_{B, h_{B}}} b_{i} D_{i} (y_{i} - θ_{D}) + o_{p} (n^{- \frac{1}{2}}) \\ = θ_{D} + T_{1} + p_{D} T_{2} + q_{D} T_{3} + T_{4} + o_{p} (n^{- \frac{1}{2}}), \end{matrix}

where

\begin{matrix} T_{1} = \frac{1}{N_{D}} \sum_{h_{A} = 1}^{H_{A}} \frac{N_{A, h_{A}}}{n_{A, h_{A}}} \sum_{i \in S_{A, h_{A}}} a_{i} D_{i} (y_{i} - θ_{D}), T_{2} = \frac{1}{N_{D}} \sum_{h_{A} = 1}^{H_{A}} \frac{N_{A, h_{A}}}{n_{A, h_{A}}} \sum_{i \in S_{A, h_{A}}} a b_{i} D_{i} (y_{i} - θ_{D}) \\ T_{3} = \frac{1}{N_{D}} \sum_{h_{B} = 1}^{H_{B}} \frac{N_{B, h_{B}}}{n_{B, h_{B}}} \sum_{i \in S_{B, h_{B}}} b a_{i} D_{i} (y_{i} - θ_{D}), T_{4} = \frac{1}{N_{D}} \sum_{h_{B} = 1}^{H_{B}} \frac{N_{B, h_{B}}}{n_{B, h_{B}}} \sum_{i \in S_{B, h_{B}}} b_{i} D_{i} (y_{i} - θ_{D}) . \end{matrix}

Then,

\begin{matrix} V ({\hat{θ}}_{D}) & = V (T_{1} + p_{D} T_{2} + q_{D} T_{3} + T_{4}) + o (n^{- 1}) \\ = V (T_{1} + p_{D} T_{2}) + V (q_{D} T_{3} + T_{4}) + o (n^{- 1}) \\ = V (T_{1}) + p_{D}^{2} V (T_{2}) + 2 p_{D} c ov (T_{1}, T_{2}) + q_{D}^{2} V (T_{3}) + V (T_{4}) + 2 q_{D} c ov (T_{3}, T_{4}) + o (n^{- 1}) . \end{matrix}

Let $n_{1 i} = a_{i} D_{i} (y_{i} - θ_{D}) .$ Then,

V (T_{1}) = \frac{1}{N_{D}^{2}} \sum_{h_{A} = 1}^{H_{A}} \frac{N_{A, h_{A}}^{2}}{n_{A, h_{A}}} (1 - \frac{n_{A, h_{A}}}{N_{A, h_{A}}}) S_{n_{1, A, h_{A}}}^{2},

(1)

where

\begin{matrix} S_{n_{1, A, h_{A}}}^{2} & = \frac{1}{N_{A, h_{A}} - 1} \sum_{i = 1}^{N_{A, h_{A}}} {(n_{1 i} - {\bar{n}}_{1, A, h_{A}})}^{2} = \frac{1}{N_{A, h_{A}} - 1} \sum_{i = 1}^{N_{A, h_{A}}} (n_{1 i}^{2} - 2 n_{1 i} {\bar{n}}_{1, A, h_{A}} + {\bar{n}}_{1, A, h_{A}}^{2}) \\ = \frac{1}{N_{A, h_{A}} - 1} (\sum_{i = 1}^{N_{A, h_{A}}} n_{1 i}^{2} - N_{A, h_{A}} {\bar{n}}_{1, A, h_{A}}^{2}) \\ = \frac{1}{N_{A, h_{A}} - 1} (\sum_{i = 1}^{N_{A, h_{A}}} a_{i} D_{i} {(y_{i} - θ_{D})}^{2} - N_{A, h_{A}} {\bar{n}}_{1, A, h_{A}}^{2}) \end{matrix},

(2)

and

\begin{matrix} {\bar{n}}_{1, A, h_{A}} & = \frac{1}{N_{A, h_{A}}} \sum_{i = 1}^{N_{A, h_{A}}} a_{i} D_{i} (y_{i} - θ_{D}) = \frac{1}{N_{A, h_{A}}} (\sum_{i = 1}^{N_{A}, h_{A}} a_{i} D_{i} y_{i} - N_{A, h_{A}, a, D} θ_{D}) \\ = \frac{N_{A, h_{A}, a, D}}{N_{A, h_{A}}} {\bar{Y}}_{A, h_{A}, a, D} - \frac{N_{A, h_{A}, a, D}}{N_{A, h_{A}}} θ_{D} \\ = \frac{N_{A, h_{A}, a, D}}{N_{A, h_{A}}} ({\bar{Y}}_{A, h_{A}, a, D} - θ_{D}) \end{matrix} .

(3)

Because we have

\begin{matrix} \sum_{i = 1}^{N_{A, h_{A}}} a_{i} D_{i} {(y_{i} - θ_{D})}^{2} = \sum_{i = 1}^{N_{A, h_{A}}} a_{i} D_{i} {(y_{i} - {\bar{Y}}_{A, h_{A}, a, D} + {\bar{Y}}_{A, h_{A}, a, D} - θ_{D})}^{2} \\ = \sum_{i = 1}^{N_{A, h_{A}}} a_{i} D_{i} {(y_{i} - {\bar{Y}}_{A, h_{A}, a, D})}^{2} N_{A, h_{A}, a, D} {({\bar{Y}}_{A, h_{A}, a, D} - θ_{D})}^{2} \end{matrix}

(4)

According to (1), (2), and (3), we have

S_{n_{1}, A, h_{A}}^{2} \approx \frac{N_{A, h_{A}, a, D}}{N_{A, h_{A}}} S_{A, h_{A}, a, D}^{2} + \frac{N_{A, h_{A}, a, D}}{N_{A, h_{A}}} {({\bar{Y}}_{A, h_{A}, a, D} - θ_{D})}^{2} - {(\frac{N_{A, h_{A}, a, D}}{N_{A, h_{A}}})}^{2} {({\bar{Y}}_{A, h_{A}, a, D} - θ_{D})}^{2} .

(5)

By using similar techniques, we have

V (T_{2}) = \frac{1}{N_{D}^{2}} \sum_{h_{A} = 1}^{H_{A}} \frac{N_{A, h_{A}}^{2}}{n_{A, h_{A}}} (1 - \frac{n_{A, h_{A}}}{N_{A, h_{A}}}) S_{n_{2}, A, h_{A}}^{2}, where n_{2 i} = a b_{i} D_{i} (y_{i} - θ_{D}),

(6)

\begin{matrix} S_{n_{2}, A, h_{A}}^{2} \approx \frac{N_{A, h_{A}, ab, D}}{N_{A, h_{A}}} S_{A, h_{A}, ab, D}^{2} + \frac{N_{A, h_{A}, ab, D}}{N_{A, h_{A}}} {({\bar{Y}}_{A, h_{A}, ab, D} - θ_{D})}^{2} \\ - {(\frac{N_{A, h_{A}, ab, D}}{N_{A, h_{A}}})}^{2} {({\bar{Y}}_{A, h_{A}, ab, D} - θ_{D})}^{2} \end{matrix}

V (T_{3}) = \frac{1}{N_{D}^{2}} \sum_{h_{B} = 1}^{H_{B}} \frac{N_{B, h_{B}}^{2}}{n_{B, h_{B}}} (1 - \frac{n_{B, h_{B}}}{N_{B, h_{B}}}) S_{n_{3}, B, h_{B}}^{2}, where n_{3 i} = b a_{i} D_{i} (y_{i} - θ_{D}), \begin{matrix} S_{n_{3}, B, h_{B}}^{2} \approx \frac{N_{B, h_{B}, ba, D}}{N_{B, h_{B}}} S_{B, h_{B}, ba, D}^{2} + \frac{N_{B, h_{B}, ba, D}}{N_{B, h_{B}}} {({\bar{Y}}_{B, h_{B}, ba, D} - θ_{D})}^{2} \\ - {(\frac{N_{B, h_{B}, ba, D}}{N_{B, h_{B}}})}^{2} {({\bar{Y}}_{B, h_{B}, ba, D} - θ_{D})}^{2} \end{matrix}

(7)

\begin{matrix} V (T_{4}) = \frac{1}{N_{D}^{2}} \sum_{h_{B} = 1}^{H_{B}} \frac{N_{B, h_{B}}^{2}}{n_{B, h_{B}}} (1 - \frac{n_{B, h_{B}}}{N_{B, h_{B}}}) S_{n_{4}, B, h_{B}}^{2}, where n_{4 i} = b_{i} D_{i} (y_{i} - θ_{D}) \end{matrix}

(8)

\begin{matrix} S_{n_{4}, B, h_{B}}^{2} \approx \frac{N_{B, h_{B}, b, D}}{N_{B, h_{B}}} S_{B, h_{B}, b, D}^{2} + \frac{N_{B, h_{B}, b, D}}{N_{B, h_{B}}} {({\bar{Y}}_{B, h_{B}, b, D} - θ_{D})}^{2} \\ - {(\frac{N_{B, h_{B}, b, D}}{N_{B, h_{B}}})}^{2} {({\bar{Y}}_{B, h_{B}, b, D} - θ_{D})}^{2} \\ c ov (T_{1}, T_{2}) = \frac{1}{N_{D}^{2}} \sum_{h_{A} = 1}^{H_{A}} \frac{N_{A, h_{A}}^{2}}{n_{A, h_{A}}} (1 - \frac{n_{A, h_{A}}}{N_{A, h_{A}}}) S_{n_{1} n_{2}, A, h_{A}}^{2} \end{matrix}

(9)

\begin{matrix} S_{n_{1} n_{2}, A, h_{A}}^{2} = \frac{1}{N_{A, h_{A}} - 1} \sum_{i = 1}^{N_{A, h_{A}}} (n_{1 i} - {\bar{n}}_{1, A, h_{A}}) (n_{2 i} - {\bar{n}}_{2, A, h_{A}}) \\ \approx - \frac{N_{A, h_{A}, a, D}}{N_{A, h_{A}}} \frac{N_{A, h_{A}, ab, D}}{N_{A, h_{A}}} ({\bar{Y}}_{A, h_{A}, a, D} - θ_{D}) ({\bar{Y}}_{A, h_{A}, ab, D} - θ_{D}) \\ c ov (T_{3}, T_{4}) = \frac{1}{N_{D}^{2}} \sum_{h_{B} = 1}^{H_{B}} \frac{N_{B, h_{B}}^{2}}{n_{B, h_{B}}} (1 - \frac{n_{B, h_{B}}}{N_{B, h_{B}}}) S_{n_{3} n_{4}, B, h_{B}}^{2} \end{matrix}

(10)

\begin{matrix} S_{n_{3} n_{4}, B, h_{B}}^{2} = \frac{1}{N_{B, h_{B}} - 1} \sum_{i = 1}^{N_{B, h_{B}}} (n_{3 i} - {\bar{n}}_{3, B, h_{B}}) (n_{4 i} - {\bar{n}}_{4, B, h_{B}}) \\ \approx - \frac{N_{B, h_{B}, b, D}}{N_{B, h_{B}}} \frac{N_{B, h_{B}, ba, b}}{N_{B, h_{B}}} ({\bar{Y}}_{B, h_{B}, b, D} - θ_{D}) ({\bar{Y}}_{B, h_{B}, ba, D} - θ_{D}) . \end{matrix}

According to (1)–(10) and after some algebra, we have

V ({\hat{θ}}_{D}) = \frac{1}{N_{D}^{2}} (\sum_{h_{A} = 1}^{H_{A}} N_{A, h_{A}}^{2} \frac{E_{A, h_{A}}}{n_{A, h_{A}}} + \sum_{h_{B} = 1}^{H_{B}} N_{B, h_{B}}^{2} \frac{E_{B, h_{B}}}{n_{B, h_{B}}}) + o (n^{- 1}),

where $E_{A, h_{A}}$ and $E_{B, h_{B}}$ are defined in section 2.

B. SKETCHED PROOF FOR TWO MINORITY POPULATIONS

We seek to minimize

\begin{matrix} C = \sum_{h_{A} = 1}^{H_{A}} n_{A, h_{A}} [P_{A, h_{A}}^{(L)} c_{A}^{⋆} + f_{A, h_{A}}^{(M)} P_{A, h_{A}}^{(M)} c_{A}^{⋆} + c_{A}^{'} (1 - P_{A, h_{A}}^{(L)} - f_{A, h_{A}}^{(M)} P_{A, h_{A}}^{(M)})] \\ + \sum_{h_{B} = 1}^{H_{B}} n_{B, h_{B}} [P_{B, h_{B}}^{(L)} c_{B}^{⋆} + f_{B, h_{B}}^{(M)} P_{B, h_{B}}^{(M)} c_{B}^{⋆} + c_{B}^{'} (1 - P_{B, h_{B}}^{(L)} - f_{B, h_{B}}^{(M)} P_{B, h_{B}}^{(M)})] \end{matrix}

subject to the constraints:

\frac{\sqrt{V ({\hat{θ}}_{D}^{(L)})}}{θ_{D}^{(L)}} \leq α_{L}, \frac{\sqrt{V ({\hat{θ}}_{D}^{(M)})}}{θ_{D}^{(M)}} \leq α_{M},

where $C_{L} = {(α_{L} θ_{D}^{(L)})}^{2} and C_{M} = {(α_{M} θ_{D}^{(M)})}^{2}$ .

\begin{matrix} V ({\hat{θ}}_{D}^{(L)}) = {(N_{D}^{(L)})}^{- 2} (\sum_{h_{A} = 1}^{H_{A}} N_{A, h_{A}}^{2} \frac{E_{A, h_{A}}^{(L)}}{n_{A, h_{A}}} + \sum_{h_{B} = 1}^{H_{B}} N_{B, h_{B}}^{2} \frac{E_{B, h_{B}}^{(L)}}{n_{B, h_{B}}}), \\ V ({\hat{θ}}_{D}^{(M)}) = {(N_{D}^{(M)})}^{- 2} (\sum_{h_{A} = 1}^{H_{B}} N_{A, h_{A}}^{2} \frac{E_{A, h_{A}}^{(M)}}{n_{A, h_{A}} f_{A, h_{A}}^{(M)}} + \sum_{h_{B} = 1}^{H_{B}} N_{B, h_{B}}^{2} \frac{E_{B, h_{B}}^{(M)}}{n_{B, h_{B}} f_{B, h_{B}}^{(M)}}) \end{matrix}

Using Lagrange multipliers, we have

\begin{matrix} F & = \sum_{h_{A} = 1}^{H_{A}} n_{A, h_{A}} [P_{A, h_{A}}^{(L)} c_{A}^{⋆} + f_{A, h_{A}}^{(M)} P_{A, h_{A}}^{(M)} c_{A}^{⋆} + c_{A}^{'} (1 - P_{A, h_{A}}^{(L)} - f_{A, h_{A}}^{(M)} P_{A, h_{A}}^{(M)})] \\ + \sum_{h_{B} = 1}^{H_{B}} n_{B, h_{B}} [P_{B, h_{B}}^{(L)} c_{B}^{⋆} + f_{B, h_{B}}^{(M)} P_{B, h_{B}}^{(M)} c_{B}^{⋆} + c_{B}^{'} (1 - P_{B, h_{B}}^{(L)} - f_{B, h_{B}}^{(M)} P_{B, h_{B}}^{(M)})] \\ + λ_{1} [V ({\hat{θ}}_{D}^{(L)}) - C_{L}] + λ_{2} [V ({\hat{θ}}_{D}^{(M)}) - C_{M}] \end{matrix} .

Taking the derivative of F with respect to $n_{A, h_{A}}$ and setting it equal to zero yields

\begin{matrix} \frac{\partial F}{\partial n_{A, h_{A}}} = P_{A, h_{A}}^{(L)} c_{A}^{⋆} + f_{A, h_{A}}^{(M)} P_{A, h_{A}}^{(M)} c_{A}^{⋆} + c_{A}^{'} (1 - P_{A, h_{A}}^{(L)} - f_{A, h_{A}}^{(M)} P_{A, h_{A}}^{(M)}) \\ + λ_{1} \frac{1}{{(N_{D}^{(L)})}^{2}} N_{A, h_{A}}^{2} E_{A, h_{A}}^{(L)} (- 1) \frac{1}{n_{A, h_{A}}^{2}} + λ_{2} \frac{1}{{(N_{D}^{(M)})}^{2}} N_{A, h_{A}}^{2} \frac{E_{A, h_{A}}^{(M)}}{f_{A, h_{A}}^{(M)}} (- 1) \frac{1}{n_{A, h_{A}}^{2}} = 0. \end{matrix}

Then

\frac{N_{A, h_{A}}^{2}}{n_{A, h_{A}}^{2}} [\frac{λ_{1} E_{A, h_{A}}^{(L)}}{{(N_{D}^{(L)})}^{2}} + \frac{λ_{2} E_{A, h_{A}}^{(M)}}{{(N_{D}^{(M)})}^{2} f_{A, h_{A}}^{(M)}}] = P_{A, h_{A}}^{(L)} c_{A}^{⋆} + f_{A, h_{A}}^{(M)} P_{A, h_{A}}^{(M)} c_{A}^{⋆} + c_{A}^{'} (1 - P_{A, h_{A}}^{(L)} - f_{A, h_{A}}^{(M)} P_{A, h_{A}}^{(M)}) .

Similarly, taking the derivative of F with respect to $n_{B, h_{B}}$ and setting it equal to zero yields

\begin{matrix} \frac{\partial F}{\partial n_{B, h_{B}}} = P_{B, h_{B}}^{(L)} c_{B}^{⋆} + f_{B, h_{B}}^{(M)} P_{B, h_{B}}^{(M)} c_{B}^{⋆} + c_{B}^{'} (1 - P_{B, h_{B}}^{(M)} - f_{B, h_{B}}^{(M)} P_{B, h_{B}}^{(M)}) \\ + λ_{1} \frac{1}{{(N_{D}^{(L)})}^{2}} N_{B, h_{B}}^{2} E_{B, h_{B}}^{(L)} (- 1) \frac{1}{n_{B, h_{B}}^{2}} + λ_{2} \frac{1}{{(N_{D}^{(M)})}^{2}} N_{B, h_{B}}^{2} \frac{E_{B, h_{B}}^{(M)}}{f_{B, h_{B}}^{(M)}} (- 1) \frac{1}{n_{B, h_{B}}^{2}} = 0 \end{matrix}

and

\begin{matrix} \frac{N_{B, h_{B}}^{2}}{n_{B, h_{B}}^{2}} [\frac{λ_{1} E_{B, h_{B}}^{(L)}}{{(N_{D}^{(L)})}^{2}} + \frac{λ_{2} E_{B, h_{B}}^{(M)}}{{(N_{D}^{(M)})}^{2} f_{B, h_{B}}^{(M)}}] \\ = P_{B, h_{B}}^{(L)} c_{B}^{⋆} + f_{B, h_{B}}^{(M)} P_{B, h_{B}}^{(M)} c_{B}^{⋆} + c_{B}^{'} (1 - P_{B, h_{B}}^{(L)} - f_{B, h_{B}}^{(M)} P_{B, h_{B}}^{(M)}) \\ \frac{\partial F}{\partial f_{A, h_{A}}^{(M)}} = n_{A, h_{A}} [P_{A, h_{A}}^{(M)} c_{A}^{⋆} - P_{A, h_{A}}^{(M)} c_{A}^{'}] + λ_{2} \frac{1}{{(N_{D}^{(M)})}^{2}} N_{A, h_{A}}^{2} \frac{E_{A, h_{A}}^{(M)}}{n_{A, h_{A}}} (- 1) \frac{1}{{(f_{A, h_{A}}^{(M)})}^{2}} = 0. \end{matrix}

Then

\begin{matrix} n_{A, h_{A}}^{2} P_{A, h_{A}}^{(M)} (c_{A}^{⋆} - c_{A}^{'}) = \frac{λ_{2} N_{A, h_{A}}^{2} E_{A, h_{A}}^{(M)}}{{(N_{D}^{(M)})}^{2} {(f_{A, h_{A}}^{(M)})}^{2}} \\ \frac{N_{A, h_{A}}^{2}}{n_{A, h_{A}}^{2}} = \frac{P_{A, h_{A}}^{(M)} (c_{A}^{⋆} - c_{A}^{'}) {(N_{D}^{(M)})}^{2} {(f_{A, h_{A}}^{(M)})}^{2}}{λ_{2} E_{A, h_{A}}^{(M)}} \end{matrix}

Similarly,

\frac{N_{B, h_{B}}^{2}}{n_{B, h_{B}}^{2}} = \frac{P_{B, h_{B}}^{(M)} (c_{B}^{⋆} - c_{B}^{'}) {(N_{D}^{(M)})}^{2} {(f_{b, h_{b}}^{(M)})}^{2}}{λ_{2} E_{B, h_{B}}^{(M)}} .

Then

\frac{P_{A, h_{A}}^{(M)} (c_{A}^{⋆} - c_{A}^{'}) {(N_{D}^{(M)})}^{2} {(f_{A, h_{A}}^{(M)})}^{2}}{λ_{2} E_{A, h_{A}}^{(M)}} \times \frac{λ_{1} E_{A, h_{A}}^{(L)}}{{(N_{D}^{(L)})}^{2}} + P_{A, h_{A}}^{(M)} (c_{A}^{⋆} - c_{A}^{'}) f_{A, h_{A}}^{(M)} = P_{A, h_{A}}^{(L)} (c_{A}^{⋆} - c_{A}^{'}) + c_{A}^{'} + f_{A, h_{A}}^{(M)} P_{A, h_{A}}^{(M)} (c_{A}^{⋆} - c_{A}^{'}) .

Then

\begin{matrix} P_{A, h_{A}}^{(M)} λ_{1} {(N_{D}^{(M)})}^{2} E_{A, h_{A}}^{(L)} {(f_{A, h_{A}}^{(M)})}^{2} {(λ_{2} E_{A, h_{A}}^{(M)} {(N_{D}^{(L)})}^{2})}^{- 1} = P_{A, h_{A}}^{(L)} + \frac{c_{A}^{'}}{c_{A}^{⋆} - c_{A}^{'}} \\ \begin{matrix} f_{A, h_{A}}^{(M)} & = \sqrt{\frac{(P_{A, h_{A}}^{(L)} + c_{A}^{'} {(c_{A}^{⋆} - c_{A}^{'})}^{- 1}) λ_{2} E_{A, h_{A}}^{(M)} {(N_{D}^{(L)})}^{2}}{P_{A, h_{A}}^{(M)} {(N_{D}^{(M)})}^{2} λ_{1} E_{A, h_{A}}^{(L)}}} \\ = \frac{N_{D}^{(L)}}{N_{D}^{(M)}} \sqrt{\frac{λ_{2} E_{A, h_{A}}^{(M)} (P_{A, h_{A}}^{(L)} + c_{A}^{'} {(c_{A}^{⋆} - c_{A}^{'})}^{- 1})}{P_{A, h_{A}}^{(M)} λ_{1} E_{A, h_{A}}^{(L)}}} \end{matrix} \end{matrix}

Similarly, we have

\begin{matrix} f_{B, h_{B}}^{(M)} = \frac{N_{D}^{(L)}}{N_{D}^{(M)}} \sqrt{\frac{λ_{2} E_{B, h_{B}}^{(M)} (P_{B, h_{B}}^{(L)} + c_{B}^{'} {(c_{B}^{⋆} - c_{B}^{'})}^{- 1})}{P_{B, h_{B}}^{(M)} λ_{1} E_{B, h_{B}}^{(L)}}} \\ \begin{matrix} n_{A, h_{A}} & = \sqrt{\frac{λ_{2} N_{A, h_{A}}^{2} E_{A, h_{A}}^{(M)}}{{(N_{D}^{(M)})}^{2} P_{A, h_{A}}^{(M)} (c_{A}^{⋆} - c_{A}^{'})} \times \frac{P_{A, h_{A}}^{(M)} {(N_{D}^{(M)})}^{2} λ_{1} E_{A, h_{A}}^{(L)}}{(P_{A, h_{A}}^{(L)} + c_{A}^{'} {(c_{A}^{⋆} - c_{A}^{'})}^{- 1}) λ_{2} E_{A, h_{A}}^{(M)} {(N_{D}^{(L)})}^{2}}} \\ = \sqrt{\frac{N_{A, h_{A}}^{2} λ_{1} E_{A, h_{A}}^{(L)}}{[(c_{A}^{⋆} - c_{A}^{'}) P_{A, h_{A}}^{(L)} + c_{A}^{'}] {(N_{D}^{(L)})}^{2}}} \end{matrix} \end{matrix}

n_{A, h_{A}} = \frac{N_{A, h_{A}}}{N_{D}^{(L)}} \sqrt{\frac{λ_{1} E_{A, h_{A}}^{(L)}}{(c_{A}^{⋆} - c_{A}^{'}) P_{A, h_{A}}^{(L)} + c_{A}^{'}}}

and

n_{B, h_{B}} = \frac{N_{B, h_{B}}}{N_{D}^{(L)}} \sqrt{\frac{λ_{1} E_{B, h_{B}}^{(L)}}{(c_{B}^{⋆} - c_{B}^{'}) P_{B, h_{B}}^{(L)} + c_{B}^{'}}} .

Then $λ_{1}$ and λ₂ can be obtained by solving

\begin{matrix} \begin{matrix} V ({\hat{θ}}_{D}^{(L)}) = {(N_{D}^{(L)})}^{- 2} (\sum_{h_{A} = 1}^{H_{A}} N_{A, h_{A}}^{2} \frac{E_{A, h_{A}}^{(L)}}{n_{A, h_{A}}} + \sum_{h_{B} = 1}^{H_{B}} N_{B, h_{B}}^{2} \frac{E_{B, h_{B}}^{(L)}}{n_{B, h_{B}}}) = C_{L} \\ and & V ({\hat{θ}}_{D}^{(M)}) = {(N_{D}^{(M)})}^{- 2} (\sum_{h_{A} = 1}^{H_{A}} N_{A, h_{A}}^{2} \frac{E_{A, h_{A}}^{(M)}}{n_{A, h_{A}} f_{A, h_{A}}^{(M)}} + \sum_{h_{B} = 1}^{H_{B}} N_{B, h_{B}}^{2} \frac{E_{B, h_{B}}^{(M)}}{n_{B, h_{B}} f_{B, h_{B}}^{(M)}}) = C_{M} \end{matrix} . \\ C_{L} = {(θ_{D}^{(L)} α_{L})}^{2}, C_{M} = {(θ_{D}^{(M)} α_{M})}^{2} \end{matrix}

If $f_{A, h_{A}}^{(M)} \geq 1$ , then set $f_{A, h_{A}}^{(M)} = 1$ .

\begin{matrix} \frac{N_{A, h_{A}}^{2}}{n_{A, h_{A}}^{2}} \times [\frac{λ_{1} E_{A, h_{A}}^{(L)}}{{(N_{D}^{(L)})}^{2}} + \frac{λ_{2} E_{A, h_{A}}^{(M)}}{{(N_{D}^{(M)})}^{2}}] = P_{A, h_{A}}^{(L)} c_{A}^{⋆} + P_{A, h_{A}}^{(M)} c_{A}^{⋆} + c_{A}^{'} (1 - P_{A, h_{A}}^{(L)} - P_{A, h_{A}}^{(M)}) \\ = (P_{A, h_{A}}^{(L)} + P_{A, h_{A}}^{(M)}) (c_{A}^{⋆} - c_{A}^{'}) + c_{A}^{'} \\ n_{A, h_{A}} = \sqrt{\frac{N_{A, h_{A}}^{2} (λ_{1} E_{A, h_{A}}^{(L)} {(N_{D}^{(L)})}^{- 2} + λ_{2} E_{A, h_{A}}^{(M)} {(N_{D}^{(M)})}^{- 2})}{(P_{A, h_{A}}^{(L)} + P_{A, h_{A}}^{(M)}) (c_{A}^{⋆} - c_{A}^{'}) + c_{A}^{'}}} \end{matrix}

Similarly, if $f_{B, h_{B}}^{(M)} \geq 1$ , then set $f_{B, h_{B}}^{(M)} = 1$ .

Then, we have

n_{B, h_{B}} = \sqrt{\frac{N_{B, h_{B}}^{2} (λ_{1} E_{B, h_{B}}^{(L)} {(N_{D}^{(L)})}^{- 2} + λ_{2} E_{B, h_{B}}^{(M)} {(N_{D}^{(M)})}^{- 2})}{(P_{B, h_{B}}^{(L)} + P_{B, h_{B}}^{(M)}) (c_{B}^{⋆} - c_{B}^{'}) + c_{B}^{'}}} .

We thank a referee for his/her useful comments and suggestions, which improved the quality of this article. Data were collected by the Sooner Survey Center as part of the evaluation of the Tobacco Settlement Endowment Trust’s statewide Healthy Living Program (PI Rebekah Rhoades). The research of Drs. Sixia Chen and Alexander Stubblefield is supported by a Presbyterian Health Foundation seed grant number is C5101401 / ORA # 20171573; Support for Drs. Chen and Stoner was provided through National Institutes of Health, National Institute of General Medical Sciences (Grant 2U54GM104938-06, PI Judith James) and National Institutes of Health, National Institute on Minority Health and Health Disparities (Grant 1R25MD011564, PI Julie Stoner/Courtney Houchen).

References

Bankier M. D. (1986), “ Estimators Based on Several Stratified Samples with Applications to Multiple Frame Surveys,” Journal of the American Statistical Association, 81, 1074–1079. [Google Scholar]
Chang L., Krosnick J. A. (2009), “ National Surveys via RDD Telephone Interviewing versus the Internet,” Public Opinion Quarterly, 73, 641–678. [Google Scholar]
Chen S., Kalton G. (2015), “ Geographic Oversampling for Race/Ethnicity Using Data from the 2010 US population Census,” Journal of Survey Statistics and Methodology, 3, 543–565. [Google Scholar]
Curtin L. R., Mohadjer L. K., Dohrmann S. M., Kruszon-Moran D., Mirel L. B., Carroll M. D., Hirsch R., Burt V. L., Johnson C. L. (2013), National Health and Nutrition Examination Survey: Sample Design, 2007-2010, Washington, DC: US Government Printing Office. [PubMed] [Google Scholar]
Dalenius T. (1957), Sampling in Sweden, Stockholm: Almquist and Wicksell. [Google Scholar]
Elliot M. N., Finch B. K., Klein D., Sai M., Phuong Do D., Beckett M. K., Orr N., Lurie N. (2008), “ Sample Designs for Measuring the Health of Small Racial/Ethnic Subgroups,” Statistics in Medicine, 27, 4016–4029. [DOI] [PubMed] [Google Scholar]
Espey D. K., Jim M. A., Cobb N., Bartholomew M., Becker T., Haverkamp D., Plescia M. (2014), “ Leading Causes of Death and All-Cause Mortality in American Indians and Alaska Natives,” American Journal of Public Health, 104, S303–S311. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fahimi M., Judkins D. (1991), PSU Probabilities Given Differential Sampling at Second Stage. Proceedings of the Section on Survey Research Methods, pp. 538–543, American Statistical Association. [Google Scholar]
Fuller W. A., Burmeister L. F. (1972), “Estimators for Samples Selected from Two Overlapping Frames,” Proceedings of the Social Statistics Section, American Statistical Association, pp. 245–249.
Hartley H. O. (1962), “Multiple Frame Surveys,” Proceedings of the Social Statistics Section, American Statistical Association, pp. 203–206.
Hartley H. O. (1974), “ Multiple Frame Methodology and Selected Applications,” Sankhyā, 36, 99–118. [Google Scholar]
Helba C., Love C., Wivagg J., Frissell K., Lee K. C., Whitwell C. (2015), “Estimating the Need for Treatment for Substance Use Disorders among Minnesota Adults: Results of the 2014/2015 Minnesota Survey on Adult Substance Use, available at https://mn.gov/dhs/partners-and-providers/news-initiatives-reports-workgroups/alcohol-drug-other-addictions/mn-adult-su-survey/, unpublished report.
Jewett R. S., Judkins D. R. (1988), “ Multivaraite Stratification with Size Constraints,” SIAM Journal on Scientific and Statistical Computing, 9, 1091–1097. [Google Scholar]
Judkins D. R., Brick J. M., Broene P., Ferraro D., Strickler T. (2001), 1999 NSAF Sample Design: Report No. 2, Washington, DC: Urban Institute Press. [Google Scholar]
Kalsbeek W. D. (2003), “ Sampling Minority Groups in Health Surveys,” Statistics in Medicine, 22, 1527–1549. [DOI] [PubMed] [Google Scholar]
Kalton G. (2009), “ Methods for Oversampling Rare Subpopulations in Social Surveys,” Survey Methodology, 35, 125–141. [Google Scholar]
Kalton G., Anderson D. (1986), “ Sampling Rare Populations,” Journal of the Royal Statistical Society, 149, 65–82. [Google Scholar]
Lohr S. L. (2011), “ Alternative Survey Sample Designs: Sampling with Multiple Overlapping Frames,” Survey Methodology, 37, 197–213. [Google Scholar]
Lohr S. L., Rao J. N. K. (2000), “Inference from Dual Frame Surveys,” Journal of the American Statistical Association, 95, 271–280. [Google Scholar]
Marcus A. C., Crane L. A. (1986), “Telephone Surveys in Public Health Research,” Medical Care, 24, 97–112. [DOI] [PubMed] [Google Scholar]
Mohadjer L., Krenzke T. (2009), “Sample Design,” in Technical Report and Data File User’s Manual for the 2003 National Assessment of Adult Literacy, eds. Baldi S. and US National Center for Education Statistics, Washington, DC: US Government Printing Office. [Google Scholar]
Mowery P. D., Dube S. R., Thorne S. L., Garrett B. E., Homa D. M., Nez-Henderson P. (2015), “Disparities in Smoking-Related Mortality among American Indians/Alaska Natives,” American Journal of Preventive Medicine, 49, 738–744. [DOI] [PubMed] [Google Scholar]
Rao J. N. K., Wu C. (2010), “ Pseudo-Empirical Likelihood Inference for Multiple Frame Surveys,” Journal of the American Statistical Association, 105, 1494–1503. [Google Scholar]
Skinner C. J., Rao J. N. K. (1996), “ Estimation in Dual Frame Surveys with Complex Desings,” Journal of the American Statistical Association, 91, 349–356. [Google Scholar]
Szolnoki G., Hoffmann D. (2013), “ Online, Face-to-Face and Telephone Surveys- Comparing Different Sampling Methods in Wine Consumer Research,” Wine Economics and Policy, 2, 57–66. [Google Scholar]
Tourangeau R., Edwards B., Johnson T. P., Wolter K. M., Bates N. (2014), Hard-to-Survey Populations, Cambridge University Press. [Google Scholar]
Waksberg J., Judkins D., Massey J. T. (1997), “ Geographic-Based Oversampling in Demographic Surveys of the United States,” Survey Methodology, 23, 61–71. [Google Scholar]
Wolter K. M., Tao X., Montgomery R., Smith P. J. (2015), “ Optimum Allocation for a Dual-Frame Telephone Survey,” Survey Methodology, 41, 389–401. [PMC free article] [PubMed] [Google Scholar]

[smz054-B1] Bankier M. D. (1986), “ Estimators Based on Several Stratified Samples with Applications to Multiple Frame Surveys,” Journal of the American Statistical Association, 81, 1074–1079. [Google Scholar]

[smz054-B2] Chang L., Krosnick J. A. (2009), “ National Surveys via RDD Telephone Interviewing versus the Internet,” Public Opinion Quarterly, 73, 641–678. [Google Scholar]

[smz054-B3] Chen S., Kalton G. (2015), “ Geographic Oversampling for Race/Ethnicity Using Data from the 2010 US population Census,” Journal of Survey Statistics and Methodology, 3, 543–565. [Google Scholar]

[smz054-B4] Curtin L. R., Mohadjer L. K., Dohrmann S. M., Kruszon-Moran D., Mirel L. B., Carroll M. D., Hirsch R., Burt V. L., Johnson C. L. (2013), National Health and Nutrition Examination Survey: Sample Design, 2007-2010, Washington, DC: US Government Printing Office. [PubMed] [Google Scholar]

[smz054-B5] Dalenius T. (1957), Sampling in Sweden, Stockholm: Almquist and Wicksell. [Google Scholar]

[smz054-B6] Elliot M. N., Finch B. K., Klein D., Sai M., Phuong Do D., Beckett M. K., Orr N., Lurie N. (2008), “ Sample Designs for Measuring the Health of Small Racial/Ethnic Subgroups,” Statistics in Medicine, 27, 4016–4029. [DOI] [PubMed] [Google Scholar]

[smz054-B7] Espey D. K., Jim M. A., Cobb N., Bartholomew M., Becker T., Haverkamp D., Plescia M. (2014), “ Leading Causes of Death and All-Cause Mortality in American Indians and Alaska Natives,” American Journal of Public Health, 104, S303–S311. [DOI] [PMC free article] [PubMed] [Google Scholar]

[smz054-B8] Fahimi M., Judkins D. (1991), PSU Probabilities Given Differential Sampling at Second Stage. Proceedings of the Section on Survey Research Methods, pp. 538–543, American Statistical Association. [Google Scholar]

[smz054-B9] Fuller W. A., Burmeister L. F. (1972), “Estimators for Samples Selected from Two Overlapping Frames,” Proceedings of the Social Statistics Section, American Statistical Association, pp. 245–249.

[smz054-B10] Hartley H. O. (1962), “Multiple Frame Surveys,” Proceedings of the Social Statistics Section, American Statistical Association, pp. 203–206.

[smz054-B11] Hartley H. O. (1974), “ Multiple Frame Methodology and Selected Applications,” Sankhyā, 36, 99–118. [Google Scholar]

[smz054-B12] Helba C., Love C., Wivagg J., Frissell K., Lee K. C., Whitwell C. (2015), “Estimating the Need for Treatment for Substance Use Disorders among Minnesota Adults: Results of the 2014/2015 Minnesota Survey on Adult Substance Use, available at https://mn.gov/dhs/partners-and-providers/news-initiatives-reports-workgroups/alcohol-drug-other-addictions/mn-adult-su-survey/, unpublished report.

[smz054-B13] Jewett R. S., Judkins D. R. (1988), “ Multivaraite Stratification with Size Constraints,” SIAM Journal on Scientific and Statistical Computing, 9, 1091–1097. [Google Scholar]

[smz054-B14] Judkins D. R., Brick J. M., Broene P., Ferraro D., Strickler T. (2001), 1999 NSAF Sample Design: Report No. 2, Washington, DC: Urban Institute Press. [Google Scholar]

[smz054-B15] Kalsbeek W. D. (2003), “ Sampling Minority Groups in Health Surveys,” Statistics in Medicine, 22, 1527–1549. [DOI] [PubMed] [Google Scholar]

[smz054-B16] Kalton G. (2009), “ Methods for Oversampling Rare Subpopulations in Social Surveys,” Survey Methodology, 35, 125–141. [Google Scholar]

[smz054-B17] Kalton G., Anderson D. (1986), “ Sampling Rare Populations,” Journal of the Royal Statistical Society, 149, 65–82. [Google Scholar]

[smz054-B18] Lohr S. L. (2011), “ Alternative Survey Sample Designs: Sampling with Multiple Overlapping Frames,” Survey Methodology, 37, 197–213. [Google Scholar]

[smz054-B19] Lohr S. L., Rao J. N. K. (2000), “Inference from Dual Frame Surveys,” Journal of the American Statistical Association, 95, 271–280. [Google Scholar]

[smz054-B20] Marcus A. C., Crane L. A. (1986), “Telephone Surveys in Public Health Research,” Medical Care, 24, 97–112. [DOI] [PubMed] [Google Scholar]

[smz054-B21] Mohadjer L., Krenzke T. (2009), “Sample Design,” in Technical Report and Data File User’s Manual for the 2003 National Assessment of Adult Literacy, eds. Baldi S. and US National Center for Education Statistics, Washington, DC: US Government Printing Office. [Google Scholar]

[smz054-B22] Mowery P. D., Dube S. R., Thorne S. L., Garrett B. E., Homa D. M., Nez-Henderson P. (2015), “Disparities in Smoking-Related Mortality among American Indians/Alaska Natives,” American Journal of Preventive Medicine, 49, 738–744. [DOI] [PubMed] [Google Scholar]

[smz054-B23] Rao J. N. K., Wu C. (2010), “ Pseudo-Empirical Likelihood Inference for Multiple Frame Surveys,” Journal of the American Statistical Association, 105, 1494–1503. [Google Scholar]

[smz054-B24] Skinner C. J., Rao J. N. K. (1996), “ Estimation in Dual Frame Surveys with Complex Desings,” Journal of the American Statistical Association, 91, 349–356. [Google Scholar]

[smz054-B25] Szolnoki G., Hoffmann D. (2013), “ Online, Face-to-Face and Telephone Surveys- Comparing Different Sampling Methods in Wine Consumer Research,” Wine Economics and Policy, 2, 57–66. [Google Scholar]

[smz054-B26] Tourangeau R., Edwards B., Johnson T. P., Wolter K. M., Bates N. (2014), Hard-to-Survey Populations, Cambridge University Press. [Google Scholar]

[smz054-B27] Waksberg J., Judkins D., Massey J. T. (1997), “ Geographic-Based Oversampling in Demographic Surveys of the United States,” Survey Methodology, 23, 61–71. [Google Scholar]

[smz054-B28] Wolter K. M., Tao X., Montgomery R., Smith P. J. (2015), “ Optimum Allocation for a Dual-Frame Telephone Survey,” Survey Methodology, 41, 389–401. [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Oversampling of Minority Populations Through Dual-Frame Surveys

Sixia Chen

Alexander Stubblefield

Julie A Stoner

Abstract

1. INTRODUCTION

2. NOTATIONS

Table 1.

3. PROPOSED METHOD

4. DESIGN EFFECT DUE TO STRATIFICATION

5. EXTENSION TO OVERSAMPLE MULTIPLE MINORITIES

6. REAL APPLICATION

6.1 Oversampling a Single Minority Population

Table 2.

Table 3.

6.2 Oversampling Multiple Minority Populations

Table 4.

7. CONCLUDING REMARKS

Appendix

A. SKETCHED PROOF OF (2)

B. SKETCHED PROOF FOR TWO MINORITY POPULATIONS

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Oversampling of Minority Populations Through Dual-Frame Surveys

Sixia Chen

Alexander Stubblefield

Julie A Stoner

Abstract

1. INTRODUCTION

2. NOTATIONS

Table 1.

3. PROPOSED METHOD

4. DESIGN EFFECT DUE TO STRATIFICATION

5. EXTENSION TO OVERSAMPLE MULTIPLE MINORITIES

6. REAL APPLICATION

6.1 Oversampling a Single Minority Population

Table 2.

Table 3.

6.2 Oversampling Multiple Minority Populations

Table 4.

7. CONCLUDING REMARKS

Appendix

A. SKETCHED PROOF OF (2)

B. SKETCHED PROOF FOR TWO MINORITY POPULATIONS

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases