Improved estimation of population variance in stratified successive sampling using calibrated weights under non-response

MK Pandey; GN Singh; Tolga Zaman; Aned Al Mutairi; Manahil SidAhmed Mustafa

doi:10.1016/j.heliyon.2024.e27738

. 2024 Mar 12;10(6):e27738. doi: 10.1016/j.heliyon.2024.e27738

Improved estimation of population variance in stratified successive sampling using calibrated weights under non-response

MK Pandey ^a,^⁎, GN Singh ^a, Tolga Zaman ^b, Aned Al Mutairi ^c, Manahil SidAhmed Mustafa ^d

PMCID: PMC10965521 PMID: 38545218

Abstract

This paper introduces a new method to estimate the population variance of a study variable in stratified successive sampling over two occasions, while accounting for random non-response. The method uses a logarithmic type estimator and leverages information from a highly positively correlated auxiliary variable. The paper also presents calibrated weights for the new estimator and examines its properties through numerical and simulation studies. The results indicate that the suggested estimator is more effective than the standard estimator for estimating the population variance.

MSC: 62D05

Keywords: Stratified sampling, Successive sampling, Auxiliary information, Bias, Mean square error, Calibration technique, Random non-response

1. Introduction

Stratified successive sampling is a statistical method employed to estimate population variance in situations where obtaining a random sample from the population proves to be challenging or cost-prohibitive. This technique involves dividing the population into distinct strata and selecting individuals randomly from within these strata. Successive sampling proves particularly advantageous in socio-economic research, where populations may be widely dispersed or access may be limited. The concept of stratified sampling for estimating population parameters, including population variance, was first introduced by [16]. Subsequently, [14] extended this method for estimating population variance in stratified successive sampling, while [19] and [25] explored population variance estimation in stratified successive sampling scenarios with unequal probabilities while [15] did the same with equal probability. [23] developed an effectual variance estimation strategy for two-occasion successive sampling, accounting for random non-response, whereas [27] worked on a family of robust-type estimators for population variance in simple and stratified random sampling. [1] improved the estimation of finite population variance using dual supplementary information under stratified random sampling, and [6] introduced efficient classes of estimators of population variance in two-phase successive sampling scenarios with random non-response. Moreover, [5] and [4] contributed to the field by developing memory-type ratio and product estimators under ranked-based sampling schemes, among others provide a comprehensive foundation for the utilization of stratified successive sampling and calibration approaches, enhancing the proposed method's accuracy and applicability in estimating population variance, especially under non-response.

[7] introduced a calibration approach using least squares adjustment, a method later adopted by statistical authorities in various organizations. The primary aim of the calibration approach is to formulate unbiased estimation procedures with minimal dispersion by leveraging information from auxiliary variables. [8] proposed a calibration estimation procedure that reduces the discrepancy between initial and final weights while still adhering to calibration equations and constraints. [10] explored model-assisted higher-order calibration of variance estimators, and [20] applied the calibration approach in survey theory and practice. [12] and [11] contributed to calibration approach estimators in stratified sampling. [26] introduced a calibration approach-based regression-type estimator to model the inverse relationship between study and auxiliary variables, and [13] applied calibration weighting in stratified random sampling. [17] introduced a new calibration estimator specifically designed for stratified sampling, and [9] developed calibration estimation for ratio estimators in stratified sampling with proportion allocation. [22] extended the calibration approach to estimate population variance in stratified successive sampling while accounting for random non-response. Recently, [2] conducted work on estimating the calibration of the mean by employing a double use of auxiliary information. Additionally, [3] focused on determining optimal calibrated weights while minimizing a variance function. In a related study, [21] explored L-moments and calibration-based variance estimators under a double stratified random sampling scheme, with a specific focus on their application during the COVID-19 pandemic. Moreover, [18] contributed to the field by developing a general class of improved population variance estimators under non-sampling errors using calibrated weights in stratified sampling.

Calibration finds its place as a pivotal technique in the estimation of population variance through stratified successive sampling. This approach enriches estimation accuracy by imbuing resulting estimates with greater population representativeness and reduced bias. This significance is particularly pronounced in stratified successive sampling, where the goal centers on ensuring representation of each stratum within the final estimate. Additionally, this proposed estimation strategy finds application in domains as diverse as the estimation of variance in gas turbine exhaust pressure, as evidenced by real-data illustration in later sections of this manuscript.

2. Motivation of the study

The dynamic nature of our world entails constant variation, with implications across various realms. In medicine, variations in body temperature, blood pressure, and pulse rate hold diagnostic significance. Similarly, diverse consumer responses to products drive pricing and quality decisions, while variations in climatic factors guide agricultural planning.

The estimation of population variance assumes a pivotal role in diverse fields, ranging from socio-economic studies to engineering applications. Stratified successive sampling offers a strategic approach, particularly when random sampling from the entire population is intricate. However, the presence of non-response poses a challenge, potentially introducing bias.

In this context, this research article proposed a novel solution. We have introduced a logarithmic-type estimator that incorporates information from a highly correlated auxiliary variable. Moreover, calibrated weights are integrated into the estimation process to counteract the impact of non-response bias.

The motivation for this study emanates from various factors, including:

Practical Relevance: Stratified successive sampling is commonly used in scenarios where populations are geographically dispersed or hard to access due to non-homogeneity. However, non-response can distort estimates, making it essential to account for this bias through calibrated weights.

Calibration Approach: We use the calibration technique to improve the accuracy of the estimates. By incorporating calibration into the estimation process, resulting estimates are more representative of the population and have reduced bias. Leveraging auxiliary variables to enhance estimation accuracy has shown effectiveness. Applying this calibration technique to stratified successive sampling aims to yield more representative and unbiased population variance estimates.

Engineering Applications: The proposed methodology holds practical significance in fields like engineering, particularly in quality control and process improvement. Precise population variance estimation is vital for assessing manufacturing process variability, enabling informed decisions for process optimization and quality control strategies.

Socio-Economic Research: In socio-economic research, understanding the relationship between variables like education and income is pivotal for informed decision-making. By enhancing population variance estimation, the proposed method provides deeper insights into these relationships, guiding policy and strategy decisions.

Unique Challenges Addressed: The use of logarithmic-type estimators underscores a focus on addressing rare challenges, such as estimating diseases or socio-economic issues requiring specialized techniques. This approach showcases our dedication to tackling complex issues with precision.

In conclusion, the motivation behind this research lies in the imperative to enhance population variance estimation accuracy in stratified successive sampling, particularly by addressing non-response challenges. Through the utilization of calibrated weights and auxiliary information, we aspire to provide more resilient and effective estimation techniques, applicable across diverse fields from engineering to socio-economic research. By obtaining more accurate estimates of population variance, one can better understand the relationship between education level and income in the studied population. This understanding may hold important implications for policy decisions related to education and income inequality, as well as for businesses and organizations interested in targeting college-educated individuals for employment or marketing purposes.

3. Sample structure

Consider a finite population of size N divided into L non-overlapping strata, each containing $N_{k}$ (k=1,2,..., L) units. Let us use X and Y to represent the study character on the first and second occasions. It is assumed that information regarding an auxiliary variable Z is accessible on both occasions, and the population variance of Z is known.

Let us consider the $k^{th}$ strata, where k ranges from 1 to L. To begin with, we use simple random sampling without replacement (SRSWOR) to draw a preliminary sample of size $n_{k}$ from the population for the first occasion, where $r_{1 k}$ units do not respond. From the responding part of this sample, we draw a second stage SRSWOR sample of size $m_{k} = n_{k} λ_{k}^{″}$ , where $λ_{k}^{″}$ is the fraction of matched samples, and $r_{2 k}$ units do not respond. We use this sample for the second occasion and collect information on the study variable Y. Additionally, we draw a fresh sample of size $u_{k} = n_{k} - m_{k} = n_{k} μ_{k}^{″}$ from the population using SRSWOR on Y again. Here, $r_{3 k}$ units do not respond. The fractions of matched and fresh samples on the current (second) occasion are represented by $λ_{k}^{″}$ and $μ_{k}^{″}$ , respectively, where $λ_{k}^{″} + μ_{k}^{″}$ = 1.

3.1. Notations

From now on, we will use the following notations:

$σ_{Y}^{2}$ : The population variance of Y, i.e., the characteristics under study.

${\bar{Z}}_{N_{k}} = \frac{1}{N_{k}} \sum_{l = 1}^{N_{k}} Z_{kl}$ : The population mean of Z in the $k^{th}$ strata.

$\bar{Z} = \frac{1}{N} \sum_{l = 1}^{N_{k}} N_{k} {\bar{Z}}_{N_{k}}$ : The population mean of Z.

${\bar{x}}_{n_{k} - r_{1k}} = \frac{1}{n_{k} - r_{1k}} \sum_{l = 1}^{n_{k} - r_{1k}} x_{kl}, {\bar{x}}_{m_{k} - r_{2k}} = \frac{1}{m_{k - r_{2k}}} \sum_{l = 1}^{m_{k} - r_{2k}} x_{kl}$ : The sample means of the study variable X in the $k^{th}$ strata based on the responding part of samples of sizes $n_{k}$ and $m_{k}$ , respectively.

${\bar{z}}_{n_{k}} = \frac{1}{n_{k}} \sum_{l = 1}^{n_{k}} z_{kl}, {\bar{z}}_{u_{k}} = \frac{1}{u_{k}} \sum_{l = 1}^{u_{k}} z_{kl}$ : The sample means of the auxiliary variable Z in the $k^{th}$ strata based on samples of sizes $n_{k}$ and $u_{k}$ , respectively.

$S_{Y_{N_{k}}}^{2}$ = $\frac{1}{N_{k} - 1} \sum_{l = 1}^{N_{k}} {(Y_{kl} - {\bar{Y}}_{N_{k}})}^{2}$ , $S_{X_{N_{k}}}^{2}$ = $\frac{1}{N_{k} - 1} \sum_{l = 1}^{N_{k}} {(X_{kl} - {\bar{X}}_{N_{k}})}^{2}$ : The population mean squares of the $k^{th}$ stratum of the study variables Y and X, respectively.

$S_{Z_{N_{k}}}^{2}$ = $\frac{1}{N_{k} - 1} \sum_{l = 1}^{N_{k}} {(Z_{kl} - {\bar{Z}}_{N_{k}})}^{2}$ : The population mean squares of the $k^{th}$ stratum for the auxiliary variable Z.

$s_{x_{n_{k}} - r_{1k}}^{⁎^{2}} = \frac{1}{n_{k} - r_{1k} - 1} \sum_{l = 1}^{n_{k} - r_{1k}} {(x_{kl} - {\bar{x}}_{n_{k} - r_{1k}})}^{2}$ : Depending on the responding part of sample of size $n_{k}$ , the sample mean square of study variable X for the $k^{th}$ stratum.

$s_{x_{m_{k}} - r_{2k}}^{⁎^{2}} = \frac{1}{m_{k} - r_{2k} - 1} \sum_{l = 1}^{m_{k} - r_{2k}} {(x_{kl} - {\bar{x}}_{u_{k} - r_{2k}})}^{2}$ : Depending on the responding part of sample of size $m_{k}$ , the sample mean square of study variable X for the $k^{th}$ stratum.

$s_{y_{n_{k} - r_{1k}}}^{⁎^{2}} = \frac{1}{n_{k} - r_{1k} - 1} \sum_{l = 1}^{n_{k} - r_{1k}} {(y_{kl} - {\bar{y}}_{n_{k} - r_{1k}})}^{2}$ : Depending on the responding part of sample of size $n_{k}$ , the sample mean square of study variable Y for the $k^{th}$ stratum.

$s_{y_{u_{k} - r_{3k}}}^{⁎^{2}} = \frac{1}{u_{k} - r_{3k} - 1} \sum_{l = 1}^{u_{k} - r_{3k}} {(y_{kl} - {\bar{y}}_{u_{k} - r_{3k}})}^{2}$ : Depending on the responding part of sample of size $u_{k}$ , the sample mean square of study variable Y for the $k^{th}$ stratum.

$s_{z_{n_{k}}}^{⁎^{2}} = \frac{1}{n_{k} - 1} \sum_{l = 1}^{n_{k}} {(z_{kl} - {\bar{z}}_{n_{k}})}^{2}$ : Depending on the sample of size $n_{k}$ , the sample mean square of auxiliary variable Z for the $k^{th}$ stratum.

$s_{z_{m_{k}}}^{⁎^{2}} = \frac{1}{m_{k - 1}} \sum_{l = 1}^{m_{k}} {(z_{kl} - {\bar{z}}_{m_{k}})}^{2}$ : Depending on the sample of size $m_{k}$ , the sample mean square of auxiliary variable Z for the $k^{th}$ stratum.

$s_{z_{u_{k}}}^{⁎^{2}} = \frac{1}{u_{k} - 1} \sum_{l = 1}^{u_{k}} {(z_{kl} - {\bar{z}}_{u_{k}})}^{2}$ : Depending on the sample of size $m_{k}$ , the sample mean square of auxiliary variable Z for the $k^{th}$ stratum.

$W_{k}$ = $\frac{N_{k}}{N}$ : The original weight of the $k^{th}$ stratum, k= 1, 2,...,L

$Ω_{k}^{⁎}$ : The calibrated weight of the $k^{th}$ stratum, k= 1, 2,...,L based on the sample of size $m_{k}$

$Ω_{k}^{⁎ ⁎}$ : The calibrated weight of the $k^{th}$ stratum, k= 1, 2,...,L based on the sample of size $u_{k}$

$Q_{k}$ : The independent weight of the $k^{th}$ stratum, k= 1, 2,...,L.

4. Non-response probability model

The $k^{th}$ stratum is considered using [24] random non-response model. Consider a sample $S_{n_{k}}$ of size $n_{k}$ for which some data on X could not be collected due to random non-response. Let $r_{1 k}$ , where $r_{1 k}$ ranges from 0, 1, 2,...,( $n_{k}$ - 2), represent the number of such cases in $S_{n_{k}}$ . Similarly, for a sample $S_{m_{k}}$ of size $m_{k}$ , let $r_{2 k}$ represent the number of units for which Y information on the second occasion could not be acquired due to random non-response, and $r_{3 k}$ represent the same for a sample of size $u_{k}$ . It is presumed that $r_{1 k}$ , $r_{2 k}$ , and $r_{3 k}$ fall within their respective bounds. We assume that $0 \leq r_{1 k} \leq (n_{k} - 2)$ , $0 \leq r_{2 k} \leq (m_{k} - 2)$ and $0 \leq r_{3 k} \leq (u_{k} - 2)$ . If $p_{1}$ , $p_{2}$ , and $p_{3}$ probabilities of non-response among the ( $n_{k}$ -2), ( $m_{k}$ -2), and ( $u_{k}$ -2) possible values of non-responses respectively, the discrete probability distributions for $r_{1 k}$ , $r_{2 k}$ , and $r_{3 k}$ are represented by

P (r_{1 k}) = \frac{n_{k} - r_{1 k}}{n_{k} q_{1} + 2 p_{1}} (\begin{matrix} n_{k} - 2 \\ r_{1 k} \end{matrix}) p_{1}^{r_{1 k}} q_{1}^{n_{k} - r_{1 k} - 2}; r_{1 k} = 0, 1, 2, . . ., n_{k} - 2, P (r_{2 k}) = \frac{m_{k} - r_{2 k}}{n_{k} q_{2} + 2 p_{2}} (\begin{matrix} m_{k} - 2 \\ r_{2 k} \end{matrix}) p_{2}^{r_{2 k}} q_{2}^{m_{k} - r_{2 k} - 2}; r_{2 k} = 0, 1, 2, . . ., m_{k} - 2

and

P (r_{3 k}) = \frac{u_{k} - r_{3 k}}{n_{k} q_{3} + 2 p_{3}} (\begin{matrix} u_{k} - 2 \\ r_{3 k} \end{matrix}) p_{3}^{r_{3 k}} q_{2}^{u_{k} - r_{3 k} - 2}; r_{3 k} = 0, 1, 2, . . ., u_{k} - 2, respectively .

The number of ways to obtain $r_{l k}$ (l = 1, 2, 3) non-responses from all potential non-response values for the three samples are represented by $(\begin{matrix} n_{k} - 2 \\ r_{1 k} \end{matrix})$ , $(\begin{matrix} m_{k} - 2 \\ r_{2 k} \end{matrix})$ , and $(\begin{matrix} u_{k} - 2 \\ r_{3 k} \end{matrix})$ .

5. Proposed estimator

We have developed a set of estimators to estimate the population mean square $S_{Y_{N_{k}}}^{2}$ of the $k^{th}$ stratum, where k may range from 1 to L. These estimators are based on two samples: $S_{m_{k}}$ , which is a common sample of size $m_{k}$ collected on previous occasions, and $S_{u_{k}}$ , which is a fresh sample of size $u_{k}$ collected on the current occasion. We refer to the estimators based on these samples as $T_{m_{k}}$ and $T_{u_{k}}$ , respectively.

T_{m_{k}} = s_{y_{n_{k}} - r_{1 k}}^{2} + a_{k} \log [1 + | s_{x_{n_{k}}}^{⁎ 2} - s_{x_{m_{k}}}^{⁎ 2} |]

(1)

T_{u_{k}} = s_{u_{k} - r_{3 k}}^{2} + b_{k} \log [1 + | 1 - \frac{s_{z_{u_{k}}}^{⁎ 2}}{S_{Z_{N_{k}}}^{2}} |]

(2)

where

s_{x_{n_{k}}}^{⁎ 2} = s_{x_{n_{k}} - r_{1 k}}^{2} + \log [1 + | 1 - \frac{s_{z_{n_{k}}}^{⁎ 2}}{S_{Z_{N_{k}}}^{2}} |]

s_{x_{m_{k}}}^{⁎ 2} = s_{x_{n_{k}} - r_{2 k}}^{2} + \log [1 + | 1 - \frac{s_{z_{m_{k}}}^{⁎ 2}}{S_{Z_{N_{k}}}^{2}} |]

The constants $a_{k}$ and $b_{k}$ may be established by minimizing the mean square errors of the estimators.

In order to estimate the population variance $σ_{Y}^{2}$ , we use the estimators $T_{m_{k}}$ and $T_{u_{k}}$ , which are defined in Equations (1) and (2), and we suggest the matched sample estimator $T_{m}$ and fresh sample estimator $T_{u}$ as follows:

T_{m} = \sum_{k = 1}^{L} Ω^{⁎ 2} T_{m_{k}}

(3)

and

T_{u} = \sum_{k = 1}^{L} Ω^{⁎ ⁎ 2} T_{u_{k}}

(4)

Remark 1

The argument of the logarithmic function must be non-negative; otherwise, the function is undefined. When assessing the practical applicability of an estimator, it is crucial to establish its domain of valid values. The presented estimators fall under the category of the log(x) function, where x must be greater than zero, limiting their use to situations with positive values of x. In contrast, for the $\log [1 + | x |]$ function, there are no such restrictions on the domain of x, allowing for a broader range of values and increased versatility in real-world applications.

Remark 2

We have observed that the structure remains consistent in large sample approximations. Specifically, if $m_{k}$ , $u_{k}$ , and $n_{k}$ → $N_{k}$ , then $s_{z_{n_{k}}}^{2}$ → $S_{Z_{N_{k}}}^{2}$ , $s_{z_{m_{k}}}^{2}$ → $S_{Z_{N_{k}}}^{2}$ , $s_{z_{u_{k}}}^{2}$ → $S_{Z_{N_{k}}}^{2}$ , $s_{y_{n_{k} - r_{1k}}}^{2}$ → $S_{Y_{N_{k}}}^{2}$ , and $s_{y_{u_{k} - r_{3k}}}^{2}$ → $S_{Y_{N_{k}}}^{2}$ . Using the fact that $\log (1) = 0$ , we may conclude that $s_{x_{n_{k}}}^{2}$ → $S_{X_{N_{k}}}^{2}$ , and also $s_{x_{m_{k}}}^{2}$ → $S_{X_{N_{k}}}^{2}$ . Therefore, the estimator is consistent in large-sample approximations, as it may be inferred that $T_{m_{k}}$ → $S_{Y_{N_{k}}}^{2}$ and $T_{u_{k}}$ → $S_{Y_{N_{k}}}^{2}$ .

6. Suggested calibration technique

The new calibrated weights for the estimator of the population variance ( $T_{m} = \sum_{k = 1}^{L} Ω_{k}^{⁎ 2} T_{m_{k}}$ ) under stratified sampling are obtained by minimizing the chi-square distance function $\sum_{k = 1}^{L} \frac{{(Ω_{k}^{⁎} - W_{k})}^{2}}{Q_{k} W_{k}}$ while adhering to specific calibration constraints. These constraints are as follows:

1.
$\sum_{k = 1}^{L} Ω_{k}^{⁎} = 1$
2.
$\sum_{k = 1}^{L} Ω_{k}^{⁎} \log ({\bar{z}}_{n_{k}}) = \sum_{k = 1}^{L} W_{k} \log ({\bar{Z}}_{k})$
3.
$\sum_{k = 1}^{L} Ω_{k}^{⁎} c_{x_{m_{k} - r_{2 k}}} = \sum_{k = 1}^{L} W_{k} c_{x_{n_{k} - r_{1 k}}}$

Where $Ω_{k}^{⁎}$ represents the calibrated weights, $W_{k}$ are the initial weights, and $c_{x_{m_{k} - r_{2 k}}}$ = $\frac{s_{x_{m_{k} - r_{2 k}}}}{{\bar{x}}_{m_{k} - r_{2 k}}}$ and $c_{x_{n_{k} - r_{1 k}}}$ = $\frac{s_{x_{n_{k} - r_{1 k}}}}{{\bar{x}}_{n_{k} - r_{1 k}}}$ .

Next, we consider the estimation of the population variance on the current occasion ( $T_{u} = \sum_{k = 1}^{L} Ω_{k}^{⁎ ⁎ 2} T_{u_{k}}$ ) using another set of calibrated weights. These calibrated weights are derived through the minimization of the chi-square distance function $\sum_{k = 1}^{L} \frac{{(Ω_{k}^{⁎ ⁎} - W_{k})}^{2}}{Q_{k} W_{k}}$ while adhering to specific calibration constraints, which are as follows:

1.
$\sum_{k = 1}^{L} Ω_{k}^{⁎ ⁎} = 1$
2.
$\sum_{k = 1}^{L} Ω_{k}^{⁎ ⁎} c_{{\bar{z}}_{u_{k}}} = C_{Z}$

Where $Ω_{k}^{⁎ ⁎}$ represents the calibrated weights, $c_{z_{u_{k}}}$ = $\frac{s_{z_{u_{k}}}}{{\bar{z}}_{u_{k}}}$ and $C_{Z}$ = $\frac{S_{Z}}{\bar{Z}}$ .

It is important to note that $Q_{k} > 0$ are appropriately determined weights that will determine the estimator form.

In Appendix A, detailed derivations have been given.

We propose a way to estimate the population variance of a study variable on the current (second) occasion of a two-occasion successive sampling. Our proposed estimator is a combination of two estimators, $T_{u}$ and $T_{m}$ , using a scalar ϕ $(0 \leq ϕ \leq 1)$ that minimizes the mean square error of the estimator T.

i.e.

T = ϕ T_{u} + (1 - ϕ) T_{m}

If we want to estimate the population variance on each occasion, we choose ϕ=1 and use $T_{u}$ . If we want to estimate the change from one occasion to the next, we choose ϕ=0 and use $T_{m}$ . To deal with both problems at the same time, we need to determine an optimum value for ϕ.

7. Properties of the estimator that has been proposed

The first-order bias and mean squared errors of proposed estimators $T_{u}$ and $T_{m}$ are derived, assuming large sample size and utilizing the stated transformations.

$s_{x_{n_{k} - r_{1 k}}}^{2} = S_{X_{N_{k}}}^{2} (1 + ϵ_{0 k})$	$s_{z_{n_{k}}}^{2} = S_{Z_{N_{k}}}^{2} (1 + ϵ_{1 k})$	$s_{x_{m_{k} - r_{2 k}}}^{2} = S_{X_{N_{k}}}^{2} (1 + ϵ_{2 k})$
$s_{z_{m_{k}}}^{2} = S_{Z_{N_{k}}}^{2} (1 + ϵ_{3 k})$	$s_{y_{n_{k} - r_{1 k}}}^{2} = S_{Y_{N_{k}}}^{2} (1 + ϵ_{4 k})$	$s_{y_{u_{k} - r_{3 k}}}^{2} = S_{Y_{N_{k}}}^{2} (1 + ϵ_{5 k})$
$s_{z_{u_{k}}}^{2} = S_{Z_{k}}^{2} (1 + ϵ_{6 k})$	$s_{y_{m_{k} - r_{2 k}}}^{2} = S_{Y_{N_{k}}}^{2} (1 + ϵ_{7 k})$

Stratum	N_k	n_k	r_1k	m_k	r_2k	u_k	ρ_xy	ρ_yz	ρ_zx
Strata 1	10000	3000	500	2000	400	1000	0.9	0.9	0.9
Strata 2	15000	3000	500	2000	400	1000	0.8	0.8	0.8
Strata 3	5000	2000	400	1200	300	800	0.8	0.8	0.8
Strata 4	8000	800	50	500	50	300	0.6	0.6	0.6

Stratum	N_k	n_k	r_1k	m_k	r_2k	u_k	ρ_xy	ρ_yz	ρ_zx
Strata 1	12000	3600	600	2400	480	1200	0.95	0.95	0.95
Strata 2	17000	3600	600	2400	480	1200	0.85	0.85	0.85
Strata 3	6000	2400	480	1440	360	960	0.85	0.85	0.85
Strata 4	9000	900	57	563	57	337	0.65	0.65	0.65

Case	Stratum	W_k	$Ω_{k}^{⁎}$	$Ω_{k}^{⁎ ⁎}$
I	1	0.2631579	0.30121385	0.3129923
	2	0.3947368	0.55203331	0.2801177
	3	0.1315789	0.08592402	0.1564961
	4	0.2105263	0.06082881	0.2503938

II	1	0.2631579	0.33327331	0.3013643
	2	0.3947368	0.52933590	0.2801177
	3	0.1315789	0.07974788	0.1697853
	4	0.2105263	0.05764291	0.2487327

III	1	0.2631579	0.30140728	0.3132156
	2	0.3947368	0.55189637	0.2801177
	3	0.1315789	0.08588676	0.1564564
	4	0.2105263	0.06080959	0.2502102

IV	1	0.2631579	0.30107846	0.3124016
	2	0.3947368	0.55212917	0.2801177
	3	0.1315789	0.08595011	0.1565251
	4	0.2105263	0.06084227	0.2509556

		PRE				Bias
p₁	p₂	p₃=0.05	0.10	0.15	0.20	0.05	0.10	0.15	0.20
0.05	0.05	142.9	141.9	141.1	140.4	0.0001505	0.0001462	0.0001416	0.0001367
0.05	0.10	146.1	145.3	144.6	144.0	0.0001505	0.0001462	0.0001416	0.0001367
0.05	0.15	149.5	148.7	148.1	147.7	0.0001505	0.0001462	0.0001416	0.0001367
0.05	0.20	152.9	152.3	151.8	151.6	0.0001505	0.0001462	0.0001416	0.0001367
0.10	0.05	139.3	138.2	137.3	136.5	0.0001505	0.0001462	0.0001416	0.0001367
0.10	0.10	142.5	141.5	140.6	140.0	0.0001505	0.0001462	0.0001416	0.0001367
0.10	0.15	145.8	144.9	144.1	143.6	0.0001505	0.0001462	0.0001416	0.0001367
0.10	0.20	149.1	148.3	147.7	147.3	0.0001505	0.0001462	0.0001416	0.0001367
0.15	0.05	136.1	134.9	133.8	132.9	0.0001505	0.0001462	0.0001416	0.0001367
0.15	0.10	139.2	138.1	137.1	136.3	0.0001505	0.0001462	0.0001416	0.0001367
0.15	0.15	142.4	141.4	140.5	139.8	0.0001505	0.0001462	0.0001416	0.0001367
0.15	0.20	145.7	144.8	144.0	143.4	0.0001505	0.0001462	0.0001416	0.0001367
0.20	0.05	133.2	131.9	130.7	129.7	0.0001505	0.0001462	0.0001416	0.0001367
0.20	0.10	136.3	135.0	133.9	133.0	0.0001505	0.0001462	0.0001416	0.0001367
0.20	0.15	139.4	138.2	137.3	136.4	0.0001505	0.0001462	0.0001416	0.0001367
0.20	0.20	142.6	141.6	140.7	140.0	0.0001505	0.0001462	0.0001416	0.0001367

Stratum	PRE	Bias
Case I	144.5398	0.0001416434
Case II	144.5906	0.0001352958
Case III	144.5827	0.0001416345
Case IV	144.4316	0.0001415639

Case	Stratum	W_k	$Ω_{k}^{⁎}$	$Ω_{k}^{⁎ ⁎}$
I	1	0.2727273	0.3808514	0.2609850
	2	0.3947368	0.1446900	0.3997344
	3	0.1363636	0.1269082	0.1304925
	4	0.2045455	0.3475505	0.2087880

II	1	0.2727273	0.3558239	0.2614920
	2	0.3863636	0.1629070	0.3997344
	3	0.1363636	0.1002457	0.1299131
	4	0.2045455	0.3810234	0.2088605

III	1	0.2727273	0.3825238	0.2607414
	2	0.3863636	0.1434726	0.3997344
	3	0.1363636	0.1286899	0.1306599
	4	0.2045455	0.3453136	0.2088643

IV	1	0.2727273	0.3815501	0.2609781
	2	0.3863636	0.1441814	0.3997344
	3	0.1363636	0.1276526	0.1304788
	4	0.2045455	0.3466160	0.2088087

Stratum	PRE	Bias
Case I	182.8948	0.000407048
Case II	184.5569	0.0004524072
Case III	182.7551	0.0004040932
Case IV	182.8089	0.0004057739

Stratum	N_k	n_k	r_1k	m_k	r_2k	u_k	ρ_xy	ρ_yz	ρ_zx
Strata 1	2500	875	125	625	125	250	0.8999226	0.8283662	0.7473441
Strata 2	2600	910	130	650	130	260	0.8914957	0.9164524	0.8335361
Strata 3	2311	809	116	578	116	231	0.9689409	0.9337467	0.9209707

Case	Stratum	W_k	$Ω_{k}^{⁎}$	$Ω_{k}^{⁎ ⁎}$
I	1	0.3373364	0.4930872	0.3101796
	2	0.3508298	0.3163187	0.4557245
	3	0.3118338	0.1905941	0.2340959

II	1	0.3373364	0.4930873	0.3137647
	2	0.3508298	0.3163187	0.4546208
	3	0.3118338	0.1905940	0.2316144

III	1	0.3373364	0.4930873	0.3100504
	2	0.3508298	0.3163187	0.4557643
	3	0.3118338	0.1905940	0.2341853

IV	1	0.3373364	0.4930873	0.3195387
	2	0.3508298	0.3163187	0.4528433
	3	0.3118338	0.1905940	0.2276180

Stratum	PRE	Bias
Case I	145.1036	0.01079086
Case II	145.0877	0.01079694
Case III	145.1042	0.01079061
Case IV	145.0727	0.01080245

		PRE				Bias
p₁	p₂	p₃=0.05	0.10	0.15	0.20	0.05	0.10	0.15	0.20
0.05	0.05	143.0	142.1	141.4	140.8	0.000144	0.0001399	0.0001353	0.0001305
0.05	0.10	146.4	145.6	145.0	144.6	0.000144	0.0001399	0.0001353	0.0001305
0.05	0.15	149.9	149.3	148.8	148.5	0.000144	0.0001399	0.0001353	0.0001305
0.05	0.20	153.5	153.0	152.7	152.5	0.000144	0.0001399	0.0001353	0.0001305
0.10	0.05	139.2	138.3	137.4	136.7	0.000144	0.0001399	0.0001353	0.0001305
0.10	0.10	142.6	141.7	140.9	140.3	0.000144	0.0001399	0.0001353	0.0001305
0.10	0.15	146.0	145.2	144.6	144.1	0.000144	0.0001399	0.0001353	0.0001305
0.10	0.20	149.5	148.9	148.4	148.0	0.000144	0.0001399	0.0001353	0.0001305
0.15	0.05	135.9	134.8	133.8	133.0	0.000144	0.0001399	0.0001353	0.0001305
0.15	0.10	139.1	138.1	137.2	136.5	0.000144	0.0001399	0.0001353	0.0001305
0.15	0.15	142.5	141.6	140.8	140.2	0.000144	0.0001399	0.0001353	0.0001305
0.15	0.20	145.9	145.1	144.5	144.0	0.000144	0.0001399	0.0001353	0.0001305
0.20	0.05	132.9	131.7	130.6	129.6	0.000144	0.0001399	0.0001353	0.0001305
0.20	0.10	136.1	134.9	133.9	133.1	0.000144	0.0001399	0.0001353	0.0001305
0.20	0.15	139.3	138.3	137.4	136.6	0.000144	0.0001399	0.0001353	0.0001305
0.20	0.20	142.7	141.8	141.0	140.4	0.000144	0.0001399	0.0001353	0.0001305

		PRE				Bias
p₁	p₂	p₃=0.05	0.10	0.15	0.20	0.05	0.10	0.15	0.20
0.05	0.05	177.6	172.9	168.8	165.3	0.0004008	0.0003805	0.0003597	0.0003385
0.05	0.10	182.6	177.9	173.8	170.4	0.0004008	0.0003805	0.0003597	0.0003385
0.05	0.15	187.8	183.1	179.1	175.8	0.0004008	0.0003805	0.0003597	0.0003385
0.05	0.20	193.2	188.6	184.7	181.4	0.0004008	0.0003805	0.0003597	0.0003385
0.10	0.05	172.9	168.1	163.8	160.2	0.0004008	0.0003805	0.0003597	0.0003385
0.10	0.10	177.8	172.9	168.7	165.1	0.0004008	0.0003805	0.0003597	0.0003385
0.10	0.15	182.8	178.0	173.8	170.3	0.0004008	0.0003805	0.0003597	0.0003385
0.10	0.20	188.1	183.3	179.2	175.7	0.0004008	0.0003805	0.0003597	0.0003385
0.15	0.05	168.5	163.5	159.1	155.3	0.0004008	0.0003805	0.0003597	0.0003385
0.15	0.10	173.2	168.2	163.8	160.1	0.0004008	0.0003805	0.0003597	0.0003385
0.15	0.15	178.1	173.1	168.8	165.1	0.0004008	0.0003805	0.0003597	0.0003385
0.15	0.20	183.2	178.3	174.0	170.3	0.0004008	0.0003805	0.0003597	0.0003385
0.20	0.05	164.3	159.2	154.7	150.8	0.0004008	0.0003805	0.0003597	0.0003385
0.20	0.10	168.9	163.7	159.2	155.3	0.0004008	0.0003805	0.0003597	0.0003385
0.20	0.15	173.6	168.5	164.0	160.2	0.0004008	0.0003805	0.0003597	0.0003385
0.20	0.20	178.6	173.5	169.0	165.2	0.0004008	0.0003805	0.0003597	0.0003385

		PRE				Bias
p₁	p₂	p₃=0.05	0.10	0.15	0.20	0.05	0.10	0.15	0.20
0.05	0.05	178.7	173.4	168.8	164.9	0.0004473	0.0004280	0.000408	0.0003874
0.05	0.10	183.0	177.7	173.2	169.3	0.0004473	0.0004280	0.000408	0.0003874
0.05	0.15	187.4	182.2	177.7	173.8	0.0004473	0.0004280	0.000408	0.0003874
0.05	0.20	192.0	186.8	182.4	178.6	0.0004473	0.0004280	0.000408	0.0003874
0.10	0.05	174.7	169.3	164.6	160.5	0.0004473	0.0004280	0.000408	0.0003874
0.10	0.10	178.9	173.5	168.8	164.7	0.0004473	0.0004280	0.000408	0.0003874
0.10	0.15	183.2	177.8	173.2	169.2	0.0004473	0.0004280	0.000408	0.0003874
0.10	0.20	187.7	182.3	177.7	173.8	0.0004473	0.0004280	0.000408	0.0003874
0.15	0.05	170.9	165.4	160.5	156.2	0.0004473	0.0004280	0.000408	0.0003874
0.15	0.10	175.0	169.4	164.6	160.4	0.0004473	0.0004280	0.000408	0.0003874
0.15	0.15	179.2	173.6	168.8	164.7	0.0004473	0.0004280	0.000408	0.0003874
0.15	0.20	183.6	178.1	173.3	169.2	0.0004473	0.0004280	0.000408	0.0003874
0.20	0.05	167.3	161.6	156.6	152.2	0.0004473	0.0004280	0.000408	0.0003874
0.20	0.10	171.2	165.6	160.6	156.2	0.0004473	0.0004280	0.000408	0.0003874
0.20	0.15	175.3	169.7	164.7	160.4	0.0004473	0.0004280	0.000408	0.0003874
0.20	0.20	179.6	173.9	169.0	164.7	0.0004473	0.0004280	0.000408	0.0003874

tot_am= $\sum_{k = 1}^{L} W_{k} Q_{k}$	tot_bm= $\sum_{k = 1}^{L} W_{k} Q_{k} \log ({\bar{z}}_{n_{k}})$
tot_cm= $\sum_{k = 1}^{L} W_{k} Q_{k} c_{x_{m_{k} - r_{2 k}}}$	tot_dm=1- $\sum_{k = 1}^{L} W_{k}$
tot_em= $\sum_{k = 1}^{L} W_{k} Q_{k} {(\log ({\bar{z}}_{n_{k}}))}^{2}$	tot_fm= $\sum_{k = 1}^{L} W_{k} Q_{k} \log ({\bar{z}}_{n_{k}}) c_{x_{m_{k} - r_{2 k}}}$
tot_gm= $\sum_{k = 1}^{L} W_{k} (\log ({\bar{Z}}_{k}) - \log ({\bar{z}}_{n_{k}}))$	tot_hm= $\sum_{k = 1}^{L} W_{k} Q_{k} c_{x_{m_{k} - r_{2 k}}}^{2}$
tot_im= $\sum_{k = 1}^{L} W_{k} c_{x_{n_{k} - r_{1 k}}} - \sum_{k = 1}^{L} W_{k} c_{x_{m_{k} - r_{2 k}}}$

tot_au= $\sum_{k = 1}^{L} W_{k} Q_{k}$	tot_bu= $\sum_{k = 1}^{L} W_{k} Q_{k} c_{{\bar{z}}_{u_{k}}}$
tot_cu=1- $\sum_{k = 1}^{L} W_{k}$	tot_du= $\sum_{k = 1}^{L} W_{k} Q_{k} c_{{\bar{z}}_{u_{k}}}^{2}$
tot_eu= $C_{Z} - \sum_{k = 1}^{L} W_{k} c_{{\bar{z}}_{u_{k}}}$

E( $ϵ_{0 k}^{2}$ )=f_3k(λ_400k − 1)	E( $ϵ_{1 k}^{2}$ )=f_8k(λ_004k − 1)	E( $ϵ_{2 k}^{2}$ )=f_10k(λ_400k − 1)
E( $ϵ_{3 k}^{2}$ )=f_9k(λ_004k − 1)	E( $ϵ_{4 k}^{2}$ )=f_3k(λ_040k − 1)	E(ϵ_0kϵ_2k)=f_3k(λ_400k − 1)
E(ϵ_0kϵ_4k)=f_3k(λ_220k − 1)	E(ϵ_2kϵ_4k)=f_3k(λ_220k − 1)	E(ϵ_3kϵ_4k)=f_3k(λ_022k − 1)
E(ϵ_1kϵ_4k)=f_8k(λ_022k − 1)	E(ϵ_0kϵ_3k)=f_3k(λ_202k − 1)	E(ϵ_2kϵ_3k)=f_2k(λ_202k − 1)
E(ϵ_0kϵ_1k)=f_8k(λ_202k − 1)	E(ϵ_1kϵ_2k)=f_8k(λ_202k − 1)	E(ϵ_1kϵ_3k)=f_8k(λ_004k − 1)
E( $ϵ_{5 k}^{2}$ )=f_6k(λ_040k − 1)	E( $ϵ_{6 k}^{2}$ )=f_7k(λ_004k − 1)	E(ϵ_5kϵ_6k)=f_7k(λ_022k − 1)
E( $ϵ_{7 k}^{2}$ )=f_10k(λ_040k − 1)

PERMALINK

Improved estimation of population variance in stratified successive sampling using calibrated weights under non-response

MK Pandey

GN Singh

Tolga Zaman

Aned Al Mutairi

Manahil SidAhmed Mustafa

Abstract

1. Introduction

2. Motivation of the study

3. Sample structure

3.1. Notations

4. Non-response probability model

5. Proposed estimator

Remark 1

Remark 2

6. Suggested calibration technique

7. Properties of the estimator that has been proposed

8. The estimator's minimum mean squared error (MSE)

9. Empirical study

9.1. Simulation study

Table 1.

Table 8.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 9.

Table 10.

Table 11.

Table 12.

Table 13.

Table 14.

9.2. Study based on real data

Table 15.

Figure 1.

Table 16.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Table 17.

Table 18.

Table 19.

Table 20.

Table 21.

Figure 6.

Figure 7.

10. Discussion and conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Appendix A. Deriving calibrated strata weights

Appendix B. Deriving bias and MSE of the proposed estimator

Remark

Data availability statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases