Skip to main content
Heliyon logoLink to Heliyon
. 2024 Mar 12;10(6):e27738. doi: 10.1016/j.heliyon.2024.e27738

Improved estimation of population variance in stratified successive sampling using calibrated weights under non-response

MK Pandey a,, GN Singh a, Tolga Zaman b, Aned Al Mutairi c, Manahil SidAhmed Mustafa d
PMCID: PMC10965521  PMID: 38545218

Abstract

This paper introduces a new method to estimate the population variance of a study variable in stratified successive sampling over two occasions, while accounting for random non-response. The method uses a logarithmic type estimator and leverages information from a highly positively correlated auxiliary variable. The paper also presents calibrated weights for the new estimator and examines its properties through numerical and simulation studies. The results indicate that the suggested estimator is more effective than the standard estimator for estimating the population variance.

MSC: 62D05

Keywords: Stratified sampling, Successive sampling, Auxiliary information, Bias, Mean square error, Calibration technique, Random non-response

1. Introduction

Stratified successive sampling is a statistical method employed to estimate population variance in situations where obtaining a random sample from the population proves to be challenging or cost-prohibitive. This technique involves dividing the population into distinct strata and selecting individuals randomly from within these strata. Successive sampling proves particularly advantageous in socio-economic research, where populations may be widely dispersed or access may be limited. The concept of stratified sampling for estimating population parameters, including population variance, was first introduced by [16]. Subsequently, [14] extended this method for estimating population variance in stratified successive sampling, while [19] and [25] explored population variance estimation in stratified successive sampling scenarios with unequal probabilities while [15] did the same with equal probability. [23] developed an effectual variance estimation strategy for two-occasion successive sampling, accounting for random non-response, whereas [27] worked on a family of robust-type estimators for population variance in simple and stratified random sampling. [1] improved the estimation of finite population variance using dual supplementary information under stratified random sampling, and [6] introduced efficient classes of estimators of population variance in two-phase successive sampling scenarios with random non-response. Moreover, [5] and [4] contributed to the field by developing memory-type ratio and product estimators under ranked-based sampling schemes, among others provide a comprehensive foundation for the utilization of stratified successive sampling and calibration approaches, enhancing the proposed method's accuracy and applicability in estimating population variance, especially under non-response.

[7] introduced a calibration approach using least squares adjustment, a method later adopted by statistical authorities in various organizations. The primary aim of the calibration approach is to formulate unbiased estimation procedures with minimal dispersion by leveraging information from auxiliary variables. [8] proposed a calibration estimation procedure that reduces the discrepancy between initial and final weights while still adhering to calibration equations and constraints. [10] explored model-assisted higher-order calibration of variance estimators, and [20] applied the calibration approach in survey theory and practice. [12] and [11] contributed to calibration approach estimators in stratified sampling. [26] introduced a calibration approach-based regression-type estimator to model the inverse relationship between study and auxiliary variables, and [13] applied calibration weighting in stratified random sampling. [17] introduced a new calibration estimator specifically designed for stratified sampling, and [9] developed calibration estimation for ratio estimators in stratified sampling with proportion allocation. [22] extended the calibration approach to estimate population variance in stratified successive sampling while accounting for random non-response. Recently, [2] conducted work on estimating the calibration of the mean by employing a double use of auxiliary information. Additionally, [3] focused on determining optimal calibrated weights while minimizing a variance function. In a related study, [21] explored L-moments and calibration-based variance estimators under a double stratified random sampling scheme, with a specific focus on their application during the COVID-19 pandemic. Moreover, [18] contributed to the field by developing a general class of improved population variance estimators under non-sampling errors using calibrated weights in stratified sampling.

Calibration finds its place as a pivotal technique in the estimation of population variance through stratified successive sampling. This approach enriches estimation accuracy by imbuing resulting estimates with greater population representativeness and reduced bias. This significance is particularly pronounced in stratified successive sampling, where the goal centers on ensuring representation of each stratum within the final estimate. Additionally, this proposed estimation strategy finds application in domains as diverse as the estimation of variance in gas turbine exhaust pressure, as evidenced by real-data illustration in later sections of this manuscript.

2. Motivation of the study

The dynamic nature of our world entails constant variation, with implications across various realms. In medicine, variations in body temperature, blood pressure, and pulse rate hold diagnostic significance. Similarly, diverse consumer responses to products drive pricing and quality decisions, while variations in climatic factors guide agricultural planning.

The estimation of population variance assumes a pivotal role in diverse fields, ranging from socio-economic studies to engineering applications. Stratified successive sampling offers a strategic approach, particularly when random sampling from the entire population is intricate. However, the presence of non-response poses a challenge, potentially introducing bias.

In this context, this research article proposed a novel solution. We have introduced a logarithmic-type estimator that incorporates information from a highly correlated auxiliary variable. Moreover, calibrated weights are integrated into the estimation process to counteract the impact of non-response bias.

The motivation for this study emanates from various factors, including:

Practical Relevance: Stratified successive sampling is commonly used in scenarios where populations are geographically dispersed or hard to access due to non-homogeneity. However, non-response can distort estimates, making it essential to account for this bias through calibrated weights.

Calibration Approach: We use the calibration technique to improve the accuracy of the estimates. By incorporating calibration into the estimation process, resulting estimates are more representative of the population and have reduced bias. Leveraging auxiliary variables to enhance estimation accuracy has shown effectiveness. Applying this calibration technique to stratified successive sampling aims to yield more representative and unbiased population variance estimates.

Engineering Applications: The proposed methodology holds practical significance in fields like engineering, particularly in quality control and process improvement. Precise population variance estimation is vital for assessing manufacturing process variability, enabling informed decisions for process optimization and quality control strategies.

Socio-Economic Research: In socio-economic research, understanding the relationship between variables like education and income is pivotal for informed decision-making. By enhancing population variance estimation, the proposed method provides deeper insights into these relationships, guiding policy and strategy decisions.

Unique Challenges Addressed: The use of logarithmic-type estimators underscores a focus on addressing rare challenges, such as estimating diseases or socio-economic issues requiring specialized techniques. This approach showcases our dedication to tackling complex issues with precision.

In conclusion, the motivation behind this research lies in the imperative to enhance population variance estimation accuracy in stratified successive sampling, particularly by addressing non-response challenges. Through the utilization of calibrated weights and auxiliary information, we aspire to provide more resilient and effective estimation techniques, applicable across diverse fields from engineering to socio-economic research. By obtaining more accurate estimates of population variance, one can better understand the relationship between education level and income in the studied population. This understanding may hold important implications for policy decisions related to education and income inequality, as well as for businesses and organizations interested in targeting college-educated individuals for employment or marketing purposes.

3. Sample structure

Consider a finite population of size N divided into L non-overlapping strata, each containing Nk (k=1,2,..., L) units. Let us use X and Y to represent the study character on the first and second occasions. It is assumed that information regarding an auxiliary variable Z is accessible on both occasions, and the population variance of Z is known.

Let us consider the kth strata, where k ranges from 1 to L. To begin with, we use simple random sampling without replacement (SRSWOR) to draw a preliminary sample of size nk from the population for the first occasion, where r1k units do not respond. From the responding part of this sample, we draw a second stage SRSWOR sample of size mk=nkλk, where λk is the fraction of matched samples, and r2k units do not respond. We use this sample for the second occasion and collect information on the study variable Y. Additionally, we draw a fresh sample of size uk=nkmk=nkμk from the population using SRSWOR on Y again. Here, r3k units do not respond. The fractions of matched and fresh samples on the current (second) occasion are represented by λk and μk, respectively, where λk+μk = 1.

3.1. Notations

From now on, we will use the following notations:

σY2: The population variance of Y, i.e., the characteristics under study.

Z¯Nk=1Nkl=1NkZkl: The population mean of Z in the kth strata.

Z¯=1Nl=1NkNkZ¯Nk: The population mean of Z.

x¯nkr1k=1nkr1kl=1nkr1kxkl,x¯mkr2k=1mkr2kl=1mkr2kxkl: The sample means of the study variable X in the kth strata based on the responding part of samples of sizes nk and mk, respectively.

z¯nk=1nkl=1nkzkl,z¯uk=1ukl=1ukzkl: The sample means of the auxiliary variable Z in the kth strata based on samples of sizes nk and uk, respectively.

SYNk2=1Nk1l=1Nk(YklY¯Nk)2, SXNk2=1Nk1l=1Nk(XklX¯Nk)2: The population mean squares of the kth stratum of the study variables Y and X, respectively.

SZNk2=1Nk1l=1Nk(ZklZ¯Nk)2: The population mean squares of the kth stratum for the auxiliary variable Z.

sxnkr1k2=1nkr1k1l=1nkr1k(xklx¯nkr1k)2: Depending on the responding part of sample of size nk, the sample mean square of study variable X for the kth stratum.

sxmkr2k2=1mkr2k1l=1mkr2k(xklx¯ukr2k)2: Depending on the responding part of sample of size mk, the sample mean square of study variable X for the kth stratum.

synkr1k2=1nkr1k1l=1nkr1k(ykly¯nkr1k)2: Depending on the responding part of sample of size nk, the sample mean square of study variable Y for the kth stratum.

syukr3k2=1ukr3k1l=1ukr3k(ykly¯ukr3k)2: Depending on the responding part of sample of size uk, the sample mean square of study variable Y for the kth stratum.

sznk2=1nk1l=1nk(zklz¯nk)2: Depending on the sample of size nk, the sample mean square of auxiliary variable Z for the kth stratum.

szmk2=1mk1l=1mk(zklz¯mk)2: Depending on the sample of size mk, the sample mean square of auxiliary variable Z for the kth stratum.

szuk2=1uk1l=1uk(zklz¯uk)2: Depending on the sample of size mk, the sample mean square of auxiliary variable Z for the kth stratum.

Wk =NkN: The original weight of the kth stratum, k= 1, 2,...,L

Ωk: The calibrated weight of the kth stratum, k= 1, 2,...,L based on the sample of size mk

Ωk: The calibrated weight of the kth stratum, k= 1, 2,...,L based on the sample of size uk

Qk: The independent weight of the kth stratum, k= 1, 2,...,L.

4. Non-response probability model

The kth stratum is considered using [24] random non-response model. Consider a sample Snk of size nk for which some data on X could not be collected due to random non-response. Let r1k, where r1k ranges from 0, 1, 2,...,(nk - 2), represent the number of such cases in Snk. Similarly, for a sample Smk of size mk, let r2k represent the number of units for which Y information on the second occasion could not be acquired due to random non-response, and r3k represent the same for a sample of size uk. It is presumed that r1k, r2k, and r3k fall within their respective bounds. We assume that 0r1k(nk2), 0r2k(mk2) and 0r3k(uk2). If p1, p2, and p3 probabilities of non-response among the (nk-2), (mk-2), and (uk-2) possible values of non-responses respectively, the discrete probability distributions for r1k, r2k, and r3k are represented by

P(r1k)=nkr1knkq1+2p1(nk2r1k)p1r1kq1nkr1k2;r1k=0,1,2,...,nk2,P(r2k)=mkr2knkq2+2p2(mk2r2k)p2r2kq2mkr2k2;r2k=0,1,2,...,mk2

and

P(r3k)=ukr3knkq3+2p3(uk2r3k)p3r3kq2ukr3k2;r3k=0,1,2,...,uk2,respectively.

The number of ways to obtain rlk (l = 1, 2, 3) non-responses from all potential non-response values for the three samples are represented by (nk2r1k), (mk2r2k), and (uk2r3k).

5. Proposed estimator

We have developed a set of estimators to estimate the population mean square SYNk2 of the kth stratum, where k may range from 1 to L. These estimators are based on two samples: Smk, which is a common sample of size mk collected on previous occasions, and Suk, which is a fresh sample of size uk collected on the current occasion. We refer to the estimators based on these samples as Tmk and Tuk, respectively.

Tmk=synkr1k2+aklog[1+|sxnk2sxmk2|] (1)
Tuk=sukr3k2+bklog[1+|1szuk2SZNk2|] (2)

where

sxnk2=sxnkr1k2+log[1+|1sznk2SZNk2|]
sxmk2=sxnkr2k2+log[1+|1szmk2SZNk2|]

The constants ak and bk may be established by minimizing the mean square errors of the estimators.

In order to estimate the population variance σY2, we use the estimators Tmk and Tuk, which are defined in Equations (1) and (2), and we suggest the matched sample estimator Tm and fresh sample estimator Tu as follows:

Tm=k=1LΩ2Tmk (3)

and

Tu=k=1LΩ2Tuk (4)

Remark 1

The argument of the logarithmic function must be non-negative; otherwise, the function is undefined. When assessing the practical applicability of an estimator, it is crucial to establish its domain of valid values. The presented estimators fall under the category of the log(x) function, where x must be greater than zero, limiting their use to situations with positive values of x. In contrast, for the log[1+|x|] function, there are no such restrictions on the domain of x, allowing for a broader range of values and increased versatility in real-world applications.

Remark 2

We have observed that the structure remains consistent in large sample approximations. Specifically, if mk, uk, and nkNk, then sznk2SZNk2, szmk2SZNk2, szuk2SZNk2, synkr1k2SYNk2, and syukr3k2SYNk2. Using the fact that log(1)=0, we may conclude that sxnk2SXNk2, and also sxmk2SXNk2. Therefore, the estimator is consistent in large-sample approximations, as it may be inferred that TmkSYNk2 and TukSYNk2.

6. Suggested calibration technique

The new calibrated weights for the estimator of the population variance (Tm=k=1LΩk2Tmk) under stratified sampling are obtained by minimizing the chi-square distance function k=1L(ΩkWk)2QkWk while adhering to specific calibration constraints. These constraints are as follows:

  • 1.

    k=1LΩk=1

  • 2.

    k=1LΩklog(z¯nk)=k=1LWklog(Z¯k)

  • 3.

    k=1LΩkcxmkr2k=k=1LWkcxnkr1k

Where Ωk represents the calibrated weights, Wk are the initial weights, and cxmkr2k=sxmkr2kx¯mkr2k and cxnkr1k=sxnkr1kx¯nkr1k.

Next, we consider the estimation of the population variance on the current occasion (Tu=k=1LΩk2Tuk) using another set of calibrated weights. These calibrated weights are derived through the minimization of the chi-square distance function k=1L(ΩkWk)2QkWk while adhering to specific calibration constraints, which are as follows:

  • 1.

    k=1LΩk=1

  • 2.

    k=1LΩkcz¯uk=CZ

Where Ωk represents the calibrated weights, czuk=szukz¯uk and CZ=SZZ¯.

It is important to note that Qk>0 are appropriately determined weights that will determine the estimator form.

In Appendix A, detailed derivations have been given.

We propose a way to estimate the population variance of a study variable on the current (second) occasion of a two-occasion successive sampling. Our proposed estimator is a combination of two estimators, Tu and Tm, using a scalar ϕ (0ϕ1) that minimizes the mean square error of the estimator T.

i.e.

T=ϕTu+(1ϕ)Tm

If we want to estimate the population variance on each occasion, we choose ϕ=1 and use Tu. If we want to estimate the change from one occasion to the next, we choose ϕ=0 and use Tm. To deal with both problems at the same time, we need to determine an optimum value for ϕ.

7. Properties of the estimator that has been proposed

The first-order bias and mean squared errors of proposed estimators Tu and Tm are derived, assuming large sample size and utilizing the stated transformations.

sxnkr1k2=SXNk2(1+ϵ0k) sznk2=SZNk2(1+ϵ1k) sxmkr2k2=SXNk2(1+ϵ2k)
szmk2=SZNk2(1+ϵ3k) synkr1k2=SYNk2(1+ϵ4k) syukr3k2=SYNk2(1+ϵ5k)
szuk2=SZk2(1+ϵ6k) symkr2k2=SYNk2(1+ϵ7k)

Based on the calculations, the bias of the suggested estimators Tm and Tu, as well as the mean squared error (MSE) of Tm and Tu, have been computed accurately up to the first order of approximation.

The bias of Tm:

Bias(Tm)=k=1LakΩk2[SXNk22f4k(λ202k1)SXNk42f5k(λ400k1)] (5)

Mean Squared Error (MSE) of Tm:

MSE(Tm)=k=1LΩk4[f3kSYNk4(λ040k1)+ak2f5kSXNk4(λ400k1)+ak2f2k(λ004k1)2ak2f4kSXNk2(λ202k1)2akf1kSYNk2(λ022k1)] (6)

and bias of Tu:

Bias(Tu)=k=1LbkΩk2f7k(λ004k1) (7)

Mean Squared Error (MSE) of Tu:

MSE(Tu)=k=1LΩk4[f6kSYNk4(λ040k1)+bk2f7k(λ004k1)2bkf7kSYNk2(λ022k1)] (8)

where

f1k=(1nkq1+2p11nk),f2k=(1mk1nk),f3k=(1nkq1+2p11Nk)f4k=(1nkq1+2p11mk),f5k=(1mkq2+2p21nkq1+2p1),f6k=(1ukq3+2p31Nk)f7k=(1uk1Nk),f8k=(1nk1Nk),f9k=(1mk1Nk)f10k=(1mkq2+2p21Nk)

and

λαβγk=μαβγkμ200kαμ020kβμ002kγ,μαβγk=1Nk1j=1Nk(YkjY¯k)α(XkjX¯k)β(ZkjZ¯k)γ

In Appendix B, detailed derivations have been given.

Thus, we may obtain the bias and MSE of estimator T by combining the biases and MSEs of two non-overlapping samples Tu and Tm, as shown below:

Bias(T)=ϕBias(Tu)+(1ϕ)Bias(Tm) (9)

and

MSE(T)=ϕ2MSE(Tu)+(1ϕ)2MSE(Tm) (10)

Where equations (5), (7), (6), and (8) provide the expressions for Bias(Tm), Bias(Tu), MSE(Tm), and MSE(Tu), respectively. As Tu and Tm are based on non-overlapping samples of sizes u and m, respectively, the covariance term may be ignored as it is of order N1, and c(Tu,Tm)=0.

8. The estimator's minimum mean squared error (MSE)

MSEs of Tm and Tu depend on ak and bk. To find the best ak and bk, we minimize the MSEs in Equations (6) and (8). We get optimal ak and bk values.

akopt=f1kSYNk2(λ022k1)f5kSXNk4(λ400k1)+f2k(λ004k1)2f4kSXNk2(λ202k1) (11)

and

bkopt=SYNk2(λ022k1)(λ004k1) (12)

Using akopt from Eq. (11) in Eq. (6) and bkopt from Eq. (12) in Eq. (8) gives the minimum MSE of Tm and Tu.

MinMSE(Tm)=k=1LΩk4SYNk4[f3k(λ040k1)f1k2(λ022k1)2f5kSYNk4(λ400k1)+f2k(λ004k1)2f4kSXNk2(λ202k1)] (13)

and

MinMSE(Tu)=k=1LΩk4SYNk4[f6k(λ040k1)f7k(λ022k1)2λ004k1] (14)

MSE of T depends on ϕ. To find the best ϕ, we minimize the MSE. The optimal ϕ is

ϕopt=MinMSE(Tm)MinMSE(Tm)+MinMSE(Tu) (15)

We may use this ϕopt value to get the best MSE of T, which is

MSE(T)opt=MinMSE(Tm)MinMSE(Tu)MinMSE(Tm)+MinMSE(Tu) (16)

Equations (13) and (14) give the expressions for MinMSE(Tm) and MinMSE(Tu), respectively.

9. Empirical study

Before employing an estimator in practical situations, it is crucial to assess its performance based on its inherent properties. In light of this, an empirical analysis has been carried out in this section utilizing both real and simulated data to evaluate the suggested estimator.

To accomplish this, we will conduct a comparison between the suggested estimator T and an alternative estimator τ, which is also designed to handle random non-response and is defined in the same manner. The purpose of this comparison is to evaluate how well the suggested estimator T performs under conditions of random non-response. It is important to note that this evaluation will also include a comparison with the standard estimator because no estimator is proposed in stratified successive sampling under non-response.

τ=ψτu+(1ψ)τm

The values of τu=k=1LΩk2syukr3k2, τm=k=1LΩk2symkr2k2 and ψ(0ψ1) are unknown. The constant ψ needs to be determined by minimizing the Mean Squared Error (MSE) of estimator τ.

MSE(τm)=k=1LΩk4SYNk4(1mkq2+2p21Nk)(λ040k1) (17)

and

MSE(τu)=k=1LΩk4SYNk4(1ukq3+2p31Nk)(λ040k1) (18)

The minimum Mean Squared Error (MSE) of estimator τ by the combination of (17) and (18), up to the first order of approximations, may be expressed as:

MSE(τ)=MSE(τu)MSE(τm)MSE(τu)+MSE(τm) (19)

The proposed estimator T may be evaluated in terms of its Percentage Relative Efficiency (PRE) i.e. (20) with respect to the estimator τ. This may be calculated using the following formula:

PRE=MSE(τ)MSE(T)opt×100, (20)

where MSE(T)opt and MSE(τ) are defined in Equations (16) and (19), respectively.

The following Qk values have been considered:

  • Case A:

    Qk=1.0

  • Case B:

    Qk=1Wk

  • Case C:

    Qk=Z¯k

  • Case D:

    Qk=SZNk2

The resulting calibrated stratum weights are shown in Tables (2), (9) and (16), respectively.

9.1. Simulation study

We generated data based on our theoretical findings using the statistical computing software R. To create data from a normal distribution with specific parameters and correlation coefficients for both study and auxiliary variables, we utilized the mvrnorm function from the MASS package and the genCorgen function from the Simstudy package. The population parameters for the generated data following a Poisson distribution are outlined in Table 1, while the generated data following a Normal distribution are outlined in Table 8. Simulations were conducted to analyze the effects of the controlling parameter Qk, both with and without random non-response. The results of these simulations are presented in Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13, Table 14.

Table 1.

The statistical parameters correspond to the simulated data from a Poisson distribution.

Stratum Nk nk r1k mk r2k uk ρxy ρyz ρzx
Strata 1 10000 3000 500 2000 400 1000 0.9 0.9 0.9
Strata 2 15000 3000 500 2000 400 1000 0.8 0.8 0.8
Strata 3 5000 2000 400 1200 300 800 0.8 0.8 0.8
Strata 4 8000 800 50 500 50 300 0.6 0.6 0.6

Table 8.

The statistical parameters correspond to the simulated data from a Normal distribution.

Stratum Nk nk r1k mk r2k uk ρxy ρyz ρzx
Strata 1 12000 3600 600 2400 480 1200 0.95 0.95 0.95
Strata 2 17000 3600 600 2400 480 1200 0.85 0.85 0.85
Strata 3 6000 2400 480 1440 360 960 0.85 0.85 0.85
Strata 4 9000 900 57 563 57 337 0.65 0.65 0.65

Table 2.

Calibrated strata weights for simulated data following a Poisson distribution.

Case Stratum Wk Ωk Ωk
I 1 0.2631579 0.30121385 0.3129923
2 0.3947368 0.55203331 0.2801177
3 0.1315789 0.08592402 0.1564961
4 0.2105263 0.06082881 0.2503938



II 1 0.2631579 0.33327331 0.3013643
2 0.3947368 0.52933590 0.2801177
3 0.1315789 0.07974788 0.1697853
4 0.2105263 0.05764291 0.2487327



III 1 0.2631579 0.30140728 0.3132156
2 0.3947368 0.55189637 0.2801177
3 0.1315789 0.08588676 0.1564564
4 0.2105263 0.06080959 0.2502102



IV 1 0.2631579 0.30107846 0.3124016
2 0.3947368 0.55212917 0.2801177
3 0.1315789 0.08595011 0.1565251
4 0.2105263 0.06084227 0.2509556

Table 3.

For Case I:Q1 = 1.0, Q2 = 1.0, Q3 = 1.0, and Q4 = 1.0. Subsequently, Bias and PRE were obtained from simulated data following a Poisson distribution, where p1, p2, and p3 represent the probabilities of non-response among the (nk - 2), (mk - 2), and (uk - 2) possible values of non-responses, respectively.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20
0.05 0.05 142.9 141.9 141.1 140.4 0.0001505 0.0001462 0.0001416 0.0001367
0.05 0.10 146.1 145.3 144.6 144.0 0.0001505 0.0001462 0.0001416 0.0001367
0.05 0.15 149.5 148.7 148.1 147.7 0.0001505 0.0001462 0.0001416 0.0001367
0.05 0.20 152.9 152.3 151.8 151.6 0.0001505 0.0001462 0.0001416 0.0001367
0.10 0.05 139.3 138.2 137.3 136.5 0.0001505 0.0001462 0.0001416 0.0001367
0.10 0.10 142.5 141.5 140.6 140.0 0.0001505 0.0001462 0.0001416 0.0001367
0.10 0.15 145.8 144.9 144.1 143.6 0.0001505 0.0001462 0.0001416 0.0001367
0.10 0.20 149.1 148.3 147.7 147.3 0.0001505 0.0001462 0.0001416 0.0001367
0.15 0.05 136.1 134.9 133.8 132.9 0.0001505 0.0001462 0.0001416 0.0001367
0.15 0.10 139.2 138.1 137.1 136.3 0.0001505 0.0001462 0.0001416 0.0001367
0.15 0.15 142.4 141.4 140.5 139.8 0.0001505 0.0001462 0.0001416 0.0001367
0.15 0.20 145.7 144.8 144.0 143.4 0.0001505 0.0001462 0.0001416 0.0001367
0.20 0.05 133.2 131.9 130.7 129.7 0.0001505 0.0001462 0.0001416 0.0001367
0.20 0.10 136.3 135.0 133.9 133.0 0.0001505 0.0001462 0.0001416 0.0001367
0.20 0.15 139.4 138.2 137.3 136.4 0.0001505 0.0001462 0.0001416 0.0001367
0.20 0.20 142.6 141.6 140.7 140.0 0.0001505 0.0001462 0.0001416 0.0001367

Table 4.

For Case II, the values are Q1 = 3.80, Q2 = 2.53, Q3 = 7.60, and Q4 = 4.75. Subsequently, Bias and PRE were obtained from simulated data following a Poisson distribution.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20
0.05 0.05 143.0 142.1 141.4 140.8 0.000144 0.0001399 0.0001353 0.0001305
0.05 0.10 146.4 145.6 145.0 144.6 0.000144 0.0001399 0.0001353 0.0001305
0.05 0.15 149.9 149.3 148.8 148.5 0.000144 0.0001399 0.0001353 0.0001305
0.05 0.20 153.5 153.0 152.7 152.5 0.000144 0.0001399 0.0001353 0.0001305
0.10 0.05 139.2 138.3 137.4 136.7 0.000144 0.0001399 0.0001353 0.0001305
0.10 0.10 142.6 141.7 140.9 140.3 0.000144 0.0001399 0.0001353 0.0001305
0.10 0.15 146.0 145.2 144.6 144.1 0.000144 0.0001399 0.0001353 0.0001305
0.10 0.20 149.5 148.9 148.4 148.0 0.000144 0.0001399 0.0001353 0.0001305
0.15 0.05 135.9 134.8 133.8 133.0 0.000144 0.0001399 0.0001353 0.0001305
0.15 0.10 139.1 138.1 137.2 136.5 0.000144 0.0001399 0.0001353 0.0001305
0.15 0.15 142.5 141.6 140.8 140.2 0.000144 0.0001399 0.0001353 0.0001305
0.15 0.20 145.9 145.1 144.5 144.0 0.000144 0.0001399 0.0001353 0.0001305
0.20 0.05 132.9 131.7 130.6 129.6 0.000144 0.0001399 0.0001353 0.0001305
0.20 0.10 136.1 134.9 133.9 133.1 0.000144 0.0001399 0.0001353 0.0001305
0.20 0.15 139.3 138.3 137.4 136.6 0.000144 0.0001399 0.0001353 0.0001305
0.20 0.20 142.7 141.8 141.0 140.4 0.000144 0.0001399 0.0001353 0.0001305

Table 5.

For Case III:Q1=10.141, Q2=10.07, Q3= 10.08, and Q4= 10.05. Subsequently, Bias and PRE were obtained from simulated data following a Poisson distribution.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20
0.05 0.05 142.8901 141.9 141.1 140.5 0.0001505 0.0001462 0.0001416 0.0001367
0.05 0.10 146.1507 145.3 144.6 144.1 0.0001505 0.0001462 0.0001416 0.0001367
0.05 0.15 149.5087 148.8 148.2 147.8 0.0001505 0.0001462 0.0001416 0.0001367
0.05 0.20 152.9686 152.3 151.9 151.6 0.0001505 0.0001462 0.0001416 0.0001367
0.10 0.05 139.3401 138.3 137.3 136.5 0.0001505 0.0001462 0.0001416 0.0001367
0.10 0.10 142.5191 141.5 140.7 140.0 0.0001505 0.0001462 0.0001416 0.0001367
0.10 0.15 145.7932 144.9 144.2 143.6 0.0001505 0.0001462 0.0001416 0.0001367
0.10 0.20 149.1664 148.4 147.8 147.3 0.0001505 0.0001462 0.0001416 0.0001367
0.15 0.05 136.1238 134.9 133.8 132.9 0.0001505 0.0001462 0.0001416 0.0001367
0.15 0.10 139.2287 138.1 137.1 136.3 0.0001505 0.0001462 0.0001416 0.0001367
0.15 0.15 142.4261 141.4 140.5 139.8 0.0001505 0.0001462 0.0001416 0.0001367
0.15 0.20 145.7203 144.8 144.0 143.5 0.0001505 0.0001462 0.0001416 0.0001367
0.20 0.05 133.2532 131.9 130.8 129.7 0.0001505 0.0001462 0.0001416 0.0001367
0.20 0.10 136.2912 135.1 134.0 133.0 0.0001505 0.0001462 0.0001416 0.0001367
0.20 0.15 139.4197 138.3 137.3 136.4 0.0001505 0.0001462 0.0001416 0.0001367
0.20 0.20 142.6426 141.6 140.7 140.0 0.0001505 0.0001462 0.0001416 0.0001367

Table 6.

For Case IV:Q1= 9.93, Q2=10.05, Q3=10.06, and Q4= 10.19. Subsequently, Bias and PRE were obtained from simulated data following a Poisson distribution.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20
0.05 0.05 142.8 141.8 141.0 140.4 0.0001505 0.0001461 0.0001415 0.0001367
0.05 0.10 146.0 145.2 144.5 144.0 0.0001505 0.0001461 0.0001415 0.0001367
0.05 0.15 149.4 148.6 148.1 147.7 0.0001505 0.0001461 0.0001415 0.0001367
0.05 0.20 152.8 152.2 151.8 151.5 0.0001505 0.0001461 0.0001415 0.0001367
0.10 0.05 139.2 138.1 137.2 136.4 0.0001505 0.0001461 0.0001415 0.0001367
0.10 0.10 142.4 141.4 140.6 139.9 0.0001505 0.0001461 0.0001415 0.0001367
0.10 0.15 145.7 144.8 144.1 143.5 0.0001505 0.0001461 0.0001415 0.0001367
0.10 0.20 149.0 148.3 147.7 147.2 0.0001505 0.0001461 0.0001415 0.0001367
0.15 0.05 136.0 134.8 133.7 132.8 0.0001505 0.0001461 0.0001415 0.0001367
0.15 0.10 139.1 138.0 137.0 136.2 0.0001505 0.0001461 0.0001415 0.0001367
0.15 0.15 142.3 141.3 140.4 139.7 0.0001505 0.0001461 0.0001415 0.0001367
0.15 0.20 145.6 144.7 143.9 143.4 0.0001505 0.0001461 0.0001415 0.0001367
0.20 0.05 133.1 131.8 130.7 129.6 0.0001505 0.0001461 0.0001415 0.0001367
0.20 0.10 136.1 134.9 133.9 132.9 0.0001505 0.0001461 0.0001415 0.0001367
0.20 0.15 139.3 138.2 137.2 136.3 0.0001505 0.0001461 0.0001415 0.0001367
0.20 0.20 142.5 141.5 140.6 139.9 0.0001505 0.0001461 0.0001415 0.0001367

Table 7.

In the absence of non-response, bias and PRE are observed from simulated data following a Poisson distribution when p1= p2= p3=0.

Stratum PRE Bias
Case I 144.5398 0.0001416434
Case II 144.5906 0.0001352958
Case III 144.5827 0.0001416345
Case IV 144.4316 0.0001415639

Table 9.

Calibrated strata weights for simulated data following a Normal distribution.

Case Stratum Wk Ωk Ωk
I 1 0.2727273 0.3808514 0.2609850
2 0.3947368 0.1446900 0.3997344
3 0.1363636 0.1269082 0.1304925
4 0.2045455 0.3475505 0.2087880



II 1 0.2727273 0.3558239 0.2614920
2 0.3863636 0.1629070 0.3997344
3 0.1363636 0.1002457 0.1299131
4 0.2045455 0.3810234 0.2088605



III 1 0.2727273 0.3825238 0.2607414
2 0.3863636 0.1434726 0.3997344
3 0.1363636 0.1286899 0.1306599
4 0.2045455 0.3453136 0.2088643



IV 1 0.2727273 0.3815501 0.2609781
2 0.3863636 0.1441814 0.3997344
3 0.1363636 0.1276526 0.1304788
4 0.2045455 0.3466160 0.2088087

Table 10.

For Case I:Q1 = 1.0, Q2 = 1.0, Q3 = 1.0, and Q4 = 1.0. Subsequently, Bias and PRE were obtained from simulated data following a Normal distribution, where p1, p2, and p3 represent the probabilities of non-response among the (nk - 2), (mk - 2), and (uk - 2) possible values of non-responses, respectively.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20
0.05 0.05 177.6 172.9 168.8 165.3 0.0004008 0.0003805 0.0003597 0.0003385
0.05 0.10 182.6 177.9 173.8 170.4 0.0004008 0.0003805 0.0003597 0.0003385
0.05 0.15 187.8 183.1 179.1 175.8 0.0004008 0.0003805 0.0003597 0.0003385
0.05 0.20 193.2 188.6 184.7 181.4 0.0004008 0.0003805 0.0003597 0.0003385
0.10 0.05 172.9 168.1 163.8 160.2 0.0004008 0.0003805 0.0003597 0.0003385
0.10 0.10 177.8 172.9 168.7 165.1 0.0004008 0.0003805 0.0003597 0.0003385
0.10 0.15 182.8 178.0 173.8 170.3 0.0004008 0.0003805 0.0003597 0.0003385
0.10 0.20 188.1 183.3 179.2 175.7 0.0004008 0.0003805 0.0003597 0.0003385
0.15 0.05 168.5 163.5 159.1 155.3 0.0004008 0.0003805 0.0003597 0.0003385
0.15 0.10 173.2 168.2 163.8 160.1 0.0004008 0.0003805 0.0003597 0.0003385
0.15 0.15 178.1 173.1 168.8 165.1 0.0004008 0.0003805 0.0003597 0.0003385
0.15 0.20 183.2 178.3 174.0 170.3 0.0004008 0.0003805 0.0003597 0.0003385
0.20 0.05 164.3 159.2 154.7 150.8 0.0004008 0.0003805 0.0003597 0.0003385
0.20 0.10 168.9 163.7 159.2 155.3 0.0004008 0.0003805 0.0003597 0.0003385
0.20 0.15 173.6 168.5 164.0 160.2 0.0004008 0.0003805 0.0003597 0.0003385
0.20 0.20 178.6 173.5 169.0 165.2 0.0004008 0.0003805 0.0003597 0.0003385

Table 11.

For Case II, the values are Q1 = 3.80, Q2 = 2.53, Q3 = 7.60, and Q4 = 4.75. Subsequently, Bias and PRE were obtained from simulated data following a Normal distribution.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20
0.05 0.05 178.7 173.4 168.8 164.9 0.0004473 0.0004280 0.000408 0.0003874
0.05 0.10 183.0 177.7 173.2 169.3 0.0004473 0.0004280 0.000408 0.0003874
0.05 0.15 187.4 182.2 177.7 173.8 0.0004473 0.0004280 0.000408 0.0003874
0.05 0.20 192.0 186.8 182.4 178.6 0.0004473 0.0004280 0.000408 0.0003874
0.10 0.05 174.7 169.3 164.6 160.5 0.0004473 0.0004280 0.000408 0.0003874
0.10 0.10 178.9 173.5 168.8 164.7 0.0004473 0.0004280 0.000408 0.0003874
0.10 0.15 183.2 177.8 173.2 169.2 0.0004473 0.0004280 0.000408 0.0003874
0.10 0.20 187.7 182.3 177.7 173.8 0.0004473 0.0004280 0.000408 0.0003874
0.15 0.05 170.9 165.4 160.5 156.2 0.0004473 0.0004280 0.000408 0.0003874
0.15 0.10 175.0 169.4 164.6 160.4 0.0004473 0.0004280 0.000408 0.0003874
0.15 0.15 179.2 173.6 168.8 164.7 0.0004473 0.0004280 0.000408 0.0003874
0.15 0.20 183.6 178.1 173.3 169.2 0.0004473 0.0004280 0.000408 0.0003874
0.20 0.05 167.3 161.6 156.6 152.2 0.0004473 0.0004280 0.000408 0.0003874
0.20 0.10 171.2 165.6 160.6 156.2 0.0004473 0.0004280 0.000408 0.0003874
0.20 0.15 175.3 169.7 164.7 160.4 0.0004473 0.0004280 0.000408 0.0003874
0.20 0.20 179.6 173.9 169.0 164.7 0.0004473 0.0004280 0.000408 0.0003874

Table 12.

For Case III:Q1=11.764729, Q2=9.509398, Q3= 8.948854, and Q4= 10.113946. Subsequently, Bias and PRE were obtained from simulated data following a Normal distribution.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20
0.05 0.05 177.5 172.8 168.8 165.3 0.0003978 0.000377 0.0003566 0.0003354
0.05 0.10 182.5 177.8 173.9 170.5 0.0003978 0.000377 0.0003566 0.0003354
0.05 0.15 187.8 183.1 179.2 175.9 0.0003978 0.000377 0.0003566 0.0003354
0.05 0.20 193.3 188.7 184.8 181.6 0.0003978 0.000377 0.0003566 0.0003354
0.10 0.05 172.8 168.0 163.8 160.2 0.0003978 0.000377 0.0003566 0.0003354
0.10 0.10 177.7 172.8 168.7 165.1 0.0003978 0.000377 0.0003566 0.0003354
0.10 0.15 182.8 178.0 173.9 170.4 0.0003978 0.000377 0.0003566 0.0003354
0.10 0.20 188.1 183.3 179.3 175.9 0.0003978 0.000377 0.0003566 0.0003354
0.15 0.05 168.3 163.3 159.0 155.3 0.0003978 0.000377 0.0003566 0.0003354
0.15 0.10 173.0 168.1 163.8 160.0 0.0003978 0.000377 0.0003566 0.0003354
0.15 0.15 178.0 173.0 168.8 165.1 0.0003978 0.000377 0.0003566 0.0003354
0.15 0.20 183.2 178.2 174.0 170.4 0.0003978 0.000377 0.0003566 0.0003354
0.20 0.05 164.1 159.0 154.6 150.7 0.0003978 0.000377 0.0003566 0.0003354
0.20 0.10 168.7 163.6 159.1 155.3 0.0003978 0.000377 0.0003566 0.0003354
0.20 0.15 173.5 168.4 164.0 160.1 0.0003978 0.000377 0.0003566 0.0003354
0.20 0.20 178.5 173.4 169.0 165.2 0.0003978 0.000377 0.0003566 0.0003354

Table 13.

For Case IV:Q1= 0.9961224, Q2=1.0018931, Q3=1.0054827, and Q4= 0.9811516. Subsequently, Bias and PRE were obtained from simulated data following a Normal distribution.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20
0.05 0.05 177.6 172.9 168.8 165.3 0.0003995 0.0003792 0.0003584 0.0003372
0.05 0.10 182.6 177.9 173.8 170.4 0.0003995 0.0003792 0.0003584 0.0003372
0.05 0.15 187.8 183.1 179.2 175.8 0.0003995 0.0003792 0.0003584 0.0003372
0.05 0.20 193.3 188.6 184.7 181.5 0.0003995 0.0003792 0.0003584 0.0003372
0.10 0.05 172.9 168.0 163.8 160.2 0.0003995 0.0003792 0.0003584 0.0003372
0.10 0.10 177.7 172.9 168.7 165.1 0.0003995 0.0003792 0.0003584 0.0003372
0.10 0.15 182.8 178.0 173.8 170.3 0.0003995 0.0003792 0.0003584 0.0003372
0.10 0.20 188.1 183.3 179.3 175.8 0.0003995 0.0003792 0.0003584 0.0003372
0.15 0.05 168.4 163.4 159.1 155.3 0.0003995 0.0003792 0.0003584 0.0003372
0.15 0.10 173.1 168.1 163.8 160.1 0.0003995 0.0003792 0.0003584 0.0003372
0.15 0.15 178.0 173.1 168.8 165.1 0.0003995 0.0003792 0.0003584 0.0003372
0.15 0.20 183.2 178.3 174.0 170.4 0.0003995 0.0003792 0.0003584 0.0003372
0.20 0.05 164.2 159.1 154.7 150.7 0.0003995 0.0003792 0.0003584 0.0003372
0.20 0.10 168.8 163.7 159.2 155.3 0.0003995 0.0003792 0.0003584 0.0003372
0.20 0.15 173.6 168.4 164.0 160.1 0.0003995 0.0003792 0.0003584 0.0003372
0.20 0.20 178.6 173.5 169.0 165.2 0.0003995 0.0003792 0.0003584 0.0003372

Table 14.

In the absence of non-response, bias and PRE are observed from simulated data following a Normal distribution when p1= p2= p3=0.

Stratum PRE Bias
Case I 182.8948 0.000407048
Case II 184.5569 0.0004524072
Case III 182.7551 0.0004040932
Case IV 182.8089 0.0004057739

9.2. Study based on real data

This illustrates the application of the suggested class of estimators. The data is available in the UCI machine learning repository dataset named “Gas Turbine CO and NOx Emission Data Set,” which consists of 36733 instances of 11 sensor measurements from a gas turbine in Turkey's northwestern region. These measurements were aggregated across an hour (by means of average or sum) to analyze CO and NOx (NO + NO2) flue gas emissions. For this analysis, the gt2011.csv file was chosen.

The study used the following study and auxiliary variables:

  • Y:

    Gas turbine exhaust pressure (GTEP)

  • X:

    Air filter difference pressure (AFDP)

  • Z:

    Turbine inlet temperature (TIT)

The strata were formed based on the Ambient temperature (AT) into three categories:

  • Stratum 1:

    from 2.1163-12.707 C

  • Stratum 2:

    from 12.708-21.759 C

  • Stratum 3:

    from 21.760-34.532 C

To estimate the population variance in income among college graduates in a region, non-response bias may be corrected using calibrated weights in stratified successive sampling. This process involves adjusting the sampling weights based on response probabilities. The population parameters for the real data are displayed in Table 15. Calibrated weights are presented in Fig. 1 and Table 16, while the effects of controlling parameter Qk with random non-response are shown in Figure 2, Figure 3, Figure 4, Figure 5 and Table 17, Table 18, Table 19, Table 20. Table 21 and Figure 6, Figure 7 illustrate the scenario without random non-response.

Table 15.

The statistical parameters corresponding to the real data.

Stratum Nk nk r1k mk r2k uk ρxy ρyz ρzx
Strata 1 2500 875 125 625 125 250 0.8999226 0.8283662 0.7473441
Strata 2 2600 910 130 650 130 260 0.8914957 0.9164524 0.8335361
Strata 3 2311 809 116 578 116 231 0.9689409 0.9337467 0.9209707

Figure 1.

Figure 1

Calibrated strata weights for real data.

Table 16.

Calibrated strata weights for real data.

Case Stratum Wk Ωk Ωk
I 1 0.3373364 0.4930872 0.3101796
2 0.3508298 0.3163187 0.4557245
3 0.3118338 0.1905941 0.2340959



II 1 0.3373364 0.4930873 0.3137647
2 0.3508298 0.3163187 0.4546208
3 0.3118338 0.1905940 0.2316144



III 1 0.3373364 0.4930873 0.3100504
2 0.3508298 0.3163187 0.4557643
3 0.3118338 0.1905940 0.2341853



IV 1 0.3373364 0.4930873 0.3195387
2 0.3508298 0.3163187 0.4528433
3 0.3118338 0.1905940 0.2276180

Figure 2.

Figure 2

For Case I:Q1 = 1.0, Q2 = 1.0, and Q3 = 1.0. Subsequently, Bias (shown by the red line) and PRE (blue line) were obtained from real data.

Figure 3.

Figure 3

For Case II:Q1=2.96, Q2= 2.85 and Q3= 3.21. Subsequently, Bias (shown by the red line) and PRE (blue line) were obtained from real data.

Figure 4.

Figure 4

For Case III:Q1=0.0009208365, Q2= 0.0009248637, and Q3=0.0009196857. Subsequently, Bias (shown by the red line) and PRE (blue line) were obtained from real data.

Figure 5.

Figure 5

For Case IV:Q1=0.0045, Q2= 0.0031, and Q3= 0.0046. Subsequently, Bias (shown by the red line) and PRE (blue line) were obtained from real data.

Table 17.

For Case I:Q1=1.0, Q2= 1.0 and Q3= 1.0. Subsequently, Bias and PRE were obtained from real data.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 p3
0.05 0.05 143.7177 146.5793 149.5242 152.5562 0.01287393
0.05 0.10 150.4350 153.5733 156.8091 160.1470 0.01287393
0.05 0.15 157.5665 161.0128 164.5733 168.2538 0.01287393
0.05 0.20 165.1517 168.9418 172.8659 176.9313 0.01287393
0.10 0.05 135.8471 138.5520 141.3357 144.2016 0.01287393
0.10 0.10 142.1950 145.1614 148.2200 151.3750 0.01287393
0.10 0.15 148.9341 152.1916 155.5571 159.0360 0.01287393
0.10 0.20 156.1018 159.6842 163.3933 167.2359 0.01287393
0.15 0.05 128.5160 131.0749 133.7084 136.4196 0.01287393
0.15 0.10 134.5185 137.3248 140.2182 143.2030 0.01287393
0.15 0.15 140.8907 143.9722 147.1559 150.4469 0.01287393
0.15 0.20 147.6676 151.0564 154.5651 158.2001 0.01287393
0.20 0.05 121.6989 124.1220 126.6158 129.1832 0.01287393
0.20 0.10 127.3784 130.0357 132.7756 135.6019 0.01287393
0.20 0.15 133.4072 136.3251 139.3397 142.4558 0.01287393
0.20 0.20 139.8183 143.0270 146.3492 149.7909 0.01287393

Table 18.

For Case II:Q1=2.964400, Q2= 2.850385 and Q3= 3.206837. Subsequently, Bias and PRE were obtained from real data.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 p3
0.05 0.05 143.7036 146.5620 149.5034 152.5316 0.01288324
0.05 0.10 150.4240 153.5588 156.7909 160.1249 0.01288324
0.05 0.15 157.5591 161.0018 164.5585 168.2348 0.01288324
0.05 0.20 165.1485 168.9349 172.8550 176.9159 0.01288324
0.10 0.05 135.8289 138.5306 141.3108 144.1731 0.01288324
0.10 0.10 142.1795 145.1425 148.1974 151.3487 0.01288324
0.10 0.15 148.9218 152.1757 155.5374 159.0122 0.01288324
0.10 0.20 156.0932 159.6719 163.3770 167.2153 0.01288324
0.15 0.05 128.4939 131.0497 133.6798 136.3875 0.01288324
0.15 0.10 134.4987 137.3017 140.1916 143.1726 0.01288324
0.15 0.15 140.8736 143.9517 147.1317 150.4187 0.01288324
0.15 0.20 147.6538 151.0391 154.5439 158.1746 0.01288324
0.20 0.05 121.6731 124.0933 126.5838 129.1477 0.01288324
0.20 0.10 127.3546 130.0087 132.7451 135.5678 0.01288324
0.20 0.15 133.3858 136.3003 139.3112 142.4235 0.01288324
0.20 0.20 139.7997 143.0049 146.3233 149.7609 0.01288324

Table 19.

For Case III:Q1=0.0009208365, Q2= 0.0009248637, and Q3=0.0009196857. Subsequently, Bias and PRE were obtained from real data.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 p3
0.05 0.05 143.7183 146.5800 149.5251 152.5571 0.01287356
0.05 0.10 150.4355 153.5739 156.8098 160.1478 0.01287356
0.05 0.15 157.5669 161.0133 164.5739 168.2546 0.01287356
0.05 0.20 165.1519 168.9422 172.8664 176.9319 0.01287356
0.10 0.05 135.8478 138.5528 141.3366 144.2026 0.01287356
0.10 0.10 142.1957 145.1622 148.2208 151.3760 0.01287356
0.10 0.15 148.9347 152.1923 155.5579 159.0369 0.01287356
0.10 0.20 156.1022 159.6848 163.3940 167.2367 0.01287356
0.15 0.05 128.5169 131.0759 133.7094 136.4208 0.01287356
0.15 0.10 134.5193 137.3257 140.2192 143.2041 0.01287356
0.15 0.15 140.8914 143.9730 147.1569 150.4480 0.01287356
0.15 0.20 147.6682 151.0571 154.5660 158.2011 0.01287356
0.20 0.05 121.6998 124.1231 126.6169 129.1845 0.01287356
0.20 0.10 127.3793 130.0367 132.7767 135.6031 0.01287356
0.20 0.15 133.4080 136.3260 139.3407 142.4570 0.01287356
0.20 0.20 139.8190 143.0279 146.3502 149.7921 0.01287356

Table 20.

For Case IV:Q1=0.004542526, Q2= 0.003128517, and Q3= 0.004646731. Subsequently, Bias and PRE were obtained from real data.

PRE
Bias
p1 p2 p3=0.05 0.10 0.15 0.20 p3
0.05 0.05 143.6914 146.5429 149.4771 152.4975 0.01289368
0.05 0.10 150.4193 153.5470 156.7715 160.0972 0.01289368
0.05 0.15 157.5632 160.9985 164.5471 168.2148 0.01289368
0.05 0.20 165.1629 168.9416 172.8533 176.9052 0.01289368
0.10 0.05 135.8075 138.5026 141.2758 144.1305 0.01289368
0.10 0.10 142.1648 145.1209 148.1684 151.3116 0.01289368
0.10 0.15 148.9149 152.1616 155.5155 158.9819 0.01289368
0.10 0.20 156.0955 159.6667 163.3637 167.1931 0.01289368
0.15 0.05 128.4640 131.0134 133.6366 136.3370 0.01289368
0.15 0.10 134.4747 137.2709 140.1536 143.1268 0.01289368
0.15 0.15 140.8566 143.9276 147.1000 150.3788 0.01289368
0.15 0.20 147.6450 151.0228 154.5196 158.1418 0.01289368
0.20 0.05 121.6354 124.0492 126.5330 129.0899 0.01289368
0.20 0.10 127.3220 129.9695 132.6988 135.5139 0.01289368
0.20 0.15 133.3592 136.2668 139.2703 142.3746 0.01289368
0.20 0.20 139.7804 142.9784 146.2889 149.7181 0.01289368

Table 21.

In the absence of non-response, Bias and PRE are observed from real data when p1= p2= p3=0.

Stratum PRE Bias
Case I 145.1036 0.01079086
Case II 145.0877 0.01079694
Case III 145.1042 0.01079061
Case IV 145.0727 0.01080245

Figure 6.

Figure 6

In the absence of non-response, PRE is observed from real data when p1 = p2 = p3 = 0.

Figure 7.

Figure 7

In the absence of non-response, biases are observed in real data when p1 = p2 = p3 = 0.

10. Discussion and conclusions

Based on the empirical results presented above:

  • 1.

    Table 2, Table 9, Table 16, along with Fig. 1, demonstrate that calibrated strata weights closely resemble the original weights, particularly for the fresh sample. The estimator effectively mitigates the adverse effects of non-responses, leading to negligible bias across all choices of Qk. This study underscores the improved estimation of population variance in stratified successive sampling using calibrated weights in the presence of non-response.

  • 2.

    The results from Table 3, Table 4, Table 5, Table 6, presenting simulated data, demonstrate the effectiveness of the proposed method in reducing non-response bias in practical scenarios. The negligible bias observed in the estimator, with values of 104 for simulated data from a Poisson distribution, and similarly observed in Table 10, Table 11, Table 12, Table 13 for simulated data from a Normal distribution, indicates a favorable outcome of our research in addressing non-response cases. Additionally, the findings from Table 17, Table 18, Table 19, Table 20 and Figure 2, Figure 3, Figure 4, Figure 5, depicting real data, confirm the consistency of our approach. The bias, with values of the order of 102, emphasizes the significance of our approach in minimizing bias resulting from non-response.

  • 3.

    Table 3, Table 4, Table 5, Table 6, presenting simulated data from the Poisson distribution, and Table 10, Table 11, Table 12, Table 13, showcasing simulated data from the Normal distribution, demonstrate that the proposed estimator exhibits lower mean squared error (MSE) and higher percentage relative efficiency (PRE) compared to the standard estimator when dealing with random non-response. Similarly, Table 17, Table 18, Table 19, Table 20 and Figure 2, Figure 3, Figure 4, Figure 5, based on real data, confirm the same observation, with the percentage relative efficiency (PRE) exceeding 100. These findings indicate that the proposed method is more effective.

  • 4.

    Table 7, Table 14, Table 21, and Figure 6, Figure 7 collectively demonstrate that the proposed estimator exhibits negligible bias and higher percentage relative efficiency (PRE) compared to the standard estimator in the absence of non-response. This observation suggests that the proposed method is more effective, even in scenarios without non-response.

  • 5.

    In a simulation study, we may observe that as the correlation coefficient increases, the percentage relative efficiency (PRE) also increases, while the bias decreases. Conversely, when we decrease the correlation coefficient, the PRE decreases, and the bias increases.

  • 6.

    The findings suggest that for a fixed value of p1 and p2, as the value of p3 increases, both the bias and percentage relative efficiency (PRE) decrease for simulated data. Conversely, for real data, while the bias remains fixed, the percentage relative efficiency (PRE) increases. Similarly, for a fixed value of p1 and p3, increasing p2 results in a constant bias but an increasing percentage relative efficiency (PRE) for both simulated and real data. Furthermore, increasing p1 while keeping p2 and p3 fixed leads to a constant bias but a decreasing percentage relative efficiency (PRE) for both simulated and real data. Additionally, our research reveals that as the non-response rate of p2 increases, the percentage relative efficiency (PRE) also increases, as observed in Table 17, Table 18, Table 19, Table 20 and Figure 2, Figure 3, Figure 4, Figure 5 (real data), as well as in Table 3, Table 4, Table 5, Table 6 and 10-13 (simulated data). These findings highlight an important outcome of our study.

The results obtained from the analysis of the “Gas Turbine CO and NOx Emission Data Set” and simulation studies demonstrate the effectiveness of the proposed methodology in correcting non-response bias and obtaining more accurate estimates. By employing the suggested class of estimators and utilizing calibrated weights, the estimation of population variance may be performed on other real-world data sets as well, such as the estimation of the variance in income among college graduates may be significantly improved. Survey statisticians are recommended to use the estimator for their practical applications in such cases.

CRediT authorship contribution statement

M.K. Pandey: Writing – review & editing, Writing – original draft, Resources, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. G.N. Singh: Visualization, Validation, Supervision. Tolga Zaman: Writing – review & editing. Aned Al Mutairi: Software, Writing – review & editing. Manahil SidAhmed Mustafa: Writing – review & editing.

Declaration of Competing Interest

The authors have no conflicts of interest to disclose.

Acknowledgements

We are thankful to the IIT (ISM) Dhanbad for providing the infrastructural and financial support to accomplish the present work. Additionally, We are also thankful to the esteemed editor and reviewers for their encouraging and valuable suggestions, which greatly contributed in improving the manuscript upto the current level.

Appendix A. Deriving calibrated strata weights

The calibrated weights are determined by minimizing the Lagrange function while satisfying the given constraints. The Lagrange function, considering the chi-square distance measure and the calibration constraints, is formulated as follows:

Lm=k=1L(ΩkWk)2QkWk2l1m(k=1LΩk1)2l2m(k=1LΩklog(z¯nk)k=1LWklog(Z¯k))2l3m(k=1LΩkcxmkr2kk=1LWkcxnkr1k) (21)

where l1m, l2m and l3m are Lagrange multipliers.

Differentiating the Lagrange function with respect to Ω and setting the derivative to zero yields the calibrated weight equation:

Ωk=Wk+(l1m+l2mlog(z¯nk)+l3mcxmkr2k)WkQk (22)

By substituting the derived Ω values into the given calibration constraints, a matrix equation is obtained:

[totamtotbmtotcmtotbmtotemtotfmtotcmtotfmtothm][l1ml2ml3m]=[totdmtotgmtotim]

The solution of the matrix equation provides the Lagrange multipliers, which can be expressed as follows:

l1m=detαmdetm,l2m=detβmdetm,&l3m=detγmdetm, (23)

where the determinants are given by:

detm=totamtotemtothmtotamtotfm2totbm2tothm+2totbmtotcmtotfmtotemtotcm2 (24)
detαm=totdmtotemtothmtotdmtotfm2totbmtotgmtothm+totbmtotimtotfm+totcmtotgmtotfmtotcmtotimtotem (25)
detβm=totamtotgmtothmtotamtotimtotfmtotbmtotdmtothm+totcmtotdmtotfm+totbmtotcmtotimtotcm2totgm (26)
detγm=totamtotemtotimtotamtotgmtotfmtotbm2totim+totbmtotcmtotgm+totbmtotdmtotfmtotcmtotdmtotem (27)

Now, let us define the term totam, totbm, totcm, totdm, totem, totfm, totgm, tothm, totim as follows:

totam=k=1LWkQk totbm=k=1LWkQklog(z¯nk)
totcm=k=1LWkQkcxmkr2k totdm=1-k=1LWk
totem=k=1LWkQk(log(z¯nk))2 totfm=k=1LWkQklog(z¯nk)cxmkr2k
totgm=k=1LWk(log(Z¯k)log(z¯nk)) tothm=k=1LWkQkcxmkr2k2
totim=k=1LWkcxnkr1kk=1LWkcxmkr2k

Similarly, the calibrated weights for the Tu estimator are obtained by minimizing the Lagrange function while satisfying the given calibration constraints. The Lagrange function, considering the chi-square distance measure and the calibration constraints, is expressed as follows:

Lu=k=1L(ΩkWk)2QkWk2l1u(k=1LΩk1)2l2u(k=1LΩkcz¯ukCZ) (28)

where l1u and l2u are Lagrange multipliers.

Differentiating the Lagrange function with respect to Ω and setting the derivative to zero gives the following calibrated weight equation:

Ωk=Wk+(l1u+l2ucz¯uk)WkQk (29)

By substituting the derived Ω values into the given calibration constraints, a matrix equation is formed:

[totautotbutotbutotdu][l1ul2u]=[totcutoteu]

Solving this matrix equation provides the Lagrange multipliers, given by:

l1u=detαudetu,&l2u=detβudetu (30)

where the determinants are calculated as follows:

detu=totautotdutotbu2 (31)
detαu=totcutotdutotbutoteu (32)
detβu=totautoteutotbutotcu (33)

Now, let us define the term totau, totbu, totcu, totdu, toteu as follows:

totau=k=1LWkQk totbu=k=1LWkQkcz¯uk
totcu=1-k=1LWk totdu=k=1LWkQkcz¯uk2
toteu=CZk=1LWkcz¯uk

The calibrated weights for the estimators of population variance, Tm and Tu, are determined using these Lagrange multipliers and the initial weights Wk.

Appendix B. Deriving bias and MSE of the proposed estimator

The expectations obtained after applying the transformations specified in section 7 are as follows:

E(ϵlk)=0 and |ϵlk|1, ∀ l= 0, 1,...,7.

E(ϵ0k2)=f3k(λ400k − 1) E(ϵ1k2)=f8k(λ004k − 1) E(ϵ2k2)=f10k(λ400k − 1)
E(ϵ3k2)=f9k(λ004k − 1) E(ϵ4k2)=f3k(λ040k − 1) E(ϵ0kϵ2k)=f3k(λ400k − 1)
E(ϵ0kϵ4k)=f3k(λ220k − 1) E(ϵ2kϵ4k)=f3k(λ220k − 1) E(ϵ3kϵ4k)=f3k(λ022k − 1)
E(ϵ1kϵ4k)=f8k(λ022k − 1) E(ϵ0kϵ3k)=f3k(λ202k − 1) E(ϵ2kϵ3k)=f2k(λ202k − 1)
E(ϵ0kϵ1k)=f8k(λ202k − 1) E(ϵ1kϵ2k)=f8k(λ202k − 1) E(ϵ1kϵ3k)=f8k(λ004k − 1)
E(ϵ5k2)=f6k(λ040k − 1) E(ϵ6k2)=f7k(λ004k − 1) E(ϵ5kϵ6k)=f7k(λ022k − 1)
E(ϵ7k2)=f10k(λ040k − 1)

Where the notations were previously defined in section 7.

f1k=(1nkq1+2p11nk),f2k=(1mk1nk),f3k=(1nkq1+2p11Nk)f4k=(1nkq1+2p11mk),f5k=(1mkq2+2p21nkq1+2p1),f6k=(1ukq3+2p31Nk)f7k=(1uk1Nk),f8k=(1nk1Nk),f9k=(1mk1Nk)f10k=(1mkq2+2p21Nk)

and

λαβγk=μαβγkμ200kαμ020kβμ002kγ,μαβγk=1Nk1j=1Nk(YkjY¯k)α(XkjX¯k)β(ZkjZ¯k)γ

Remark

When the data are free of random non-response, or for p1=0 and p2=0, the above assumptions agree with the typical results.

By applying these transformations to sxnk2 and sxmk2, we obtain:

sxnk2=SXNk2(1+ϵ01)+|ϵ1k||ϵ1k2|2+|ϵ1k3|3+... (34)
sxmk2=SXNk2(1+ϵ02)+|ϵ3k||ϵ3k2|2+|ϵ3k3|3+... (35)

By utilizing Equations (34) and (35), we may transform Equation (1) into:

Tmk=SYNk2(1+ϵ4k)+ak[SXNk2(ϵ0kϵ2k)+(ϵ1kϵ3k)(ϵ1k2ϵ3k2)2SXNk4(ϵ0kϵ2k)22(ϵ3kϵ1k)22SXNk2(ϵ0kϵ1kϵ0kϵ3kϵ2kϵ1k+ϵ2kϵ3k)2] (36)

We obtain, based on equations (3) and (36), the following

Tm=k=1LΩ2SYNk2+k=1LΩ2SYNk2ϵ4k+k=1LakΩ2[SXNk2(ϵ0kϵ2k)+(ϵ1kϵ3k)(ϵ1k2ϵ3k2)2SXNk4(ϵ0kϵ2k)22(ϵ3kϵ1k)22SXNk2(ϵ0kϵ1kϵ0kϵ3kϵ2kϵ1k+ϵ2kϵ3k)2] (37)

Given that the calibrated weight Ωk is in such close proximity to the strata weight Wk, it is reasonable to make the assumption that...

k=1LΩk2SYNk2k=1LWk2SYNk2=σY2

In order for Equation (37) to be transformed into

Tm=σY2+k=1LΩ2SYNk2ϵ4k+k=1LakΩ2[SXNk2(ϵ0kϵ2k)+(ϵ1kϵ3k)(ϵ1k2ϵ3k2)2SXNk4(ϵ0kϵ2k)22(ϵ3kϵ1k)22SXNk2(ϵ0kϵ1kϵ0kϵ3kϵ2kϵ1k+ϵ2kϵ3k)2] (38)

In the same way, when we apply the transformations to Equation (2), we obtain:

Tuk=SYNk2(1+ϵ5k)+bk|ϵ6k|bk|ϵ6k2|2+... (39)

By utilizing Equation (4) and (39), we obtain:

Tu=k=1LΩk2SYNk2+k=1LΩk2[SYNk2ϵ5k+bkϵ6kbkϵ6k22+...] (40)

Given that the calibrated weight Ωk is in such close proximity to the strata weight Wk, it is reasonable to make the assumption that...

k=1LΩk2SYNk2k=1LWk2SYNk2=σY2

In order for Equation (40) to be transformed into

Tu=σY2+k=1LΩk2[SYNk2ϵ5k+bkϵ6kbkϵ6k22+...] (41)

The equations for the bias of the proposed estimator Tm and Tu, as well as the mean squared error (MSE) of Tm and Tu, are derived by taking expectations on both sides of Equations (38) and (41).

Data availability statement

Data will be made available on request.

References

  • 1.Ahmad S., Hussain S., Shabbir J., Zahid E., Aamir M., Onyango R., et al. Improved estimation of finite population variance using dual supplementary information under stratified random sampling. Math. Probl. Eng. 2022;2022 [Google Scholar]
  • 2.Alam S., Shabbir J. Calibration estimation of mean by using double use of auxiliary information. Commun. Stat., Simul. Comput. 2022;51(8):4769–4787. [Google Scholar]
  • 3.Alam S., Singh S., Shabbir J. Optimal calibrated weights while minimizing a variance function. Commun. Stat., Theory Methods. 2023;52(5):1634–1651. [Google Scholar]
  • 4.Aslam I., Noor-ul Amin M., Hanif M., Sharma P. Memory type ratio and product estimators under ranked-based sampling schemes. Commun. Stat., Theory Methods. 2023;52(4):1155–1177. [Google Scholar]
  • 5.Aslam I., Noor-ul Amin M., Yasmeen U., Hanif M. Memory type ratio and product estimators in stratified sampling. J. Reliab. Stat. Stud. 2020:1–20. [Google Scholar]
  • 6.Basit Z., Bhatti M.I. Efficient classes of estimators of population variance in two-phase successive sampling under random non-response. Statistica. 2022;82(2):177–198. [Google Scholar]
  • 7.Deming W.E., Stephan F.F. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Stat. 1940;11(4):427–444. [Google Scholar]
  • 8.Deville J.C., Särndal C.E. Calibration estimators in survey sampling. J. Am. Stat. Assoc. 1992;87(418):376–382. [Google Scholar]
  • 9.El-Sheikh A.A., El-Kossaly H.A. Calibration estimation for ratio estimators in stratified sampling for proportion allocation. J. Progressive Res. Math. 2020;16(4):3199–3205. [Google Scholar]
  • 10.Farrell P.J., Singh S. Model-assisted higher-order calibration of estimators of variance. Aust. N. Z. J. Stat. 2005;47(3):375–383. [Google Scholar]
  • 11.Kim J.K., Park M. Calibration estimation in survey sampling. Int. Stat. Rev. 2010;78(1):21–39. [Google Scholar]
  • 12.Kim J.M., Sungur E.A., Heo T.Y. Calibration approach estimators in stratified sampling. Stat. Probab. Lett. 2007;77(1):99–103. [Google Scholar]
  • 13.Koyuncu N., Kadilar C. Calibration weighting in stratified random sampling. Commun. Stat., Simul. Comput. 2016;45(7):2267–2275. [Google Scholar]
  • 14.Krewski D., Hocking R.R. Estimation of population variance in stratified successive sampling. J. Am. Stat. Assoc. 1978;73(361):137–141. [Google Scholar]
  • 15.Masry E.S., Hedayat A.S. Estimation of population variance in stratified successive sampling with equal probabilities. J. Stat. Plan. Inference. 1987;15(1):75–88. [Google Scholar]
  • 16.Neyman J., Vereshchagin N. On the use of stratified sampling in the estimation of population parameters. Ann. Math. Stat. 1938;9(3):293–296. [Google Scholar]
  • 17.Özgül N. New calibration estimator in stratified sampling. J. Stat. Comput. Simul. 2018;88(13):2561–2572. [Google Scholar]
  • 18.Pandey M.K., Singh G.N., Zaman T., Mutairi A.A., Mustafa M.S.A. A general class of improved population variance estimators under non-sampling errors using calibrated weights in stratified sampling. Sci. Rep. 2024;14(1):2948. doi: 10.1038/s41598-023-47234-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Raghunathan T.E., Grizzle J.E. Estimation of population variance in stratified successive sampling with unequal probabilities. Biometrika. 1995;82(3):519–528. [Google Scholar]
  • 20.Särndal C.E. The calibration approach in survey theory and practice. Surv. Methodol. 2007;33(2):99–119. [Google Scholar]
  • 21.Shahzad U., Ahmad I., García-Luengo A.V., Zaman T., Al-Noor N.H., Kumar A. Estimation of coefficient of variation using calibrated estimators in double stratified random sampling. Mathematics. 2023;11(1) [Google Scholar]
  • 22.Singh G.N., Bhattacharyya D., Bandyopadhyay A. Calibration estimation of population variance under stratified successive sampling in presence of random non response. Commun. Stat., Theory Methods. 2021;50(19):4487–4509. [Google Scholar]
  • 23.Singh G.N., Sharma A.K., Bandyopadhyay A. Effectual variance estimation strategy in two-occasion successive sampling in presence of random non response. Commun. Stat., Theory Methods. 2017;46(14):7201–7224. [Google Scholar]
  • 24.Singh S., Joarder A.H. Estimation of finite population variance using random non-response in survey sampling. Metrika. 1998;47:241–249. [Google Scholar]
  • 25.Srivastava D.K. Estimation of population variance in stratified successive sampling with unequal probabilities. J. Am. Stat. Assoc. 1987;82(398):516–523. [Google Scholar]
  • 26.Sud U.C., Chandra H., Gupta V.K. Calibration approach-based regression-type estimator for inverse relationship between study and auxiliary variable. J. Stat. Theory Pract. 2014;8(4):707–721. [Google Scholar]
  • 27.Zaman T., Bulut H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat., Theory Methods. 2021:1–15. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES