Abstract
This paper introduces a new method to estimate the population variance of a study variable in stratified successive sampling over two occasions, while accounting for random non-response. The method uses a logarithmic type estimator and leverages information from a highly positively correlated auxiliary variable. The paper also presents calibrated weights for the new estimator and examines its properties through numerical and simulation studies. The results indicate that the suggested estimator is more effective than the standard estimator for estimating the population variance.
MSC: 62D05
Keywords: Stratified sampling, Successive sampling, Auxiliary information, Bias, Mean square error, Calibration technique, Random non-response
1. Introduction
Stratified successive sampling is a statistical method employed to estimate population variance in situations where obtaining a random sample from the population proves to be challenging or cost-prohibitive. This technique involves dividing the population into distinct strata and selecting individuals randomly from within these strata. Successive sampling proves particularly advantageous in socio-economic research, where populations may be widely dispersed or access may be limited. The concept of stratified sampling for estimating population parameters, including population variance, was first introduced by [16]. Subsequently, [14] extended this method for estimating population variance in stratified successive sampling, while [19] and [25] explored population variance estimation in stratified successive sampling scenarios with unequal probabilities while [15] did the same with equal probability. [23] developed an effectual variance estimation strategy for two-occasion successive sampling, accounting for random non-response, whereas [27] worked on a family of robust-type estimators for population variance in simple and stratified random sampling. [1] improved the estimation of finite population variance using dual supplementary information under stratified random sampling, and [6] introduced efficient classes of estimators of population variance in two-phase successive sampling scenarios with random non-response. Moreover, [5] and [4] contributed to the field by developing memory-type ratio and product estimators under ranked-based sampling schemes, among others provide a comprehensive foundation for the utilization of stratified successive sampling and calibration approaches, enhancing the proposed method's accuracy and applicability in estimating population variance, especially under non-response.
[7] introduced a calibration approach using least squares adjustment, a method later adopted by statistical authorities in various organizations. The primary aim of the calibration approach is to formulate unbiased estimation procedures with minimal dispersion by leveraging information from auxiliary variables. [8] proposed a calibration estimation procedure that reduces the discrepancy between initial and final weights while still adhering to calibration equations and constraints. [10] explored model-assisted higher-order calibration of variance estimators, and [20] applied the calibration approach in survey theory and practice. [12] and [11] contributed to calibration approach estimators in stratified sampling. [26] introduced a calibration approach-based regression-type estimator to model the inverse relationship between study and auxiliary variables, and [13] applied calibration weighting in stratified random sampling. [17] introduced a new calibration estimator specifically designed for stratified sampling, and [9] developed calibration estimation for ratio estimators in stratified sampling with proportion allocation. [22] extended the calibration approach to estimate population variance in stratified successive sampling while accounting for random non-response. Recently, [2] conducted work on estimating the calibration of the mean by employing a double use of auxiliary information. Additionally, [3] focused on determining optimal calibrated weights while minimizing a variance function. In a related study, [21] explored L-moments and calibration-based variance estimators under a double stratified random sampling scheme, with a specific focus on their application during the COVID-19 pandemic. Moreover, [18] contributed to the field by developing a general class of improved population variance estimators under non-sampling errors using calibrated weights in stratified sampling.
Calibration finds its place as a pivotal technique in the estimation of population variance through stratified successive sampling. This approach enriches estimation accuracy by imbuing resulting estimates with greater population representativeness and reduced bias. This significance is particularly pronounced in stratified successive sampling, where the goal centers on ensuring representation of each stratum within the final estimate. Additionally, this proposed estimation strategy finds application in domains as diverse as the estimation of variance in gas turbine exhaust pressure, as evidenced by real-data illustration in later sections of this manuscript.
2. Motivation of the study
The dynamic nature of our world entails constant variation, with implications across various realms. In medicine, variations in body temperature, blood pressure, and pulse rate hold diagnostic significance. Similarly, diverse consumer responses to products drive pricing and quality decisions, while variations in climatic factors guide agricultural planning.
The estimation of population variance assumes a pivotal role in diverse fields, ranging from socio-economic studies to engineering applications. Stratified successive sampling offers a strategic approach, particularly when random sampling from the entire population is intricate. However, the presence of non-response poses a challenge, potentially introducing bias.
In this context, this research article proposed a novel solution. We have introduced a logarithmic-type estimator that incorporates information from a highly correlated auxiliary variable. Moreover, calibrated weights are integrated into the estimation process to counteract the impact of non-response bias.
The motivation for this study emanates from various factors, including:
Practical Relevance: Stratified successive sampling is commonly used in scenarios where populations are geographically dispersed or hard to access due to non-homogeneity. However, non-response can distort estimates, making it essential to account for this bias through calibrated weights.
Calibration Approach: We use the calibration technique to improve the accuracy of the estimates. By incorporating calibration into the estimation process, resulting estimates are more representative of the population and have reduced bias. Leveraging auxiliary variables to enhance estimation accuracy has shown effectiveness. Applying this calibration technique to stratified successive sampling aims to yield more representative and unbiased population variance estimates.
Engineering Applications: The proposed methodology holds practical significance in fields like engineering, particularly in quality control and process improvement. Precise population variance estimation is vital for assessing manufacturing process variability, enabling informed decisions for process optimization and quality control strategies.
Socio-Economic Research: In socio-economic research, understanding the relationship between variables like education and income is pivotal for informed decision-making. By enhancing population variance estimation, the proposed method provides deeper insights into these relationships, guiding policy and strategy decisions.
Unique Challenges Addressed: The use of logarithmic-type estimators underscores a focus on addressing rare challenges, such as estimating diseases or socio-economic issues requiring specialized techniques. This approach showcases our dedication to tackling complex issues with precision.
In conclusion, the motivation behind this research lies in the imperative to enhance population variance estimation accuracy in stratified successive sampling, particularly by addressing non-response challenges. Through the utilization of calibrated weights and auxiliary information, we aspire to provide more resilient and effective estimation techniques, applicable across diverse fields from engineering to socio-economic research. By obtaining more accurate estimates of population variance, one can better understand the relationship between education level and income in the studied population. This understanding may hold important implications for policy decisions related to education and income inequality, as well as for businesses and organizations interested in targeting college-educated individuals for employment or marketing purposes.
3. Sample structure
Consider a finite population of size N divided into L non-overlapping strata, each containing (k=1,2,..., L) units. Let us use X and Y to represent the study character on the first and second occasions. It is assumed that information regarding an auxiliary variable Z is accessible on both occasions, and the population variance of Z is known.
Let us consider the strata, where k ranges from 1 to L. To begin with, we use simple random sampling without replacement (SRSWOR) to draw a preliminary sample of size from the population for the first occasion, where units do not respond. From the responding part of this sample, we draw a second stage SRSWOR sample of size , where is the fraction of matched samples, and units do not respond. We use this sample for the second occasion and collect information on the study variable Y. Additionally, we draw a fresh sample of size from the population using SRSWOR on Y again. Here, units do not respond. The fractions of matched and fresh samples on the current (second) occasion are represented by and , respectively, where = 1.
3.1. Notations
From now on, we will use the following notations:
: The population variance of Y, i.e., the characteristics under study.
: The population mean of Z in the strata.
: The population mean of Z.
: The sample means of the study variable X in the strata based on the responding part of samples of sizes and , respectively.
: The sample means of the auxiliary variable Z in the strata based on samples of sizes and , respectively.
=, =: The population mean squares of the stratum of the study variables Y and X, respectively.
=: The population mean squares of the stratum for the auxiliary variable Z.
: Depending on the responding part of sample of size , the sample mean square of study variable X for the stratum.
: Depending on the responding part of sample of size , the sample mean square of study variable X for the stratum.
: Depending on the responding part of sample of size , the sample mean square of study variable Y for the stratum.
: Depending on the responding part of sample of size , the sample mean square of study variable Y for the stratum.
: Depending on the sample of size , the sample mean square of auxiliary variable Z for the stratum.
: Depending on the sample of size , the sample mean square of auxiliary variable Z for the stratum.
: Depending on the sample of size , the sample mean square of auxiliary variable Z for the stratum.
=: The original weight of the stratum, k= 1, 2,...,L
: The calibrated weight of the stratum, k= 1, 2,...,L based on the sample of size
: The calibrated weight of the stratum, k= 1, 2,...,L based on the sample of size
: The independent weight of the stratum, k= 1, 2,...,L.
4. Non-response probability model
The stratum is considered using [24] random non-response model. Consider a sample of size for which some data on X could not be collected due to random non-response. Let , where ranges from 0, 1, 2,...,( - 2), represent the number of such cases in . Similarly, for a sample of size , let represent the number of units for which Y information on the second occasion could not be acquired due to random non-response, and represent the same for a sample of size . It is presumed that , , and fall within their respective bounds. We assume that , and . If , , and probabilities of non-response among the (-2), (-2), and (-2) possible values of non-responses respectively, the discrete probability distributions for , , and are represented by
and
The number of ways to obtain (l = 1, 2, 3) non-responses from all potential non-response values for the three samples are represented by , , and .
5. Proposed estimator
We have developed a set of estimators to estimate the population mean square of the stratum, where k may range from 1 to L. These estimators are based on two samples: , which is a common sample of size collected on previous occasions, and , which is a fresh sample of size collected on the current occasion. We refer to the estimators based on these samples as and , respectively.
| (1) |
| (2) |
where
The constants and may be established by minimizing the mean square errors of the estimators.
In order to estimate the population variance , we use the estimators and , which are defined in Equations (1) and (2), and we suggest the matched sample estimator and fresh sample estimator as follows:
| (3) |
and
| (4) |
Remark 1
The argument of the logarithmic function must be non-negative; otherwise, the function is undefined. When assessing the practical applicability of an estimator, it is crucial to establish its domain of valid values. The presented estimators fall under the category of the log(x) function, where x must be greater than zero, limiting their use to situations with positive values of x. In contrast, for the function, there are no such restrictions on the domain of x, allowing for a broader range of values and increased versatility in real-world applications.
Remark 2
We have observed that the structure remains consistent in large sample approximations. Specifically, if , , and → , then → , → , → , → , and → . Using the fact that , we may conclude that → , and also → . Therefore, the estimator is consistent in large-sample approximations, as it may be inferred that → and → .
6. Suggested calibration technique
The new calibrated weights for the estimator of the population variance () under stratified sampling are obtained by minimizing the chi-square distance function while adhering to specific calibration constraints. These constraints are as follows:
-
1.
-
2.
-
3.
Where represents the calibrated weights, are the initial weights, and = and =.
Next, we consider the estimation of the population variance on the current occasion () using another set of calibrated weights. These calibrated weights are derived through the minimization of the chi-square distance function while adhering to specific calibration constraints, which are as follows:
-
1.
-
2.
Where represents the calibrated weights, = and =.
It is important to note that are appropriately determined weights that will determine the estimator form.
In Appendix A, detailed derivations have been given.
We propose a way to estimate the population variance of a study variable on the current (second) occasion of a two-occasion successive sampling. Our proposed estimator is a combination of two estimators, and , using a scalar ϕ that minimizes the mean square error of the estimator T.
i.e.
If we want to estimate the population variance on each occasion, we choose ϕ=1 and use . If we want to estimate the change from one occasion to the next, we choose ϕ=0 and use . To deal with both problems at the same time, we need to determine an optimum value for ϕ.
7. Properties of the estimator that has been proposed
The first-order bias and mean squared errors of proposed estimators and are derived, assuming large sample size and utilizing the stated transformations.
Based on the calculations, the bias of the suggested estimators and , as well as the mean squared error (MSE) of and , have been computed accurately up to the first order of approximation.
The bias of :
| (5) |
Mean Squared Error (MSE) of :
| (6) |
and bias of :
| (7) |
Mean Squared Error (MSE) of :
| (8) |
where
and
In Appendix B, detailed derivations have been given.
Thus, we may obtain the bias and MSE of estimator T by combining the biases and MSEs of two non-overlapping samples and , as shown below:
| (9) |
and
| (10) |
Where equations (5), (7), (6), and (8) provide the expressions for Bias(), Bias(), MSE(), and MSE(), respectively. As and are based on non-overlapping samples of sizes u and m, respectively, the covariance term may be ignored as it is of order , and c=0.
8. The estimator's minimum mean squared error (MSE)
MSEs of and depend on and . To find the best and , we minimize the MSEs in Equations (6) and (8). We get optimal and values.
| (11) |
and
| (12) |
Using from Eq. (11) in Eq. (6) and from Eq. (12) in Eq. (8) gives the minimum MSE of and .
| (13) |
and
| (14) |
MSE of T depends on ϕ. To find the best ϕ, we minimize the MSE. The optimal ϕ is
| (15) |
We may use this value to get the best MSE of T, which is
| (16) |
Equations (13) and (14) give the expressions for and , respectively.
9. Empirical study
Before employing an estimator in practical situations, it is crucial to assess its performance based on its inherent properties. In light of this, an empirical analysis has been carried out in this section utilizing both real and simulated data to evaluate the suggested estimator.
To accomplish this, we will conduct a comparison between the suggested estimator T and an alternative estimator τ, which is also designed to handle random non-response and is defined in the same manner. The purpose of this comparison is to evaluate how well the suggested estimator T performs under conditions of random non-response. It is important to note that this evaluation will also include a comparison with the standard estimator because no estimator is proposed in stratified successive sampling under non-response.
The values of , and are unknown. The constant ψ needs to be determined by minimizing the Mean Squared Error (MSE) of estimator τ.
| (17) |
and
| (18) |
The minimum Mean Squared Error (MSE) of estimator τ by the combination of (17) and (18), up to the first order of approximations, may be expressed as:
| (19) |
The proposed estimator T may be evaluated in terms of its Percentage Relative Efficiency (PRE) i.e. (20) with respect to the estimator τ. This may be calculated using the following formula:
| (20) |
where and are defined in Equations (16) and (19), respectively.
The following values have been considered:
-
Case A:
-
Case B:
-
Case C:
-
Case D:
The resulting calibrated stratum weights are shown in Tables (2), (9) and (16), respectively.
9.1. Simulation study
We generated data based on our theoretical findings using the statistical computing software R. To create data from a normal distribution with specific parameters and correlation coefficients for both study and auxiliary variables, we utilized the mvrnorm function from the MASS package and the genCorgen function from the Simstudy package. The population parameters for the generated data following a Poisson distribution are outlined in Table 1, while the generated data following a Normal distribution are outlined in Table 8. Simulations were conducted to analyze the effects of the controlling parameter , both with and without random non-response. The results of these simulations are presented in Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13, Table 14.
Table 1.
The statistical parameters correspond to the simulated data from a Poisson distribution.
| Stratum | Nk | nk | r1k | mk | r2k | uk | ρxy | ρyz | ρzx |
|---|---|---|---|---|---|---|---|---|---|
| Strata 1 | 10000 | 3000 | 500 | 2000 | 400 | 1000 | 0.9 | 0.9 | 0.9 |
| Strata 2 | 15000 | 3000 | 500 | 2000 | 400 | 1000 | 0.8 | 0.8 | 0.8 |
| Strata 3 | 5000 | 2000 | 400 | 1200 | 300 | 800 | 0.8 | 0.8 | 0.8 |
| Strata 4 | 8000 | 800 | 50 | 500 | 50 | 300 | 0.6 | 0.6 | 0.6 |
Table 8.
The statistical parameters correspond to the simulated data from a Normal distribution.
| Stratum | Nk | nk | r1k | mk | r2k | uk | ρxy | ρyz | ρzx |
|---|---|---|---|---|---|---|---|---|---|
| Strata 1 | 12000 | 3600 | 600 | 2400 | 480 | 1200 | 0.95 | 0.95 | 0.95 |
| Strata 2 | 17000 | 3600 | 600 | 2400 | 480 | 1200 | 0.85 | 0.85 | 0.85 |
| Strata 3 | 6000 | 2400 | 480 | 1440 | 360 | 960 | 0.85 | 0.85 | 0.85 |
| Strata 4 | 9000 | 900 | 57 | 563 | 57 | 337 | 0.65 | 0.65 | 0.65 |
Table 2.
Calibrated strata weights for simulated data following a Poisson distribution.
| Case | Stratum | Wk | ||
|---|---|---|---|---|
| I | 1 | 0.2631579 | 0.30121385 | 0.3129923 |
| 2 | 0.3947368 | 0.55203331 | 0.2801177 | |
| 3 | 0.1315789 | 0.08592402 | 0.1564961 | |
| 4 | 0.2105263 | 0.06082881 | 0.2503938 | |
| II | 1 | 0.2631579 | 0.33327331 | 0.3013643 |
| 2 | 0.3947368 | 0.52933590 | 0.2801177 | |
| 3 | 0.1315789 | 0.07974788 | 0.1697853 | |
| 4 | 0.2105263 | 0.05764291 | 0.2487327 | |
| III | 1 | 0.2631579 | 0.30140728 | 0.3132156 |
| 2 | 0.3947368 | 0.55189637 | 0.2801177 | |
| 3 | 0.1315789 | 0.08588676 | 0.1564564 | |
| 4 | 0.2105263 | 0.06080959 | 0.2502102 | |
| IV | 1 | 0.2631579 | 0.30107846 | 0.3124016 |
| 2 | 0.3947368 | 0.55212917 | 0.2801177 | |
| 3 | 0.1315789 | 0.08595011 | 0.1565251 | |
| 4 | 0.2105263 | 0.06084227 | 0.2509556 | |
Table 3.
For Case I:Q1 = 1.0, Q2 = 1.0, Q3 = 1.0, and Q4 = 1.0. Subsequently, Bias and PRE were obtained from simulated data following a Poisson distribution, where p1, p2, and p3 represent the probabilities of non-response among the (nk - 2), (mk - 2), and (uk - 2) possible values of non-responses, respectively.
| PRE |
Bias |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | 0.05 | 0.10 | 0.15 | 0.20 |
| 0.05 | 0.05 | 142.9 | 141.9 | 141.1 | 140.4 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.05 | 0.10 | 146.1 | 145.3 | 144.6 | 144.0 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.05 | 0.15 | 149.5 | 148.7 | 148.1 | 147.7 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.05 | 0.20 | 152.9 | 152.3 | 151.8 | 151.6 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.10 | 0.05 | 139.3 | 138.2 | 137.3 | 136.5 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.10 | 0.10 | 142.5 | 141.5 | 140.6 | 140.0 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.10 | 0.15 | 145.8 | 144.9 | 144.1 | 143.6 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.10 | 0.20 | 149.1 | 148.3 | 147.7 | 147.3 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.15 | 0.05 | 136.1 | 134.9 | 133.8 | 132.9 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.15 | 0.10 | 139.2 | 138.1 | 137.1 | 136.3 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.15 | 0.15 | 142.4 | 141.4 | 140.5 | 139.8 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.15 | 0.20 | 145.7 | 144.8 | 144.0 | 143.4 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.20 | 0.05 | 133.2 | 131.9 | 130.7 | 129.7 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.20 | 0.10 | 136.3 | 135.0 | 133.9 | 133.0 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.20 | 0.15 | 139.4 | 138.2 | 137.3 | 136.4 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.20 | 0.20 | 142.6 | 141.6 | 140.7 | 140.0 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
Table 4.
For Case II, the values are Q1 = 3.80, Q2 = 2.53, Q3 = 7.60, and Q4 = 4.75. Subsequently, Bias and PRE were obtained from simulated data following a Poisson distribution.
| PRE |
Bias |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | 0.05 | 0.10 | 0.15 | 0.20 |
| 0.05 | 0.05 | 143.0 | 142.1 | 141.4 | 140.8 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.05 | 0.10 | 146.4 | 145.6 | 145.0 | 144.6 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.05 | 0.15 | 149.9 | 149.3 | 148.8 | 148.5 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.05 | 0.20 | 153.5 | 153.0 | 152.7 | 152.5 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.10 | 0.05 | 139.2 | 138.3 | 137.4 | 136.7 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.10 | 0.10 | 142.6 | 141.7 | 140.9 | 140.3 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.10 | 0.15 | 146.0 | 145.2 | 144.6 | 144.1 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.10 | 0.20 | 149.5 | 148.9 | 148.4 | 148.0 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.15 | 0.05 | 135.9 | 134.8 | 133.8 | 133.0 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.15 | 0.10 | 139.1 | 138.1 | 137.2 | 136.5 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.15 | 0.15 | 142.5 | 141.6 | 140.8 | 140.2 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.15 | 0.20 | 145.9 | 145.1 | 144.5 | 144.0 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.20 | 0.05 | 132.9 | 131.7 | 130.6 | 129.6 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.20 | 0.10 | 136.1 | 134.9 | 133.9 | 133.1 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.20 | 0.15 | 139.3 | 138.3 | 137.4 | 136.6 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
| 0.20 | 0.20 | 142.7 | 141.8 | 141.0 | 140.4 | 0.000144 | 0.0001399 | 0.0001353 | 0.0001305 |
Table 5.
For Case III:Q1=10.141, Q2=10.07, Q3= 10.08, and Q4= 10.05. Subsequently, Bias and PRE were obtained from simulated data following a Poisson distribution.
| PRE |
Bias |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | 0.05 | 0.10 | 0.15 | 0.20 |
| 0.05 | 0.05 | 142.8901 | 141.9 | 141.1 | 140.5 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.05 | 0.10 | 146.1507 | 145.3 | 144.6 | 144.1 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.05 | 0.15 | 149.5087 | 148.8 | 148.2 | 147.8 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.05 | 0.20 | 152.9686 | 152.3 | 151.9 | 151.6 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.10 | 0.05 | 139.3401 | 138.3 | 137.3 | 136.5 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.10 | 0.10 | 142.5191 | 141.5 | 140.7 | 140.0 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.10 | 0.15 | 145.7932 | 144.9 | 144.2 | 143.6 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.10 | 0.20 | 149.1664 | 148.4 | 147.8 | 147.3 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.15 | 0.05 | 136.1238 | 134.9 | 133.8 | 132.9 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.15 | 0.10 | 139.2287 | 138.1 | 137.1 | 136.3 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.15 | 0.15 | 142.4261 | 141.4 | 140.5 | 139.8 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.15 | 0.20 | 145.7203 | 144.8 | 144.0 | 143.5 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.20 | 0.05 | 133.2532 | 131.9 | 130.8 | 129.7 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.20 | 0.10 | 136.2912 | 135.1 | 134.0 | 133.0 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.20 | 0.15 | 139.4197 | 138.3 | 137.3 | 136.4 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
| 0.20 | 0.20 | 142.6426 | 141.6 | 140.7 | 140.0 | 0.0001505 | 0.0001462 | 0.0001416 | 0.0001367 |
Table 6.
For Case IV:Q1= 9.93, Q2=10.05, Q3=10.06, and Q4= 10.19. Subsequently, Bias and PRE were obtained from simulated data following a Poisson distribution.
| PRE |
Bias |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | 0.05 | 0.10 | 0.15 | 0.20 |
| 0.05 | 0.05 | 142.8 | 141.8 | 141.0 | 140.4 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.05 | 0.10 | 146.0 | 145.2 | 144.5 | 144.0 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.05 | 0.15 | 149.4 | 148.6 | 148.1 | 147.7 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.05 | 0.20 | 152.8 | 152.2 | 151.8 | 151.5 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.10 | 0.05 | 139.2 | 138.1 | 137.2 | 136.4 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.10 | 0.10 | 142.4 | 141.4 | 140.6 | 139.9 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.10 | 0.15 | 145.7 | 144.8 | 144.1 | 143.5 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.10 | 0.20 | 149.0 | 148.3 | 147.7 | 147.2 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.15 | 0.05 | 136.0 | 134.8 | 133.7 | 132.8 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.15 | 0.10 | 139.1 | 138.0 | 137.0 | 136.2 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.15 | 0.15 | 142.3 | 141.3 | 140.4 | 139.7 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.15 | 0.20 | 145.6 | 144.7 | 143.9 | 143.4 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.20 | 0.05 | 133.1 | 131.8 | 130.7 | 129.6 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.20 | 0.10 | 136.1 | 134.9 | 133.9 | 132.9 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.20 | 0.15 | 139.3 | 138.2 | 137.2 | 136.3 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
| 0.20 | 0.20 | 142.5 | 141.5 | 140.6 | 139.9 | 0.0001505 | 0.0001461 | 0.0001415 | 0.0001367 |
Table 7.
In the absence of non-response, bias and PRE are observed from simulated data following a Poisson distribution when p1= p2= p3=0.
| Stratum | PRE | Bias |
|---|---|---|
| Case I | 144.5398 | 0.0001416434 |
| Case II | 144.5906 | 0.0001352958 |
| Case III | 144.5827 | 0.0001416345 |
| Case IV | 144.4316 | 0.0001415639 |
Table 9.
Calibrated strata weights for simulated data following a Normal distribution.
| Case | Stratum | Wk | ||
|---|---|---|---|---|
| I | 1 | 0.2727273 | 0.3808514 | 0.2609850 |
| 2 | 0.3947368 | 0.1446900 | 0.3997344 | |
| 3 | 0.1363636 | 0.1269082 | 0.1304925 | |
| 4 | 0.2045455 | 0.3475505 | 0.2087880 | |
| II | 1 | 0.2727273 | 0.3558239 | 0.2614920 |
| 2 | 0.3863636 | 0.1629070 | 0.3997344 | |
| 3 | 0.1363636 | 0.1002457 | 0.1299131 | |
| 4 | 0.2045455 | 0.3810234 | 0.2088605 | |
| III | 1 | 0.2727273 | 0.3825238 | 0.2607414 |
| 2 | 0.3863636 | 0.1434726 | 0.3997344 | |
| 3 | 0.1363636 | 0.1286899 | 0.1306599 | |
| 4 | 0.2045455 | 0.3453136 | 0.2088643 | |
| IV | 1 | 0.2727273 | 0.3815501 | 0.2609781 |
| 2 | 0.3863636 | 0.1441814 | 0.3997344 | |
| 3 | 0.1363636 | 0.1276526 | 0.1304788 | |
| 4 | 0.2045455 | 0.3466160 | 0.2088087 | |
Table 10.
For Case I:Q1 = 1.0, Q2 = 1.0, Q3 = 1.0, and Q4 = 1.0. Subsequently, Bias and PRE were obtained from simulated data following a Normal distribution, where p1, p2, and p3 represent the probabilities of non-response among the (nk - 2), (mk - 2), and (uk - 2) possible values of non-responses, respectively.
| PRE |
Bias |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | 0.05 | 0.10 | 0.15 | 0.20 |
| 0.05 | 0.05 | 177.6 | 172.9 | 168.8 | 165.3 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.05 | 0.10 | 182.6 | 177.9 | 173.8 | 170.4 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.05 | 0.15 | 187.8 | 183.1 | 179.1 | 175.8 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.05 | 0.20 | 193.2 | 188.6 | 184.7 | 181.4 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.10 | 0.05 | 172.9 | 168.1 | 163.8 | 160.2 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.10 | 0.10 | 177.8 | 172.9 | 168.7 | 165.1 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.10 | 0.15 | 182.8 | 178.0 | 173.8 | 170.3 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.10 | 0.20 | 188.1 | 183.3 | 179.2 | 175.7 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.15 | 0.05 | 168.5 | 163.5 | 159.1 | 155.3 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.15 | 0.10 | 173.2 | 168.2 | 163.8 | 160.1 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.15 | 0.15 | 178.1 | 173.1 | 168.8 | 165.1 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.15 | 0.20 | 183.2 | 178.3 | 174.0 | 170.3 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.20 | 0.05 | 164.3 | 159.2 | 154.7 | 150.8 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.20 | 0.10 | 168.9 | 163.7 | 159.2 | 155.3 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.20 | 0.15 | 173.6 | 168.5 | 164.0 | 160.2 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
| 0.20 | 0.20 | 178.6 | 173.5 | 169.0 | 165.2 | 0.0004008 | 0.0003805 | 0.0003597 | 0.0003385 |
Table 11.
For Case II, the values are Q1 = 3.80, Q2 = 2.53, Q3 = 7.60, and Q4 = 4.75. Subsequently, Bias and PRE were obtained from simulated data following a Normal distribution.
| PRE |
Bias |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | 0.05 | 0.10 | 0.15 | 0.20 |
| 0.05 | 0.05 | 178.7 | 173.4 | 168.8 | 164.9 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.05 | 0.10 | 183.0 | 177.7 | 173.2 | 169.3 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.05 | 0.15 | 187.4 | 182.2 | 177.7 | 173.8 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.05 | 0.20 | 192.0 | 186.8 | 182.4 | 178.6 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.10 | 0.05 | 174.7 | 169.3 | 164.6 | 160.5 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.10 | 0.10 | 178.9 | 173.5 | 168.8 | 164.7 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.10 | 0.15 | 183.2 | 177.8 | 173.2 | 169.2 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.10 | 0.20 | 187.7 | 182.3 | 177.7 | 173.8 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.15 | 0.05 | 170.9 | 165.4 | 160.5 | 156.2 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.15 | 0.10 | 175.0 | 169.4 | 164.6 | 160.4 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.15 | 0.15 | 179.2 | 173.6 | 168.8 | 164.7 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.15 | 0.20 | 183.6 | 178.1 | 173.3 | 169.2 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.20 | 0.05 | 167.3 | 161.6 | 156.6 | 152.2 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.20 | 0.10 | 171.2 | 165.6 | 160.6 | 156.2 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.20 | 0.15 | 175.3 | 169.7 | 164.7 | 160.4 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
| 0.20 | 0.20 | 179.6 | 173.9 | 169.0 | 164.7 | 0.0004473 | 0.0004280 | 0.000408 | 0.0003874 |
Table 12.
For Case III:Q1=11.764729, Q2=9.509398, Q3= 8.948854, and Q4= 10.113946. Subsequently, Bias and PRE were obtained from simulated data following a Normal distribution.
| PRE |
Bias |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | 0.05 | 0.10 | 0.15 | 0.20 |
| 0.05 | 0.05 | 177.5 | 172.8 | 168.8 | 165.3 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.05 | 0.10 | 182.5 | 177.8 | 173.9 | 170.5 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.05 | 0.15 | 187.8 | 183.1 | 179.2 | 175.9 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.05 | 0.20 | 193.3 | 188.7 | 184.8 | 181.6 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.10 | 0.05 | 172.8 | 168.0 | 163.8 | 160.2 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.10 | 0.10 | 177.7 | 172.8 | 168.7 | 165.1 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.10 | 0.15 | 182.8 | 178.0 | 173.9 | 170.4 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.10 | 0.20 | 188.1 | 183.3 | 179.3 | 175.9 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.15 | 0.05 | 168.3 | 163.3 | 159.0 | 155.3 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.15 | 0.10 | 173.0 | 168.1 | 163.8 | 160.0 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.15 | 0.15 | 178.0 | 173.0 | 168.8 | 165.1 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.15 | 0.20 | 183.2 | 178.2 | 174.0 | 170.4 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.20 | 0.05 | 164.1 | 159.0 | 154.6 | 150.7 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.20 | 0.10 | 168.7 | 163.6 | 159.1 | 155.3 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.20 | 0.15 | 173.5 | 168.4 | 164.0 | 160.1 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
| 0.20 | 0.20 | 178.5 | 173.4 | 169.0 | 165.2 | 0.0003978 | 0.000377 | 0.0003566 | 0.0003354 |
Table 13.
For Case IV:Q1= 0.9961224, Q2=1.0018931, Q3=1.0054827, and Q4= 0.9811516. Subsequently, Bias and PRE were obtained from simulated data following a Normal distribution.
| PRE |
Bias |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | 0.05 | 0.10 | 0.15 | 0.20 |
| 0.05 | 0.05 | 177.6 | 172.9 | 168.8 | 165.3 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.05 | 0.10 | 182.6 | 177.9 | 173.8 | 170.4 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.05 | 0.15 | 187.8 | 183.1 | 179.2 | 175.8 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.05 | 0.20 | 193.3 | 188.6 | 184.7 | 181.5 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.10 | 0.05 | 172.9 | 168.0 | 163.8 | 160.2 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.10 | 0.10 | 177.7 | 172.9 | 168.7 | 165.1 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.10 | 0.15 | 182.8 | 178.0 | 173.8 | 170.3 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.10 | 0.20 | 188.1 | 183.3 | 179.3 | 175.8 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.15 | 0.05 | 168.4 | 163.4 | 159.1 | 155.3 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.15 | 0.10 | 173.1 | 168.1 | 163.8 | 160.1 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.15 | 0.15 | 178.0 | 173.1 | 168.8 | 165.1 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.15 | 0.20 | 183.2 | 178.3 | 174.0 | 170.4 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.20 | 0.05 | 164.2 | 159.1 | 154.7 | 150.7 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.20 | 0.10 | 168.8 | 163.7 | 159.2 | 155.3 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.20 | 0.15 | 173.6 | 168.4 | 164.0 | 160.1 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
| 0.20 | 0.20 | 178.6 | 173.5 | 169.0 | 165.2 | 0.0003995 | 0.0003792 | 0.0003584 | 0.0003372 |
Table 14.
In the absence of non-response, bias and PRE are observed from simulated data following a Normal distribution when p1= p2= p3=0.
| Stratum | PRE | Bias |
|---|---|---|
| Case I | 182.8948 | 0.000407048 |
| Case II | 184.5569 | 0.0004524072 |
| Case III | 182.7551 | 0.0004040932 |
| Case IV | 182.8089 | 0.0004057739 |
9.2. Study based on real data
This illustrates the application of the suggested class of estimators. The data is available in the UCI machine learning repository dataset named “Gas Turbine CO and NOx Emission Data Set,” which consists of 36733 instances of 11 sensor measurements from a gas turbine in Turkey's northwestern region. These measurements were aggregated across an hour (by means of average or sum) to analyze CO and NOx (NO + NO2) flue gas emissions. For this analysis, the .csv file was chosen.
The study used the following study and auxiliary variables:
-
Y:
Gas turbine exhaust pressure (GTEP)
-
X:
Air filter difference pressure (AFDP)
-
Z:
Turbine inlet temperature (TIT)
The strata were formed based on the Ambient temperature (AT) into three categories:
-
Stratum 1:
from 2.1163-12.707 C
-
Stratum 2:
from 12.708-21.759 C
-
Stratum 3:
from 21.760-34.532 C
To estimate the population variance in income among college graduates in a region, non-response bias may be corrected using calibrated weights in stratified successive sampling. This process involves adjusting the sampling weights based on response probabilities. The population parameters for the real data are displayed in Table 15. Calibrated weights are presented in Fig. 1 and Table 16, while the effects of controlling parameter with random non-response are shown in Figure 2, Figure 3, Figure 4, Figure 5 and Table 17, Table 18, Table 19, Table 20. Table 21 and Figure 6, Figure 7 illustrate the scenario without random non-response.
Table 15.
The statistical parameters corresponding to the real data.
| Stratum | Nk | nk | r1k | mk | r2k | uk | ρxy | ρyz | ρzx |
|---|---|---|---|---|---|---|---|---|---|
| Strata 1 | 2500 | 875 | 125 | 625 | 125 | 250 | 0.8999226 | 0.8283662 | 0.7473441 |
| Strata 2 | 2600 | 910 | 130 | 650 | 130 | 260 | 0.8914957 | 0.9164524 | 0.8335361 |
| Strata 3 | 2311 | 809 | 116 | 578 | 116 | 231 | 0.9689409 | 0.9337467 | 0.9209707 |
Figure 1.
Calibrated strata weights for real data.
Table 16.
Calibrated strata weights for real data.
| Case | Stratum | Wk | ||
|---|---|---|---|---|
| I | 1 | 0.3373364 | 0.4930872 | 0.3101796 |
| 2 | 0.3508298 | 0.3163187 | 0.4557245 | |
| 3 | 0.3118338 | 0.1905941 | 0.2340959 | |
| II | 1 | 0.3373364 | 0.4930873 | 0.3137647 |
| 2 | 0.3508298 | 0.3163187 | 0.4546208 | |
| 3 | 0.3118338 | 0.1905940 | 0.2316144 | |
| III | 1 | 0.3373364 | 0.4930873 | 0.3100504 |
| 2 | 0.3508298 | 0.3163187 | 0.4557643 | |
| 3 | 0.3118338 | 0.1905940 | 0.2341853 | |
| IV | 1 | 0.3373364 | 0.4930873 | 0.3195387 |
| 2 | 0.3508298 | 0.3163187 | 0.4528433 | |
| 3 | 0.3118338 | 0.1905940 | 0.2276180 | |
Figure 2.
For Case I:Q1 = 1.0, Q2 = 1.0, and Q3 = 1.0. Subsequently, Bias (shown by the red line) and PRE (blue line) were obtained from real data.
Figure 3.
For Case II:Q1=2.96, Q2= 2.85 and Q3= 3.21. Subsequently, Bias (shown by the red line) and PRE (blue line) were obtained from real data.
Figure 4.
For Case III:Q1=0.0009208365, Q2= 0.0009248637, and Q3=0.0009196857. Subsequently, Bias (shown by the red line) and PRE (blue line) were obtained from real data.
Figure 5.
For Case IV:Q1=0.0045, Q2= 0.0031, and Q3= 0.0046. Subsequently, Bias (shown by the red line) and PRE (blue line) were obtained from real data.
Table 17.
For Case I:Q1=1.0, Q2= 1.0 and Q3= 1.0. Subsequently, Bias and PRE were obtained from real data.
| PRE |
Bias |
|||||
|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | ∀ p3 |
| 0.05 | 0.05 | 143.7177 | 146.5793 | 149.5242 | 152.5562 | 0.01287393 |
| 0.05 | 0.10 | 150.4350 | 153.5733 | 156.8091 | 160.1470 | 0.01287393 |
| 0.05 | 0.15 | 157.5665 | 161.0128 | 164.5733 | 168.2538 | 0.01287393 |
| 0.05 | 0.20 | 165.1517 | 168.9418 | 172.8659 | 176.9313 | 0.01287393 |
| 0.10 | 0.05 | 135.8471 | 138.5520 | 141.3357 | 144.2016 | 0.01287393 |
| 0.10 | 0.10 | 142.1950 | 145.1614 | 148.2200 | 151.3750 | 0.01287393 |
| 0.10 | 0.15 | 148.9341 | 152.1916 | 155.5571 | 159.0360 | 0.01287393 |
| 0.10 | 0.20 | 156.1018 | 159.6842 | 163.3933 | 167.2359 | 0.01287393 |
| 0.15 | 0.05 | 128.5160 | 131.0749 | 133.7084 | 136.4196 | 0.01287393 |
| 0.15 | 0.10 | 134.5185 | 137.3248 | 140.2182 | 143.2030 | 0.01287393 |
| 0.15 | 0.15 | 140.8907 | 143.9722 | 147.1559 | 150.4469 | 0.01287393 |
| 0.15 | 0.20 | 147.6676 | 151.0564 | 154.5651 | 158.2001 | 0.01287393 |
| 0.20 | 0.05 | 121.6989 | 124.1220 | 126.6158 | 129.1832 | 0.01287393 |
| 0.20 | 0.10 | 127.3784 | 130.0357 | 132.7756 | 135.6019 | 0.01287393 |
| 0.20 | 0.15 | 133.4072 | 136.3251 | 139.3397 | 142.4558 | 0.01287393 |
| 0.20 | 0.20 | 139.8183 | 143.0270 | 146.3492 | 149.7909 | 0.01287393 |
Table 18.
For Case II:Q1=2.964400, Q2= 2.850385 and Q3= 3.206837. Subsequently, Bias and PRE were obtained from real data.
| PRE |
Bias |
|||||
|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | ∀ p3 |
| 0.05 | 0.05 | 143.7036 | 146.5620 | 149.5034 | 152.5316 | 0.01288324 |
| 0.05 | 0.10 | 150.4240 | 153.5588 | 156.7909 | 160.1249 | 0.01288324 |
| 0.05 | 0.15 | 157.5591 | 161.0018 | 164.5585 | 168.2348 | 0.01288324 |
| 0.05 | 0.20 | 165.1485 | 168.9349 | 172.8550 | 176.9159 | 0.01288324 |
| 0.10 | 0.05 | 135.8289 | 138.5306 | 141.3108 | 144.1731 | 0.01288324 |
| 0.10 | 0.10 | 142.1795 | 145.1425 | 148.1974 | 151.3487 | 0.01288324 |
| 0.10 | 0.15 | 148.9218 | 152.1757 | 155.5374 | 159.0122 | 0.01288324 |
| 0.10 | 0.20 | 156.0932 | 159.6719 | 163.3770 | 167.2153 | 0.01288324 |
| 0.15 | 0.05 | 128.4939 | 131.0497 | 133.6798 | 136.3875 | 0.01288324 |
| 0.15 | 0.10 | 134.4987 | 137.3017 | 140.1916 | 143.1726 | 0.01288324 |
| 0.15 | 0.15 | 140.8736 | 143.9517 | 147.1317 | 150.4187 | 0.01288324 |
| 0.15 | 0.20 | 147.6538 | 151.0391 | 154.5439 | 158.1746 | 0.01288324 |
| 0.20 | 0.05 | 121.6731 | 124.0933 | 126.5838 | 129.1477 | 0.01288324 |
| 0.20 | 0.10 | 127.3546 | 130.0087 | 132.7451 | 135.5678 | 0.01288324 |
| 0.20 | 0.15 | 133.3858 | 136.3003 | 139.3112 | 142.4235 | 0.01288324 |
| 0.20 | 0.20 | 139.7997 | 143.0049 | 146.3233 | 149.7609 | 0.01288324 |
Table 19.
For Case III:Q1=0.0009208365, Q2= 0.0009248637, and Q3=0.0009196857. Subsequently, Bias and PRE were obtained from real data.
| PRE |
Bias |
|||||
|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | ∀ p3 |
| 0.05 | 0.05 | 143.7183 | 146.5800 | 149.5251 | 152.5571 | 0.01287356 |
| 0.05 | 0.10 | 150.4355 | 153.5739 | 156.8098 | 160.1478 | 0.01287356 |
| 0.05 | 0.15 | 157.5669 | 161.0133 | 164.5739 | 168.2546 | 0.01287356 |
| 0.05 | 0.20 | 165.1519 | 168.9422 | 172.8664 | 176.9319 | 0.01287356 |
| 0.10 | 0.05 | 135.8478 | 138.5528 | 141.3366 | 144.2026 | 0.01287356 |
| 0.10 | 0.10 | 142.1957 | 145.1622 | 148.2208 | 151.3760 | 0.01287356 |
| 0.10 | 0.15 | 148.9347 | 152.1923 | 155.5579 | 159.0369 | 0.01287356 |
| 0.10 | 0.20 | 156.1022 | 159.6848 | 163.3940 | 167.2367 | 0.01287356 |
| 0.15 | 0.05 | 128.5169 | 131.0759 | 133.7094 | 136.4208 | 0.01287356 |
| 0.15 | 0.10 | 134.5193 | 137.3257 | 140.2192 | 143.2041 | 0.01287356 |
| 0.15 | 0.15 | 140.8914 | 143.9730 | 147.1569 | 150.4480 | 0.01287356 |
| 0.15 | 0.20 | 147.6682 | 151.0571 | 154.5660 | 158.2011 | 0.01287356 |
| 0.20 | 0.05 | 121.6998 | 124.1231 | 126.6169 | 129.1845 | 0.01287356 |
| 0.20 | 0.10 | 127.3793 | 130.0367 | 132.7767 | 135.6031 | 0.01287356 |
| 0.20 | 0.15 | 133.4080 | 136.3260 | 139.3407 | 142.4570 | 0.01287356 |
| 0.20 | 0.20 | 139.8190 | 143.0279 | 146.3502 | 149.7921 | 0.01287356 |
Table 20.
For Case IV:Q1=0.004542526, Q2= 0.003128517, and Q3= 0.004646731. Subsequently, Bias and PRE were obtained from real data.
| PRE |
Bias |
|||||
|---|---|---|---|---|---|---|
| p1 | p2 | p3=0.05 | 0.10 | 0.15 | 0.20 | ∀ p3 |
| 0.05 | 0.05 | 143.6914 | 146.5429 | 149.4771 | 152.4975 | 0.01289368 |
| 0.05 | 0.10 | 150.4193 | 153.5470 | 156.7715 | 160.0972 | 0.01289368 |
| 0.05 | 0.15 | 157.5632 | 160.9985 | 164.5471 | 168.2148 | 0.01289368 |
| 0.05 | 0.20 | 165.1629 | 168.9416 | 172.8533 | 176.9052 | 0.01289368 |
| 0.10 | 0.05 | 135.8075 | 138.5026 | 141.2758 | 144.1305 | 0.01289368 |
| 0.10 | 0.10 | 142.1648 | 145.1209 | 148.1684 | 151.3116 | 0.01289368 |
| 0.10 | 0.15 | 148.9149 | 152.1616 | 155.5155 | 158.9819 | 0.01289368 |
| 0.10 | 0.20 | 156.0955 | 159.6667 | 163.3637 | 167.1931 | 0.01289368 |
| 0.15 | 0.05 | 128.4640 | 131.0134 | 133.6366 | 136.3370 | 0.01289368 |
| 0.15 | 0.10 | 134.4747 | 137.2709 | 140.1536 | 143.1268 | 0.01289368 |
| 0.15 | 0.15 | 140.8566 | 143.9276 | 147.1000 | 150.3788 | 0.01289368 |
| 0.15 | 0.20 | 147.6450 | 151.0228 | 154.5196 | 158.1418 | 0.01289368 |
| 0.20 | 0.05 | 121.6354 | 124.0492 | 126.5330 | 129.0899 | 0.01289368 |
| 0.20 | 0.10 | 127.3220 | 129.9695 | 132.6988 | 135.5139 | 0.01289368 |
| 0.20 | 0.15 | 133.3592 | 136.2668 | 139.2703 | 142.3746 | 0.01289368 |
| 0.20 | 0.20 | 139.7804 | 142.9784 | 146.2889 | 149.7181 | 0.01289368 |
Table 21.
In the absence of non-response, Bias and PRE are observed from real data when p1= p2= p3=0.
| Stratum | PRE | Bias |
|---|---|---|
| Case I | 145.1036 | 0.01079086 |
| Case II | 145.0877 | 0.01079694 |
| Case III | 145.1042 | 0.01079061 |
| Case IV | 145.0727 | 0.01080245 |
Figure 6.
In the absence of non-response, PRE is observed from real data when p1 = p2 = p3 = 0.
Figure 7.
In the absence of non-response, biases are observed in real data when p1 = p2 = p3 = 0.
10. Discussion and conclusions
Based on the empirical results presented above:
-
1.
Table 2, Table 9, Table 16, along with Fig. 1, demonstrate that calibrated strata weights closely resemble the original weights, particularly for the fresh sample. The estimator effectively mitigates the adverse effects of non-responses, leading to negligible bias across all choices of . This study underscores the improved estimation of population variance in stratified successive sampling using calibrated weights in the presence of non-response.
-
2.
The results from Table 3, Table 4, Table 5, Table 6, presenting simulated data, demonstrate the effectiveness of the proposed method in reducing non-response bias in practical scenarios. The negligible bias observed in the estimator, with values of for simulated data from a Poisson distribution, and similarly observed in Table 10, Table 11, Table 12, Table 13 for simulated data from a Normal distribution, indicates a favorable outcome of our research in addressing non-response cases. Additionally, the findings from Table 17, Table 18, Table 19, Table 20 and Figure 2, Figure 3, Figure 4, Figure 5, depicting real data, confirm the consistency of our approach. The bias, with values of the order of , emphasizes the significance of our approach in minimizing bias resulting from non-response.
-
3.
Table 3, Table 4, Table 5, Table 6, presenting simulated data from the Poisson distribution, and Table 10, Table 11, Table 12, Table 13, showcasing simulated data from the Normal distribution, demonstrate that the proposed estimator exhibits lower mean squared error (MSE) and higher percentage relative efficiency (PRE) compared to the standard estimator when dealing with random non-response. Similarly, Table 17, Table 18, Table 19, Table 20 and Figure 2, Figure 3, Figure 4, Figure 5, based on real data, confirm the same observation, with the percentage relative efficiency (PRE) exceeding 100. These findings indicate that the proposed method is more effective.
-
4.
Table 7, Table 14, Table 21, and Figure 6, Figure 7 collectively demonstrate that the proposed estimator exhibits negligible bias and higher percentage relative efficiency (PRE) compared to the standard estimator in the absence of non-response. This observation suggests that the proposed method is more effective, even in scenarios without non-response.
-
5.
In a simulation study, we may observe that as the correlation coefficient increases, the percentage relative efficiency (PRE) also increases, while the bias decreases. Conversely, when we decrease the correlation coefficient, the PRE decreases, and the bias increases.
-
6.
The findings suggest that for a fixed value of and , as the value of increases, both the bias and percentage relative efficiency (PRE) decrease for simulated data. Conversely, for real data, while the bias remains fixed, the percentage relative efficiency (PRE) increases. Similarly, for a fixed value of and , increasing results in a constant bias but an increasing percentage relative efficiency (PRE) for both simulated and real data. Furthermore, increasing while keeping and fixed leads to a constant bias but a decreasing percentage relative efficiency (PRE) for both simulated and real data. Additionally, our research reveals that as the non-response rate of increases, the percentage relative efficiency (PRE) also increases, as observed in Table 17, Table 18, Table 19, Table 20 and Figure 2, Figure 3, Figure 4, Figure 5 (real data), as well as in Table 3, Table 4, Table 5, Table 6 and 10-13 (simulated data). These findings highlight an important outcome of our study.
The results obtained from the analysis of the “Gas Turbine CO and NOx Emission Data Set” and simulation studies demonstrate the effectiveness of the proposed methodology in correcting non-response bias and obtaining more accurate estimates. By employing the suggested class of estimators and utilizing calibrated weights, the estimation of population variance may be performed on other real-world data sets as well, such as the estimation of the variance in income among college graduates may be significantly improved. Survey statisticians are recommended to use the estimator for their practical applications in such cases.
CRediT authorship contribution statement
M.K. Pandey: Writing – review & editing, Writing – original draft, Resources, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. G.N. Singh: Visualization, Validation, Supervision. Tolga Zaman: Writing – review & editing. Aned Al Mutairi: Software, Writing – review & editing. Manahil SidAhmed Mustafa: Writing – review & editing.
Declaration of Competing Interest
The authors have no conflicts of interest to disclose.
Acknowledgements
We are thankful to the IIT (ISM) Dhanbad for providing the infrastructural and financial support to accomplish the present work. Additionally, We are also thankful to the esteemed editor and reviewers for their encouraging and valuable suggestions, which greatly contributed in improving the manuscript upto the current level.
Appendix A. Deriving calibrated strata weights
The calibrated weights are determined by minimizing the Lagrange function while satisfying the given constraints. The Lagrange function, considering the chi-square distance measure and the calibration constraints, is formulated as follows:
| (21) |
where , and are Lagrange multipliers.
Differentiating the Lagrange function with respect to and setting the derivative to zero yields the calibrated weight equation:
| (22) |
By substituting the derived values into the given calibration constraints, a matrix equation is obtained:
The solution of the matrix equation provides the Lagrange multipliers, which can be expressed as follows:
| (23) |
where the determinants are given by:
| (24) |
| (25) |
| (26) |
| (27) |
Now, let us define the term , , , , , , , , as follows:
| totam= | totbm= |
| totcm= | totdm=1- |
| totem= | totfm= |
| totgm= | tothm= |
| totim= |
Similarly, the calibrated weights for the estimator are obtained by minimizing the Lagrange function while satisfying the given calibration constraints. The Lagrange function, considering the chi-square distance measure and the calibration constraints, is expressed as follows:
| (28) |
where and are Lagrange multipliers.
Differentiating the Lagrange function with respect to and setting the derivative to zero gives the following calibrated weight equation:
| (29) |
By substituting the derived values into the given calibration constraints, a matrix equation is formed:
Solving this matrix equation provides the Lagrange multipliers, given by:
| (30) |
where the determinants are calculated as follows:
| (31) |
| (32) |
| (33) |
Now, let us define the term , , , , as follows:
| totau= | totbu= |
| totcu=1- | totdu= |
| toteu= |
The calibrated weights for the estimators of population variance, and , are determined using these Lagrange multipliers and the initial weights .
Appendix B. Deriving bias and MSE of the proposed estimator
The expectations obtained after applying the transformations specified in section 7 are as follows:
E()=0 and 1, ∀ l= 0, 1,...,7.
| E()=f3k(λ400k − 1) | E()=f8k(λ004k − 1) | E()=f10k(λ400k − 1) |
| E()=f9k(λ004k − 1) | E()=f3k(λ040k − 1) | E(ϵ0kϵ2k)=f3k(λ400k − 1) |
| E(ϵ0kϵ4k)=f3k(λ220k − 1) | E(ϵ2kϵ4k)=f3k(λ220k − 1) | E(ϵ3kϵ4k)=f3k(λ022k − 1) |
| E(ϵ1kϵ4k)=f8k(λ022k − 1) | E(ϵ0kϵ3k)=f3k(λ202k − 1) | E(ϵ2kϵ3k)=f2k(λ202k − 1) |
| E(ϵ0kϵ1k)=f8k(λ202k − 1) | E(ϵ1kϵ2k)=f8k(λ202k − 1) | E(ϵ1kϵ3k)=f8k(λ004k − 1) |
| E()=f6k(λ040k − 1) | E()=f7k(λ004k − 1) | E(ϵ5kϵ6k)=f7k(λ022k − 1) |
| E()=f10k(λ040k − 1) |
Where the notations were previously defined in section 7.
and
Remark
When the data are free of random non-response, or for =0 and =0, the above assumptions agree with the typical results.
By applying these transformations to and , we obtain:
(34)
(35) By utilizing Equations (34) and (35), we may transform Equation (1) into:
(36) We obtain, based on equations (3) and (36), the following
(37) Given that the calibrated weight is in such close proximity to the strata weight , it is reasonable to make the assumption that...
In order for Equation (37) to be transformed into
(38) In the same way, when we apply the transformations to Equation (2), we obtain:
(39) By utilizing Equation (4) and (39), we obtain:
(40) Given that the calibrated weight is in such close proximity to the strata weight , it is reasonable to make the assumption that...
In order for Equation (40) to be transformed into
(41) The equations for the bias of the proposed estimator and , as well as the mean squared error (MSE) of and , are derived by taking expectations on both sides of Equations (38) and (41).
Data availability statement
Data will be made available on request.
References
- 1.Ahmad S., Hussain S., Shabbir J., Zahid E., Aamir M., Onyango R., et al. Improved estimation of finite population variance using dual supplementary information under stratified random sampling. Math. Probl. Eng. 2022;2022 [Google Scholar]
- 2.Alam S., Shabbir J. Calibration estimation of mean by using double use of auxiliary information. Commun. Stat., Simul. Comput. 2022;51(8):4769–4787. [Google Scholar]
- 3.Alam S., Singh S., Shabbir J. Optimal calibrated weights while minimizing a variance function. Commun. Stat., Theory Methods. 2023;52(5):1634–1651. [Google Scholar]
- 4.Aslam I., Noor-ul Amin M., Hanif M., Sharma P. Memory type ratio and product estimators under ranked-based sampling schemes. Commun. Stat., Theory Methods. 2023;52(4):1155–1177. [Google Scholar]
- 5.Aslam I., Noor-ul Amin M., Yasmeen U., Hanif M. Memory type ratio and product estimators in stratified sampling. J. Reliab. Stat. Stud. 2020:1–20. [Google Scholar]
- 6.Basit Z., Bhatti M.I. Efficient classes of estimators of population variance in two-phase successive sampling under random non-response. Statistica. 2022;82(2):177–198. [Google Scholar]
- 7.Deming W.E., Stephan F.F. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Stat. 1940;11(4):427–444. [Google Scholar]
- 8.Deville J.C., Särndal C.E. Calibration estimators in survey sampling. J. Am. Stat. Assoc. 1992;87(418):376–382. [Google Scholar]
- 9.El-Sheikh A.A., El-Kossaly H.A. Calibration estimation for ratio estimators in stratified sampling for proportion allocation. J. Progressive Res. Math. 2020;16(4):3199–3205. [Google Scholar]
- 10.Farrell P.J., Singh S. Model-assisted higher-order calibration of estimators of variance. Aust. N. Z. J. Stat. 2005;47(3):375–383. [Google Scholar]
- 11.Kim J.K., Park M. Calibration estimation in survey sampling. Int. Stat. Rev. 2010;78(1):21–39. [Google Scholar]
- 12.Kim J.M., Sungur E.A., Heo T.Y. Calibration approach estimators in stratified sampling. Stat. Probab. Lett. 2007;77(1):99–103. [Google Scholar]
- 13.Koyuncu N., Kadilar C. Calibration weighting in stratified random sampling. Commun. Stat., Simul. Comput. 2016;45(7):2267–2275. [Google Scholar]
- 14.Krewski D., Hocking R.R. Estimation of population variance in stratified successive sampling. J. Am. Stat. Assoc. 1978;73(361):137–141. [Google Scholar]
- 15.Masry E.S., Hedayat A.S. Estimation of population variance in stratified successive sampling with equal probabilities. J. Stat. Plan. Inference. 1987;15(1):75–88. [Google Scholar]
- 16.Neyman J., Vereshchagin N. On the use of stratified sampling in the estimation of population parameters. Ann. Math. Stat. 1938;9(3):293–296. [Google Scholar]
- 17.Özgül N. New calibration estimator in stratified sampling. J. Stat. Comput. Simul. 2018;88(13):2561–2572. [Google Scholar]
- 18.Pandey M.K., Singh G.N., Zaman T., Mutairi A.A., Mustafa M.S.A. A general class of improved population variance estimators under non-sampling errors using calibrated weights in stratified sampling. Sci. Rep. 2024;14(1):2948. doi: 10.1038/s41598-023-47234-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Raghunathan T.E., Grizzle J.E. Estimation of population variance in stratified successive sampling with unequal probabilities. Biometrika. 1995;82(3):519–528. [Google Scholar]
- 20.Särndal C.E. The calibration approach in survey theory and practice. Surv. Methodol. 2007;33(2):99–119. [Google Scholar]
- 21.Shahzad U., Ahmad I., García-Luengo A.V., Zaman T., Al-Noor N.H., Kumar A. Estimation of coefficient of variation using calibrated estimators in double stratified random sampling. Mathematics. 2023;11(1) [Google Scholar]
- 22.Singh G.N., Bhattacharyya D., Bandyopadhyay A. Calibration estimation of population variance under stratified successive sampling in presence of random non response. Commun. Stat., Theory Methods. 2021;50(19):4487–4509. [Google Scholar]
- 23.Singh G.N., Sharma A.K., Bandyopadhyay A. Effectual variance estimation strategy in two-occasion successive sampling in presence of random non response. Commun. Stat., Theory Methods. 2017;46(14):7201–7224. [Google Scholar]
- 24.Singh S., Joarder A.H. Estimation of finite population variance using random non-response in survey sampling. Metrika. 1998;47:241–249. [Google Scholar]
- 25.Srivastava D.K. Estimation of population variance in stratified successive sampling with unequal probabilities. J. Am. Stat. Assoc. 1987;82(398):516–523. [Google Scholar]
- 26.Sud U.C., Chandra H., Gupta V.K. Calibration approach-based regression-type estimator for inverse relationship between study and auxiliary variable. J. Stat. Theory Pract. 2014;8(4):707–721. [Google Scholar]
- 27.Zaman T., Bulut H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat., Theory Methods. 2021:1–15. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.







