Abstract
This article aims to suggest a new generalized class of estimators based on probability proportional to size sampling using two auxiliary variables. The numerical expressions for the bias and mean squared error (MSE) are derived up to the first order of approximation. Four actual data sets are used to examine the performances of a new improved generalized class of estimators. From the results of real data sets, it is examined that the suggested estimator gives the minimum MSE and the percentage relative efficiency is higher than all existing estimators, which shows the importance of the new generalized class of estimators. To check the strength and generalizability of our proposed class of estimators, a simulation study is also accompanied. The consequence of the simulation study shows the worth of newly found proposed class estimators. Overall, we get to the conclusion that the proposed estimator outperforms as compared to all other estimators taken into account in this study.
Keywords: A simulation study, probability proportional to size (PPS), auxiliary variables, bias, MSE, PRE
Introduction
In the survey sampling approach, estimating the finite population mean is a common issue, and many efforts have been made to improve the precision of the estimators. A comprehensive range of approaches for incorporating the auxiliary variables by using ratio, product, and regression-type estimates are defined in the literature. Mainly when there are multiple auxiliary variables, a wide range of estimators have been presented, each one combining ratio, product, or regression estimators. Researchers have previously attempted to use the best statistical features to estimate population parameters including variance, coefficient of variation, and kurtosis. A representative sample of the population is required for this set-up. If the population of interest is similar, then selecting units can be done using simple random sampling with or without replacement. The population parameters of the auxiliary variable should also be previously known when using the ratio, product, and regression estimation methods. By suitably adapting the auxiliary variables, many authors have suggested several estimators. The researcher can investigate these research findings by looking the Kadilar and Cingi 1 who recommended improvement in estimating the population mean in simple random sampling. Al-Omari 2 suggested ratio estimation of the population mean using the auxiliary information in simple random sampling and median ranked set sampling. Ozturk 3 proposed estimation of population mean and total in a finite population setting using multiple auxiliary variables. Yadav et al. 4 recommended the use of the auxiliary variables in searching efficient estimators of a population mean. Bhushan and Pandey 5 discussed the optimality of ratio-type imputation methods for the estimation of population mean using the higher order moment of an auxiliary variable. Zaman et al. 6 recommended robust regression-ratio-type estimators of the mean utilizing two auxiliary variables. Kumar and Saini 7 discussed a predictive approach for the finite population mean when auxiliary variables are attributes. Singh and Nigam 8 recommended a generalized class of estimators for finite population mean using two auxiliary variables in sample surveys. Bhushan et al. 9 proposed some improved classes of estimators in stratified sampling using bivariate auxiliary information. Shahzad et al. 10 discussed mean estimation using robust quantile regression with two auxiliary variables. Zaman et al. 6 recommended robust regression-ratio-type estimators of the mean utilizing two auxiliary variables. Mahdizadeh and Zamanzade 11 proposed an interval estimation of the population mean in ranked set sampling. Ahmad et al. 12 recommended a new improved generalized class of estimators for population distribution function using the auxiliary variable under simple random sampling. Muhammad et al. 13 suggested an enhanced ratio-type estimator for finite population mean using the auxiliary variable in simple random sampling. Ahmad et al. 14 discussed an improved generalized class of estimators in estimating the finite population mean using two auxiliary variables under two-stage sampling. Shahzad et al. 15 proposed a three-fold utilization of supplementary information for mean estimation under median-ranked set sampling scheme. Shahzad et al. 16 discussed the estimation of the population mean by successive use of an auxiliary variable in median ranked set sampling. Yasmeen et al. 17 proposed generalized exponential estimators of finite population mean using transformed auxiliary variables. Singh et al. 18 discussed an alternative efficient class of estimators for finite population mean using information on an auxiliary attribute in sample surveys. Singh et al. 19 recommended the estimation of finite population variance using scrambled responses in the presence of auxiliary information.
In many conditions, the population differs considerably in size, for example, in a medical study, the number of patients having a specific disease, and the size of health units may differ. Likewise, in a survey related to the income of the household, the household may have a different number of siblings, and then in such circumstances, the probability of units may change. For dealing with such unequal probability, we use probability proportional to size (PPS) sampling. PPS is an unequal random sampling in which the chance of gathering information is proportional to an auxiliary variable, for each sampling unit in the population. Consider the case where we need to assess the population in a province within a country; we take the auxiliary variable that has an association with the study variable. For example (i) Population of all provinces within the country (correlated with study variable = 0.95). (ii) Number of households in all communities within the province (correlated with the study variable = 0.99). Based on these facts (ii) may be more useful at the estimation stage. Many researchers have suggested several estimators by efficiently adjusting the auxiliary variables under PPS. The researcher can examine this research by Akpanta 20 who proposed the problems of PPS sampling in multicharacter surveys. Agarwal and Mannai 21 recommended a linear combination of estimators in PPS sampling to estimate the population mean and its robustness to optimum value. Abdulla et al. 22 suggested the selection of samples in PPS sampling using the cumulative relative frequency method. Andersen et al. 23 discussed optimal PPS sampling by vanishing the auxiliary variables with applications in microscopy. Alam et al. 24 discussed the selection of the samples with PPS. Patel and Bhatt 25 recommended the estimation of finite population total under PPS sampling in the presence of extra auxiliary information. Singh et al. 26 discussed an improved estimator of population total in PPS sampling. Makela et al. 27 suggested Bayesian inference under cluster sampling with PPS. Ahmad and Shabbir 28 discussed the use of extreme values to estimate the finite population mean under the PPS sampling scheme. Ozturk 29 proposed poststratified PPS sampling from stratified populations. Latpate et al. 30 discussed the scheme of PPS sampling. Sohil et al. 31 recommended optimum second call imputation in PPS sampling. Sinha and Khanna 32 discussed the estimation of population mean under PPS sampling with and without measurement errors. Zangeneh and Little 33 discussed Bayesian inference for the finite population total from a heteroscedastic PPS. Hentschel et al. 34 recommended exact PPS sampling with a bounded sample size. Barbiero et al. 35 proposed bootstrapping PPS samples via a calibrated empirical population. Gupt and Ahamed 36 discussed optimum stratification for a generalized auxiliary variable proportional to allocation under a super-population model. Ponkaew and Lawson 37 recommended new estimators for estimating the population total with an application to water demand in Thailand under unequal probability sampling without replacement for missing data. Al-Jararha 38 discussed a class of estimators using two units with PPS. Al-Marzouki et al. 39 proposed an estimation of finite population mean under PPS in the presence of maximum and minimum values. Zheng and Little 40 suggested penalized spline model-based estimation of the finite population total. Zheng and Little 41 recommended inference for the population total from probability-proportional-to-size samples based on predictions from a penalized spline nonparametric model. Amab 42 proposed the optimum estimation of a finite population total in PPS sampling with a replacement for multicharacter surveys. Olayiwolla et al. 43 suggested the PPS method to enhance the efficiency of the estimator in two-stage sampling.
In this article, the primary aim of the current work is to propose a new improved generalized class of estimators for the estimation of finite population mean using two auxiliary variables under PPS.
The bias and mean squared error (MSE) of the proposed estimator is derived up to the first order of approximation.
Through use of the real data sets from various domains and a simulation study, the application of the proposed estimator is highlighted.
All notations and symbols are given in the Appendix.
Review of existing estimators
In this section, we have studied some well-known existing estimators under PPS sampling.
-
The usual estimator under PPS, is given by:
(1) The variance of is given by:(2) -
The ratio estimator under PPS, is given by:
(3) The bias and MSE of , are given by:
and(4) -
Murthy, 44 suggested a product estimator, given by:
(5) The bias and MSE of , are given by:
and(6) -
The regression estimator, is given by:
where is constant. The optimum values of is given by:(7) The minimum variance of , is given by:(8) -
Bai et al. 45 proposed the following estimator, is given by:
where and are the unknown constants, the optimum values are given by:(9)
andThe minimum MSE of is given by:(10) -
Bahl and Tuteja 46 suggested the following ratio and product exponential type estimators, are given by:
and(11) (12) The biases and MSEs of , are given by:(13) (14) -
Haq and Shabbir 47 suggested the following exponential-type estimators, which are given by:
where and are constants. The bias and MSE of are given by:(15)
where ,The optimum values are
andThe minimum MSE of at the optimum values, is given by:
where(16) The second proposed estimator of , is given by:
where and are constants.(17) The bias of is given by:The optimum values of and are given by:
andThe minimum MSE of at the optimum values, is given by:
where(18) -
The bias of , is given by:The optimum values of and are given by:
andThe minimum MSE of , at the optimal values, is given by:
where(20) -
Singh et al. 49 suggested the following class of estimators, is given by:
where a and b are known constants.(21) The bias and MSE of , to the first order approximation are given by:
and(22)
where .(23) -
Grover and Kaur 50 suggested the following estimators and is given by:
where and are constants.(24) The bias and MSE of , are given by:(25) The optimum values of and are:
and
where(26) The minimum MSE of , is given by:
where(27)
Proposed estimator
An estimator's performance can be improved by using appropriate use of the auxiliary variables at the design or estimation stage. Based on these ideas, we examine to use one auxiliary varible (Z) under PPS and the other auxiliary variable (X) at the estimation stage. The proposed estimator is more robust as compared to ratio, product and regression estimators as it can take any type of data that exists in literature. Taking motivation from Ahmad et al.,51,52 we propose a new class of estimators using two auxiliary variables under PPS sampling.
(28) |
where
where and are the unknown constants, a and b are described earlier. Some family members of estimators are given in Table 1.
Table 1.
A | b | |||
---|---|---|---|---|
1 | ||||
1 | ||||
1 | ||||
1 | N |
After simplification of , we have
(29) |
Expanding (29), we get
(30) |
From (30), the bias of is given by:
(31) |
Squaring (31) and taking expectations, we obtain the MSE of as given by:
(32) |
where
Differentiate (32) w.r.t and , we get the optimum values of and as given by:
and
Putting the optimum values of and in (32), we get the minimum MSE of as given by:
(33) |
Numerical study
We carry out a numerical study to evaluate the performances of estimators. The following numerical expression is used to compute the percentage relative efficiency (PRE).
where
Population-I: (Source: Singh 53 )
Y = Expected total of fish in the year 1995,
X = Expected total of fish in the year 1994,
Z = Expected total of fish in the year 1993.
Population-II: (Source: Punjab Bureau of Statistics 54 )
Y = Total number of beds on the 30th June 2021,
X = Total allocated beds for COVID-19, 2021,
Z = Beds used by COVID-19, 2021.
Population-III: (Source: Punjab Bureau of Statistics 54 )
Y = Kids under age 5 whose childbirths are described listed with a public consultant,
X = Kids aged 5–17 years who are involved in child labor during the last week,
Z = Women aged 20–24 years who were first married before age 16.
Population-IV: (Source: Singh 53 )
Y = Expected total of fish in the year 1995,
X = Expected total of fish in the year 1994,
Z = Expected total of fish in the year 1992.
The summary statistics is given in Table 2 and results based on four populations are given in Tables 3–10. The simulation results are given in Tables 11–18.
Table 2.
Parameters | Population-I | Population-II | Population-III | Population-IV |
---|---|---|---|---|
N | 69 | 36 | 36 | 69 |
15 | 7 | 6 | 14 | |
4514.899 | 76.22889 | 660.1389 | 4514.899 | |
4954.435 | 14.77222 | 215.6389 | 4954.435 | |
0.4720461 | 1.568599 | 0.7089214 | 0.8523346 | |
0.5049075 | 0.7550356 | 0.7816491 | 0.8925777 | |
0.2660536 | 0.3004429 | 0.1852395 | 0.1542462 | |
0.06341113 | 0.3558289 | 0.1026463 | 0.1173466 | |
9.985055 | 2.567602 | 19.72718 | 9.985055 |
Table 3.
Estimators | ||||
---|---|---|---|---|
349398.9 | 349874.6 | 317890.6 | 306281.5 | |
5502774 | 349701.8 | 317896.8 | 306291.6 | |
947998 | 349900.5 | 317889.6 | 306280 | |
324666.8 | 349761.7 | 317894.6 | 306288.1 | |
319576.8 | 349898 | 317889.7 | 306280.1 | |
349903.4 | 349899.6 | 317889.7 | 306280 | |
548763.7 | 349795.1 | 317893.4 | 306286.2 | |
309030.9 | 349902.9 | 317889.6 | 306279.8 | |
315997.70 | 349154 | 317916.6 | 306323.9 | |
317893.26 | 347998.8 | 319576.5 | 309104 |
MSE: mean squared error.
Table 10.
Estimators | ||||
---|---|---|---|---|
100 | 89.88564 | 109.2892 | 120.9829 | |
56.38245 | 89.93946 | 109.2831 | 120.9714 | |
41.32714 | 89.87756 | 109.2902 | 120.9846 | |
102.4372 | 89.92077 | 109.2852 | 120.9754 | |
107.6263 | 89.87763 | 109.2902 | 120.9846 | |
89.87666 | 89.87734 | 109.2902 | 120.9846 | |
69.65273 | 89.93477 | 109.2836 | 120.9724 | |
118.0844 | 89.87676 | 109.2903 | 120.9848 | |
100.5374 | 90.27774 | 109.2442 | 120.8993 | |
105.366739 | 100.2257 | 107.626627 | 117.8567 |
PRE: percentage relative efficiency.
Table 11.
Estimators | ||||
---|---|---|---|---|
0.4223217 | 0.1634819 | 0.08864383 | 0.05332498 | |
0.0899548 | 0.1634819 | 0.08864764 | 0.05332499 | |
1.49594 | 0.1636143 | 0.08864378 | 0.05332498 | |
0.0886746 | 0.1636143 | 0.08870236 | 0.05332507 | |
0.08896743 | 0.1634819 | 0.08864491 | 0.05332507 | |
0.1634819 | 0.1636143 | 0.08866297 | 0.05332498 | |
0.8664743 | 0.1636143 | 0.08864384 | 0.05332501 | |
0.08896705 | 0.1634819 | 0.08864414 | 0.05332498 | |
0.0889674 | 0.1636143 | 0.08864813 | 0.05332498 | |
0.4223205 | 0.4059884 | 0.08896743 | 0.05332547 |
MSE: mean squared error.
Table 18.
Estimators | ||||
---|---|---|---|---|
100 | 259.1356 | 353.8811 | 355.6423 | |
284.5381 | 259.1356 | 353.7001 | 355.6423 | |
25.21584 | 259.0187 | 353.8835 | 355.6423 | |
338.6248 | 259.0187 | 351.1601 | 355.6423 | |
338.6264 | 259.1356 | 353.8322 | 355.6423 | |
259.0187 | 259.0187 | 353.0294 | 355.6423 | |
45.59487 | 259.0187 | 353.8804 | 355.6423 | |
338.6438 | 259.1356 | 353.8671 | 355.6423 | |
338.6279 | 259.0187 | 353.6651 | 355.6423 | |
100.0029 | 255.1670 | 338.3200 | 354.6000 |
PRE: percentage relative efficiency.
Table 4.
Estimators | ||||
---|---|---|---|---|
100 | 99.86404 | 109.9117 | 114.0777 | |
63.49505 | 99.91337 | 109.9095 | 114.0739 | |
36.8565 | 99.85663 | 109.912 | 114.0783 | |
107.6177 | 99.89625 | 109.9103 | 114.0752 | |
109.3317 | 99.85734 | 109.912 | 114.0782 | |
99.8558 | 99.85688 | 109.912 | 114.0782 | |
63.67018 | 99.88673 | 109.9107 | 114.076 | |
113.0628 | 99.85596 | 109.912 | 114.0783 | |
110.568 | 100.0701 | 109.9027 | 114.0619 | |
109.91 | 100.4023 | 109.3318 | 113.036 |
PRE: percentage relative efficiency.
Table 5.
Estimators | ||||
---|---|---|---|---|
2042.513 | 1867.158 | 1380.42 | 1266.972 | |
1924,985 | 1876.736 | 1386.883 | 1273.903 | |
3106.509 | 1866.099 | 1379.451 | 1265.934 | |
1858.144 | 1892.744 | 1393.463 | 1280.965 | |
1407.928 | 1866.659 | 1379.973 | 1266.493 | |
1865.441 | 1868.404 | 1381.469 | 1268.096 | |
2456.203 | 1871.413 | 1383.692 | 1270.48 | |
1288.355 | 1865.91 | 1379.269 | 1265.739 | |
1357.061 | 1902.869 | 1396.35 | 1284.067 | |
1348.015 | 2034.616 | 1407.907 | 1296.493 |
MSE: mean squared error.
Table 6.
Estimators | ||||
---|---|---|---|---|
100 | 109.3916 | 147.9632 | 161.2122 | |
106.1054 | 108.8333 | 147.2736 | 160.3351 | |
65.74947 | 109.4536 | 148.0671 | 161.3444 | |
109.9222 | 107.9128 | 146.5782 | 159.4511 | |
109.9222 | 109.4208 | 148.0111 | 161.2731 | |
145.0723 | 109.3186 | 147.8508 | 161.0692 | |
109.4923 | 109.1428 | 147.6132 | 160.767 | |
83.15736 | 109.4647 | 148.0866 | 161.3692 | |
150.511 | 107.3386 | 146.2751 | 159.066 | |
151.520 | 100.3881 | 145.0744 | 157.5414 |
PRE: percentage relative efficiency.
Table 7.
Estimators | ||||
---|---|---|---|---|
31287.35 | 40038.97 | 31729.26 | 26349.13 | |
56543.03 | 38983.65 | 31868.64 | 26559.61 | |
82103.92 | 40135.31 | 31716.69 | 26330.22 | |
30213.76 | 39332.51 | 31822.19 | 26489.27 | |
28254.8 | 40127.88 | 31717.65 | 26331.68 | |
34406.16 | 40132.11 | 31717.1 | 26330.85 | |
47186.6 | 39620.37 | 31784.16 | 26431.83 | |
24431.36 | 40139.87 | 31716.09 | 26329.33 | |
26895.349 | 36482.52 | 32226.4 | 27108.34 | |
27802.337 | 36308.52 | 32610.92 | 27713.8 |
MSE: mean squared error.
Table 8.
Estimators | ||||
---|---|---|---|---|
100 | 91.16594 | 114.3605 | 138.5317 | |
55.3337 | 93.63389 | 114.5386 | 137.4339 | |
38.10701 | 90.94712 | 115.0874 | 138.6312 | |
103.5533 | 92.80341 | 114.7058 | 137.7988 | |
103.5533 | 90.96396 | 115.0839 | 138.6235 | |
111.9295 | 90.95436 | 115.0859 | 138.6279 | |
90.93532 | 92.12913 | 114.8431 | 138.0983 | |
66.30558 | 90.93677 | 115.0896 | 138.6359 | |
116.330 | 100.0531 | 113.2671 | 134.6519 | |
112.535 | 100.5326 | 111.9315 | 131.7102 |
PRE: percentage relative efficiency.
Table 9.
Estimators | ||||
---|---|---|---|---|
1057763 | 1176787 | 967856.1 | 874307.9 | |
1876050 | 1176083 | 967911 | 874391 | |
2559487 | 1176893 | 967847.9 | 874295.4 | |
1032596 | 1176327 | 967891.9 | 874362.1 | |
982810.8 | 1176892 | 967847.9 | 874295.5 | |
1176905 | 1176896 | 967847.6 | 874295.1 | |
1518623 | 1176144 | 967906.2 | 874383.7 | |
895768.3 | 1176903 | 967847 | 874294.2 | |
952013.35 | 1171676 | 968255 | 874911.8 | |
973909.40 | 1055381 | 982807.9 | 897499 |
MSE: mean squared error.
Table 12.
Estimators | ||||
---|---|---|---|---|
100 | 258.3294 | 476.4254 | 491.0944 | |
469.4821 | 258.3294 | 476.4049 | 491.0942 | |
28.2312 | 258.1202 | 476.4257 | 491.0948 | |
474.6923 | 258.1202 | 476.111 | 491.0936 | |
474.6925 | 258.3294 | 476.4196 | 491.0944 | |
258.3294 | 258.1202 | 476.3225 | 491.0942 | |
48.74025 | 258.1202 | 476.4253 | 491.0944 | |
474.6945 | 258.3294 | 476.4237 | 491.0945 | |
474.6927 | 258.1202 | 476.4023 | 491.0944 | |
100.0003 | 104.0231 | 474.6925 | 491.0899 |
PRE: percentage relative efficiency.
Table 13.
Estimators | ||||
---|---|---|---|---|
1.097883 | 0.351711 | 0.1064713 | 0.1060176 | |
0.1083106 | 0.351711 | 0.1064953 | 0.1060176 | |
4.098543 | 0.3517714 | 0.106471 | 0.1060517 | |
0.1082472 | 0.3517714 | 0.1068609 | 0.1060517 | |
0.1082472 | 0.351711 | 0.1064787 | 0.1060176 | |
0.351711 | 0.3517714 | 0.106609 | 0.1060517 | |
2.346827 | 0.3517714 | 0.1064713 | 0.1060517 | |
0.108245 | 0.351711 | 0.1064735 | 0.1060176 | |
0.1082468 | 0.3517714 | 0.1064966 | 0.1060517 | |
1.097874 | 1.061622 | 0.1081982 | 0.1064735 |
MSE: mean squared error.
Table 14.
Estimators | ||||
---|---|---|---|---|
100 | 312.1549 | 1031.154 | 1035.567 | |
1013.643 | 312.1549 | 1030.921 | 1035.567 | |
26.78716 | 312.1013 | 1031.157 | 1035.234 | |
1014.237 | 312.1013 | 1027.395 | 1035.234 | |
1014.237 | 312.1549 | 1031.083 | 1035.567 | |
312.1549 | 312.1013 | 1029.822 | 1035.234 | |
46.78159 | 312.1013 | 1031.154 | 1035.234 | |
1014.257 | 312.1549 | 1031.133 | 1035.567 | |
1014.241 | 312.1013 | 1030.697 | 1035.234 | |
100.0008 | 103.4115 | 1014.697 | 1031.133 |
PRE: percentage relative efficiency.
Table 15.
Estimators | ||||
---|---|---|---|---|
0.60094 | 0.51537 | 0.482104 | 0.478010 | |
0.9517461 | 0.51537 | 0.48214 | 0.478010 | |
2.33747 | 0.51542 | 0.4821036 | 0.478013 | |
0.4859471 | 0.51542 | 0.4827192 | 0.478013 | |
0.4859461 | 0.51537 | 0.4821088 | 0.478010 | |
0.515426 | 0.51542 | 0.4822075 | 0.478013 | |
1.208288 | 0.51542 | 0.4821048 | 0.478013 | |
0.4859419 | 0.51537 | 0.4821052 | 0.478010 | |
0.4859467 | 0.51542 | 0.4821863 | 0.478013 | |
0.6009381 | 0.58768 | 0.4858253 | 0.48186 |
MSE: mean squared error.
Table 16.
Estimators | ||||
---|---|---|---|---|
100 | 116.6020 | 124.6495 | 125.717 | |
63.14079 | 116.6020 | 124.6401 | 125.715 | |
25.70899 | 116.5909 | 124.6496 | 125.716 | |
123.6637 | 116.5909 | 124.4906 | 125.716 | |
123.6639 | 116.6020 | 124.6482 | 125.717 | |
116.5909 | 116.5909 | 124.6227 | 125.716 | |
49.73483 | 116.5909 | 124.6492 | 125.716 | |
123.665 | 116.6020 | 124.6491 | 125.717 | |
123.6638 | 116.5909 | 124.6282 | 125.716 | |
100.0003 | 102.2563 | 123.6947 | 124.710 |
PRE: percentage relative efficiency.
Table 17.
Estimators | ||||
---|---|---|---|---|
3.976607 | 1.535259 | 1.123713 | 1.118145 | |
1.397566 | 1.535259 | 1.124288 | 1.118145 | |
15.77027 | 1.535259 | 1.123705 | 1.118147 | |
1.17434 | 1.535259 | 1.13242 | 1.118147 | |
1.174335 | 1.535259 | 1.123868 | 1.118145 | |
1.535259 | 1.535259 | 1.126424 | 1.118147 | |
8.174274 | 1.535259 | 1.123715 | 1.118147 | |
1.174274 | 1.535259 | 1.123757 | 1.118145 | |
1.17433 | 1.535259 | 1.124399 | 1.118147 | |
3.976492 | 1.55843 | 1.174339 | 1.121434 |
MSE: mean squared error.
Simulation analysis
We have produced four populations of size 5000 from a bivariate normal distribution with unlike covariance matrices. The population means and covariance matrices are given below:
Population-I:
and
Population-II:
and
Population-III:
and
Population-IV:
and
Discussion
To calculate the achievability of the proposed estimators in comparison to the existing estimators, four data sets and a simulation analysis were performed. Four natural data sets were used in the empirical study. We also performed the simulation study, to check the reliability and generalizability of the new improved class of estimators. The consistency findings demonstrated that the proposed estimators were more accurate and less biased than conventional and other well-known existing estimators. Table 2 provides summary statistics for the available datasets. Tables 3–10 contain the MSE and PRE results based on the real data sets. The numerical findings based on real data sets show that our suggested estimators are the best among all existing estimators. Tables 11–18 include the MSE and PRE results utilizing simulated data sets. The results of the simulation analysis also clearly show that the PRE of the proposed estimator is higher than the existing estimators, which are considered in this study. Therefore it observed from the numerical results that our proposed estimators are the best among all the existing counterparts.
From the numerical results, presented in Tables 3–18, we would like to remind that the MSE and percentage relative efficiency of all the proposed classes of estimators changeover according to different choices of a and b. Based on both real data sets and a simulation analysis, if we used (a = 1 and b = ), ( = 1 and b = ), (a = and b = ) we get the largest values of percentage relative efficiencies of all families among different classes. In this way, choosing a and b as the coefficient of variation, kurtosis, and association coefficient in the families of estimators give the best results. While from the numerical results the percentage relative efficiencies of our suggested family are declining across the values of (a = 1 and b = N ). Greater improvements in efficiency are observed by using the proposed estimator over some existing estimators under probability proportional to size sampling. The results incorporated in this study are very sound and quite enlightening. Therefore, it is recommended that the proposed estimator is useful in practice.
Concluding remarks
In this article, we proposed an improved generalized class of estimators using two auxiliary information based on probability proportional to size sampling. Ten new estimators are generated from the proposed class of estimators, which are presented in Table 1. The proposed generalized class of estimators is compared with several existing estimators to judge their uniqueness and superiority using four real data sets. Moreover, a simulation study is also conducted to check the robustness and generalizability of the proposed estimator. The MSE of the proposed and existing estimators are derived up to the first order of approximation. The proposed class of estimators performs well as compared to its existing estimators, as shown by the results of four real data sets and a simulation study. It has been validated through empirical efficiency comparisons that our proposed class of estimators performs more effectively than the traditional estimators. The current work can be extended easily to an estimation of population means using the auxiliary variables based on measurement error, non-response, and stratified random sampling.
Author biographies
Sohaib Ahmad is a PhD scholar at Abdul Wali Khan University Mardan. His research interests include survey sampling, randomized response, and Data analysis. He published several research articles in the same field.
Javid Shabbir is a professor in the Department of Statistics, University of Wah, Pakistan. His research direction is advanced survey sampling and randomized response.
Erum Zahid is working in the Department of Applied Mathematics and Statistics, Institute of Space Technology Islamabad, Pakistan. Her research direction includes survey sampling, spatial statistics and data analysis.
Muhammad Aamir working as an assistant professor at Abdul Wali Khan University, Mardan, Pakistan. His research direction is survey sampling, time series analysis, machine learning, and he has deep insights on the accuracy of forecasting models.
Mohammed Alqawba is working in the Department of Mathematics, College of Science and Arts, Qassim University, Ar Rass, Saudi Arabia. Hir research direction includes time series analysis, survey sampling, distribution theory and stochastic processes.
Appendix
Notations and symbols
Consider a finite population Ω = { , , … , }. Let and { , } be the characteristics of the study variable (Y) and the auxiliary variable (X, Z) respectively. We draw a sample of size n by using PPS sampling taking Z as size of the unity, i.e., , be the PPS sampling for obtaining the units. We draw a sample of size n by adopting the PPS sampling with replacement.
Define
, , be the sample means corresponding to population means and .
We consider the following error terms for obtaining the properties, i.e., bias and MSE of the estimators is given by:
, , be the population coefficient of variation such that
and
Let , be the correlation coefficient of u on v.
Footnotes
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The authors received no financial support for the research, authorship, and/or publication of this article.
ORCID iD: Sohaib Ahmad https://orcid.org/0000-0003-2582-2265
References
- 1.Kadilar C, Cingi H. Improvement in estimating the population mean in simple random sampling. Appl Math Lett 2006; 19: 75–79. [Google Scholar]
- 2.Al-Omari AI. Ratio estimation of the population mean using auxiliary information in simple random sampling and median ranked set sampling. Stat Probab Lett 2012; 82: 1883–1890. [Google Scholar]
- 3.Ozturk O. Estimation of population mean and total in a finite population setting using multiple auxiliary variables. J Agric Biol Environ Stat 2014; 19: 161–184. [Google Scholar]
- 4.Yadav SK, Sharma DK, Mishra SSet al. et al. Use of auxiliary variables in searching efficient estimator of population mean. Int J Multivar Data Anal 2018; 1: 230–244. [Google Scholar]
- 5.Bhushan S, Pandey AP. Optimality of ratio-type imputation methods for estimation of population mean using higher order moment of an auxiliary variable. J Stat Theory Pract 2021; 15: 1–35. [Google Scholar]
- 6.Zaman T, Dünder E, Audu A, et al. Robust regression-ratio-type estimators of the mean utilizing two auxiliary variables: a simulation study. Math Probl Eng 2021; 2021: 1–9. [Google Scholar]
- 7.Kumar A, Saini M. A predictive approach for finite population mean when auxiliary variables are attributes. Thailand Stat 2022; 20: 575–584. [Google Scholar]
- 8.Singh HP, Nigam P. A generalized class of estimators for finite population mean using two auxiliary variables in sample surveys. J Reliab StatStud 2022; 15: 61–104. [Google Scholar]
- 9.Bhushan S, Kumar A, Onyango Ret al. et al. Some improved classes of estimators in stratified sampling using bivariate auxiliary information. J Probab Stat 2022; 2022: 1–23. [Google Scholar]
- 10.Shahzad U, Ahmad I, Almanjahie IM, et al. Mean estimation using robust quantile regression with two auxiliary variables. Sci Iran 2022; 30: 1245–1254. [Google Scholar]
- 11.Mahdizadeh M, Zamanzade E. On interval estimation of the population mean in ranked set sampling. Commun Stat - Simul Comput 2022; 51: 2747–2768. [Google Scholar]
- 12.Ahmad S, Ullah K, Zahid E, et al. A new improved generalized class of estimators for population distribution function using auxiliary variable under simple random sampling. Sci Rep 2023; 13: 5415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Muhammad I, Zakari Y, Abdu M, et al. Enhanced ratio-type estimator for finite population mean using auxiliary variable in simple random sampling. Ratio (Oxf) 2023; 5: 242–252. [Google Scholar]
- 14.Ahmad S, Hussain S, Shabbir J, et al. Improved generalized class of estimators in estimating the finite population mean using two auxiliary variables under two-stage sampling. AIMS Mathematics 2022; 7: 10609–10624. [Google Scholar]
- 15.Shahzad U, Ahmad I, Almanjahie IMet al. et al. Three-fold utilization of supplementary information for mean estimation under median ranked set sampling scheme. Plos One 2022; 17: e0276514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Shahzad U, Ahmad I, Oral E, et al. Estimation of the population mean by successive use of an auxiliary variable in median ranked set sampling. Math Popul Stud 2021; 28: 176–199. [Google Scholar]
- 17.Yasmeen U, Noor ul Amin M, Hanif M. Generalized exponential estimators of finite population mean using transformed auxiliary variables. Int J Appl Comput Math 2015; 1: 589–598. [Google Scholar]
- 18.Singh HP, Malviya P, Tailor R. An alternative efficient class of estimators for finite population mean using information on an auxiliary attribute in sample surveys. J Stat Theory Pract 2023; 17: 2. [Google Scholar]
- 19.Singh S, Sedory SA, Arnab R. Estimation of finite population variance using scrambled responses in the presence of auxiliary information. Commun Stat - Simul Comput 2015; 44: 1050–1065. [Google Scholar]
- 20.Akpanta AC. On the problems of PPS sampling in multi-character surveys. Global J Math Sci 2009; 8: 31–42. [Google Scholar]
- 21.Agarwal SK, Al Mannai M. Linear combination of estimators in probability proportional to sizes sampling to estimate the population mean and its robustness to optimum value. Statistica 2009; 69. [Google Scholar]
- 22.Abdulla F, Hossain M, Rahman M. On the selection of samples in probability proportional to size sampling: cumulative relative frequency method. Math Theor Model 2014; 4: 102Á7. [Google Scholar]
- 23.Andersen IT, Hahn U, Vedel Jensen EB. Optimal PPS sampling with vanishing auxiliary variables–with applications in microscopy. Scand J Stat 2015; 42: 1136–1148. [Google Scholar]
- 24.Alam M, Sumy SA, Parh YA. Selection of the samples with probability proportional to size. Sci J Appl Math Stat 2015; 3: 230–233. [Google Scholar]
- 25.Patel PA, Bhatt S. Estimation of finite population total under PPS sampling in presence of extra auxiliary information. Int J Stat Anal 2016; 6: 9–16. [Google Scholar]
- 26.Singh HP, Mishra AC, Pal SK. Improved estimator of population total in PPS sampling. Commun Stat - Theory Methods 2018; 47: 912–934. [Google Scholar]
- 27.Makela S, Si Y, Gelman A. Bayesian Inference under cluster sampling with probability proportional to size. Stat Med 2018; 37: 3849–3868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ahmad S, Shabbir J. Use of extreme values to estimate finite population mean under PPS sampling scheme. J Reliab Stat Stud 2018; 11: 99–112. [Google Scholar]
- 29.Ozturk O. Post-stratified probability-proportional-to-size sampling from stratified populations. J Agric Biol Environ Stat 2019; 24: 693–718. [Google Scholar]
- 30.Latpate R, Kshirsagar J, Kumar Gupta Vet al. et al. Probability proportional to size sampling. In: Advanced sampling methods. Singapore: Springer, 2021, pp.85–98. [Google Scholar]
- 31.Sohil F, Sohail MU, Shabbir J. Optimum second call imputation in PPS sampling. PLoS One 2022; 17: e0261834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Sinha RR, Khanna B. Estimation of population mean under probability proportional to size sampling with and without measurement errors. Concurrency Comput Pract Exper 2022; 34: e7023. [Google Scholar]
- 33.Zangeneh SZ, Little RJ. Bayesian Inference for the finite population total from a heteroscedastic probability proportional to size sample. J Surv Stat Methodol 2015; 3: 162–192. [Google Scholar]
- 34.Hentschel B, Haas PJ, Tian Y. Exact PPS sampling with bounded sample size. Inf Process Lett 2023; 182: 106382. [Google Scholar]
- 35.Barbiero A, Manzi G, Mecatti F. Bootstrapping probability-proportional-to-size samples via calibrated empirical population. J Stat Comput Simul 2015; 85: 608–620. [Google Scholar]
- 36.Gupt BK, Ahamed MI. Optimum stratification for a generalized auxiliary variable proportional allocation under a superpopulation model. Commun Stat-Theory Methods 2022; 51: 3269–3284. [Google Scholar]
- 37.Ponkaew C, Lawson N. New estimators for estimating population total: an application to water demand in Thailand under unequal probability sampling without replacement for missing data. PeerJ 2022; 10: e14551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Al-Jararha J. A class of sampling two units with probability proportional to size. Commun Stat – Simul Comput 2013; 42: 1906–1916. [Google Scholar]
- 39.Al-Marzouki S, Chesneau C, Akhtar S, et al. Estimation of finite population mean under PPS in presence of maximum and minimum values. AIMS Math 2021; 6: 5397–5409. [Google Scholar]
- 40.Zheng H, Little RJ. Penalized spline model-based estimation of the finite populations total from probability-proportional-to-size samples. J Off Stat 2003; 19: 99. [Google Scholar]
- 41.Zheng H, Little JA. Inference for the population total from probability-proportional-to-size samples based on predictions from a penalized spline nonparametric model. J Off Stat 2005; 21: 1. [Google Scholar]
- 42.Amab R. Optimum estimation of a finite population total in PPS sampling with replacement for multi-character surveys. J Ind Soc Agril Statist 2004; 58: 231–243. [Google Scholar]
- 43.Olayiwolla OM, Apantaku FS, Wale-Orojo OA, et al. Probability proportional to size (PPS) method to enhance efficiency of estimator in two stage sampling. Ann Comput Sci Ser 2019; 17: 311–315. [Google Scholar]
- 44.Murthy MN. Product method of estimation. Sankhya: Indian J Stat, Ser A 1964; 26: 69–74. [Google Scholar]
- 45.Bai ZD, Miao BQ and Rao CR, Estimation of direction of arrival of signals: Asymptotic results. In: Haykin S (ed) Advances in spectrum analysis and array processing, vol. II, Chapter 9, Englewood, Cliffs, NJ: Prentice Hall, 1991. [Google Scholar]
- 46.Bahl S, Tuteja R. Ratio and product type exponential estimators. J Inf Optim Sci 1991; 12: 159–164. [Google Scholar]
- 47.Abdul HAQ, Shabbir J. Improved exponential type estimators of finite population mean under complete and partial auxiliary information. Hacettepe J Math Stat 2014; 43: 1079–1093. [Google Scholar]
- 48.Ekpenyong EJ, Enang EI. Efficient exponential ratio estimator for estimating the population mean in simple random sampling. Hacettepe J Math Stat 2015; 44: 689–705. [Google Scholar]
- 49.Singh R, Chauhan P, Sawan Net al. et al. Improvement in estimating the population mean using exponential estimator in simple random sampling. Int J Stat Econ 2009; 3: 13–18. [Google Scholar]
- 50.Grover LK, Kaur P. A generalized class of ratio type exponential estimators of population mean under linear transformation of auxiliary variable. Commun Stat - Simul Comput 2014; 43: 1552–1574. [Google Scholar]
- 51.Ahmad S, Hussain S, Zahid E, et al. A simulation study: Population distribution function estimation using dual auxiliary information under stratified sampling scheme. Math Probl Eng 2022; 2022: 1–13. [Google Scholar]
- 52.Ahmad S, Hussain S, Aamir M, et al. Dual use of auxiliary information for estimating the finite population mean under the stratified random sampling scheme. J of Math 2021; 2021: 1–12. [Google Scholar]
- 53.Singh S. Advanced sampling theory with applications: How Michael “selected” Amy. Berlin, Germany: Springer Science & Business Media, 2003. [Google Scholar]
- 54.Punjab Bureau of Statistics, Punjab, Pakistan, 2021–2022. https://bos.punjab.gov.pk