Abstract
In biological and medical research, the cost and collateral damage caused during the collection and measurement of a sample are the reasons behind a compromise on the inference with a fixed and accepted approximation error. The ranked set sampling (RSS) performs better in such scenarios, and the use of auxiliary information even enhances the performance of the estimators. In this study, two generalized classes of estimators are proposed to estimate the population variance using RSS and information of auxiliary variable. The bias and mean square errors of the proposed classes of estimators are derived up to first order of approximation. Some special cases of one of the proposed class of estimators are also considered in the presence of available population parameters. A simulation study was conducted to see the performance of the members of the proposed family by using various sample sizes. The real-life data application is done to estimate the variance of gestational age of fetuses with supplementary information. The results showed that RSS design is a more accurate method than simple random sampling, to determine the population variance of hard-to-measure or destructive sampling units.
Subject terms: Mathematics and computing, Applied mathematics, Statistics
Introduction
In many scientific fields; such as medicine, agriculture and environmental studies; various sampling methods are used to collect the data for inferences. During research studies, many environmental and biological constraints disturb the data collection procedure such as sample size, cost per sample, and destructible sample units of the study variable. These constraints highly affect the statistical analysis and inference of the study. However, ranked set sampling (RSS) design can perform better in such scenarios. McIntyre introduced the RSS technique where he applied the scheme for average yield estimation of pasture to reduce the sampling cost1. Later on, Stokes suggested a classical estimator for population variance in RSS with the concept of ranking error2. An unbiased estimator of the variance of a population under a ranked set sample is developed and is proved better than the Stokes estimator, even in small samples 3. Another study was conducted to evaluate the estimation of population proportion under RSS and its respective variations4. The efficiency of the estimator increased when the supplementary information is used alongside the study variable because there is an existence of a correlation between the estimating variables and auxiliary variables5.
In literature, extensive work is performed related to ratio estimation for the population mean using RSS. A study was conducted in which the ratio estimators were developed and compared in two different designs (simple RSS & Extreme RSS)6. The scheme of RSS received great attention of researchers, a recent study was published on balanced and unbalanced RSS7. The comprehensive work related to non parametric RSS is also available8,9.
In literature, detailed work is available for estimation of population variance in SRS. A gap is found in literature regarding the availability of estimators for population variance under RSS. This study is a little effort to address this deficiency. We have proposed a class of generalized estimators for population variance under RSS. The mean square error and bias of the proposed class of estimators is derived up to the first degree of approximation. Several members of the proposed class are developed depending upon the availability of type of supplementary information such as mean, median, tri-mean, coefficient of variation, coefficient of correlation, coefficient of skewness, kurtosis and quartile deviation. A comparison of the mean square errors on real-life data in both sampling designs (SRS & RSS) is performed to evaluate the performance of these member estimators. Moreover, the relative efficiency of these estimators is calculated in a simulation study based upon an artificial population and various sample sizes for estimation of the population variance.
To estimate the population variance, consider a population of size N that is labelled as A sample of size is drawn from that has a bivariate normal distribution. The process of sampling consists of random samples, each of size that are drawn from the population and the elements of each nth set are ordered on the basis of auxiliary variable. The smallest observation is then measured from the first sample and the second smallest from the second sample. The process is continued in this manner until the largest observation has been measured from the nth set. This entire cycle is repeated mth time and the sample unit is drawn from the rth order of nth set, out of the ith cycle. Let and be the value of the study variable and the auxiliary variable respectively, where ‘ith’ value occurred in the ‘mth’ cycle, as and the ‘r’ is the ordered value ranked based on auxiliary variable in ‘nth’ sets, as . Both samples are drawn using RSS methodology where the study variable is ranked based on an auxiliary variable. The overall averages and the variances of the ranked set sample are , , and respectively. The ordered means and variances are , , and respectively. The other ordered measures used in this article are , , and . Suppose ,
and . The expectation of square error terms are
where , , , and .
Stokes considered the errors in judgment and suggested an estimator for ; which is asymptotically unbiased and more efficient than the usual SRS unbiased estimator for 2
| 1 |
where .
The variance of is obtained by Stokes2 as
| 2 |
Hadhrami have proposed the ratio estimator for the population variance based on RSS10 as
| 3 |
where . The MSE and bias of the above estimator are
| 4 |
| 5 |
Materials and method
In this study, it is assumed that both study variable (Y) and auxiliary variable (X) have a bivariate normal distribution with high positive correlation, say . The ranking is done on the basis of auxiliary variable as it is easily and cheaply available. The variables and are both sampled by the RSS method1. Here the estimator of population variance. The R-launguage has been used to conduct the simulation study of all the forms of estimators and to compute the relative efficiency.
Classical generalized ratio estimator
Motivated by the members of the class of estimators11, we have developed a generalized ratio estimator for the finite population variance under RSS scheme as:
| 6 |
where can be . If then we have the ratio estimator of population variance from and if then we have the product estimator of population variance and when then it is equal to the sample variance. After the simplification and taking expectations, we have following expressions for the bias and MSE of the proposed class of estimators
| 7 |
The bias is
| 8 |
Applying expectations in Eq. (7), the mean square error is:
| 9 |
Generalized class of estimators with auxiliary information
Motivated by Singh12, we have proposed another generalized class of ratio estimators to estimate the finite population variance by utilizing single auxiliary information under RSS technique. The proposed estimator is:
| 10 |
where and are the constants which take finite values and are function of known population parameters of auxiliary variable , such as and . When values of are suitably chosen then several existing estimators can be obtained from proposed generalized class of estimators T. In addition to existing estimators, some new estimators are generated from proposed class of estimators which are given in Table 1.
Table 1.
Some members of class of estimators.
| Estimator | Values of constant | |||||
|---|---|---|---|---|---|---|
| a | b | c | d | |||
| 1 | 1 | 1 | 1 | |||
| 1 | 1 | 1 | 1 | |||
| 1 | 1 | |||||
| 1 | 1 | |||||
| 1 | 1 | |||||
| 1 | 1 | |||||
Using error term notations in Eq. (10), we get
| 11 |
where , . Taking expectation and after simplification we have
| 12 |
The bias is obtained by using error notations terms from section-I (“Introduction” section) and is
| 13 |
Following expression of MSE is obtained after taking square and expectation of Eq. (12) and ignoring the higher order terms as
| 14 |
where . Using notation given in section-I (Introduction) the expression of mean square error is
| 15 |
where , .
, and .
Differentiating Eq. (15) with respect to and and equating to zero, the optimum values of and are, respectively, obtained as
| 16 |
and
| 17 |
Using the above optimum values, the minimum MSE of generalized class of estimators T is
| 18 |
From above generalized class of ratio estimators many forms can be formed on the basis of availbilty of population parameters of suplementry information. Some members of this class are given in Table 1 above.
Results
In this section, the real-life data is used for empirical study to obtain mean square error for explaining the advantage of RSS estimators over simple random sampling (SRS). Next, the simulation study is presented with the percent relative efficiencies of various estimators.
Applications
The RSS has an advantage in biostatistics to provide greater efficacy in small sample sizes when the variable of interest is difficult to obtain, destructive and costly. A study on the “Assessment of gestational age and weight” was done by a student in 2014–2015 in which the accuracy of gestational age was assessed by Ultrasound. A total of 400 ultrasounds were performed on pregnant ladies. From this study we have taken two highly correlated variables X = femur length of fetus and Y = gestational age. We have, then calculated the mean square error of estimators by RSS and SRS procedure shown in Table 2. From a population size of 400, we have drawn one sample of size = 12 where set size (m = 3) and no. of cycles are (n = 4) from RSS and the other sample of size of 12 is drawn by using the SRS method. The measures obtained from the population are = 6.1488, = 31.990 = 0.831, = 17.025, = − 0.223, = − 0.649, = 0.997, = 6.175, = 0.75, = 6.200, = 6.9146, = 0.1985.
Table 2.
Mean square error of different estimators.
| Estimators | SRS MSE | RSS MSE |
|---|---|---|
| 25,450.88 | 19,152.43 | |
| 25,818.66 | 19,412.12 | |
| 378.959 | 290.2143 | |
| 960.973 | 204.0161 | |
| 286.786 | 259.7252 | |
| 807.0286 | 259.5786 | |
| 287.4439 | 259.7266 | |
| 264.5108 | 259.6234 |
Based on the Table 2, it is obvious that the mean square error of the RSS estimators has lower value than the SRS estimators. The estimator is a product estimator and its mean square error is near to each other in both sampling designs as the both variables have negative correlation in real-life data.
Simulation study
The performance of the proposed estimator is compared with the existing estimator based on simulation study. The simulation study is performed by generating random observation from a normal distribution. We generated artificial population of size N = 5000 on the auxiliary variable from a normal distribution with mean 10 and standard deviation 2. Using the auxiliary variable, the study variable was generated by using the following linear equation
where is After generating the random population artificially, both sampling techniques RSS & SRS are performed to draw two independent samples respectively and we have computed all the forms of the proposed generalized estimators in different sample sizes for comparison. The procedure is repeated for 10,000 times and using 10,000 values of each estimator, the variance of each estimator is calculated. The results are given in Table 3 below. The percent relative efficiency of estimator calculated from the simulated variance of estimators by RSS and SRS procedure. The behavior of simulated variances in RSS and SRS is shown by graph-I which contains relative efficiency at different sample sizes.
Table 3.
Percent relative efficiencies (PREs) of estimators at rho = 0.90.
| Percentage relative efficiency | |||||
|---|---|---|---|---|---|
| SRS sample size | 9 | 12 | 16 | 20 | 25 |
| RSS sample size | m = 3, n = 3 | m = 3, n = 4 | m = 4, n = 4 | m = 4, n = 5 | m = 5, n = 5 |
| T1 | 2790.1020 | 2239.1620 | 2397.1550 | 2044.3430 | 2262.4710 |
| T2 | 0.0323 | 0.3480 | 0.3819 | 0.4154 | 0.4776 |
| T3 | 231.1386 | 260.9198 | 231.1386 | 192.9843 | 105.1537 |
| T4 | 277.3733 | 181.7453 | 135.8969 | 130.7218 | 123.0522 |
| T5 | 29.2569 | 28.1691 | 28.8956 | 27.8468 | 31.4923 |
| T6 | 283.0801 | 293.5363 | 394.2184 | 422.9133 | 488.3382 |
| T7 | 18.5723 | 5.2971 | 6.0491 | 5.2340 | 6.0502 |
| T8 | 8.0509 | 5.3677 | 6.9015 | 5.5186 | 5.9868 |
Significant values are in bold.
As shown in table 3, T1 (proposed RSS estimator) performed more than 2000 percent better than conventional SRS ratio estimator of Isaki13 in all sample sizes. T3,T4 and T6 (RSS estimators) also performed 200 percent better than the SRS estimators. Moreover T6 & T5 (RSS estimators) showed better performance as sample size increased, whereas the performance of T3,T4 (RSS estimators) decreased as the sample size is increased and same is the behaviour of T7 & T8. The T2 is basically product estimator and its performance depends upon the negtive correlation, that is why the RSS estimator is not a better choice than SRS estimator where correlation is negative. Uniquelly T6 performed better when set size and cycle size were equal.
Figure 1.

Above are the PRE of estimators with respect to different sample sizes, whereas T1, T3, T4, T5, T6, T7 and T8 are presented in red, yellow, green, aqua blue, light blue, purple and pink colors, respectively.
From this simulation study, we can conclude that all RSS estimators have greater percent relative efficiency then the SRS estimators. The red line shows that (ratio estimator) has a greater relative efficiency. The product estimator cannot be simulated as the and are highly negative correlated variables taken from normal population (Fig 1). For the further evaluation of properties of the suggested estimators the simulation study is conducted on lower correlation in Table 4 given below:
Table 4.
Percent relative efficiencies (PREs) of estimators at rho = 0.51.
| PREs | |||||
|---|---|---|---|---|---|
| Sample size (SRS &RSS) | 9 (n = 3, m = 3) |
12 (n = 3, m = 4) |
16 (n = 4, m = 4) |
20 (n = 4, m = 5) |
25 (n = 5, m = 5) |
| T1 | 1.04432 | 1.47021 | 1.5327 | 1.5351 | 1.8300 |
| T2 | 0.61561 | 0.64721 | 0.7642 | 0.7336 | 0.7963 |
| T3 | 349.984 | 335.604 | 363.683 | 528.781 | 489.1021 |
| T4 | 1668.446 | 3682.987 | 1909,009 | 1543.30 | 862.6497 |
| T5 | 40,140 | 38,431 | 31,178 | 39,007 | 37,689 |
| T6 | 115.3035 | 110.164 | 107.438 | 126.624 | 134.3908 |
| T7 | 1347.62 | 1097.36 | 1364.610 | 1388.869 | 1341.826 |
| T8 | 0.3898 | 1.51498 | 7.30963 | 2.105681 | 3.09583 |
Significant values are in bold.
As shown in Table 4, the T1—proposed RSS estimator performed better than conventional ratio estimator of SRS. The T2 is a product estimator and its performance depends upon the negative correlation that’s why the RSS estimator is not better then SRS estimator when correlation is negative. Overall, we can see that most of the RSS estimators outperformed corresponding to SRS estimators. In particular, the estimator T5 is the best estimator as it has highest relative efficiency.
Discussion
In this study, ratio and generalized class of estimators for population variance are suggested under RSS design utilizing one auxiliary variable. The mean square error of the estimator for population variance for RSS are obtained. We have considered real population as well as simulation data for comparison of proposed estimators with SRS design. It can be clearly observed from Table 2. that the mean square error of RSS design is giving minimum values than the simple random designs in real life population and the percent relative efficiencies of the estimators are shown in Tables 3 and 4 and are greater than the SRS design. The PREs of estimators are calculated through simulated data and using different samples. The estimator , which is a ratio estimator, provides higher values in percent relative efficiencies for all the sample size as can be seen in Tables 3 and 4. The estimator is the second-best estimator with respect to percent relative efficiencies as can be seen from the Graph-I. It is shown that the ratio estimator provides higher percent relative efficiency than the other estimators because in the generalized class of estimators when the constants provides negative value, then the behavior of ratio estimator changes to the product estimator. This will also affect the efficiency of the estimator when the population is highly positively correlated. Overall, it proved that RSS design estimators are more efficient in small-size sampling studies.
Conclusion
The main purpose of this study is to propose a generalized class of estimators for population variance in RSS utilizing one auxiliary variable and comparing its efficiency with the corresponding estimators in SRS design. We have found that our RSS estimator is practically best estimator in situations where the study variable is costly, destructive and hard to achieve. We can achieve greater efficiency in small sample size based studies like biological sciences, medical experimental researches, environmental sciences and in engineering using the estimators proposed in this study.
Author contributions
R.A.: Conception, Acquisition of data, data analysis, Writing -Original draft, R-language programming. M.H.: Conception and design, research methodology, supervising S.H.S.: Data analysis and interpretation, result compliation M.Q.S.: review-editing draft, supervising in R-lauguage programming, validation.
Data availability
The authors confirm that the data supporting the findings of this study are available within the article and the programming files will available on request. Additional information/query related to this paper may be requested from the corresponding authors: Rabail Alam (rabail.alam@yahoo.com, raabail.alam@imbb.uol.edu.pk).
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.McIntyre GA. A method for unbiased selective sampling, using ranked sets. Aust. J. Agric. Res. 1952;3:385–390. doi: 10.1071/AR9520385. [DOI] [Google Scholar]
- 2.Stokes, S. L. Estimation of variance using judgment ordered ranked set samples. Biometrics, 35–42 (1980).
- 3.MacEachern SN, Öztürk Ö, Wolfe DA, Stark GV. A new ranked set sample estimator of variance. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 2002;64:177–188. doi: 10.1111/1467-9868.00331. [DOI] [Google Scholar]
- 4.Zamanzade E, Mahdizadeh M. Using ranked set sampling with extreme ranks in estimating the population proportion. Stat. Methods Med. Res. 2020;29:165–177. doi: 10.1177/0962280218823793. [DOI] [PubMed] [Google Scholar]
- 5.Tillé Y. Sampling and Estimation from Finite Populations. Wiley; 2020. [Google Scholar]
- 6.Long C, Chen W, Yang R, Yao D. Ratio estimation of the population mean using auxiliary information under the optimal sampling design. Probab. Eng. Inf. Sci. 2022;36(2):449–460. doi: 10.1017/S0269964820000625. [DOI] [Google Scholar]
- 7.Latpate, R., Kshirsagar, J., Gupta, V. K., & Chandra, G. Balanced and unbalanced ranked set sampling. In Advanced Sampling Methods 257–274. Springer (2021).
- 8.Mahdizadeh M, Zamanzade E. Reliability estimation in multistage ranked set sampling. REVSTAT Stat. J. 2017;15(4):565–581. [Google Scholar]
- 9.Mahdizadeh M, Arghami NR. Quantile estimation using ranked set samples from a population with known mean. Commun. Stat. Simul. Comput. 2012;41(10):1872–1881. doi: 10.1080/03610918.2011.624236. [DOI] [Google Scholar]
- 10.Al-hadhrami SA. Estimation of the population variance using ranked set sampling with auxiliary variable. Int. J. Contemp. Math. Sci. 2010;52:2567–2576. [Google Scholar]
- 11.Das AK, Tripathi TP. Use of auxiliary information in estimating the finite population variance. Sankhya C. 1978;40:139–148. [Google Scholar]
- 12.Singh HP, Solanki RS. A new procedure for variance estimation in simple random sampling using auxiliary information. Stat. Pap. 2013;54:479–497. doi: 10.1007/s00362-012-0445-2. [DOI] [Google Scholar]
- 13.Isaki CT. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983;78:117–123. doi: 10.1080/01621459.1983.10477939. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The authors confirm that the data supporting the findings of this study are available within the article and the programming files will available on request. Additional information/query related to this paper may be requested from the corresponding authors: Rabail Alam (rabail.alam@yahoo.com, raabail.alam@imbb.uol.edu.pk).
