Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2022 Dec 8;12:21203. doi: 10.1038/s41598-022-24296-1

Estimation of population variance under ranked set sampling method by using the ratio of supplementary information with study variable

Rabail Alam 1,3,, Muhammad Hanif 1, Saman Hanif Shahbaz 2, Muhammad Qaiser Shahbaz 2
PMCID: PMC9732350  PMID: 36481847

Abstract

In biological and medical research, the cost and collateral damage caused during the collection and measurement of a sample are the reasons behind a compromise on the inference with a fixed and accepted approximation error. The ranked set sampling (RSS) performs better in such scenarios, and the use of auxiliary information even enhances the performance of the estimators. In this study, two generalized classes of estimators are proposed to estimate the population variance using RSS and information of auxiliary variable. The bias and mean square errors of the proposed classes of estimators are derived up to first order of approximation. Some special cases of one of the proposed class of estimators are also considered in the presence of available population parameters. A simulation study was conducted to see the performance of the members of the proposed family by using various sample sizes. The real-life data application is done to estimate the variance of gestational age of fetuses with supplementary information. The results showed that RSS design is a more accurate method than simple random sampling, to determine the population variance of hard-to-measure or destructive sampling units.

Subject terms: Mathematics and computing, Applied mathematics, Statistics

Introduction

In many scientific fields; such as medicine, agriculture and environmental studies; various sampling methods are used to collect the data for inferences. During research studies, many environmental and biological constraints disturb the data collection procedure such as sample size, cost per sample, and destructible sample units of the study variable. These constraints highly affect the statistical analysis and inference of the study. However, ranked set sampling (RSS) design can perform better in such scenarios. McIntyre introduced the RSS technique where he applied the scheme for average yield estimation of pasture to reduce the sampling cost1. Later on, Stokes suggested a classical estimator for population variance in RSS with the concept of ranking error2. An unbiased estimator of the variance of a population under a ranked set sample is developed and is proved better than the Stokes estimator, even in small samples 3. Another study was conducted to evaluate the estimation of population proportion under RSS and its respective variations4. The efficiency of the estimator increased when the supplementary information is used alongside the study variable because there is an existence of a correlation between the estimating variables and auxiliary variables5.

In literature, extensive work is performed related to ratio estimation for the population mean using RSS. A study was conducted in which the ratio estimators were developed and compared in two different designs (simple RSS & Extreme RSS)6. The scheme of RSS received great attention of researchers, a recent study was published on balanced and unbalanced RSS7. The comprehensive work related to non parametric RSS is also available8,9.

In literature, detailed work is available for estimation of population variance in SRS. A gap is found in literature regarding the availability of estimators for population variance under RSS. This study is a little effort to address this deficiency. We have proposed a class of generalized estimators for population variance under RSS. The mean square error and bias of the proposed class of estimators is derived up to the first degree of approximation. Several members of the proposed class are developed depending upon the availability of type of supplementary information such as mean, median, tri-mean, coefficient of variation, coefficient of correlation, coefficient of skewness, kurtosis and quartile deviation. A comparison of the mean square errors on real-life data in both sampling designs (SRS & RSS) is performed to evaluate the performance of these member estimators. Moreover, the relative efficiency of these estimators is calculated in a simulation study based upon an artificial population and various sample sizes for estimation of the population variance.

To estimate the population variance, consider a population of size N that is labelled as E=E1,E2,E3,,EN, A sample of size j=mn is drawn from EY,X that has a bivariate normal distribution. The process of sampling consists of n random samples, each of size n that are drawn from the population and the elements of each nth set are ordered on the basis of auxiliary variable. The smallest observation is then measured from the first sample and the second smallest from the second sample. The process is continued in this manner until the largest observation has been measured from the nth set. This entire cycle is repeated mth time and the xri sample unit is drawn from the rth order of nth set, out of the ith cycle. Let yri and xri be the value of the study variable and the auxiliary variable X,Y respectively, where ‘ith’ value occurred in the ‘mth’ cycle, as i=1,,m and the ‘r’ is the ordered value ranked based on auxiliary variable in ‘nth’ sets, as r=1,,n. Both samples are drawn using RSS methodology where the study variable is ranked based on an auxiliary variable. The overall averages and the variances of the ranked set sample are μ^x=1mni=1mr=1nxri, μ^y=1mni=1mr=1nyri, σx2=1mn-1i=1mr=1nxri-μ^x2 and σy2=1mn-1i=1mr=1nyri-μ^y2 respectively. The ordered means and variances are μry=1Nr=1nyr, μrx=1nr=1nxr, Eyri-μry2=σry2 and Exri-μrx2=σrx2 respectively. The other ordered measures used in this article are Eyri-μy2xri-μx2=σxy2, r=1nτxr2=r=1nμxr-μx2, r=1nτyr2=r=1nμyr-μy2 and r=1nτyr2τxr2=r=1nμyr-μy2μxr-μx2. Suppose esy2=sy2-Sy2/Sy2, esx2=sx2-Sx2/Sx2

sy2=1+esy2Sy2 and sx2=1+esx2Sx2. The expectation of square error terms are Eesy22=r=1nσyr4a+h+br=1nτyr2σyr2+cr=1nτyrμ3yr+dr<sσyr2σyr2-σy4=Vy

Eesx22=r=1nσxr4a+h+br=1nτxr2σxr2+cr=1nτxrμ3xr+dr<sσxr2σxr2-σx4=Ux

Eesy2esx2=ar=1nσxy2+aσx2r=1nτyr2+4aσxyr=1nτxτy-aσy2r=1nτxr2+ar=1nτxr2τyr2+br=1nτyr2τxr2σxy(r)-σx2σy2=UxVy where a=mmn2, b=4mmn-12, c=4nmn-1, h=mn-1mn-1 and d=2mnm2n2-2mn+3mn-12mn2.

Stokes considered the errors in judgment and suggested an estimator for σ2; which is asymptotically unbiased and more efficient than the usual SRS unbiased estimator for σ22

to2=1mn-1i=1mr=1nXri-μ^2 1

where μ^=1mnirXri.

The variance of to2 is obtained by Stokes2 as

varto2=mmn-12mn-1mn2rμ4r+4rτr2σr2+4mn-1mnrτrμ3r+4mmn2r<sσr2σs2+2(m-1)-(mn-1)2mn2rσr4 2

Hadhrami have proposed the ratio estimator for the population variance based on RSS10 as

t3=sRSSy2SRSSx2sRSSx2 3

where sRSSx2=t32. The MSE and bias of the above estimator are

MSE(t3)=varsy2-sy2sx22varsx2-2sy2sx2covsx2,sy2 4
Bias(t3)=sy2sx22varsx2-1sx2covsx2,sy2. 5

Materials and method

In this study, it is assumed that both study variable (Y) and auxiliary variable (X) have a bivariate normal distribution with high positive correlation, say ρ0.70. The ranking is done on the basis of auxiliary variable as it is easily and cheaply available. The variables x and y are both sampled by the RSS method1. Here the T=Sy2 estimator of population variance. The R-launguage has been used to conduct the simulation study of all the forms of estimators and to compute the relative efficiency.

Classical generalized ratio estimator

Motivated by the members of the class of estimators11, we have developed a generalized ratio estimator for the finite population variance under RSS scheme as:

T1=sy2Sx2sx2α 6

where α can be (+1,-1). If α=1 then we have the ratio estimator of population variance from T1 and if α=-1 then we have the product estimator of population variance T2 and when α=0 then it is equal to the sample variance. After the simplification and taking expectations, we have following expressions for the bias and MSE of the proposed class of estimators T1

ET1-Sy2=Sy2Eesy2-αEesx2+αα-12Eesx22-αEesx2esy2-1. 7

The bias is

ET1-Sy2=Sy2αα-12Ux2-αUxVy. 8

Applying expectations in Eq. (7), the mean square error is:

MSET1=Sy4Vy2+αUx2-2αVyUx. 9

Generalized class of estimators with auxiliary information

Motivated by Singh12, we have proposed another generalized class of ratio estimators to estimate the finite population variance by utilizing single auxiliary information under RSS technique. The proposed estimator is:

T =κ1sy2cSx2-dsx2c-dSx2λ+κ2sy2a+bSx2aSx2+bsx2δ, 10

where κ1,κ2 and λ,δ are the constants which take finite values and a,b,c,d are function of known population parameters of auxiliary variable X, such as X¯,Cx,β1x,β2x and ρxy. When values of κ1,κ2,a,b,c,d,λ,δ are suitably chosen then several existing estimators can be obtained from proposed generalized class of estimators T. In addition to existing estimators, some new estimators are generated from proposed class of estimators Ti=3,4,5,6,7,8 which are given in Table 1.

Table 1.

Some members of class of estimators.

Estimator Values of constant
λ δ a b c d
T3=κ1sy2Sx2-β1xsx21-β1xSx2+κ2sy21+TmSx2Sx2+Tmsx2 1 1 1 β1x 1 Tm
T4=κ1sy2Sx2-β2xsx21-β2xSx2+κ2sy21+CxSx2Sx2+Cxsx2 1 1 1 β2x 1 Cx
T5=κ1sy2ρxySx2-β1xsx2ρxy-β1xSx2+κ2sy2X¯+X~Sx2X¯Sx2+X~sx2 1 1 ρxy β1x X¯ X~
T6=κ1sy2ρxySx2-β2xsx2ρxy-β2xSx2+κ2sy2X¯+QdSx2X¯Sx2+Qdsx2 1 1 ρxy β2x X¯ Qd
T7=κ1sy2ρxySx2-β1xsx2ρxy-β1xSx2+κ2sy2Tm+X~Sx2TmSx2+X~sx2 1 1 ρxy β1x Tm X~
T8=κ1sy2ρxySx2-β2xsx2ρxy-β2xSx2+κ2sy2Tm+QdSx2TmSx2+Qdsx2 1 1 ρxy β2x Tm Qd

Using error term notations in Eq. (10), we get

T =κ1Sy21+esy21-η2esx2λ+κ2Sy21+esy21+η1esx2δ 11

where ψ2=dc-d, ψ1=ba+b. Taking expectation and after simplification we have

ET-Sy2=Sy2κ11+Eesy2-ψ2λEesx2+λλ-12ψ22Eesx22-ψ2λEesx2esy2+κ21+Eesy2-ψ1δEesx2+δδ+12ψ12Eesx22-ψ1δEesx2esy2-1 12

The bias is obtained by using error notations terms from section-I (“Introduction” section) and is

B(T)=Sy2κ11+λλ-12ψ22Ux-ψ2λUxVy+κ21+δδ+12ψ12Ux-ψ1δUxVy-1. 13

Following expression of MSE is obtained after taking square and expectation of Eq. (12) and  ignoring the higher order terms as

ET-Sy22=Sy4+Sy4κ12Eesy22-ψ22λ2Eesx22-ψ2λEesx2esy2+κ22Eesy22-ψ12δ2Eesx22-ψ1δEesx2esy2+2κ1κ2Eesy22-φesx2-2φEesx2esy2+ψ1ψ2λδEesx22-2κ1Eesy2-ψ22λ2Eesx2-ψ2λEesx2esy2-2κ2Eesy2-ψ12δ2Eesx2-ψ1δEesx2esy2 14

where φ=ψ2λ-ψ1δ. Using notation given in section-I (Introduction) the expression of mean square error is

MSET=Sy41+κ12A +κ22B +2κ1κ2C-2κ1D+κ2E, 15

where A = Vy2+λ2ψ22Ux2-2ψ2λUxVy, B = Vy2+δ2ψ12Ux2-2ψ1δUxVyC = Vy2-2φUxVy+ψ1ψ2λδUx2.

D =ψ2λUxVy, and E =ψ1δUxVy.

Differentiating Eq. (15) with respect to κ1 and κ2 and equating to zero, the optimum values of κ1 and κ2 are, respectively, obtained as

κ1=BD-CEAB-C2=κ1 16

and

κ2=AE-CDAB-C2=κ2. 17

Using the above optimum values, the minimum MSE of generalized class of estimators T is

MSET=Sy41+κ12A +κ22B +2κ1κ2C-2κ1D+κ2E. 18

From above generalized class of ratio estimators many forms can be formed on the basis of availbilty of population parameters of suplementry information. Some members of this class are given in Table 1 above.

Results

In this section, the real-life data is used for empirical study to obtain mean square error for explaining the advantage of RSS estimators over simple random sampling (SRS). Next, the simulation study is presented with the percent relative efficiencies of various estimators.

Applications

The RSS has an advantage in biostatistics to provide greater efficacy in small sample sizes when the variable of interest is difficult to obtain, destructive and costly. A study on the “Assessment of gestational age and weight” was done by a student in 2014–2015 in which the accuracy of gestational age was assessed by Ultrasound. A total of 400 ultrasounds were performed on pregnant ladies. From this study we have taken two highly correlated variables X = femur length of fetus and Y = gestational age. We have, then calculated the mean square error of estimators by RSS and SRS procedure shown in Table 2. From a population size of 400, we have drawn one sample of size = 12 where set size (m = 3) and no. of cycles are (n = 4) from RSS and the other sample of size of 12 is drawn by using the SRS method. The measures obtained from the population are μx = 6.1488, μy = 31.990 σx2 = 0.831, σy2 = 17.025, β1x =  − 0.223, β2x =  − 0.649, ρxy = 0.997, Tmx = 6.175, Qdx = 0.75, Mex = 6.200, σry2 = 6.9146, σrx2 = 0.1985.

Table 2.

Mean square error of different estimators.

Estimators SRS MSE RSS MSE
T1=sy2Sx2sx2 25,450.88 19,152.43
T2=sy2sx2Sx2 25,818.66 19,412.12
T3=κ1sy2Sx2-β1xsx21-β1xSx2+κ2sy21+TmSx2Sx2+Tmsx2 378.959 290.2143
T4=κ1sy2Sx2-β2xsx21-β2xSx2+κ2sy21+CxSx2Sx2+Cxsx2 960.973 204.0161
T5=κ1sy2ρxySx2-β1xsx2ρxy-β1xSx2+κ2sy2X¯+X~Sx2X¯Sx2+X~sx2 286.786 259.7252
T6=κ1sy2ρxySx2-β2xsx2ρxy-β2xSx2+κ2sy2X¯+QdSx2X¯Sx2+Qdsx2 807.0286 259.5786
T7=κ1sy2ρxySx2-β1xsx2ρxy-β1xSx2+κ2sy2Tm+X~Sx2TmSx2+X~sx2 287.4439 259.7266
T8=κ1sy2ρxySx2-β2xsx2ρxy-β2xSx2+κ2sy2Tm+QdSx2TmSx2+Qdsx2 264.5108 259.6234

Based on the Table 2, it is obvious that the mean square error of the RSS estimators has lower value than the SRS estimators. The estimator T2 is a product estimator and its mean square error is near to each other in both sampling designs as the both variables have negative correlation in real-life data.

Simulation study

The performance of the proposed estimator is compared with the existing estimator based on simulation study. The simulation study is performed by generating random observation from a normal distribution. We generated artificial population of size N = 5000 on the auxiliary variable X from a normal distribution with mean 10 and standard deviation 2. Using the auxiliary variable, the study variable Y was generated by using the following linear equation

Yi=5+1.87Xi+ei

where ei is N(0,1). After generating the random population artificially, both sampling techniques RSS & SRS are performed to draw two independent samples respectively and we have computed all the forms of the proposed generalized estimators in different sample sizes for comparison. The procedure is repeated for 10,000 times and using 10,000 values of each estimator, the variance of each estimator is calculated. The results are given in Table 3 below. The percent relative efficiency of estimator calculated from the simulated variance of estimators by RSS and SRS procedure. The behavior of simulated variances in RSS and SRS is shown by graph-I which contains  relative efficiency at different sample sizes.

Table 3.

Percent relative efficiencies (PREs) of estimators at rho = 0.90.

Percentage relative efficiency
SRS sample size 9 12 16 20 25
RSS sample size m = 3, n = 3 m = 3, n = 4 m = 4, n = 4 m = 4, n = 5 m = 5, n = 5
T1 2790.1020 2239.1620 2397.1550 2044.3430 2262.4710
T2 0.0323 0.3480 0.3819 0.4154 0.4776
T3 231.1386 260.9198 231.1386 192.9843 105.1537
T4 277.3733 181.7453 135.8969 130.7218 123.0522
T5 29.2569 28.1691 28.8956 27.8468 31.4923
T6 283.0801 293.5363 394.2184 422.9133 488.3382
T7 18.5723 5.2971 6.0491 5.2340 6.0502
T8 8.0509 5.3677 6.9015 5.5186 5.9868

Significant values are in bold.

As shown in table 3, T1 (proposed RSS estimator) performed more than 2000 percent better than conventional SRS ratio estimator of Isaki13 in all sample sizes. T3,T4 and T6 (RSS estimators) also performed 200 percent better than the SRS estimators. Moreover T6 & T5 (RSS estimators) showed better performance as sample size increased, whereas the performance of T3,T4 (RSS estimators) decreased as the sample size is increased and same is the behaviour of T7 & T8. The T2 is basically product estimator and its performance depends upon the negtive correlation, that is why the RSS estimator is not a better choice than SRS estimator where correlation is negative. Uniquelly T6 performed better when set size and cycle size were equal.

Figure 1.

Figure 1

Above are the PRE of estimators with respect to different sample sizes, whereas T1, T3, T4, T5, T6, T7 and T8 are presented in red, yellow, green, aqua blue, light blue, purple and pink colors, respectively.

From this simulation study, we can conclude that all RSS estimators have greater percent relative efficiency then the SRS estimators. The red line shows that T1 (ratio estimator) has a greater relative efficiency. The T2 product estimator cannot be simulated as the X and Y are highly negative correlated variables taken from normal population (Fig 1). For the further evaluation of properties of the suggested estimators the simulation study is conducted on lower correlation in Table 4 given below:

Table 4.

Percent relative efficiencies (PREs) of estimators at rho = 0.51.

PREs
Sample size (SRS &RSS) 9
(n = 3, m = 3)
12
(n = 3, m = 4)
16
(n = 4, m = 4)
20
(n = 4, m = 5)
25
(n = 5, m = 5)
T1 1.04432 1.47021 1.5327 1.5351 1.8300
T2 0.61561 0.64721 0.7642 0.7336 0.7963
T3 349.984 335.604 363.683 528.781 489.1021
T4 1668.446 3682.987 1909,009 1543.30 862.6497
T5 40,140 38,431 31,178 39,007 37,689
T6 115.3035 110.164 107.438 126.624 134.3908
T7 1347.62 1097.36 1364.610 1388.869 1341.826
T8 0.3898 1.51498 7.30963 2.105681 3.09583

Significant values are in bold.

As shown in Table 4, the T1—proposed RSS estimator performed better than conventional ratio estimator of SRS. The T2 is a product estimator and its performance depends upon the negative correlation that’s why the RSS estimator is not better then SRS estimator when correlation is negative. Overall, we can see that most of the RSS estimators outperformed corresponding to SRS estimators. In particular, the estimator T5 is the best estimator as it has highest relative efficiency.

Discussion

In this study, ratio and generalized class of estimators for population variance are suggested under RSS design utilizing one auxiliary variable. The mean square error of the estimator for population variance for RSS are obtained. We have considered real population as well as simulation data for comparison of proposed estimators with SRS design. It can be clearly observed from Table 2. that the mean square error of RSS design is giving minimum values than the simple random designs in real life population and the percent relative efficiencies of the estimators are shown in Tables 3 and 4 and are greater than the SRS design. The PREs of estimators are calculated through simulated data and using different samples. The estimator T1, which is a ratio estimator, provides higher values in percent relative efficiencies for all the sample size as can be seen in Tables 3 and 4. The estimator T6 is the second-best estimator with respect to percent relative efficiencies as can be seen from the Graph-I. It is shown that the ratio estimator provides higher percent relative efficiency than the other estimators because in the generalized class of estimators when the constants κ1,κ2 provides negative value, then the behavior of ratio estimator changes to the product estimator. This will also affect the efficiency of the estimator when the population is highly positively correlated. Overall, it proved that RSS design estimators are more efficient in small-size sampling studies.

Conclusion

The main purpose of this study is to propose a generalized class of estimators for population variance in RSS utilizing one auxiliary variable and comparing its efficiency with the corresponding estimators in SRS design. We have found that our RSS estimator is practically best estimator in situations where the study variable is costly, destructive and hard to achieve. We can achieve greater efficiency in small sample size based studies like biological sciences, medical experimental researches, environmental sciences and in engineering using the estimators proposed in this study.

Author contributions

R.A.: Conception, Acquisition of data, data analysis, Writing -Original draft, R-language programming. M.H.: Conception and design, research methodology, supervising S.H.S.: Data analysis and interpretation, result compliation M.Q.S.: review-editing draft, supervising in R-lauguage programming, validation.

Data availability

The authors confirm that the data supporting the findings of this study are available within the article and the programming files will available on request. Additional information/query related to this paper may be requested from the corresponding authors: Rabail Alam (rabail.alam@yahoo.com, raabail.alam@imbb.uol.edu.pk).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.McIntyre GA. A method for unbiased selective sampling, using ranked sets. Aust. J. Agric. Res. 1952;3:385–390. doi: 10.1071/AR9520385. [DOI] [Google Scholar]
  • 2.Stokes, S. L. Estimation of variance using judgment ordered ranked set samples. Biometrics, 35–42 (1980).
  • 3.MacEachern SN, Öztürk Ö, Wolfe DA, Stark GV. A new ranked set sample estimator of variance. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 2002;64:177–188. doi: 10.1111/1467-9868.00331. [DOI] [Google Scholar]
  • 4.Zamanzade E, Mahdizadeh M. Using ranked set sampling with extreme ranks in estimating the population proportion. Stat. Methods Med. Res. 2020;29:165–177. doi: 10.1177/0962280218823793. [DOI] [PubMed] [Google Scholar]
  • 5.Tillé Y. Sampling and Estimation from Finite Populations. Wiley; 2020. [Google Scholar]
  • 6.Long C, Chen W, Yang R, Yao D. Ratio estimation of the population mean using auxiliary information under the optimal sampling design. Probab. Eng. Inf. Sci. 2022;36(2):449–460. doi: 10.1017/S0269964820000625. [DOI] [Google Scholar]
  • 7.Latpate, R., Kshirsagar, J., Gupta, V. K., & Chandra, G. Balanced and unbalanced ranked set sampling. In Advanced Sampling Methods 257–274. Springer (2021).
  • 8.Mahdizadeh M, Zamanzade E. Reliability estimation in multistage ranked set sampling. REVSTAT Stat. J. 2017;15(4):565–581. [Google Scholar]
  • 9.Mahdizadeh M, Arghami NR. Quantile estimation using ranked set samples from a population with known mean. Commun. Stat. Simul. Comput. 2012;41(10):1872–1881. doi: 10.1080/03610918.2011.624236. [DOI] [Google Scholar]
  • 10.Al-hadhrami SA. Estimation of the population variance using ranked set sampling with auxiliary variable. Int. J. Contemp. Math. Sci. 2010;52:2567–2576. [Google Scholar]
  • 11.Das AK, Tripathi TP. Use of auxiliary information in estimating the finite population variance. Sankhya C. 1978;40:139–148. [Google Scholar]
  • 12.Singh HP, Solanki RS. A new procedure for variance estimation in simple random sampling using auxiliary information. Stat. Pap. 2013;54:479–497. doi: 10.1007/s00362-012-0445-2. [DOI] [Google Scholar]
  • 13.Isaki CT. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983;78:117–123. doi: 10.1080/01621459.1983.10477939. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The authors confirm that the data supporting the findings of this study are available within the article and the programming files will available on request. Additional information/query related to this paper may be requested from the corresponding authors: Rabail Alam (rabail.alam@yahoo.com, raabail.alam@imbb.uol.edu.pk).


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES