Estimation of population variance under ranked set sampling method by using the ratio of supplementary information with study variable

Rabail Alam; Muhammad Hanif; Saman Hanif Shahbaz; Muhammad Qaiser Shahbaz

doi:10.1038/s41598-022-24296-1

. 2022 Dec 8;12:21203. doi: 10.1038/s41598-022-24296-1

Estimation of population variance under ranked set sampling method by using the ratio of supplementary information with study variable

Rabail Alam ^1,^3,^✉, Muhammad Hanif ¹, Saman Hanif Shahbaz ², Muhammad Qaiser Shahbaz ²

PMCID: PMC9732350 PMID: 36481847

Abstract

In biological and medical research, the cost and collateral damage caused during the collection and measurement of a sample are the reasons behind a compromise on the inference with a fixed and accepted approximation error. The ranked set sampling (RSS) performs better in such scenarios, and the use of auxiliary information even enhances the performance of the estimators. In this study, two generalized classes of estimators are proposed to estimate the population variance using RSS and information of auxiliary variable. The bias and mean square errors of the proposed classes of estimators are derived up to first order of approximation. Some special cases of one of the proposed class of estimators are also considered in the presence of available population parameters. A simulation study was conducted to see the performance of the members of the proposed family by using various sample sizes. The real-life data application is done to estimate the variance of gestational age of fetuses with supplementary information. The results showed that RSS design is a more accurate method than simple random sampling, to determine the population variance of hard-to-measure or destructive sampling units.

Subject terms: Mathematics and computing, Applied mathematics, Statistics

Introduction

In many scientific fields; such as medicine, agriculture and environmental studies; various sampling methods are used to collect the data for inferences. During research studies, many environmental and biological constraints disturb the data collection procedure such as sample size, cost per sample, and destructible sample units of the study variable. These constraints highly affect the statistical analysis and inference of the study. However, ranked set sampling (RSS) design can perform better in such scenarios. McIntyre introduced the RSS technique where he applied the scheme for average yield estimation of pasture to reduce the sampling cost¹. Later on, Stokes suggested a classical estimator for population variance in RSS with the concept of ranking error². An unbiased estimator of the variance of a population under a ranked set sample is developed and is proved better than the Stokes estimator, even in small samples ³. Another study was conducted to evaluate the estimation of population proportion under RSS and its respective variations⁴. The efficiency of the estimator increased when the supplementary information is used alongside the study variable because there is an existence of a correlation between the estimating variables and auxiliary variables⁵.

In literature, extensive work is performed related to ratio estimation for the population mean using RSS. A study was conducted in which the ratio estimators were developed and compared in two different designs (simple RSS & Extreme RSS)⁶. The scheme of RSS received great attention of researchers, a recent study was published on balanced and unbalanced RSS⁷. The comprehensive work related to non parametric RSS is also available^8,9.

In literature, detailed work is available for estimation of population variance in SRS. A gap is found in literature regarding the availability of estimators for population variance under RSS. This study is a little effort to address this deficiency. We have proposed a class of generalized estimators for population variance under RSS. The mean square error and bias of the proposed class of estimators is derived up to the first degree of approximation. Several members of the proposed class are developed depending upon the availability of type of supplementary information such as mean, median, tri-mean, coefficient of variation, coefficient of correlation, coefficient of skewness, kurtosis and quartile deviation. A comparison of the mean square errors on real-life data in both sampling designs (SRS & RSS) is performed to evaluate the performance of these member estimators. Moreover, the relative efficiency of these estimators is calculated in a simulation study based upon an artificial population and various sample sizes for estimation of the population variance.

To estimate the population variance, consider a population of size N that is labelled as $E = (E_{1}, E_{2}, E_{3}, \dots, E_{N}),$ A sample of size $j = m n$ is drawn from $E \sim (Y, X)$ that has a bivariate normal distribution. The process of sampling consists of $n$ random samples, each of size $n$ that are drawn from the population and the elements of each nth set are ordered on the basis of auxiliary variable. The smallest observation is then measured from the first sample and the second smallest from the second sample. The process is continued in this manner until the largest observation has been measured from the nth set. This entire cycle is repeated mth time and the $x_{(r) i}$ sample unit is drawn from the rth order of nth set, out of the ith cycle. Let $y_{[r] i}$ and $x_{(r) i}$ be the value of the study variable and the auxiliary variable $(X, Y)$ respectively, where ‘ith’ value occurred in the ‘mth’ cycle, as $i = (1, \dots, m)$ and the ‘r’ is the ordered value ranked based on auxiliary variable in ‘nth’ sets, as $r = (1, \dots, n)$ . Both samples are drawn using RSS methodology where the study variable is ranked based on an auxiliary variable. The overall averages and the variances of the ranked set sample are ${\hat{μ}}_{x} = \frac{1}{mn} \sum_{i = 1}^{m} \sum_{r = 1}^{n} x_{(r) i}$ , ${\hat{μ}}_{y} = \frac{1}{mn} \sum_{i = 1}^{m} \sum_{r = 1}^{n} y_{[r] i}$ , $σ_{x}^{2} = \frac{1}{m n - 1} \sum_{i = 1}^{m} \sum_{r = 1}^{n} {(x_{(r) i} - {\hat{μ}}_{x})}^{2}$ and $σ_{y}^{2} = \frac{1}{m n - 1} \sum_{i = 1}^{m} \sum_{r = 1}^{n} {(y_{[r] i} - {\hat{μ}}_{y})}^{2}$ respectively. The ordered means and variances are $μ_{(r) y} = \frac{1}{N} \sum_{r = 1}^{n} y_{[r]}$ , $μ_{(r) x} = \frac{1}{n} \sum_{r = 1}^{n} x_{[r]}$ , $E {(y_{(r) i} - μ_{(r) y})}^{2} = σ_{(r) y}^{2}$ and $E {(x_{(r) i} - μ_{(r) x})}^{2} = σ_{(r) x}^{2}$ respectively. The other ordered measures used in this article are $E {(y_{(r) i} - μ_{y})}^{2} {(x_{(r) i} - μ_{x})}^{2} = σ_{xy}^{2}$ , $\sum_{r = 1}^{n} τ_{x (r)}^{2} = \sum_{r = 1}^{n} {(μ_{x (r)} - μ_{x})}^{2}$ , $\sum_{r = 1}^{n} τ_{y (r)}^{2} = \sum_{r = 1}^{n} {(μ_{y (r)} - μ_{y})}^{2}$ and $\sum_{r = 1}^{n} τ_{y (r)}^{2} τ_{x (r)}^{2} = \sum_{r = 1}^{n} {(μ_{y (r)} - μ_{y})}^{2} {(μ_{x (r)} - μ_{x})}^{2}$ . Suppose $e_{s_{y}^{2}} = \{(s_{y}^{2} - S_{y}^{2}) / S_{y}^{2}\}$ , $e_{s_{x}^{2}} = \{(s_{x}^{2} - S_{x}^{2}) / S_{x}^{2}\}$

$s_{y}^{2} = \{(1 + e_{s_{y}^{2}}) S_{y}^{2}\}$ and $s_{x}^{2} = \{(1 + e_{s_{x}^{2}}) S_{x}^{2}\}$ . The expectation of square error terms are $E {(e_{s_{y}^{2}})}^{2} = \sum_{r = 1}^{n} σ_{y (r)}^{4} (a + h) + b \sum_{r = 1}^{n} τ_{y (r)}^{2} σ_{y (r)}^{2} + c \sum_{r = 1}^{n} τ_{y (r)} μ_{3 y (r)} + d \sum \sum_{r < s} σ_{y (r)}^{2} σ_{y (r)}^{2} - σ_{y}^{4} = V_{y}$

E {(e_{s_{x}^{2}})}^{2} = \sum_{r = 1}^{n} σ_{x (r)}^{4} (a + h) + b \sum_{r = 1}^{n} τ_{x (r)}^{2} σ_{x (r)}^{2} + c \sum_{r = 1}^{n} τ_{x (r)} μ_{3 x (r)} + d \sum \sum_{r < s} σ_{x (r)}^{2} σ_{x (r)}^{2} - σ_{x}^{4} = U_{x}

$E (e_{s_{y}^{2}} e_{s_{x}^{2}}) = a \sum_{r = 1}^{n} σ_{xy}^{2} + a σ_{x}^{2} \sum_{r = 1}^{n} τ_{y (r)}^{2} + 4 a σ_{xy} \sum_{r = 1}^{n} τ_{x} τ_{y} - a σ_{y}^{2} \sum_{r = 1}^{n} τ_{x (r)}^{2} + a \sum_{r = 1}^{n} τ_{x (r)}^{2} τ_{y (r)}^{2} + b \sum_{r = 1}^{n} τ_{y (r)}^{2} τ_{x (r)}^{2} σ_{x y (r)} - σ_{x}^{2} σ_{y}^{2} = U_{x} V_{y}$ where $a = \frac{m}{{(mn)}^{2}}$ , $b = \frac{4 m}{{(m n - 1)}^{2}}$ , $c = \frac{4}{n (m n - 1)}$ , $h = \frac{m (n - 1)}{m n - 1}$ and $d = 2 m n \frac{(m^{2} n^{2} - 2 m n + 3)}{{(m n - 1)}^{2} {(mn)}^{2}}$ .

Stokes considered the errors in judgment and suggested an estimator for $σ^{2}$ ; which is asymptotically unbiased and more efficient than the usual SRS unbiased estimator for $σ^{2}$ ²

t_{o}^{2} = \frac{1}{m n - 1} \sum_{i = 1}^{m} \sum_{r = 1}^{n} {(X_{[r] i} - \hat{μ})}^{2}

where $\hat{μ} = \frac{1}{mn} \sum_{i} \sum_{r} X_{[r] i}$ .

The variance of $t_{o}^{2}$ is obtained by Stokes² as

var (t_{o}^{2}) = \frac{m}{{(m n - 1)}^{2}} \{\begin{matrix} {(\frac{m n - 1}{mn})}^{2} \sum_{r} μ_{4 [r]} + 4 \sum_{r} τ_{[r]}^{2} σ_{[r]}^{2} + 4 (\frac{m n - 1}{mn}) \sum_{r} τ_{[r]} μ_{3 [r]} \\ + \frac{4 m}{{(mn)}^{2}} \sum_{r < s} \sum σ_{[r]}^{2} σ_{[s]}^{2} + \frac{2 (m - 1) - {(m n - 1)}^{2}}{{(mn)}^{2}} \sum_{r} σ_{[r]}^{4} \end{matrix}\}

Hadhrami have proposed the ratio estimator for the population variance based on RSS¹⁰ as

t_{3}^{} = s_{RSSy}^{2} \frac{S_{RSSx}^{2}}{s_{RSSx}^{2}}

where $s_{RSSx}^{2} = t_{3}^{2}$ . The MSE and bias of the above estimator are

M S E (t_{3}) = var (s_{y}^{2}) - {(\frac{s_{y}^{2}}{s_{x}^{2}})}^{2} var (s_{x}^{2}) - 2 (\frac{s_{y}^{2}}{s_{x}^{2}}) cov (s_{x}^{2}, s_{y}^{2})

B i a s (t_{3}) = \frac{s_{y}^{2}}{{(s_{x}^{2})}^{2}} var (s_{x}^{2}) - (\frac{1}{s_{x}^{2}}) cov (s_{x}^{2}, s_{y}^{2}) .

Materials and method

In this study, it is assumed that both study variable (Y) and auxiliary variable (X) have a bivariate normal distribution with high positive correlation, say $ρ \geq 0.70$ . The ranking is done on the basis of auxiliary variable as it is easily and cheaply available. The variables $x$ and $y$ are both sampled by the RSS method¹. Here the $T = S_{y}^{2}$ estimator of population variance. The R-launguage has been used to conduct the simulation study of all the forms of estimators and to compute the relative efficiency.

Classical generalized ratio estimator

Motivated by the members of the class of estimators¹¹, we have developed a generalized ratio estimator for the finite population variance under RSS scheme as:

T_{1} = s_{y}^{2} {(\frac{S_{x}^{2}}{s_{x}^{2}})}^{α}

where $α$ can be $(+ 1, - 1)$ . If $α = 1$ then we have the ratio estimator of population variance from $(T_{1})$ and if $α = - 1$ then we have the product estimator of population variance $(T_{2})$ and when $α = 0$ then it is equal to the sample variance. After the simplification and taking expectations, we have following expressions for the bias and MSE of the proposed class of estimators $T_{1}$

E (T_{1} - S_{y}^{2}) = S_{y}^{2} [(E (e_{s_{y}^{2}}) - α E (e_{s_{x}^{2}}) + \frac{α (α - 1)}{2} E {(e_{s_{x}^{2}})}^{2} - α E (e_{s_{x}^{2}} e_{s_{y}^{2}})) - 1] .

The bias is

E (T_{1} - S_{y}^{2}) = S_{y}^{2} [\frac{α (α - 1)}{2} U_{x}^{2} - α U_{x} V_{y}] .

Applying expectations in Eq. (7), the mean square error is:

MSE (T_{1}) = S_{y}^{4} [V_{y}^{2} + α U_{x}^{2} - 2 α V_{y} U_{x}] .

Generalized class of estimators with auxiliary information

Motivated by Singh¹², we have proposed another generalized class of ratio estimators to estimate the finite population variance by utilizing single auxiliary information under RSS technique. The proposed estimator is:

T = κ_{1} s_{y}^{2} {\{\frac{c S_{x}^{2} - d s_{x}^{2}}{(c - d) S_{x}^{2}}\}}^{λ} + κ_{2} s_{y}^{2} {\{\frac{(a + b) S_{x}^{2}}{a S_{x}^{2} + b s_{x}^{2}}\}}^{δ},

where $(κ_{1}, κ_{2})$ and $(λ, δ)$ are the constants which take finite values and $(a, b, c, d)$ are function of known population parameters of auxiliary variable $X$ , such as $\bar{X}, C_{x}, β_{1} (x), β_{2} (x)$ and $ρ_{xy}$ . When values of $(κ_{1}, κ_{2}, a, b, c, d, λ, δ)$ are suitably chosen then several existing estimators can be obtained from proposed generalized class of estimators T. In addition to existing estimators, some new estimators are generated from proposed class of estimators $T_{i} = (3, 4, 5, 6, 7, 8)$ which are given in Table 1.

Table 1.

Some members of class of estimators.

Estimator	Values of constant
Estimator	$λ$	$δ$	a	b	c	d
$T_{3} = κ_{1} s_{y}^{2} \{\frac{S_{x}^{2} - β_{1 (x)} s_{x}^{2}}{(1 - β_{1 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(1 + T_{m}) S_{x}^{2}}{S_{x}^{2} + T_{m} s_{x}^{2}}\}$	1	1	1	$β_{1 (x)}$	1	$Tm$
$T_{4} = κ_{1} s_{y}^{2} \{\frac{S_{x}^{2} - β_{2 (x)} s_{x}^{2}}{(1 - β_{2 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(1 + C_{x}) S_{x}^{2}}{S_{x}^{2} + C_{x} s_{x}^{2}}\}$	1	1	1	$β_{2 (x)}$	1	$C_{x}$
$T_{5} = κ_{1} s_{y}^{2} \{\frac{ρ_{xy} S_{x}^{2} - β_{1 (x)} s_{x}^{2}}{(ρ_{xy} - β_{1 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(\bar{X} + \tilde{X}) S_{x}^{2}}{\bar{X} S_{x}^{2} + \tilde{X} s_{x}^{2}}\}$	1	1	$ρ_{xy}$	$β_{1 (x)}$	$\bar{X}$	$\tilde{X}$
$T_{6} = κ_{1} s_{y}^{2} \{\frac{ρ_{xy} S_{x}^{2} - β_{2 (x)} s_{x}^{2}}{(ρ_{xy} - β_{2 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(\bar{X} + Q d) S_{x}^{2}}{\bar{X} S_{x}^{2} + Q d s_{x}^{2}}\}$	1	1	$ρ_{xy}$	$β_{2 (x)}$	$\bar{X}$	$Qd$
$T_{7} = κ_{1} s_{y}^{2} \{\frac{ρ_{xy} S_{x}^{2} - β_{1 (x)} s_{x}^{2}}{(ρ_{xy} - β_{1 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(T m + \tilde{X}) S_{x}^{2}}{T m S_{x}^{2} + \tilde{X} s_{x}^{2}}\}$	1	1	$ρ_{xy}$	$β_{1 (x)}$	$Tm$	$\tilde{X}$
$T_{8} = κ_{1} s_{y}^{2} \{\frac{ρ_{xy} S_{x}^{2} - β_{2 (x)} s_{x}^{2}}{(ρ_{xy} - β_{2 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(T m + Q d) S_{x}^{2}}{T m S_{x}^{2} + Q d s_{x}^{2}}\}$	1	1	$ρ_{xy}$	$β_{2 (x)}$	$Tm$	$Qd$

Open in a new tab

Using error term notations in Eq. (10), we get

T = κ_{1} S_{y}^{2} (1 + e_{s_{y}^{2}}) {(1 - η_{2} e_{s_{x}^{2}})}^{λ} + κ_{2} S_{y}^{2} (1 + e_{s_{y}^{2}}) {(1 + η_{1} e_{s_{x}^{2}})}^{δ}

where $ψ_{2} = \frac{d}{(c - d)}$ , $ψ_{1} = \frac{b}{(a + b)}$ . Taking expectation and after simplification we have

E (T - S_{y}^{2}) = S_{y}^{2} [\begin{matrix} κ_{1} \{1 + E (e_{s_{y}^{2}}) - ψ_{2} λ E (e_{s_{x}^{2}}) + \frac{λ (λ - 1)}{2} ψ_{2}^{2} E {(e_{s_{x}^{2}})}^{2} - ψ_{2} λ E (e_{s_{x}^{2}} e_{s_{y}^{2}})\} \\ + κ_{2} (1 + E (e_{s_{y}^{2}}) - ψ_{1} δ E (e_{s_{x}^{2}}) + \frac{δ (δ + 1)}{2} ψ_{1}^{2} E {(e_{s_{x}^{2}})}^{2} - ψ_{1} δ E (e_{s_{x}^{2}} e_{s_{y}^{2}})) - 1 \end{matrix}]

The bias is obtained by using error notations terms from section-I (“Introduction” section) and is

B (T) = S_{y}^{2} [κ_{1} \{1 + \frac{λ (λ - 1)}{2} ψ_{2}^{2} U_{x} - ψ_{2} λ U_{x} V_{y}\} + κ_{2} (1 + \frac{δ (δ + 1)}{2} ψ_{1}^{2} U_{x} - ψ_{1} δ U_{x} V_{y}) - 1] .

Following expression of MSE is obtained after taking square and expectation of Eq. (12) and ignoring the higher order terms as

E {(T - S_{y}^{2})}^{2} = S_{y}^{4} + S_{y}^{4} [\begin{matrix} κ_{1}^{2} \{E {(e_{s_{y}^{2}})}^{2} - ψ_{2}^{2} λ^{2} E {(e_{s_{x}^{2}})}^{2} - ψ_{2} λ E (e_{s_{x}^{2}} e_{s_{y}^{2}})\} \\ + κ_{2}^{2} \{E {(e_{s_{y}^{2}})}^{2} - ψ_{1}^{2} δ^{2} E {(e_{s_{x}^{2}})}^{2} - ψ_{1} δ E (e_{s_{x}^{2}} e_{s_{y}^{2}})\} \\ + 2 κ_{1} κ_{_{2}} \{E {(e_{s_{y}^{2}})}^{2} - φ (e_{s_{x}^{2}}) - 2 φ E (e_{s_{x}^{2}} e_{s_{y}^{2}}) + ψ_{1} ψ_{2} λ δ E {(e_{s_{x}^{2}})}^{2}\} \\ - 2 κ_{1} \{E (e_{s_{y}^{2}}) - ψ_{2}^{2} λ^{2} E (e_{s_{x}^{2}}) - ψ_{2} λ E (e_{s_{x}^{2}} e_{s_{y}^{2}})\} \\ - 2 κ_{2} \{E (e_{s_{y}^{2}}) - ψ_{1}^{2} δ^{2} E (e_{s_{x}^{2}}) - ψ_{1} δ E (e_{s_{x}^{2}} e_{s_{y}^{2}})\} \end{matrix}]

where $φ = (ψ_{2} λ - ψ_{1} δ)$ . Using notation given in section-I (Introduction) the expression of mean square error is

MSE (T) = S_{y}^{4} [1 + (κ_{1}^{2} A + κ_{2}^{2} B + 2 κ_{1} κ_{_{2}} C) - 2 (κ_{1} D + κ_{2} E)],

where ${A = V}_{y}^{2} + λ^{2} ψ_{2}^{2} U_{x}^{2} - 2 ψ_{2} λ U_{x} V_{y}$ , ${B = V}_{y}^{2} + δ^{2} ψ_{1}^{2} U_{x}^{2} - 2 ψ_{1} δ U_{x} V_{y}$ ${C = V}_{y}^{2} - 2 φ U_{x} V_{y} + ψ_{1} ψ_{2} λ δ U_{x}^{2}$ .

$D = ψ_{2} λ U_{x} V_{y}$ , and $E = ψ_{1} δ U_{x} V_{y}$ .

Differentiating Eq. (15) with respect to $κ_{1}$ and $κ_{2}$ and equating to zero, the optimum values of $κ_{1}^{*}$ and $κ_{2}^{*}$ are, respectively, obtained as

κ_{1} = \frac{(B D - C E)}{(A B - C^{2})} = κ_{1}^{*}

and

κ_{2} = \frac{(A E - C D)}{(A B - C^{2})} = κ_{2}^{*} .

Using the above optimum values, the minimum MSE of generalized class of estimators T is

MSE (T) = S_{y}^{4} [1 + (κ_{1}^{2 *} A + κ_{2}^{2 *} B + 2 κ_{1}^{*} κ_{2}^{*} C) - 2 (κ_{1}^{*} D + κ_{2}^{*} E)] .

From above generalized class of ratio estimators many forms can be formed on the basis of availbilty of population parameters of suplementry information. Some members of this class are given in Table 1 above.

Results

In this section, the real-life data is used for empirical study to obtain mean square error for explaining the advantage of RSS estimators over simple random sampling (SRS). Next, the simulation study is presented with the percent relative efficiencies of various estimators.

Applications

The RSS has an advantage in biostatistics to provide greater efficacy in small sample sizes when the variable of interest is difficult to obtain, destructive and costly. A study on the “Assessment of gestational age and weight” was done by a student in 2014–2015 in which the accuracy of gestational age was assessed by Ultrasound. A total of 400 ultrasounds were performed on pregnant ladies. From this study we have taken two highly correlated variables X = femur length of fetus and Y = gestational age. We have, then calculated the mean square error of estimators by RSS and SRS procedure shown in Table 2. From a population size of 400, we have drawn one sample of size = 12 where set size (m = 3) and no. of cycles are (n = 4) from RSS and the other sample of size of 12 is drawn by using the SRS method. The measures obtained from the population are $μ_{x}$ = 6.1488, $μ_{y}$ = 31.990 $σ_{x}^{2}$ = 0.831, $σ_{y}^{2}$ = 17.025, $β_{1 (x)}$ = − 0.223, $β_{2 (x)}$ = − 0.649, $ρ_{xy}$ = 0.997, $T m_{x}$ = 6.175, $Q d_{x}$ = 0.75, $M e_{x}$ = 6.200, $σ_{(r) y}^{2}$ = 6.9146, $σ_{(r) x}^{2}$ = 0.1985.

Table 2.

Mean square error of different estimators.

Estimators	SRS MSE	RSS MSE
$T_{(1)} = s_{y}^{2} (\frac{S_{x}^{2}}{s_{x}^{2}})$	25,450.88	19,152.43
$T_{(2)} = s_{y}^{2} (\frac{s_{x}^{2}}{S_{x}^{2}})$	25,818.66	19,412.12
$T_{3} = κ_{1} s_{y}^{2} \{\frac{S_{x}^{2} - β_{1 (x)} s_{x}^{2}}{(1 - β_{1 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(1 + T_{m}) S_{x}^{2}}{S_{x}^{2} + T_{m} s_{x}^{2}}\}$	378.959	290.2143
$T_{4} = κ_{1} s_{y}^{2} \{\frac{S_{x}^{2} - β_{2 (x)} s_{x}^{2}}{(1 - β_{2 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(1 + C_{x}) S_{x}^{2}}{S_{x}^{2} + C_{x} s_{x}^{2}}\}$	960.973	204.0161
$T_{5} = κ_{1} s_{y}^{2} \{\frac{ρ_{xy} S_{x}^{2} - β_{1 (x)} s_{x}^{2}}{(ρ_{xy} - β_{1 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(\bar{X} + \tilde{X}) S_{x}^{2}}{\bar{X} S_{x}^{2} + \tilde{X} s_{x}^{2}}\}$	286.786	259.7252
$T_{6} = κ_{1} s_{y}^{2} \{\frac{ρ_{xy} S_{x}^{2} - β_{2 (x)} s_{x}^{2}}{(ρ_{xy} - β_{2 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(\bar{X} + Q d) S_{x}^{2}}{\bar{X} S_{x}^{2} + Q d s_{x}^{2}}\}$	807.0286	259.5786
$T_{7} = κ_{1} s_{y}^{2} \{\frac{ρ_{xy} S_{x}^{2} - β_{1 (x)} s_{x}^{2}}{(ρ_{xy} - β_{1 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(T m + \tilde{X}) S_{x}^{2}}{T m S_{x}^{2} + \tilde{X} s_{x}^{2}}\}$	287.4439	259.7266
$T_{8} = κ_{1} s_{y}^{2} \{\frac{ρ_{xy} S_{x}^{2} - β_{2 (x)} s_{x}^{2}}{(ρ_{xy} - β_{2 (x)}) S_{x}^{2}}\} + κ_{2} s_{y}^{2} \{\frac{(T m + Q d) S_{x}^{2}}{T m S_{x}^{2} + Q d s_{x}^{2}}\}$	264.5108	259.6234

Open in a new tab

Based on the Table 2, it is obvious that the mean square error of the RSS estimators has lower value than the SRS estimators. The estimator $T_{2}$ is a product estimator and its mean square error is near to each other in both sampling designs as the both variables have negative correlation in real-life data.

Simulation study

The performance of the proposed estimator is compared with the existing estimator based on simulation study. The simulation study is performed by generating random observation from a normal distribution. We generated artificial population of size N = 5000 on the auxiliary variable $X$ from a normal distribution with mean 10 and standard deviation 2. Using the auxiliary variable, the study variable $Y$ was generated by using the following linear equation

Y_{i} = 5 + 1.87 X_{i} + e_{i}

where $e_{i}$ is $N (0, 1) .$ After generating the random population artificially, both sampling techniques RSS & SRS are performed to draw two independent samples respectively and we have computed all the forms of the proposed generalized estimators in different sample sizes for comparison. The procedure is repeated for 10,000 times and using 10,000 values of each estimator, the variance of each estimator is calculated. The results are given in Table 3 below. The percent relative efficiency of estimator calculated from the simulated variance of estimators by RSS and SRS procedure. The behavior of simulated variances in RSS and SRS is shown by graph-I which contains relative efficiency at different sample sizes.

Table 3.

Percent relative efficiencies (PREs) of estimators at rho = 0.90.

Percentage relative efficiency
SRS sample size	9	12	16	20	25
RSS sample size	m = 3, n = 3	m = 3, n = 4	m = 4, n = 4	m = 4, n = 5	m = 5, n = 5
T1	2790.1020	2239.1620	2397.1550	2044.3430	2262.4710
T2	0.0323	0.3480	0.3819	0.4154	0.4776
T3	231.1386	260.9198	231.1386	192.9843	105.1537
T4	277.3733	181.7453	135.8969	130.7218	123.0522
T5	29.2569	28.1691	28.8956	27.8468	31.4923
T6	283.0801	293.5363	394.2184	422.9133	488.3382
T7	18.5723	5.2971	6.0491	5.2340	6.0502
T8	8.0509	5.3677	6.9015	5.5186	5.9868

Open in a new tab

Significant values are in bold.

As shown in table 3, T1 (proposed RSS estimator) performed more than 2000 percent better than conventional SRS ratio estimator of Isaki¹³ in all sample sizes. T3,T4 and T6 (RSS estimators) also performed 200 percent better than the SRS estimators. Moreover T6 & T5 (RSS estimators) showed better performance as sample size increased, whereas the performance of T3,T4 (RSS estimators) decreased as the sample size is increased and same is the behaviour of T7 & T8. The T2 is basically product estimator and its performance depends upon the negtive correlation, that is why the RSS estimator is not a better choice than SRS estimator where correlation is negative. Uniquelly T6 performed better when set size and cycle size were equal.

Above are the PRE of estimators with respect to different sample sizes, whereas T1, T3, T4, T5, T6, T7 and T8 are presented in red, yellow, green, aqua blue, light blue, purple and pink colors, respectively.

From this simulation study, we can conclude that all RSS estimators have greater percent relative efficiency then the SRS estimators. The red line shows that $T_{1}$ (ratio estimator) has a greater relative efficiency. The $T_{2}$ product estimator cannot be simulated as the $X$ and $Y$ are highly negative correlated variables taken from normal population (Fig 1). For the further evaluation of properties of the suggested estimators the simulation study is conducted on lower correlation in Table 4 given below:

Table 4.

Percent relative efficiencies (PREs) of estimators at rho = 0.51.

PREs
Sample size (SRS &RSS)	9 (n = 3, m = 3)	12 (n = 3, m = 4)	16 (n = 4, m = 4)	20 (n = 4, m = 5)	25 (n = 5, m = 5)
T1	1.04432	1.47021	1.5327	1.5351	1.8300
T2	0.61561	0.64721	0.7642	0.7336	0.7963
T3	349.984	335.604	363.683	528.781	489.1021
T4	1668.446	3682.987	1909,009	1543.30	862.6497
T5	40,140	38,431	31,178	39,007	37,689
T6	115.3035	110.164	107.438	126.624	134.3908
T7	1347.62	1097.36	1364.610	1388.869	1341.826
T8	0.3898	1.51498	7.30963	2.105681	3.09583

Open in a new tab

Significant values are in bold.

As shown in Table 4, the T1—proposed RSS estimator performed better than conventional ratio estimator of SRS. The T2 is a product estimator and its performance depends upon the negative correlation that’s why the RSS estimator is not better then SRS estimator when correlation is negative. Overall, we can see that most of the RSS estimators outperformed corresponding to SRS estimators. In particular, the estimator T5 is the best estimator as it has highest relative efficiency.

Discussion

In this study, ratio and generalized class of estimators for population variance are suggested under RSS design utilizing one auxiliary variable. The mean square error of the estimator for population variance for RSS are obtained. We have considered real population as well as simulation data for comparison of proposed estimators with SRS design. It can be clearly observed from Table 2. that the mean square error of RSS design is giving minimum values than the simple random designs in real life population and the percent relative efficiencies of the estimators are shown in Tables 3 and 4 and are greater than the SRS design. The PREs of estimators are calculated through simulated data and using different samples. The estimator $T_{1}$ , which is a ratio estimator, provides higher values in percent relative efficiencies for all the sample size as can be seen in Tables 3 and 4. The estimator $T_{6}$ is the second-best estimator with respect to percent relative efficiencies as can be seen from the Graph-I. It is shown that the ratio estimator provides higher percent relative efficiency than the other estimators because in the generalized class of estimators when the constants $(κ_{1}, κ_{2})$ provides negative value, then the behavior of ratio estimator changes to the product estimator. This will also affect the efficiency of the estimator when the population is highly positively correlated. Overall, it proved that RSS design estimators are more efficient in small-size sampling studies.

Conclusion

The main purpose of this study is to propose a generalized class of estimators for population variance in RSS utilizing one auxiliary variable and comparing its efficiency with the corresponding estimators in SRS design. We have found that our RSS estimator is practically best estimator in situations where the study variable is costly, destructive and hard to achieve. We can achieve greater efficiency in small sample size based studies like biological sciences, medical experimental researches, environmental sciences and in engineering using the estimators proposed in this study.

Author contributions

R.A.: Conception, Acquisition of data, data analysis, Writing -Original draft, R-language programming. M.H.: Conception and design, research methodology, supervising S.H.S.: Data analysis and interpretation, result compliation M.Q.S.: review-editing draft, supervising in R-lauguage programming, validation.

Data availability

The authors confirm that the data supporting the findings of this study are available within the article and the programming files will available on request. Additional information/query related to this paper may be requested from the corresponding authors: Rabail Alam (rabail.alam@yahoo.com, raabail.alam@imbb.uol.edu.pk).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.McIntyre GA. A method for unbiased selective sampling, using ranked sets. Aust. J. Agric. Res. 1952;3:385–390. doi: 10.1071/AR9520385. [DOI] [Google Scholar]
2.Stokes, S. L. Estimation of variance using judgment ordered ranked set samples. Biometrics, 35–42 (1980).
3.MacEachern SN, Öztürk Ö, Wolfe DA, Stark GV. A new ranked set sample estimator of variance. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 2002;64:177–188. doi: 10.1111/1467-9868.00331. [DOI] [Google Scholar]
4.Zamanzade E, Mahdizadeh M. Using ranked set sampling with extreme ranks in estimating the population proportion. Stat. Methods Med. Res. 2020;29:165–177. doi: 10.1177/0962280218823793. [DOI] [PubMed] [Google Scholar]
5.Tillé Y. Sampling and Estimation from Finite Populations. Wiley; 2020. [Google Scholar]
6.Long C, Chen W, Yang R, Yao D. Ratio estimation of the population mean using auxiliary information under the optimal sampling design. Probab. Eng. Inf. Sci. 2022;36(2):449–460. doi: 10.1017/S0269964820000625. [DOI] [Google Scholar]
7.Latpate, R., Kshirsagar, J., Gupta, V. K., & Chandra, G. Balanced and unbalanced ranked set sampling. In Advanced Sampling Methods 257–274. Springer (2021).
8.Mahdizadeh M, Zamanzade E. Reliability estimation in multistage ranked set sampling. REVSTAT Stat. J. 2017;15(4):565–581. [Google Scholar]
9.Mahdizadeh M, Arghami NR. Quantile estimation using ranked set samples from a population with known mean. Commun. Stat. Simul. Comput. 2012;41(10):1872–1881. doi: 10.1080/03610918.2011.624236. [DOI] [Google Scholar]
10.Al-hadhrami SA. Estimation of the population variance using ranked set sampling with auxiliary variable. Int. J. Contemp. Math. Sci. 2010;52:2567–2576. [Google Scholar]
11.Das AK, Tripathi TP. Use of auxiliary information in estimating the finite population variance. Sankhya C. 1978;40:139–148. [Google Scholar]
12.Singh HP, Solanki RS. A new procedure for variance estimation in simple random sampling using auxiliary information. Stat. Pap. 2013;54:479–497. doi: 10.1007/s00362-012-0445-2. [DOI] [Google Scholar]
13.Isaki CT. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983;78:117–123. doi: 10.1080/01621459.1983.10477939. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.McIntyre GA. A method for unbiased selective sampling, using ranked sets. Aust. J. Agric. Res. 1952;3:385–390. doi: 10.1071/AR9520385. [DOI] [Google Scholar]

[CR2] 2.Stokes, S. L. Estimation of variance using judgment ordered ranked set samples. Biometrics, 35–42 (1980).

[CR3] 3.MacEachern SN, Öztürk Ö, Wolfe DA, Stark GV. A new ranked set sample estimator of variance. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 2002;64:177–188. doi: 10.1111/1467-9868.00331. [DOI] [Google Scholar]

[CR4] 4.Zamanzade E, Mahdizadeh M. Using ranked set sampling with extreme ranks in estimating the population proportion. Stat. Methods Med. Res. 2020;29:165–177. doi: 10.1177/0962280218823793. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Tillé Y. Sampling and Estimation from Finite Populations. Wiley; 2020. [Google Scholar]

[CR6] 6.Long C, Chen W, Yang R, Yao D. Ratio estimation of the population mean using auxiliary information under the optimal sampling design. Probab. Eng. Inf. Sci. 2022;36(2):449–460. doi: 10.1017/S0269964820000625. [DOI] [Google Scholar]

[CR7] 7.Latpate, R., Kshirsagar, J., Gupta, V. K., & Chandra, G. Balanced and unbalanced ranked set sampling. In Advanced Sampling Methods 257–274. Springer (2021).

[CR8] 8.Mahdizadeh M, Zamanzade E. Reliability estimation in multistage ranked set sampling. REVSTAT Stat. J. 2017;15(4):565–581. [Google Scholar]

[CR9] 9.Mahdizadeh M, Arghami NR. Quantile estimation using ranked set samples from a population with known mean. Commun. Stat. Simul. Comput. 2012;41(10):1872–1881. doi: 10.1080/03610918.2011.624236. [DOI] [Google Scholar]

[CR10] 10.Al-hadhrami SA. Estimation of the population variance using ranked set sampling with auxiliary variable. Int. J. Contemp. Math. Sci. 2010;52:2567–2576. [Google Scholar]

[CR11] 11.Das AK, Tripathi TP. Use of auxiliary information in estimating the finite population variance. Sankhya C. 1978;40:139–148. [Google Scholar]

[CR12] 12.Singh HP, Solanki RS. A new procedure for variance estimation in simple random sampling using auxiliary information. Stat. Pap. 2013;54:479–497. doi: 10.1007/s00362-012-0445-2. [DOI] [Google Scholar]

[CR13] 13.Isaki CT. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983;78:117–123. doi: 10.1080/01621459.1983.10477939. [DOI] [Google Scholar]

PERMALINK

Estimation of population variance under ranked set sampling method by using the ratio of supplementary information with study variable

Rabail Alam

Muhammad Hanif

Saman Hanif Shahbaz

Muhammad Qaiser Shahbaz

Abstract

Introduction

Materials and method

Classical generalized ratio estimator

Generalized class of estimators with auxiliary information

Table 1.

Results

Applications

Table 2.

Simulation study

Table 3.

Figure 1.

Table 4.

Discussion

Conclusion

Author contributions

Data availability

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Estimation of population variance under ranked set sampling method by using the ratio of supplementary information with study variable

Rabail Alam

Muhammad Hanif

Saman Hanif Shahbaz

Muhammad Qaiser Shahbaz

Abstract

Introduction

Materials and method

Classical generalized ratio estimator

Generalized class of estimators with auxiliary information

Table 1.

Results

Applications

Table 2.

Simulation study

Table 3.

Figure 1.

Table 4.

Discussion

Conclusion

Author contributions

Data availability

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases