The New Sub-regression Type Estimator in Ranked Set Sampling

Eda Gizem Koçyiğit; Khalid Ul Islam Rather

doi:10.1007/s42519-023-00324-9

. 2023 Feb 28;17(2):27. doi: 10.1007/s42519-023-00324-9

The New Sub-regression Type Estimator in Ranked Set Sampling

Eda Gizem Koçyiğit ^1,^✉, Khalid Ul Islam Rather ²

PMCID: PMC9974047 PMID: 36875336

Abstract

In this study, a new sub-regression type estimator for ranked set sampling (RSS) is proposed based on the idea of a sub-ratio estimator given in Koçyiğit and Kadılar (Commun Stat Theory Methods 1–23, 2022). The proposed unbiased estimator's mean square error is obtained and compared theoretically with other estimators. The theoretical results have been supported by the different simulations and real-life data sets studies and have shown that the proposed estimator is more effective than the estimators in the literature. It is also seen that the number of repetitions in the RSS affected the effectiveness of the sub-estimators.

Keywords: Statistical sampling, Ranked set sampling, Mean estimation, Regression type estimator, Sub-ratio estimator

Introduction

In practice, the limited sample size due to the difficulty of accessing the study variable has made the RSS method popular recently. The sampling method, which was discovered by McIntyre [1] as an alternative method for simple random sampling (SRS) and on which numerous studies have been carried out until today, gives more effective results than SRS in many fields [2–9].

Recently, an alternative way to develop the mean estimator specifically to the RSS method has been proposed by Koçyiğit and Kadilar [10]. In the general RSS procedure, m random sets, each of size m, are chosen at random from the population with an equal probability and without replacement. Each unit in random sets is ranked according to the less sensitive auxiliary variable (X). The first-smallest unit is chosen from the first set, after which the second-smallest unit is chosen from the second set. This process is carried out until the m-th set's unit with the highest rank is chosen. As a result, m unit samples are selected and the estimation is made over this sample. At the end of this process, there are a randomly selected m² sample of X, and m-size RSS samples of Y and X. Ratio, product, and regression type estimators using auxiliary variable information need population parameters. It may not always be possible to reach these parameters. Sub-estimators provide using mathematical formulas of these estimators without any population information. For calculating proposed sub-estimators, the data of the m²X auxiliary variable is used as extra information in the estimator. Their study showed that the auxiliary variable which is ranking with measurement or visual ranking used in the estimator increases the efficiency of the estimators compared to the classical estimators. While the most important advantage of sub-estimators is that they can estimate more efficiently without population parameters than simple estimators, it can be shown as a disadvantage that their efficiency is not always better than estimators using an auxiliary variable.

In this study, we propose an efficient and alternate sub-regression type estimator, motivated by the idea of sub- ratio estimators and we present the necessary theoretical demonstrations as mean square error (MSE) and bias (B) then we make applications on the real data set with simulation. We also consider the number of repetitions. Since the number of sets (m) is generally recommended to be chosen as 3, 4, or 5 in the method, the required sample size cannot be reached with the set size alone. Consequently, a total of mr units have been measured during this process because this cycle may be repeated r times. In this study, we consider the different sizes of the cycle and observe the repetition effect on the estimators.

This study is structured as follows: Sect. 2 presents the estimators in the literature. In Sect. 3, we propose the sub-regression type estimator in RSS and this section contains also mathematical comparisons of estimators. Section 4 calculates the relative efficiencies of the estimators and compares these estimators in simulation and real data. Section 5 concludes and offers future studies.

Estimators in Literature

The fundamental unbiased mean estimator of the RSS:

{\hat{μ}}_{1} = \frac{1}{mr} \sum_{j = 1}^{r} \sum_{i = 1}^{m} y_{[i] ; j}

where $y_{[i] ; j}$ is the observation of j-th cycle and i-th set. The MSE of mean estimator of RSS is given as:

MSE ({\hat{μ}}_{1}) = {\bar{Y}}^{2} (γ C_{y}^{2} - ω_{y}^{2})

where γ = 1/mr, $C_{y}^{2}$ is the coefficient of variation of the study variable, $ω_{y (i)}^{2} = \frac{1}{m^{2} r {\bar{Y}}^{2}} \sum_{i = 1}^{m} τ_{y (i)}^{2}$ , $τ_{y (i)}^{2} = μ_{y (i)} - \bar{Y}$ shows that deviation of i-th ranked mean and $μ_{y (i)}$ is the mean of the i-th order statistics of the study variable.

Samawi and Muttlak [11] defined the basic ratio estimator of the RSS when auxiliary variable is X

{\hat{μ}}_{2} = \frac{{\bar{y}}_{[n]}}{{\bar{x}}_{[n]}} \bar{X}

where ${\bar{y}}_{[n]} = \frac{1}{mn} \sum_{i = 1}^{m} \sum_{j = 1}^{n} y_{[n] i j}$ and ${\bar{x}}_{[n]} = \frac{1}{mn} \sum_{i = 1}^{m} \sum_{j = 1}^{n} x_{[n] i j}$ . The MSE equation of this ratio estimator is as below

MSE ({\hat{μ}}_{2}) = {\bar{Y}}^{2} [γ C_{x}^{2} - ω_{x (i)}^{2} + γ C_{y}^{2} - ω_{y (i)}^{2} - 2 (γ C_{yx}^{} - ω_{y x (i)}^{})]

where $ω_{x (i)}^{2} = \frac{1}{m^{2} r {\bar{X}}^{2}} \sum_{i = 1}^{m} τ_{x (i)}^{2}$ , $ω_{y x (i)}^{} = \frac{1}{m^{2} r \bar{Y} \bar{X}} \sum_{i = 1}^{m} τ_{Y X (i)}^{}$ , and $τ_{x (i)}^{2} = μ_{x (i)} - \bar{X}$ , $τ_{y x (i)}^{} = [μ_{y (i)} - \bar{Y}] [μ_{x (i)} - \bar{X}]$ is the cross product of the deviation. Note that the value of $μ_{x (i)}$ is the mean of the i-th order statistics of the auxiliary variable just as $μ_{y (i)}$ and depends on the order statistics of some specific distributions.

Koçyiğit and Kadılar’s RSS sub-ratio estimator is defined as:

{\hat{μ}}_{3} = \frac{{\bar{y}}_{[n]}}{{\bar{x}}_{[n]}} {\bar{X}}_{SUB}

where ${\bar{X}}_{SUB} = \frac{1}{m^{2}} \sum_{i = 1}^{m} \sum_{j = 1}^{m} X_{i : j}$ and the MSE equation of the estimator

MSE ({\hat{μ}}_{3}) = {\bar{Y}}^{2} [γ C_{x_{SUB}}^{2} - ω_{x_{SUB} (i)}^{2} + γ C_{y}^{2} - ω_{y (i)}^{2} - 2 (γ C_{y x_{SUB}}^{} - ω_{y x_{SUB} (i)}^{})]

where $C_{x_{SUB}}^{2} = \frac{S_{x_{SUB}}^{2}}{{\bar{X}}_{SUB}^{2}}$ is the coefficient of variation of the X_SUB and $C_{y x_{SUB}}^{} = ρ C_{y}^{} C_{x_{SUB}}^{}$ and $ω_{x_{SUB} (i)}^{2} = \frac{1}{m^{2} r {\bar{X}}_{SUB}^{2}} \sum_{i = 1}^{m} τ_{x_{SUB} (i)}^{2}$ , $ω_{y x_{SUB} (i)}^{} = \frac{1}{m^{2} r \bar{Y} {\bar{X}}_{SUB}} \sum_{i = 1}^{m} τ_{y x_{SUB} (i)}^{2}$ .

Koçyiğit and Kadılar RSS sub-exponential ratio type estimator is given as

{\hat{μ}}_{4} = {\bar{y}}_{[n]} exp (\frac{{\bar{X}}_{SUB} - {\bar{x}}_{[n]}}{{\bar{X}}_{SUB} + {\bar{x}}_{[n]}})

The MSE of the sub-exponential ratio estimator is

MSE ({\hat{μ}}_{4}) = {\bar{Y}}^{2} [γ C_{x_{SUB}}^{2} - ω_{x_{SUB} (i)}^{2} + \frac{1}{4} (γ C_{y}^{2} - ω_{y (i)}^{2}) - (γ C_{y x_{SUB}}^{} - ω_{y x_{SUB} (i)}^{})]

Proposed Estimator

Motivated by Koçyiğit and Kadılar [10] and traditional regression type estimator, we proposed RSS sub-regression type estimator as

{\hat{μ}}_{5} = {\bar{y}}_{[n]} + b ({\bar{X}}_{SUB} - {\bar{x}}_{[n]})

where b = ρ S_y / S_x is the coefficient of slope, S_y and S_x are the standard deviation of the Y and X, respectively, and ρ is the correlation coefficient between X and Y. Note that, coefficient of slope should be estimate on sample as $\hat{b} = \frac{ρ s_{y}}{s_{x_{SUB}}}$ where s_y is the standard deviation of the RSS sample y, and $s_{x_{SUB}} = \sqrt{\frac{1}{m^{2} - 1} (\sum_{i = 1}^{m} \sum_{j = 1}^{m} X_{i : j} - {\bar{X}}_{SUB})}$ . This proposed estimator uses the mathematical formula of the regular regression type estimator, using only the information obtained from RSS, without the need for any population parameters.

To obtain bias and MSE equations of the proposed estimator, we write ${\bar{y}}_{[n]} = \bar{Y} (e_{0} + 1)$ and ${\bar{x}}_{[n]} = {\bar{X}}_{SUB} (e_{1} + 1)$ in Eq. (9) such that, $E (e_{0}) = E (e_{1}) = 0$ , $E (e_{0}^{2}) = γ C_{y}^{2} - ω_{y}^{2}$ , $E (e_{1}^{2}) = γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2}$ , and $E (e_{0} e_{1}) = γ C_{y x_{SUB}} - ω_{y x_{SUB}}$ . We obtain the sub-regression type estimator in linear form as:

{\hat{μ}}_{5} = \bar{Y} + \bar{Y} e_{0} - b {\bar{X}}_{SUB} e_{1}

If $\bar{Y}$ is subtracted and get the expected value from both sides in Eq. (10):

E ({\hat{μ}}_{5} - \bar{Y}) = \bar{Y} E (e_{0}) - b {\bar{X}}_{SUB} E (e_{1}) = 0

It shows that the proposed sub-regression type estimator is unbiased. When taking square of Eq. (11), we get MSE of the proposed estimator as

MSE ({\hat{μ}}_{5}) = R_{SUB}^{2} (γ C_{y}^{2} - ω_{y}^{2}) + b^{2} (γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2}) - 2 b R_{SUB} (γ C_{y x_{SUB}} - ω_{y x_{SUB}})

where $R_{SUB} = \frac{\bar{Y}}{{\bar{X}}_{SUB}}$ . To find the optimal value of slope coefficient b, if Eq. (12) is derived and set to zero,

b^{*} = R_{SUB} \frac{γ C_{y x_{SUB}} - ω_{y x_{SUB}}}{γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2}}

When we write Eq. (13) in Eq. (12), we can get the minimum MSE of the unbiased sub-regression type estimator as

{MSE}_{min} ({\hat{μ}}_{5}) = R_{SUB}^{2} [γ C_{y}^{2} - ω_{y}^{2} - \frac{{(γ C_{y x_{SUB}} - ω_{y x_{SUB}})}^{2}}{γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2}}]

Cümle ekle mse min ile karşılaştırmalar için

If $(γ C_{y}^{2} - ω_{y}^{2}) < \frac{{(γ C_{y x_{SUB}} - ω_{y x_{SUB}})}^{2}}{(1 - {\bar{X}}_{SUB}) (γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2})}, {MSE}_{min} ({\hat{μ}}_{5}) < MSE ({\hat{μ}}_{1}) (15)$
If $(γ C_{y}^{2} - ω_{y (i)}^{2}) - (γ C_{x}^{2} - ω_{x (i)}^{2}) + 2 (γ C_{yx}^{} - ω_{y x (i)}^{}) < \frac{{(γ C_{y x_{SUB}} - ω_{y x_{SUB}})}^{2}}{1 - {\bar{X}}_{SUB} (γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2})}$ ${MSE}_{min} ({\hat{μ}}_{5}) < MSE ({\hat{μ}}_{2}) (16)$
If $[\begin{matrix} (1 - {\bar{X}}_{SUB}) (γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2}) (γ C_{y}^{2} - ω_{y}^{2}) \\ - {(γ C_{y x_{SUB}} - ω_{y x_{SUB}})}^{2} \end{matrix}]$ ${\bar{X}}_{SUB} (γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2})$ $[\begin{matrix} (γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2}) \\ - 2 (γ C_{y x_{SUB}} - ω_{y x_{SUB} (i)}) \end{matrix}], {MSE}_{min} ({\hat{μ}}_{5}) < MSE ({\hat{μ}}_{3}) (17)$
If $[\begin{matrix} \frac{3}{4} {\bar{X}}_{SUB} (γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2}) (γ C_{y}^{2} - ω_{y}^{2}) \\ - {(γ C_{y x_{SUB}} - ω_{y x_{SUB}})}^{2} \end{matrix}] < {\bar{X}}_{SUB} (γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2})$ $[\begin{matrix} (γ C_{x_{SUB}}^{2} - ω_{x_{SUB}}^{2}) \\ - (γ C_{y x_{SUB}}^{} - ω_{y x_{SUB} (i)}^{}) \end{matrix}], {MSE}_{min} ({\hat{μ}}_{5}) < MSE ({\hat{μ}}_{4}) (18)$

If Eqs. (15) to (18) are met, proposed unbiased sub-regression type estimator is better than the mean estimator, basic ratio estimator, sub-ratio estimator and sub-exponential ratio estimator of the RSS, respectively.

Numerical Studies

In this section, the MSEs of the estimators are computed numerically by first taking samples from the various derived distributions, then different real datasets considered as the population, and taking samples from it. The R program was utilized in all calculations performed in this section.

Simulation Studies

Considering the skewness of the finite populations produced in the simulation study, populations were derived for N = 1000 from bivariate N(5,1) for a symmetrical distribution, bivariate Exp(1) for a right-skewed distribution, and bivariate Beta(4,1) for a left-skewed distribution. The correlation coefficient between the study and auxiliary variable ρ = 0.65, 0.75, 0.85, and 0.95 was taken. From these populations, samples were selected 10 000 times with a set size of m = 3, 4 and 5 and the number of repetitions as cycle r = 2, 3, 5 and 10, and the values of the estimators were calculated. The MSE values of the estimators were calculated via Eq. (19).

M S E ({\hat{μ}}_{ij}) = \frac{1}{10000} \sum_{j = 1}^{10000} {({\hat{μ}}_{ij} - \bar{Y})}^{2}, i = 1, 2, 3, 4, 5

For the calculation of relative efficiency (RE), the reference variable ${\hat{μ}}_{1}$ was taken and RE was calculated using the formula in Eq. (20).

{RE}_{k} = \frac{MSE ({\hat{μ}}_{1})}{MSE ({\hat{μ}}_{k})} ; k = 2, 3, 4, 5

The results obtained from the normal distribution are given in Table 1, the results from the beta distribution in Table 2, and the results from the exponential distribution in Table 3. The most effective estimators are marked with “*” in the tables.

Table 1.

RE values for N(5,1)

N(5,1)	ρ	0.65			0.75			0.85			0.95
N(5,1)	m	3	4	5	3	4	5	3	4	5	3	4	5
r = 2	RE₂	1.2627	1.2339	1.1923	1.6283	1.5185	1.4447	2.3104	2.0832	1.9136	5.7903	4.8826	4.3668
	RE₃	1.1029	1.1072	1.0971	1.1702	1.1758	1.1639	1.2617*	1.2740*	1.2748*	1.4404*	1.5045*	1.5444*
	RE₄	1.1184	1.1192	1.1098	1.1616	1.1656	1.1571	1.2147	1.2242	1.2250	1.3107	1.3549	1.3756
	RE₅	1.1204*	1.1225*	1.1226*	1.1771*	1.1824*	1.1818*	1.2341	1.2601	1.2633	1.4147	1.4932	1.5423
r = 3	RE₂	1.2608	1.2220	1.2058	1.6515	1.5255	1.4574	2.3084	2.0473	1.9120	5.7317	4.9326	4.3474
	RE₃	1.0983	1.1017	1.1027	1.1705	1.1711	1.1718	1.2607*	1.2737	1.2742*	1.4436*	1.5147*	1.5528
	RE₄	1.1158*	1.1159	1.1126	1.1606	1.1628	1.1601	1.2136	1.2246	1.2249	1.3122	1.3572	1.3805
	RE₅	1.1251*	1.1248*	1.1223*	1.1767*	1.1892*	1.1891*	1.2579	1.2739*	1.2704	1.4257	1.4965	1.5555*
r = 5	RE₂	1.2682	1.2263	1.2137	1.6354	1.5286	1.4544	2.3227	2.0847	1.9304	5.7676	4.8976	4.3012
	RE₃	1.1048	1.1024	1.1088	1.1763	1.1783	1.1755	1.2604	1.2753	1.2687	1.4352*	1.5161	1.5400
	RE₄	1.1197	1.1164	1.1153	1.1643	1.1663	1.1625	1.2135	1.2245	1.2206	1.3104	1.3580	1.3759
	RE₅	1.1266*	1.1283*	1.1264*	1.1771*	1.1861*	1.1856*	1.2628*	1.2856*	1.2782*	1.4406*	1.5177*	1.5622*
r = 10	RE₂	1.2716	1.2121	1.2041	1.6603	1.5307	1.4721	2.3000	2.0705	1.9197	5.8508	4.9097	4.3159
	RE₃	1.1097	1.1024	1.1025	1.1711	1.1754	1.1693	1.2588	1.2765	1.2705	1.4448*	1.5006	1.5543
	RE₄	1.1208	1.1119	1.1118	1.1612	1.1652	1.1590	1.2133	1.2261	1.2232	1.3134	1.3533	1.3853
	RE₅	1.1286*	1.1272*	1.1252*	1.1881*	1.1919*	1.1887*	1.2635*	1.2863*	1.2825*	1.4464*	1.5110*	1.5556*

Open in a new tab

Table 2.

RE values for Beta(4,1)

Beta(4,1)	ρ	0.65			0.75			0.85			0.95
Beta(4,1)	m	3	4	5	3	4	5	3	4	5	3	4	5
r = 2	RE₂	1.1781	1.1597	1.1396	1.4683	1.4016	1.3325	2.2908	2.0794	1.9323	5.8830	5.0555	4.5298
	RE₃	1.0530	1.0494	1.0486	1.1213	1.1243	1.1219	1.2751*	1.2897*	1.2945*	1.4857*	1.5496*	1.5935*
	RE₄	1.1045*	1.1014*	1.0958	1.1482*	1.1522*	1.1471	1.2318	1.2453	1.2475	1.3514	1.3950	1.4247
	RE₅	1.0951	1.1005	1.0979*	1.1392	1.1501	1.1531*	1.2419	1.2667	1.2759	1.3878	1.4603	1.5111
r = 3	RE₂	1.1994	1.1558	1.1321	1.4851	1.4076	1.3529	2.3121	2.1056	1.9269	5.9013	5.1002	4.5065
	RE₃	1.0577	1.0450	1.0457	1.1264	1.1213	1.1247	1.2819*	1.2927*	1.2881*	1.4851*	1.5527*	1.6124*
	RE₄	1.1053*	1.0980	1.0940	1.1507	1.1492	1.1474	1.2351	1.2451	1.2435	1.3518	1.3972	1.4318
	RE₅	1.0978	1.0989*	1.0965*	1.1533*	1.1550*	1.1510*	1.2527	1.2750	1.2864	1.4125	1.4944	1.5486
r = 5	RE₂	1.2103	1.1784	1.1422	1.4898	1.4121	1.3599	2.3172	2.0814	1.9366	5.9152	5.0786	4.5167
	RE₃	1.0555	1.0597	1.0463	1.1287	1.1274	1.1215	1.2719	1.2905	1.2931	1.4772*	1.5559*	1.5971*
	RE₄	1.1044	1.1048	1.0935	1.1524	1.1528	1.1458	1.2313	1.2449	1.2469	1.3484	1.3984	1.4261
	RE₅	1.1052*	1.1051*	1.0947*	1.1555*	1.1624*	1.1551*	1.2732*	1.2955*	1.3007*	1.4417	1.5097	1.5762
r = 10	RE₂	1.2039	1.1697	1.146	1.4489	1.3756	1.3229	2.3378	2.0967	1.9444	6.0444	5.1824	4.5355
	RE₃	1.0557	1.0543	1.0477	1.1323	1.1337	1.1268	1.2824*	1.3006	1.2907	1.4855*	1.5736*	1.6071*
	RE₄	1.1051	1.1026	1.0943	1.1523	1.1536	1.1464	1.2362	1.2509	1.2457	1.3504	1.4077	1.4303
	RE₅	1.1079*	1.1060*	1.0982*	1.1657*	1.1651*	1.1564*	1.2802	1.3035*	1.3055*	1.4746	1.5393	1.5733

Open in a new tab

Table 3.

RE values for Exp(1)

Exp(1)	ρ	0.65			0.75			0.85			0.95
Exp(1)	m	3	4	5	3	4	5	3	4	5	3	4	5
r = 2	RE₂	1.0786	1.1094	1.1129	1.2959	1.3110	1.2885	2.2484	2.0902	1.9593	5.9081	5.4249	5.0056
	RE₃	1.0153	1.0395	1.0448	1.0904	1.1030	1.1143	1.3036	1.3443	1.3454	1.6272*	1.7647*	1.8624*
	RE₄	1.1242	1.1309	1.1292	1.1894	1.1952	1.1977	1.2821	1.3088	1.3128	1.4392	1.5168	1.5679
	RE₅	1.1696*	1.1588*	1.1489*	1.2175*	1.2209*	1.2024*	1.4066*	1.4434*	1.4426*	1.5779	1.6271	1.6933
r = 3	RE₂	1.1372	1.1417	1.1434	1.3832	1.3538	1.3214	2.2862	2.0971	1.9837	6.1777	5.5767	5.1022
	RE₃	1.0392	1.0554	1.0615	1.1085	1.12056	1.1254	1.3025	1.3452	1.3542	1.6394*	1.7726*	1.8694*
	RE₄	1.1279	1.1351	1.1336	1.1899	1.1992	1.1997	1.2762	1.3072	1.3141	1.4381	1.5162	1.5653
	RE₅	1.1548*	1.1479*	1.1419*	1.2110*	1.2156*	1.2050*	1.4070*	1.4290*	1.4437*	1.5841	1.6719	1.7119
r = 5	RE₂	1.1818	1.1821	1.1658	1.4297	1.3926	1.3445	2.3543	2.1488	1.9978	6.4302	5.6945	5.1263
	RE₃	1.0612	1.0740	1.0734	1.1243	1.1368	1.1403	1.3241	1.3479	1.3542	1.6466*	1.7763*	1.8658*
	RE₄	1.1329	1.1403	1.1369	1.1936	1.2064	1.2039	1.2822	1.3069	1.3105	1.4373	1.5131	1.5609
	RE₅	1.1516*	1.1476*	1.1393*	1.2031*	1.2160*	1.2079*	1.3946*	1.4103*	1.4423*	1.5865	1.6956	1.7738
r = 10	RE₂	1.2266	1.2099	1.1852	1.5469	1.4742	1.3962	2.3719	2.1293	2.1341	6.4872	5.7567	5.2545
	RE₃	1.0791	1.086	1.0883	1.1679	1.1746	1.1571	1.3184	1.3470	1.4312	1.6441*	1.7873*	1.8764*
	RE₄	1.1391	1.1440	1.1431	1.1878	1.1945	1.1862	1.2774	1.3013	1.3354	1.4342	1.5137	1.5608
	RE₅	1.1458*	1.1521*	1.1463*	1.2084*	1.2053*	1.2074*	1.3859*	1.4141*	1.4388*	1.6073	1.7255	1.8049

Open in a new tab

Under the normal distribution, the correlation is 0.65 or 0.75, while the proposed estimator is more efficient than other estimators in the literature. The superiority of the sub-ratio type estimator is seen when the correlation is 0.85 and above and the number of repetitions is low. However, as the number of repetitions increases, the proposed estimator is the most efficient for each correlation and set size. A similar situation exists in the beta distribution. As the number of repetitions increases, the estimator comes to its most efficient position. On the other hand, it is seen that the sub-exponential type estimator gives good results in low correlation and low repetitions. Under the exponential distribution, the proposed estimator gives the best results under all conditions except when the correlation is very high. The RE changes of the proposed estimator according to the set size, different correlations, distributions, and the number of repetitions are shown in Figs. 1 and 2.

Fig. 1 — Change of RE of the proposed sub-regression type estimator with cycle under N(5,1), and ρ = 0.65, 0.75 and 0.85

Fig. 2 — Change of RE of the proposed sub-regression type estimator with cycle under skewed distributions for ρ = 0.65, 0.75 and 0.85

Real Data Studies

In this part, we consider two different real-life datasets to prove the efficiency of the proposed estimators. Primarily the European and Global Covid-19 datasets in Koçyiğit and Kadılar [10] were used. The study variable and auxiliary variable were accepted as the number of cases and the number of people who died from the disease, respectively. MSE and RE values were calculated by Eqs. (19) and (20), and the RE results are given in Table 4.

Table 4.

RE values for COVID-19 data sets

Set Size	European COVID-19 Data N = 54, ρ = 0.81				Global COVID-19 Data N = 212, ρ = 0.92
Set Size	Cycle	2	3	5	2	3	5	10
m = 3	RE₂	1.2599	1.4679	1.8492	NaN	2.1774	2.2526	2.0065
	RE₃	1.2775	1.4474	1.5884	NaN	0.6288	0.7411	0.848
	RE₄	1.7102	1.6436	1.5685	1.9076	1.8702	1.8616	1.8789
	RE₅	3.0708*	2.5582*	2.1070*	3.1718*	2.8152*	2.5554*	2.2369*
m = 4	RE₂	1.4732	1.7269	2.1486	2.1312	2.2597	2.1354	1.7975
	RE₃	1.4976	1.6433	1.8339	0.7047	0.8058	0.8688	0.9589
	RE₄	1.8108	1.7233	1.6432	2.0976	2.0865*	2.1013*	2.2038*
	RE₅	2.9486*	2.4806*	2.0860*	2.0999*	2.0341	1.9232	1.8363
m = 5	RE₂	1.7427	2.0084	2.3099	2.1775	2.1701	1.9844	1.6338
	RE₃	1.7233	1.8924	2.0369	0.8332	0.9289	0.9980	1.0295
	RE₄	1.8798	1.7794	1.6953	2.2440*	2.2858*	2.3714*	2.5067*
	RE₅	2.7978*	2.4391*	2.1021*	1.4165	1.4448	1.4526	1.5415

Open in a new tab

Since the set size m = 3, 4 and 5, the cycle numbers have been chosen according to the size of the populations. In Koçyiğit and Kadılar [10], it was observed that although the basic ratio estimator could not be calculated under 1 repetition, it could be calculated by increasing the number of repetitions. While the proposed estimator is best in the European COVID-19 data set, and it is better when the number of sets is low in the Global COVID-19 data set. The fluctuations in the efficiency of the proposed estimator for these real data sets can be seen in the graphs in the top row of Fig. 2.

As the second real data set group, Apple production data sets of the Marmara and Aegean regions [12, 13] were used. Apple production in tons is taken as a study variable and the number of trees in villages is taken as an auxiliary variable. The population parameters of the sets are given in Table 5, also MSE and RE values were calculated with the same equations which are given in the previous section and the RE results are given in Table 6.

Table 5.

Population parameters of apple production data sets

Marmara Region (N = 106)
$\bar{Y}$ = 1536.77	$\bar{X}$ = 24,375.59	ρ = 0.81	S_x = 49,189.083	S_y = 6425.087	S_yx = 257,778,692
Aegean Region (N = 106)
$\bar{Y}$ = 2212.59	$\bar{X}$ = 27,421.698	ρ = 0.86	S_x = 57,460.615	S_y = 11,551.53	S_yx = 568,176,176

Open in a new tab

Table 6.

RE values for apple production data sets

Cycle	Data sets	Marmara Region			Aegean Region
Cycle	m	3	4	5	3	4	5
r = 2	RE₂	8.3269	6.1347	4.8965	11.149	8.0622	6.3021
	RE₃	2.9537	3.1268	3.0927	3.5092	3.8499	3.8791
	RE₄	1.7577	1.7980	1.7800	1.8841	1.9667	1.9683
	RE₅	4.4504*	4.1553*	3.5785*	7.3676*	6.0436*	4.3664*
r = 3	RE₂	5.8070	4.3739	3.6119	7.4175	5.5183	4.4874
	RE₃	2.5043	2.5811	2.5482	2.9559	3.1164	3.0869
	RE₄	1.6213	1.6379	1.6199	1.7383	1.7797	1.7683
	RE₅	3.4495*	3.2113*	2.9318*	5.2252*	4.5247*	3.7172*
r = 5	RE₂	3.9536	3.2055	2.7996	4.7610	3.7455	3.2052
	RE₃	2.1045	2.1569	2.1378	2.4133	2.4579	2.4247
	RE₄	1.4857	1.5011	1.4892	1.5751	1.5870	1.5723
	RE₅	2.6400*	2.4772*	2.3497*	3.7337*	3.3020*	3.0374*
r = 10	RE₂	2.7630	2.4695	2.3042	3.0521	2.6721	2.4232
	RE₃	1.7653	1.8394	1.8613	1.9140	1.9776	1.9736
	RE₄	1.3610	1.3859	1.3899	1.4056	1.4243	1.4186
	RE₅	1.9679*	1.9639*	1.9530*	2.5075*	2.3963*	2.3635*

Open in a new tab

Apple production data sets obtained from Marmara and Aegean regions are equal in population size (N_Marmara = N_Ege = 106), and similar in correlation coefficient between the study and auxiliary variable (ρ_Marmara = 0.81, ρ_Ege = 0.86). The proposed estimator for both data sets gave the most efficient results. It can be seen from the bottom row of Fig. 3 that as the number of repetitions increases, the effectiveness of the estimator decreases according to the set size.

Fig. 3 — Change of RE of the proposed sub-regression type estimator in real data sets

Discussion

In this study, a new unbiased sub-regression type estimator is proposed, after obtaining the MSE equation, it is theoretically compared with other estimators in the literature and the efficiency conditions are specified.

It has been observed that when the number of cycle is increased to meet the required sample size, ratio estimators, which cannot be calculated in data containing zero, can be calculated.

As a result of the simulation study, it has been observed that the proposed estimator always gives better results than other estimators in the literature at high repetition numbers. Results from real data studies also support this idea.

For future studies, more efficient estimators can be developed with the sub-estimator idea and the effect of the set size and cycle combinations required to meet the sample size on the effectiveness of the estimators can be examined.

Declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.McIntyre GA. A method for unbiased selective sampling using ranked sets. Aust J Agric Res. 1952;3:385–390. doi: 10.1071/AR9520385. [DOI] [Google Scholar]
2.Bhushan S, Kumar A. Novel log type class of estimators under ranked set sampling. Sankhya B. 2022;84(1):421–447. doi: 10.1007/s13571-021-00265-y. [DOI] [Google Scholar]
3.Dell TR, Clutter JL. Ranked set sampling theory with order statistics background. Biometrika. 1972;28:545–555. doi: 10.2307/2556166. [DOI] [Google Scholar]
4.Kadilar C, Unyazici Y, Cingi H. Ratio estimator for the population mean using ranked set sampling. Stat Pap. 2009;50:301–309. doi: 10.1007/s00362-007-0079-y. [DOI] [Google Scholar]
5.Mahdizadeh M, Zamanzade E (2022) On estimating the area under the ROC curve in ranked set sampling. Stat Methods Med Res 09622802221097211 [DOI] [PubMed]
6.Mahdizadeh M, Zamanzade E. On interval estimation of the population mean in ranked set sampling. Commun Stat Simul Comput. 2022;51(5):2747–2768. doi: 10.1080/03610918.2019.1700276. [DOI] [Google Scholar]
7.Prasad B. Some improved ratio type estimators of population mean and ratio in finite population sample surveys. Commun Stat Theory Methods. 1989;18:379–392. doi: 10.1080/03610928908829905. [DOI] [Google Scholar]
8.Takahasi K, Wakimoto K. On unbiased estimates of the population mean based on the stratified sampling by means of ordering. Ann Inst Stat Math. 1968;20:1–31. doi: 10.1007/BF02911622. [DOI] [Google Scholar]
9.Vishwakarma GK, Zeeshan SM, Bouza CN. Ratio and product type exponential estimators for population mean using ranked set sampling. Revista Investigacion Operacional. 2017;38(3):266–271. [Google Scholar]
10.Koçyiğit EG, Kadilar C. Information theory approach to ranked set sampling and new sub-ratio estimators. Commun Stat Theory Methods. 2022 doi: 10.1080/03610926.2022.2100910. [DOI] [Google Scholar]
11.Samawi HM, Muttlak HA. Estimation of ratio using rank set sampling. Biom J. 1996;38:753–764. doi: 10.1002/bimj.4710380616. [DOI] [Google Scholar]
12.Kadilar C, Cingi H. Ratio estimators in simple random sampling. Appl Math Comput. 2004;151(3):893–902. [Google Scholar]
13.Kadilar C, Candan M, Cingi H. Ratio estimators using robust regression. Hacettepe J Math Stat. 2007;36(2):181–188. [Google Scholar]

[CR1] 1.McIntyre GA. A method for unbiased selective sampling using ranked sets. Aust J Agric Res. 1952;3:385–390. doi: 10.1071/AR9520385. [DOI] [Google Scholar]

[CR2] 2.Bhushan S, Kumar A. Novel log type class of estimators under ranked set sampling. Sankhya B. 2022;84(1):421–447. doi: 10.1007/s13571-021-00265-y. [DOI] [Google Scholar]

[CR3] 3.Dell TR, Clutter JL. Ranked set sampling theory with order statistics background. Biometrika. 1972;28:545–555. doi: 10.2307/2556166. [DOI] [Google Scholar]

[CR4] 4.Kadilar C, Unyazici Y, Cingi H. Ratio estimator for the population mean using ranked set sampling. Stat Pap. 2009;50:301–309. doi: 10.1007/s00362-007-0079-y. [DOI] [Google Scholar]

[CR5] 5.Mahdizadeh M, Zamanzade E (2022) On estimating the area under the ROC curve in ranked set sampling. Stat Methods Med Res 09622802221097211 [DOI] [PubMed]

[CR6] 6.Mahdizadeh M, Zamanzade E. On interval estimation of the population mean in ranked set sampling. Commun Stat Simul Comput. 2022;51(5):2747–2768. doi: 10.1080/03610918.2019.1700276. [DOI] [Google Scholar]

[CR7] 7.Prasad B. Some improved ratio type estimators of population mean and ratio in finite population sample surveys. Commun Stat Theory Methods. 1989;18:379–392. doi: 10.1080/03610928908829905. [DOI] [Google Scholar]

[CR8] 8.Takahasi K, Wakimoto K. On unbiased estimates of the population mean based on the stratified sampling by means of ordering. Ann Inst Stat Math. 1968;20:1–31. doi: 10.1007/BF02911622. [DOI] [Google Scholar]

[CR9] 9.Vishwakarma GK, Zeeshan SM, Bouza CN. Ratio and product type exponential estimators for population mean using ranked set sampling. Revista Investigacion Operacional. 2017;38(3):266–271. [Google Scholar]

[CR10] 10.Koçyiğit EG, Kadilar C. Information theory approach to ranked set sampling and new sub-ratio estimators. Commun Stat Theory Methods. 2022 doi: 10.1080/03610926.2022.2100910. [DOI] [Google Scholar]

[CR11] 11.Samawi HM, Muttlak HA. Estimation of ratio using rank set sampling. Biom J. 1996;38:753–764. doi: 10.1002/bimj.4710380616. [DOI] [Google Scholar]

[CR12] 12.Kadilar C, Cingi H. Ratio estimators in simple random sampling. Appl Math Comput. 2004;151(3):893–902. [Google Scholar]

[CR13] 13.Kadilar C, Candan M, Cingi H. Ratio estimators using robust regression. Hacettepe J Math Stat. 2007;36(2):181–188. [Google Scholar]

PERMALINK

The New Sub-regression Type Estimator in Ranked Set Sampling

Eda Gizem Koçyiğit

Khalid Ul Islam Rather

Abstract

Introduction

Estimators in Literature

Proposed Estimator