Skip to main content
Heliyon logoLink to Heliyon
. 2023 Oct 21;9(11):e21418. doi: 10.1016/j.heliyon.2023.e21418

Estimation of finite population mean using double sampling under probability proportional to size sampling in the presence of extreme values

Jing Wang a, Sohaib Ahmad b, Muhammad Arslan c,h,, Showkat Ahmad Lone d, AH Abd Ellah e, Maha A Aldahlan f, Mohammed Elgarhy g
PMCID: PMC10598535  PMID: 37885711

Abstract

Values that are too large or small enough can be found in many data sets. Therefore, the estimator can yield ambiguous findings if several of the incredible deals are picked for the sample. When such extreme values occur, we propose improved estimators to determine the finite population means using double sampling based on probability proportional to size sampling (PPS). The properties of estimators are obtained up to the first order of approximations. When the size of the units varies widely, the PPS sampling technique may be employed. To determine the values of Pi when using PPS, we must be acquainted with the aggregate of the auxiliary variable Xi. However the designs and estimation techniques we have looked at so far are unsuccessful and are less effective when this information is difficult to locate or when other information is missing. The two-phase approach is preferable and more feasible in these kinds of circumstances. To demonstrate how effectively the recommended estimators performed, we used three actual data sets. We show mathematically and theoretically that the suggested estimators outperform alternative estimators.

Keywords: PPS, double sampling, Auxiliary variable, Bias, MSE, PRE

1. Introduction

The effective use of auxiliary variables in survey sampling may boost the precision of estimators of the population parameter. The best statistical property estimates for population quantities, like mean, total, median, etc., are frequently searched for by researchers. For this, an illustrative sample of the population is needed. If the aggregate of concern is equivalent, choosing the entities can be done utilizing a SRS approach. It is necessary to know the aggregate constraints of the auxiliary variable in order to use the ratio, product, or regression methods of estimate. The ratio estimator plays an important role when there is a significant connection between the research and the auxiliary information. Apart from, the product estimator works effectively when there is a lack of association amongst the research and the auxiliary variable. By applicably adapting the auxiliary information, numerous researchers have developed various ratio estimators. Researchers can investigate this research by looking at [1] recommended on certail procedures of enlightening ratio and regression estimators [2]. recommended a better class of estimators for the mean of the population that use PPS sampling [3]. suggested that under linear transformation of the auxiliary variable, exponential estimators of the population mean be of the ratio type [4]. they reviewed a class of estimators of the population mean that hold satisfactorily against linear modification of the auxiliary information [5]. discussed a class of exponential ratio estimators consuming two auxiliary information [6]. studied mean estimate using quantile regression ratios under full and partial auxiliary information [7]. suggested robust quantile regression with two more variables for mean estimation [8]. recommended methods of enlightening estimators. The [9] discussed ameliorate estimation of mean using skewness and kurtosis of auxiliary character [10]. recommended a class of product estimators of population mean utilizing auxiliary information has been presented and questioned [11]. suggested an estimation of the population mean that was of the generalized exponential type and used auxiliary features [12]. recommended estimators of the mean of a population using simple random sampling that are based on robust ratios were proposed [13]. estimators for the mean of a population that make use of supplementary data and execute consecutive sampling on two occasions are recommended [14]. presented several imputation strategies for addressing missing information in two-sample consecutive sampling [15,16]. recommended estimation of population mean under probability proportional to size sampling with and without measurement errors.

In various situations, such as medical studies or surveys, it is common for the population sizes to diverge significantly. This can lead to variations in the probabilities or outcomes of different units within the population. For example, in a medical study examining a specific disease may be relatively small compared to the overall population size. This divergence in size can affect the probability of selecting individuals with the disease in a random sample. Researchers may need to account for this difference in population size and adjust their sampling methods or statistical analyses accordingly to ensure accurate representation and valid conclusions. Similarly, in surveys related to family income, the number of siblings within families can vary widely. This divergence in family size can influence the overall distribution of income levels within the survey population. It may be necessary to consider the different family sizes when analyzing the survey data or drawing conclusions about the relationship between family income and other variables. In situations like these, statistical techniques such as weighting, stratified random sampling, or other methods can be employed to address the divergent population sizes and account for the varying probabilities of units within the population. These techniques aim to provide accurate estimates and make valid inference despite the difference in population sizes. We utilize PPS sampling to deal with such an unequal probability. A PPS is an unequal random sampling in which, for each sampling component taken collectively, the chance of choices is proportional to an auxiliary variable. Let the context where we must evaluate the population in districts inside a province; we choose the auxiliary variable that has the determined relationship with the research variable.

For example.

  • (i)

    The aggregate of all districts inside the province (associated with research variable = 0.85).

  • (ii)

    The quantity of families in all societies inside the districts (association with the study variable = 0.98).

On the origin of these facts: (ii) more useful as an auxiliary variable.

Researchers can investigate this research by looking at [17] a discussion of using outliers to estimate the average of a population using a probability-based sampling design [18,19]. recommended PPS when outliers are present [20]. discussed combination of ratio and PPS estimators [21]. offered a more accurate estimation of the population size using PPS data [22]. discussed improved estimators in simple random sampling [23]. recommended on mixture of ratio and PPS estimators [24]. recommended substitute estimators in PPS sampling [25]. two auxiliary variables were suggested for improved estimate of the population mean using PPS.

Therefore when evidence like that is not readily accessible or when the auxiliary variable is not available, the earlier designs and estimating procedures do not produce capable results, and their efficiency decreases. Double-phase sampling is more beneficial and effective in this situation. The populations mean of the auxiliary information, which will be used in the evaluation or selection phase, can be estimated using an adequate initial sample.

For example:

On the condition of a single auxiliary information X, we take a sizeable investigative sample for estimating the population mean and only a subsample for computing the research variable Y because obtaining evidence on X is less expensive. This may imply allocating a portion of the assets to this large initial sample, resulting in a smaller sample size for computing the study variable. When the improvement in accuracy is significant compared to the rise in price due to the gathering of information on the auxiliary information for huge samples, this technique is favorable. The difficulty of calculating total buffalo milk production in a given region is an actual illustration of this situation. We use a community as the sampling element and the quantity of milk buffalo in a community as the auxiliary information in this study. Because the whole amount of milk buffalo in each community in the region may not be known, the investigator may choose a huge sample of communities and gather data on the number of milk buffalo in each village. This data is then utilized to calculate an estimate of X, the total number of milk buffalo in the area. The researchers are focused on an article regarding double-phase sampling at [26] who proposed the generalized regression estimator for two-phase tax record samples [27]. recommended the mean of a finite population can be estimated using linear regression and the ratio product [24]. presented double-sampling modified exponential estimators for the mean of a finite population [28]. recommended combining exponential functions for effective estimate when two-phase sampling is used [29]. in the context of stratified two-stage sampling, we talked about exponential chain ratio estimators [30]. consuming two auxiliary information in stratified two-phase sampling, a new, more accurate calibration estimator was presented [31]. recommended a family of estimators for predicting population mean from auxiliary proportions in single- and two-stage samples [32]. discussed a two-phase sampling method that uses a generalized methodology to estimate a finite population mean was suggested [33]. estimated the mean of a finite population using a mixed exponential-type estimator and a two-stage sampling design [34]. proposed that two-phase sampling could improve mean population estimates [35]. recommended an effective group of double-sampling estimators for the population mean [36]. for double-sampling the mean of a finite population, an exponential estimator of the chain-ratio type is proposed.

Our primary objectives are highlighted as follows.

  • 1.

    In this paper, the primary objective of the contemporary effort is to estimate the finite population means using double sampling under PPS in the existence of extreme values (minimum and maximum values).

  • 2.

    The numerical properties i.e. bias and MSE of the recommended estimator, are consequent up to the first order of approximation.

  • 3.

    The application of the recommended estimator is highlighted through the use of real data sets from various domains.

2. Sampling methodology

Let a population Ψ = {Ψ1,Ψ2,,ΨN} of size N unlike elements. In the first phase, we draw an initial large sample of size “m” (m < N) from Ψ by making use of the SRSWOR sampling design and estimating the auxiliary information x. In the second phase, we take out a sub-sample of size “n” from the first phase of size “m”, i.e., (n < m) by SRSWOR or at first hand from Ψ, and notice both the study and auxiliary variables. Consider yi,xi and zi to be the study and auxiliary variables, respectively.

Let Pi = Zii=1NZi, be the PPS to size for ith units, where

u11=i=1nui11n=ypps,v11=i=1Nvi12n=xpps,
Y=1Ni=1Nyi,andX=1Ni=1Nxi,
ui11=1NPiyi,vi12=1NPixi,su112=i=1NPi(ui11Y)2,sv122=i=1NPi(vi12X)2,
ρuv=i=1N(ui11Y)(vi12X)su11sv12,su11=Pi(ui11Y)2,sv12=Pi(vi12X)2.

Some real data sets include extreme values, e.g., when estimating the intelligence quotient (IQ), the brilliant students got (maximum) marks, and the weak students got (minimum) marks. If there are unexpectedly large or small elements in the population, the finite population mean is particularly delicate to unpredicted values. Furthermore, because the mean estimator is particularly delicate to such unpredicted findings, the population mean will either be ordinary or overstated depending on whether the sample contains large or small values. Consequently, if any of the surprising values are picked in the sample, the estimator can produce ambiguous conclusions. [37], suggested the following unbiased estimator to overcome this issue, which is given in equation (1).

yˆss={y+s,ifthesamplecontainsonlyminimum,notmaximumvaluesys,ifthesamplecontainsonlymaximum,notminimumvaluesy,ifsamplecontainsallobservation (1)
Var(yˆss)=φsu1122φnc2N1[σu11nc1]

The MSE of yˆss, at the unknown value of c, which is given in equation (2):

Var(yˆss)min=Var(y)φσu1122(N1), (2)

where

Var(y)=φsu112
  • [11]recommended population total under PPS, which are given in equation (3):

yˆpps=1ni=1n(yi+φ)piNφ, (3)

where, pi = cxi+ncxi+Nn.

For estimation of the population means, we can also write equation (3) as given by:

yˆpps=1ni=1n(yi+φ)Npiφ

(cxi+Nn)Nni=1N(yi+φ)cxi+n, when c = 1, n = 0, φ = 0

yˆpps=1ni=1n(yi)Npi=1ni=1nui11=u11=ypps

The variance of ypps is given in equation (4):

Var(yˆT,PPS)=φu11cu112 (4)

The ratio and product estimators [38,39] which are given in equations (5), (6):

yRT,PPS=u11(X*v11), (5)
yPT,PPS=u11(v11X*), (6)

The MSE of yRT,PPS, and yPT,PPS are given in equations (7), (8):

MSE(yRT,PPS)=Y2{φcu112+φ2cv12(cv122ρu11v12cu11)} (7)

and

MSE(yPT,PPS)=Y2{φcu112+φ2cv12(cv12+2ρu11v12cu11)} (8)

The regression estimator is given in equation (9):

yRegT,PPS=u11+β(X*v11). (9)

The variance of regression is given in equation (10):

Var(yRegT,PPS)=φsu112+φ2su112(1ρu11v122). (10)

Where.

φ = (1n1N) , φ = (1m1N), φ2 = (1n1m).

3. Suggested estimators

Some real data sets included extreme values, either very large or small. The efficiency of estimators may suffer in the manifestation of these extreme values. For example, while measuring the average export of goods, China may produce a large number of goods for the international market due to new technology and improved skills of its people, compared to Pakistan's small amount of goods due to poor management and lack of technology. Similarly, if we wish to know the average yearly wheat production in our country, we can see that wheat production in Punjab is extremely large as compared to other provinces. To deal with such an extreme values taking motivation from Refs. [17,18], we suggested an improved ratio, product, and regression type estimator for double phase with PPS sampling in the occurrence of extreme values. The recommended improved estimators are presented in three different situations.

Situation-I: Mean per unit estimator, given in equation (11)

yT,PPS={u11+c,Iftheselectedobservationincludedsmallvalueofui11u11c,Iftheselectedobservationincludedlargevalueofui11u11,Iftheselectedobservationincludedothervalues (11)

The optimal value of C, is given as:

C=φσu1122(N1),

The least variance at the value of C are given in equation (12):

V(YˆT,PPS)=V(yˆT,PPS)φσu1122(N1) (12)

Situation-II: When u and v are positively correlated.

When the correlation between u and v is positive, when the minimum cost of u is chosen, the collection of the minimum value of v is presumed. And for a maximum value of v, a maximum cost of u is assumed to be nominated. In such a scenario, we suggest the following improved ratio type estimator, which is given in equation (13).

YˆRT,PPS=uc11(X*vc21), (13)

or

YˆRT,PPS={(u11+c1)(X*+c2)(v+c2),Ifthesampleincludedsmallvalueofui11andvi12(u11c1)(X*c2)(vc2),Ifthesampleincludedlargevalueofui11andvi12u11(X*v),forallothersamples,

where (uc11=u11+c1,Xc21*=X*+c2,vc21=v11+c2). If the trial contains minimum values of u and v. (uc11=u11c1,Xc21*=X*c2,vc21=v11c2). If a trial contains maximum values of u and v, and (uc11=u11,Xc21*=X*,vc21=v11), for all further samples. Where c1 and c2 are sustained, its value y be decisive for optimal conditions.

The regression estimator is given in equation (14):

yT,Reg1,PPS=uc11+b(X*vc21), (14)

where (uc11=u11+c1,vc21=v11+c2) if the trial comprises u and v minimum. (uc11=u11c1,vc21=v11c2) if the trial comprises u and v maximum, and (uc11=u11,vc21=v11), for all other samples.

Situation-III: When u and v are negatively correlated.

While u and v are both negatively correlated with one another, the picking of a large assessment of v is expected to be accompanied by a small value of u. Similarly, when a small value of v is selected, it is expected to select a large value of u. Based on these situations, we suggested the following improved product type estimator, which is given in equation (15):

YˆPT,PPS=uc12(vc22X*), (15)

or

YˆPT,PPS={(u11+c1)(v+c2)(X*+c2),Ifthesampleincludedsmallvalueofui11andlargevaluesofvi12(u11c1)(vc2)(X*c2),Ifthesampleincludedlargevalueofui11andsmallvaluesofvi12u(vX*),forallothersamples

The regression estimator is given in equation (16):

yT,Reg2,PPS=uc12+b(X*vc22), (16)

where (uc12=u11+c1,vc22=v11c2) if the sample comprises u and v minimum. (uc12=u11c1,vc22=v11+c2) if the sample comprises u and v maximum, and (uc12=u11,vc22=v11), for all other samples.

To find out biases and MSE we explain the relative error term and their expectation given as:

Let

ϵ0=u11UU,ϵ1=v11VV,ϵ1=X*VV,
E(ϵ0)=E(ϵ1)=E(ϵ1)=0.
E(ϵ02)=[φY2(su1122nc2N1[σu11nc1]),E(ϵ12)=φX2(sv1222mc2N1[σv12mc2])],
E(ϵ12)=[φX2(sv1222mc2N1[σv12mc2]),E(ϵ1ϵ1)=φX2(sv1222mc2N1[σv12mc2])],
E(ϵ0ϵ1)=[φXY(su11v12nN1[c2σu11+c1σv2nc1c2])],
E(ϵ0ϵ1)=[φXY(su11v12mN1[c2σu11+c1σv122mc1c2])],

where

σu=umaxuminandσv=vmaxvmin.

By simplifying (12), in terms of e's.

YˆRT,PPS = Y(1+ϵ0)(1+ϵ1)1 ,or.

YˆRT,PPS = Y(ϵ0ϵ1ϵ0ϵ1+ϵ12).

Taking expectations from both sides, we have

Bias(yTr,PPS)=Y(φ2cv122φ2cu11v12)R(N1){2c2X{(nφmφ)σv12c2(n2φm2φ)}(nφmφ)(c2σu11+c1σv12)+2c1c2(n2φm2φ)},

where R = YX.

Unique values of c1 and c2 are not possible, because we have one equation and two unknown values.

c1(optimal)=σu11σv12,
c2(optimal)={(N1)σu11mRσv122mR(Nmn)}

Putting the ideal values of c1(optimal) and c2(optimal), the least MSE of YˆRT,PPS, is given in equation (17):

MSE(YˆRT,PPS)=MSE(yTr,PPS)12mN(N1)[σv{(Nn)σu11+2R(nm)σv12}+(nm){(Nn)σu11mRσv12}2m(Nmn)] (17)

Similarly, the bias of product type estimator is given:

B (YˆPT,PPS) = [φXY{su11v12nN1[c2σu11+c1σv122nc1c2]}].

The MSE of product type estimator is given in equation (18):

MSE(YˆPT,PPS)=MSE(yPT,PPS)12mN(N1)[σv12{(Nn)σu11+2R(nm)σv12}+(nm){(Nn)σu11mRσv12}2m(Nmn)] (18)

In circumstance of positive correlation, the variance of yT,Reg1,PPS, given in equation (19);

Var(yT,Reg1,PPS)=MSE(yRegT,PPS)12mN(N1)[σv12{(Nn)σu112β(nm)σv12}+(nm){(Nn)σu11mRσv12}2m(Nmn)] (19)

In circumstance of negative correlation, the variance of yT,Reg2,PPS, given in equation (20):

Var(yT,Reg2,PPS)=MSE(yRegT,PPS)12mN(N1)[σv12{(Nn)σu11+2β(nm)σv12}+(nm){(Nn)σu11mRσv12}2m(Nmn)] (20)

Generally, we can write the variance of the regression estimator as given in equation (21):

Var(YˆRegGT,PPS)=MSE(yRegT,PPS)12mN(N1)[σv12{(Nn)σu11+2|β|(nm)σv12}+(nm){(Nn)σu11mRσv12}2m(Nmn)] (21)

4. Efficiency comparison

In this section, we equate theoretically the suggested estimators with existing counterparts.

  • (i)

    By taking (4) and (12)

Var(YˆT,PPS) < Var(yˆT,PPS), or

Var(yˆT,PPS) - Var(YˆT,PPS) > 0

φσu1122(N1)> 0

  • (ii)

    By taking (7) and (17)

MSE (YˆRT,PPS) < MSE(yRT,PPS), or

MSE(yRT,PPS) - MSE (YˆRT,PPS) > 0

12mN(N1)[σv12{(Nn)σu11+2R(nm)σv12}+(nm){(Nn)σu11mRσv12}2m(Nmn)]> 0

  • (iii)

    By taking (8) and (18)

MSE (YˆPT,PPS) < MSE(yPT,PPS), or

MSE(yPT,PPS) - MSE (YˆPT,PPS) > 0

12mN(N1)[σv12{(Nn)σu11+2R(nm)σv12}+(nm){(Nn)σu11mRσv12}2m(Nmn)]> 0

  • (iv)

    By taking (10) and (19)

Var (yT,Reg1,PPS) < Var(yRegT,PPS), or

Var(yRegT,PPS) - Var (yT,Reg1,PPS) > 0

12mN(N1)[σv12{(Nn)σu112β(nm)σv12}+(nm){(Nn)σu11mRσv12}2m(Nmn)]> 0

  • (v)

    By taking (10) and (20)

Var (yT,Reg2,PPS) < Var(yRegT,PPS), or

Var(yRegT,PPS) - Var (yT,Reg2,PPS) > 0

12mN(N1)(σv{(Nn)σu11+2β(nm)σv12}+(nm){(Nn)σu11mRσv12}2m(Nmn))> 0

5. Numerical investigation

We took three data sets to determine the suggested estimator's efficiency with existing counterparts. The summary statistics of these data sets are given below:

Data-I [Source: [24]]:

Y = Expected fish caught throughout 1995,

X = expected fish caught throughout 1994,

Z = expected fish caught throughout 1993.

Data-II [Source: [24]]:

Y = Expected fish caught throughout 1995,

X = expected fish caught throughout 1993,

Z = expected number of fish caught throughout 1992.

Data-III [Source: [40] ]:

Y=Output for 80 yard,

X = stable capital in a region,

Z = number of labors.

6. Discussion

As previously mentioned, we evaluated the performance of our suggested estimators using three real data sets. The proposed estimators are numerically and mathematically related to their current equivalents. The actual data are summarised statistically in Table 1, Table 2, Table 3. The MSE and PRE of our proposed and current counterparts are displayed in Table 4, Table 5. In phrase of MSE and PRE, it is detected that the suggested estimators are efficient than existing counterparts. The gain in data 2 is greater as compared to data 1 and data 3. Fig. 1 shows a comparison of estimators in terms of MSE. We plotted estimators on the X-axis and MSE values on the Y-axis. The estimator is more effective when you reduce the value of MSE. The efficiency of an estimator is directly related to the trend of lines. As the value of MSE is the minimum, the line graph shows the downward direction. Fig. 2 shows a comparison of estimators in terms of percentage relative estimators. When compared to their counterparts, our proposed estimators gain the most percentage relative efficiency. We plot estimators on the X-axis and values of PRE on the Y-axis. The higher the value of PRE, the better is the estimator. The trend line indicates an increasing path based on the PRE values.

Table 1.

Summary statistic for Data-I.

N = 69 X = 4954.435 cu11 = 0.4720461 cv12 = 0.5049075 umax = 11873.89 R = 0.9112843
m = 36 Z = 4591.072 cu112 = 0.2228275 cv122 = 0.2549316 umin = 467.0002 cu11v12 = 0.063411
m1 = 20 su112 = 2420387 sv122 = 2007238 ρu11v12 = 0.2660536 vmax = 18850.12 β = 0.292154
Y = 4514.899 su11 = 1555.759 sv12 = 1416.77 su11v12 = 1,418,429 vmin = 894.5356

Table 2.

Summary statistic for Data-II.

N = 69 X = 4591.072 cu11 = 0.8523346 cv12 = 1.509517 umax = 20949.43 R = 0.983408
m = 36 Z = 4230.174 cu112 = 0.7264743 cv122 = 2.278641 umin = 1002.527 cu11v12 = 0.05634
m1 = 20 su112 = 3,361,469 sv122 = 2,245,140 ρu11v12 = 0.04379289 vmax = 54678.91 β = 0.05358
Y = 4514.899 su11 = 1833.488 sv12 = 1498.379 su11v12 = 1,167,922 vmin = 1634.557

Table 3.

Summary statistic for Data-III.

N = 80 X = 1126.463 cu11 = 0.775310 cv12 = 0.281887 umax = 15586.8 R = 4.600808
m = 45 Z = 285.125 cu112 = 0.6011057 cv122 = 0.07946076 umin = 2408.59 cu11v12 = 0.12676
m1 = 25 su112 = 10,568,817 sv122 = 63257.12 ρu11v12 = 0.5800092 vmax = 1944.034 β = 7.497101
Y = 5182.637 su11 = 3250.972 sv12 = 251.5097 su11v12 = 740038.3 vmin = 592.6127

Table 4.

MSE of the existing and suggested estimators.

Estimators Data-I
MSE(.)
Data-II
MSE(.)
Data-III
MSE(.)
yT,PPS 2,079,853 8,604,841 12,067,836
yRT,PPS
YˆRT,PPS
219312.1
218,119
1,506,958
1,547,809
360886.7
349856.2
yPT,PPS
YˆPT,PPS
334209.1
333016.1
1,609,051
1,649,902
603003.9
591973.5
yRegT,PPS
YˆRegGT,PPS
115820.1
102615.2
163927.6
111587.4
358827.8
351424.7

Table 5.

PRE of the existing and suggested estimators.

′Estimators Data-I Data-II Data-III
yT,PPS 100 100 100
yRT,PPS
YˆRT,PPS
948.3534
953.5406
571.0073
555.9368
3343.941
3449.37
yPT,PPS
YˆPT,PPS
622.3209
624.5504
534.7774
521.5365
2001.286
2038.577
yRegT,PPS
YˆRegGT,PPS
1795.762
2026.846
5249.171
7711.299
3363.127
3433.975

Fig. 1.

Fig. 1

MSE of the suggested and existing estimators

Fig. 1: On Y-axis, we put the values of mean square error, and on X-axis, we put the estimators.

Fig. 2.

Fig. 2

PRE of the suggested and existing estimators

Fig. 2: On Y-axis, we put the values of mean square error, and on X-axis, we put the estimators.

7. Conclusion

In this paper, we have recommended an enhanced ratio, product, and regression type estimators for the estimation of finite population mean in double-phase with PPS sampling in the incidence of extreme values. The numerical expressions of properties are derived up to the first order of approximation. The purpose of this proposal is to enhance the accuracy and precision of mean estimation compared to existing estimators. To evaluate the efficiency of the recommended estimator, we conduct a comparative analysis with several existing counterparts. By comparing the performance of the proposed estimator against these alternatives, we aim to demonstrate its uniqueness and superiority. We used three actual data sets to obtain the MSEs and PRE. From the numerical results, recommended estimators perform well in terms of minimum mean square error and advanced PRE. It has been validated through empirical efficiency comparisons that our proposed estimators perform more effectively than the traditional estimators. The recommended estimators performed well, with the greatest gain in efficiency, and would perform well in applied surveys. The current work can be easily extended to yield an improved family of estimators under stratified random sampling and measurement error using the auxiliary information or attributes for estimation of population mean and variance. Additionally, it would be interesting to examine the efficiency of our recommended estimator in more complex survey settings, such as clustered and stratified sampling.

Data availability

Data will be made available on request.

Funding

There is no funding.

CRediT authorship contribution statement

Jing Wang: Formal analysis. Sohaib Ahmad: Conceptualization, Investigation, Writing – original draft. Muhammad Arslan: Formal analysis. Showkat Ahmad Lone: Resources, Validation. A.H. Abd Ellah: Methodology, Validation. Maha A. Aldahlan: Conceptualization, Data curation. Mohammed Elgarhy: Data curation, Resources.

Declaration of competing interest

The authors declare that they have no known cometing finincial intersts or personal relationships that could have appeared to influence the work reported in this paper.

Contributor Information

Jing Wang, Email: wangjingjjx@tynu.edu.cn.

Sohaib Ahmad, Email: sohaib_ahmad@awkum.edu.pk.

Muhammad Arslan, Email: arslanma76172@gmail.com.

Showkat Ahmad Lone, Email: s.lone@seu.edu.sa.

A.H. Abd Ellah, Email: aabdulmotajali@bu.edu.sa.

Maha A. Aldahlan, Email: maal-dahlan@uj.edu.sa.

Mohammed Elgarhy, Email: m_elgarhy85@sva.edu.eg.

References

  • 1.Singh S. ume 2. Kluwer Academic Publishers; 2003. (Advanced Sampling Theory with Applications: How MichaelSelected Amy). [Google Scholar]
  • 2.Ahmad S., Shabbir J., Zahid E., Aamir M. Improved family of estimators for the population mean using supplementary variables under PPS sampling. Sci. Prog. 2023;106(2) doi: 10.1177/00368504231180085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Grover L.K., Kaur P. Ratio type exponential estimators of population mean under linear transformation of auxiliary variable: theory and methods. S. Afr. Stat. J. 2011;45(2):205–230. [Google Scholar]
  • 4.Grover L.K., Kaur P. A generalized class of ratio type exponential estimators of population mean under linear transformation of auxiliary variable. Commun. Stat. Simulat. Comput. 2014;43(7):1552–1574. [Google Scholar]
  • 5.Singh R., Sharma P. A class of exponential ratio estimators of finite population mean using two auxiliary variables. Pak. J. Statistics Oper. Res. 2015:221–229. [Google Scholar]
  • 6.Shahzad U., Hanif M., Sajjad I., Anas M.M. Quantile regression-ratio-type estimators for mean estimation under complete and partial auxiliary information. Sci. Iran. 2022;29(3):1705–1715. [Google Scholar]
  • 7.Shahzad U., Ahmad I., Almanjahie I.M., Al-Noor N.H., Hanif M. Mean estimation using robust quantile regression with two auxiliary variables. Sci. Iran. 2022;30:1245–1254. [Google Scholar]
  • 8.Rao T.J. On certail methods of improving ration and regression estimators. Commun. Stat. Theor. Methods. 1991;20(10):3325–3340. [Google Scholar]
  • 9.Sinha R.R., Bharti Ameliorate estimation of mean using skewness and kurtosis of auxiliary character. J. Stat. Manag. Syst. 2022;25(4):927–944. [Google Scholar]
  • 10.Yadav S.K., Zaman T., Khokhar A., Saha S. Questing elevated family of product estimators of population mean using auxiliary varaibles. J. Sci. Arts. 2022;22(2):343–350. [Google Scholar]
  • 11.Ahmad S., Arslan M., Khan A., Shabbir J. A generalized exponential-type estimator for population mean using auxiliary attributes. PLoS One. 2021;16(5) doi: 10.1371/journal.pone.0246947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zaman T., Bulut H., Yadav S.K. Robust ratio‐type estimators for finite population mean in simple random sampling: a simulation study. Concurrency Comput. Pract. Ex. 2022;34(25) [Google Scholar]
  • 13.Singh G.N., Khalid M. Efficient class of estimators for finite population mean using auxiliary information in two-occasion successive sampling. J. Mod. Appl. Stat. Methods. 2019;17(2):14. [Google Scholar]
  • 14.Singh G.N., Khalid M., Kim J.M. Some imputation methods to deal with the problems of missing data in two-occasion successive sampling. Commun. Stat. Simulat. Comput. 2021;50(2):557–580. [Google Scholar]
  • 15.Sinha R.R. Families of estimators for estimating mean using information of auxiliary variate under response and non-response. Journal of Reliability and Statistical Studies. 2020:21–60. [Google Scholar]
  • 16.Sinha R.R., Khanna B. Estimation of population mean under probability proportional to size sampling with and without measurement errors. Concurrency Comput. Pract. Ex. 2022;34(18) [Google Scholar]
  • 17.Ahmad S., Shabbir J. Use of extreme values to estimate finite population mean under pps sampling scheme. Journal of Reliability and Statistical Studies. 2018:99–112. [Google Scholar]
  • 18.Al-Marzouki S., Chesneau C., Akhtar S., Nasir J.A., Ahmad S., Hussain S.…El-Morshedy M. Estimation of finite population mean under PPS in presence of maximum and minimum values. AIMS Mathematics. 2021;6(5):5397–5409. [Google Scholar]
  • 19.Ahmad S., Zahid E., Shabbir J., Aamir M., Onyango R. Enhanced estimation of the population mean using two auxiliary variables under probability proportional to size sampling. Math. Probl Eng. 2023:2023. doi: 10.1177/00368504231208537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Agarwal S., Kumar P. Combination of ratio and pps estimators. J. Indian Soc. Agric. Stat. 1980;32:81–86. [Google Scholar]
  • 21.Singh H.P., Mishra A.C., Pal S.K. Improved estimator of population total in PPS sampling. Commun. Stat. Theor. Methods. 2018;47(4):912–934. [Google Scholar]
  • 22.Sharma P., Singh R. Improved estimators in simple random sampling when study variable is an attribute. J. Stat. Appl. Pro. Lett. 2015;2(1):51–58. [Google Scholar]
  • 23.Pandey S., Singh R.K. On combination of ratio and PPS estimators. Biom. J. 1984;26(3):333–336. [Google Scholar]
  • 24.Rao J. Alternative estimators in PPS sampling for multiple characteristics. Sankhya: The Indian Journal of Statistics, Series A. 1966;28:47–60. [Google Scholar]
  • 25.Srivenkataramana T., Tracy D.S. Transforming the study variate after PPS Sampling. Metron. 1979;37(1):175–181. [Google Scholar]
  • 26.Armstrong J., St-Jean H. Generalized regression estimator for two-phase sample of tax records. Surv. Methodol. 1983;20:91–105. [Google Scholar]
  • 27.Singh H.P., Espejo M.R. On linear regression and ratio-product estimation of a finite population mean. The Statistician. 2003;1:59–67. [Google Scholar]
  • 28.Hassan Y., Ismail M., Murray W., Shahbaz M.Q. Efficient estimation combining exponential and functions under two phase sampling. AIMS Mathematics. 2020;5(6):7605–7623. [Google Scholar]
  • 29.Sanaullah A., Ali H.A., ul Amin M.N., Hanif M. Generalized exponential chain ratio estimators under stratified two-phase random sampling. Appl. Math. Comput. 2014;226:541–547. [Google Scholar]
  • 30.Ozgul N. New improved calibration estimator based on two auxiliary variables in stratified two-phase sampling. J. Stat. Comput. Simulat. 2021;91(6):1243–1256. [Google Scholar]
  • 31.Oyeyemi G.M., Muhammad I., Kareem A.O. Combined exponential-type estimators for finite population mean in two-phase sampling. Asian Journal of Probability and Statistics. 2023;21(2):44–58. [Google Scholar]
  • 32.Tiwari A., Kumar M., Dubey S.K. A generalized approach for estimation of a finite population mean in two-phase sampling. Asian Journal of Probability and Statistics. 2023;21(3):45–58. [Google Scholar]
  • 33.Liu X., Arslan M. A general class of estimators on estimating population mean using the auxiliary proportions under simple and two phase sampling. AIMS Mathematics. 2021;6(12):13592–13607. [Google Scholar]
  • 34.Bhushan S., Kumar A. Enhanced estimation of population mean under two-phase sampling. Int. J. Math. Model. Numer. Optim. 2023;13(1):34–48. [Google Scholar]
  • 35.Bhushan S.B., Kumar A.K., Kumar S. Efficient class of estimators of population mean under double sampling. Thailand Statistician. 2023;21(3):498–509. [Google Scholar]
  • 36.Janbandhu R., Garg N., Tailor R. Chain ratio type exponential estimator for finite population mean in double sampling. Thailand Statistician. 2023;21(3):675–690. [Google Scholar]
  • 37.Sarndal C.E. vol. 40. International Statistical Institute; 1972. pp. 1–12. (Sample Survey Theory vs General Statistical Theory: Estimation of the Population Mean). [Google Scholar]
  • 38.Murthy M.N. second ed. Statistical Publishing Society; Calcutta: 1967. Sampling Theory and Methods. [Google Scholar]
  • 39.Mankal Narasinha Murthy . Statistical Pub. Society; Calcutta: 1967. Sampling Theory and Methods. [Google Scholar]
  • 40.Upadhyaya L.N., Singh H.P. Use of transformed auxiliary variable in estimating the finite population mean. Biom. J. 1999;41(5):627–636. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES