Skip to main content
SpringerPlus logoLink to SpringerPlus
. 2016 Jun 14;5(1):723. doi: 10.1186/s40064-016-2368-1

A note on a difference-type estimator for population mean under two-phase sampling design

Mursala Khan 1,, Abdullah Yahia Al-Hossain 2
PMCID: PMC4908086  PMID: 27375992

Abstract

In this manuscript, we have proposed a difference-type estimator for population mean under two-phase sampling scheme using two auxiliary variables. The properties and the mean square error of the proposed estimator are derived up to first order of approximation; we have also found some efficiency comparison conditions for the proposed estimator in comparison with the other existing estimators under which the proposed estimator performed better than the other relevant existing estimators. We show that the proposed estimator is more efficient than other available estimators under the two phase sampling scheme for this one example; however, further study is needed to establish the superiority of the proposed estimator for other populations.

Keywords: Study variable, Auxiliary variable, Bias, Mean squared-error, Two phase sampling, Exponential chain-type estimator, Efficiency

Background

In survey sampling, the use of the auxiliary information at the estimation stage is widely used in order to obtain improved designs and the precision of an estimator of the unknown population parameter. When the knowledge of the auxiliary variable is used at the estimation stage, the ratio, product and regression methods of estimation are widely employed in these situations.

The most important topic which is widely discussed in the various probability sampling schemes is the estimation of the population mean of the study variable. A large number of authors have paid their attention towards the formulation of new or modified estimators for the estimation of population mean, for the case, see Hansen and Hurwitz (1943), Sukhatme (1962), Srivastava (1970), Chand (1975), Cochran (1977), Kiregyera (1980, 1984), Srivastava et al. (1990), Bahl and Tuteja (1991), Singh et al. (2006, 2007, 2011), Singh and Choudhury (2012), Khare et al. (2013), Singh and Majhi (2014) and Khan (2015, 2016) etc.

Symbols and notations

Let us consider a finite population of size N of different units U = {U1U2U3, …, UN}. Let y and x be the study and the auxiliary variable with corresponding values yi and xi respectively for the i-th unit i = {1, 2, 3,…, N} defined in a finite population U with means Y¯=(1/N)i=1Nyi and X¯=1Ni=1Nxi of the study as well as auxiliary variable respectively.

Also let Sy2=1N-1i=1Nyi-Y¯2 and Sx2=1N-1i=1Nxi-X¯2 be the population variances of the study and the auxiliary variable respectively and let Cy and Cx be the coefficient of variation of the study as well as auxiliary variable respectively, and ρyx be the correlation coefficient between x and y. Let y and x be the study and the auxiliary variable in the sample with corresponding values yi and xi respectively for the i-th unit i = {1, 2, 3…, n} in the sample with unbiased means y¯=1ni=1nyi and x¯=1ni=1nxi respectively.

Also let S^y2=1n-1i=1nyi-y¯2 and S^x2=1n-1i=1nxi-x¯2 be the corresponding sample variances of the study as well as auxiliary variable respectively. Let Syx=i=1Nyi-Y¯xi-X¯N-1,Syz=i=1Nyi-Y¯zi-Z¯N-1 and Sxz=i=1Nxi-X¯zi-Z¯N-1 be the co-variances between their respective subscripts respectively. Similarly byxn=S^yxS^x2 is the corresponding sample regression coefficient of y on x based on a sample of size n. Also Cy=SyY¯,Cx=SxX¯ and Cz=SzZ¯ are the coefficients of variations of the study and auxiliary variables respectively.

Also θ=1n-1N,θ1=1n-1N and θ2=1n-1n.

Some existing estimators

Let us consider a finite population U = {U1, U2, U3, …, UN} of size N units. To estimate the population mean Y¯ of the variable of interest say y taking values yi, in the existence of two auxiliary variables say x and z taking values xi and zi for the ith unit Ui. We assume that there is a high correlation between y and x as compared to the correlation between y and z, (i.e. ρyx > ρyz > 0). When the population X¯ of the auxiliary variable x is unknown, but information on the other cheaply auxiliary variable say z closely related to x but compared to x remotely to y, is available for all the units in a population. In such a situation we use a two phase sampling. In the two phase sampling scheme a large initial sample of size n′ (n′ < N) is drawn from the population U by using simple random sample without replacement sampling (SRSWOR) scheme and measure x and z to estimate X¯. In the second phase, we draw a sample (subsample) of size n from first phase sample of size n′, i.e. (n < n′) by using (SRSWOR) or directly from the population U and observed the study variable y.

The variance of the usual simple estimator t0=y¯=1ni=1nyi up to first order of approximation is, given by

Vt0=θSy2 1

The classical ratio and regression estimators in two-phase probability sampling and their mean square errors up to first order of approximation are, given by

t1=y¯x¯x¯ 2
MSEt1=Y¯2θCy2+θ2Cx2-2ρyxCyCx 3
t2=y¯+byxnx¯-x¯ 4
MSEt2=Sy2θ1-ρyx2+θ1ρyx2 5

Chand (1975), suggested the following chain ratio-type estimator the suggested estimator is, given by

t3=y¯x¯x¯z¯Z¯ 6

The mean square error of the suggested estimator is, given as

MSEt3=Y¯2θCy2+θ2Cx2-2ρyxCyCx+θ1Cz2-2ρyzCyCz 7

Khare et al. (2013), proposed a generalized chain ratio in regression estimator for population mean, the recommended estimator is given by

t4=y¯+byxx¯Z¯z¯α-x¯ 8

where α is the unknown constant, and the minimum mean square error at the optimum value of α=ρyzCxρyxcz is, given by

MSEt4=Y¯2Cy2θ-θ2ρyx2-θ1ρyz2 9

Recently Singh and Mahji (2014), suggested a chain-type exponential estimators for Y¯ given by

t5=y¯x¯x¯expZ¯-z¯Z¯+z¯ 10
t6=y¯+byxnx¯expZ¯-z¯Z¯+z¯-x¯ 11
t7=y¯expx¯-x¯x¯+x¯Z¯z¯ 12

The mean square errors of the suggested estimators, up to first order of approximation are, given as follows

MSEt5=Y¯2θCy2+θ2Cx2-2ρyxCyCx+θ14Cz2-4ρyzCyCz 13
MSEt6=Y¯2Cy2θ21-ρyx2+θ11+ρyx24Cz2Cx2-ρyxρyzCzCx 14
MSEt7=Y¯2θCy2+θ24Cx2-4ρyxCyCx+θ1Cz2-2ρyzCyCz 15

The proposed estimator

On the lines of Khare et al. (2013), we propose a difference-type estimator for population mean under two-phase sampling scheme using two auxiliary variables; the suggested estimator is, given by

tm=y¯+k1x¯Z¯z¯-x¯+k2Z¯x¯x¯-z¯ 16

where k1 and k2 are the unknown constants,

To obtain the properties of the proposed estimator we define the following relative error terms and their expectations.

Let e0=y¯-Y¯Y¯,e1=x¯-X¯X¯,e1=x¯-X¯X¯,e2=z¯-Z¯Z¯ and e2=z¯-Z¯Z¯, such that

Ee0=Eei=Eei=0,fori=1,2.Ee02=θCy2,Ee12=θCx2,Ee12=θ1Cx2,Ee1e1=θ1Cx2,Ee22=θCz2,Ee0e2=θ1Cyz,Ee0e1=θCyx,Ee0e1=θ1Cyx,Ee0e2=θCyz,Ee1e2=Ee1e2=θ1Cxz,Ee1e2=θCxz,Ee22=Ee2e2=θ1Cz2.

Rewriting (16), in terms of es, we have

tm=Y¯1+e0+k1X¯1+e11+e2-1-1+e1+k2Z¯1+e11+e1-1-1+e2

Expanding the right hand side of the above equation, and neglecting terms of es having power greater than two, we have

tm-Y¯=Y¯e0-k1X¯e1-e1+e2+e22+e1e2-k2Z¯e1-e1+e2-e12+e1e1 17

On squaring and taking expectation on both sides of Eq. (17), and keeping terms up to second order, we have

MSEtm=EY¯2e02+k12X¯2e12+e12+e22-2e1e1+2e1e2-2e1e2+k22Z¯2e12+e12+e22-2e1e1+2e1e2-2e1e2+2k1k2X¯Z¯e12+e12-2e1e1+e1e2-e1e2+e2e1-e12e1+e1e2-2k1Y¯X¯e0e1-e0e1+e0e2-2k2Y¯Z¯e0e1-e0e1+e0e2

Further simplifying, we get

MSEtm=Y¯2θCy2+k12X¯2θ1Cz2+θ2Cx2+k22Z¯2θCz2+θ2Cx2+2θ2Cxz+2k1k2X¯Z¯θ2Cx2+θ1Cz2+θ2Cxz-2k1Y¯X¯θ2Cyx+θ1Cyz-2k2Y¯Z¯θ2Cyx+θCyz 18

Now to find the minimum mean squared error of tm, we differentiate Eq. (18) with respect to k1 and k2 respectively and putting it equal to zero, that is

MSEtmk1=0andMSEtmk2=0
k1opt=Y¯BC-DEX¯AB-E2andk2opt=Y¯AD-CEZ¯AB-E2.

where A = θ1C2z + θ2C2x, B = θC2z + θ2C2x + 2θ2Cxz, C = θ2Cyx + θ1Cyz, D = θ2Cyx + θCyz and E = θ1C2z + θ2C2x + θ2Cxz.

On substituting the optimum values of k1 and k2 in Eq. (18) we get the minimum mean square error (MSE) of the proposed estimator tm up to order one is, given as

MSEtmmin=Y¯2θCy2-AD2+BC2-2CDEAB-E2 19

Efficiency comparison

In this section, we have compare the propose estimator with the other existing estimators.

  1. By Eqs. (1) and (19),

    MSEtmmin<MSEt0ifAD2+BC2-2CDEAB-E2>0.
  2. By Eqs. (3) and (19)

    MSEtmmin<MSEt1ifAD2+BC2-2CDEAB-E2+θ2Cx2-2ρyxCyCx>0.
  3. By Eqs. (5) and (19),

    MSEtmmin<MSEt2ifAD2+BC2-2CDEAB-E2-θ2Cy2ρyx2>0.
  4. By Eqs. (7) and (19),

    MSEtmmin<MSEt3ifθ2CxCx-2ρyxCy+θ1CzCz-2ρyzCy+AD2+BC2-2CDEAB-E2>0.
  5. By Eqs. (9) and (19),

    MSEtmmin<MSEt4ifAD2+BC2-2CDEAB-E2-θ2ρyx2+θ1ρyz2Cy2>0.
  6. By Eqs. (13) and (19),

    MSEtmmin<MSEt5ifθ2Cx2-2Cxy+θ14Cz2-4Cyz+AD2+BC2-2CDEAB-E2>0.
  7. By Eqs. (14) and (19),

    MSEtmmin<MSEt6ifθ1Cy2ρyx24Cz2Cx2-ρyxρyzCzCx-θ2ρyx2Cy2+AD2+BC2-2CDEAB-E2>0.
  8. By Eqs. (15) and (19),

    MSEtmmin<MSEt7ifAD2+BC2-2CDEAB-E2+θ24Cx2-4Cxy+θ1Cz2-2Cyz>0.

Numerical comparison

To examine the performance of the proposed estimator with various existing estimators, we have considered a real data set from the literature the description of the population are, given by

Population Source, (Cochran 1977).

y: Number of placebo children;

x: Number of paralytic polio cases in the placebo group;

z: Number of paralytic polio cases in the not inoculated group.

N=34,n=15,n=10,Y¯=4.92,X¯=2.59,Z¯=2.91,C2y = 1.0248, C2x = 1.5175, C2z = 1.1492, Cyx = 0.9136, Cyz = 0.6978, ρyx = 0.7326, ρyz = 0.6430, ρxz = 0.6837 (Table 1). We have use the following expression for Percentage Relative Efficiency (PRE)

Table 1.

The mean square errors (MSE’s) and the Percent relative efficiencies (PRE’s) of the estimators with respect to t 0

Population
Estimator MSE’s PRE (t 0,t j)
t 0 1.7525 100.00
t 1 1.5032 116.59
t 2 1.3073 134.06
t 3 1.2793 137.00
t4 0.9247 189.52
t 5 1.1312 154.92
t 6 1.0227 171.36
t 7 1.0982 159.58
t m 0.8206 213.56
PRE=Var(t0)MSEtjorVar(tj)100,forj=0,1,2,3,4,5,6,7andm.

Conclusion

From the above table, we have observed that the proposed estimator has smaller mean square error and has higher percent relative efficiency than the other existing estimators. However, although the proposed estimator has the highest percent relative efficiency than other existing estimators for this one example, it could have lower relative efficiency for other populations. Further work is needed before it can be recommended for general use in practical surveys.

Authors’ contributions

The authors contributed equally and significantly in writing this article. Both authors read and approved the final manuscript.

Acknowledgements

The authors are very thankful to the editor and the anonymous learned referees for their valuable suggestions regarding the improvement of the paper.

Competing interests

The authors declare that they have no competing interests.

Contributor Information

Mursala Khan, Email: mursala.khan@yahoo.com.

Abdullah Yahia Al-Hossain, Email: aalhossain@jazanu.edu.sa.

References

  1. Bahl S, Tuteja RK. Ratio and product type exponential estimator. Inf Optim Sci. 1991;12:159–163. [Google Scholar]
  2. Chand L (1975) Some ratio type estimator based on two or more auxiliary variables. Unpublished Ph.D. dissertation, Lowa State University, Ames, Lowa
  3. Cochran WG. Sampling techniques. New-York: Wiley; 1977. [Google Scholar]
  4. Hansen MH, Hurwitz WN. On the theory of sampling from finite populations. Ann Math Stat. 1943;14:333–362. doi: 10.1214/aoms/1177731356. [DOI] [Google Scholar]
  5. Khan M. Improvement in estimating the finite population mean under maximum and minimum values in double sampling scheme. J Stat Appl Probab Lett. 2015;2(2):1–7. doi: 10.1155/2015/248374. [DOI] [Google Scholar]
  6. Khan M. A ratio chain-type exponential estimator for finite population mean using double sampling. SpringerPlus. 2016;5:1–9. doi: 10.1186/s40064-015-1659-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Khare BB, Srivastava U, Kumar K. A generalized chain ratio in regression estimator for population mean using two auxiliary characters in sample survey. J Sci Res Banaras Hindu Univ Varanasi. 2013;57:147–153. [Google Scholar]
  8. Kiregyera B. A chain ratio-type estimator in finite population mean in double sampling using two auxiliary variables. Metrika. 1980;27:217–223. doi: 10.1007/BF01893599. [DOI] [Google Scholar]
  9. Kiregyera B. Regression-type estimator using two auxiliary variables and model of double sampling from finite populations. Metrika. 1984;31:215–223. doi: 10.1007/BF01915203. [DOI] [Google Scholar]
  10. Singh BK, Choudhury S. Exponential chain ratio and product-type estimators for finite population mean under double sampling scheme. Glob J Sci Front Res Math Decis Sci. 2012;12(6):2249–4626. [Google Scholar]
  11. Singh GN, Majhi D. Some chain-type exponential estimators of population mean in two-phase sampling. Stat Trans. 2014;15(2):221–230. [Google Scholar]
  12. Singh HP, Singh S, Kim JM. General families of chain ratio type estimators of the population mean with known coefficient of variation of the second auxiliary variable in two phase sampling. J Korean Stat Soc. 2006;35(4):377–395. [Google Scholar]
  13. Singh R, Chuhan P, Swan N. Families of estimators for estimating population mean using known correlation coefficient in two phase sampling. Stat Trans. 2007;8(1):89–96. [Google Scholar]
  14. Singh R, Chuhan P, Swan N, Smarandache F. Improved exponential estimator for population variance using two auxiliary variables. Ital J Pure Appl Math. 2011;28:101–108. [Google Scholar]
  15. Srivastava SK. A two phase estimator in sampling surveys. Austr J Stat. 1970;12:23–27. doi: 10.1111/j.1467-842X.1970.tb00109.x. [DOI] [Google Scholar]
  16. Srivastava SR, Khare BB, Srivastava SR. A generalized chain ratio estimator for mean of finite population. J Indian Soc Agric Stat. 1990;42(1):108–117. [Google Scholar]
  17. Sukhatme BV. Some ratio type estimators in two-phase sampling. J Am Stat Assoc. 1962;57:628–632. doi: 10.1080/01621459.1962.10500551. [DOI] [Google Scholar]

Articles from SpringerPlus are provided here courtesy of Springer-Verlag

RESOURCES