Abstract
In modern age of information technology, data is available everywhere in huge amount. Every sector generates lot of data every day. The investigation of each unit of data is not feasible due to limited resources like time, labor, and cost. In such situations, survey sampling is recommended to draw the information about the population parameters. Therefore, the main objective of present study is to develop an estimation method for obtaining the information about population parameter. We propose an optimum estimator for enhanced estimation of population mean in simple random sampling by utilizing the information of the two auxiliary attribute. The expression for bias, mean squared error (MSE) and minimum mean squared error of the proposed estimator are derived up to the first order of approximation and it is shown that the proposed estimator under derived conditions perform better than the existing estimators theoretically. Four population are demonstrated to assess the performance as well as applicability of the proposed estimator. The percentage relative efficiency (PRE) of proposed estimator for all the populations is 209.533, 163.852, 210.398 and 340.578, respectively. The numerical illustrations confirm that the proposed estimator dominates over the existing estimators.
-
•
The main objective of present study is to propose a new estimator/method for estimation of population mean using two auxiliary attributes under simple random sampling.
-
•
The bias and mean square error of the proposed estimator/method is derived and compared with the existing estimators to compare the efficiency theoretically.
-
•
Applications of the proposed method/estimator is highlighted using thorough the real data sets of various sectors.
Keywords: Finite population, Auxiliary attribute, Simple random sampling, Bias, Mean squared error
Graphical abstract
Specifications table
Subject Area: | Mathematical Statistics |
More specific subject area: | Sample Surveys |
Method name: | Optimum estimator for estimating population mean using two auxiliary attributes |
Name and references of original method: | The proposed method is motivated by following references:
|
Resource availability: | Data utilized in the analysis is available in public domain. |
Background
In sample surveys, it is well documented in Cochran [20] that the use of supplementary information provided by auxiliary variables or attributes is frequently used for increasing the precision of the estimators by taking the advantages of correlation between the study variable and auxiliary variable. Regression, ratio, and product estimators are good examples in this perspective. Though Cochran [1] investigated that ratio estimator is best suited when the study and auxiliary variable are highly positively correlated whereas in case of highly negative correlated variables, product method of estimation is better. In many real-life situations, the study variable is not always quantitative in nature. The responses recorded from respondents are qualitative in such situations the recorded information is called attributes. Several studies like Shabbir and Gupta [22] and Abd-Elfattah et al. [23] have been conducted to improve the precision of the estimator by utilizing the auxiliary attributes having applications in agriculture, health science, fisheries, power engineering etc. It is also investigated that use of more than one auxiliary attribute enhance the efficiency of the estimator. Singh et al. [2] used auxiliary attribute information for establishment of ration estimators in simple random sampling (SRS). The bias and mean square error of the estimator has been computed for existing data set available in literature. It is developed as a modified estimator of Koyuncu and Kadilar [21] estimator and proved that it outperformed the existing estimators. Singh and Kumar [4] used auxiliary information to estimate improved regression estimator for SRS. Here the auxiliary information is qualitative, and concept of non-response incorporated in estimation. Malik and Singh [3] initiated the use of two auxiliary variables in estimation of population mean and proposed enhanced estimators. Here, auxiliary information available in qualitative form and this estimator performed better than simple regression estimator. Ekpenyong and Enang [5] suggested better exponential estimators in SRS for estimating population mean. The concept of simple random sampling without replacement used for development of estimator. Lu [6] explored the applications of estimators developed under auxiliary information in agriculture and power engineering sectors. The estimator compared with regression estimator and other existing estimators and proved efficient. Zaman and Kadilar [7] utilized auxiliary information for development of a novel family of exponential estimators. Ahmad et al. [8] carried out the generalization of exponential ratio estimators under auxiliary estimators. The estimator was exponential-based estimator while estimator proposed by us is mixed type estimator while estimator proposed in present study is a mixture of simple, ratio and product estimators. Mahajan et al. [9] explored the applications of estimation and sample surveys in agriculture and health sciences. Kumar and Saini [10] suggested a predictive approach to estimate the population mean under auxiliary attribute. Yunusa et al. [11] utilized auxiliary variables in development of regression type estimators to estimation the population mean. Rather et al. [24] used auxiliary information for development of a mixed exponential ratio type estimator for estimating the population mean. The simple random sampling and double sampling techniques utilized for selection of the sample. Zaman et al. [25] proposed an exponential type estimator for assessing and estimating the COVID-19 risk in various countries. Two multivariate families of exponential type estimators proposed by utilizing the information on two auxiliary variables. Many authors like Wayangkau et al. [18], Waheeb et al. [15], Rajak [16], and Jabal et al. [17] discussed some other data analysis techniques for analysis of agricultural information. The above cited literature motivates to explore the applicability of two auxiliary attributes in estimation of mean of various populations associated with agriculture, fisheries, and education sectors. The primary goal of this paper is to propose a novel optimum estimator for estimating finite population mean using auxiliary attributes. The expressions for the bias and mean square error (MSE) of the proposed estimator are inferred up to the first order of approximation. On the bases of theoretical and numerical comparisons, we demonstrate that the proposed estimator is more efficient than existing estimators.
Material and methods
Consider be a finite population of size N. we draw a sample of size n (with n<N) units from using simple random sample without replacement (SRSWOR). Let yi be the study variable and ẟi be the characteristics of the auxiliary attributes i.e. ẟi = 1 if the ith unit possess attribute and ẟi = 0, otherwise. Let be the total number of units in the population possessing attributes ẟi and be the total number of units in the sample possessing attributes ẟi. Let be the proportion of units in the population and be the population of units in the sample. Let and be the population and sample mean of the study variable. Let D1 and D2 be the population proportion of the auxiliary attributes and sample proportion of auxiliary attributes is denoted by d1 and d2. Let be the population variance of the study variable y. Let and respectively be the population variance of the auxiliary attributes d1 and d2. Let be the coefficient of variation of the study variable y. Let and be the coefficient of variation of the auxiliary d1 and d2. Let be the population covariance between the study variable y and the auxiliary attributes Let be the population covariance between the auxiliary attributes d1 and d2. Let be the population point bi-serial correlation coefficient between the study variable y and the auxiliary attribute Let ; and be the error terms such that , ,andwhere and .
The stepwise framework of proposed method is described as follows:
Step 1: Consider a finite population of size N.
Step 2: Select a random sample of size n from the population using simple random sampling without replacement.
Step 3: Observe yi and ẟi from sampling units.
Step 4: Define the expressions for population and sample characteristics.
Step 5: Propose the estimator for estimating population mean using two auxiliary attributes and derive its properties.
Step 6: Compare the proposed estimator with the existing estimators theoretically and numerically.
Existing estimators
Unbiased estimator ()
The most widely used estimator discussed by Cochran [1] of population mean of the study variable, is given by
(1) |
sample mean is an unbiased estimator of population mean and upto the first order of approximation the variance or MSE is given by
(2) |
Naik and Gupta ()
Naik and Gupta [12] proposed the following ratio estimator of population mean when the population proportion of auxiliary attribute is known
(3) |
The bias and MSE of to the first order of approximation is given by,
and
(4) |
Naik and Gupta ()
Naik and Gupta [12] proposed the following product estimator of population mean when the population proportion of auxiliary attribute is known
(5) |
The bias and MSE of , to the first order of approximation is given by
and
(6) |
Singh et al. ()
Singh et al. [2] suggested an exponential type ratio estimator for estimating population mean using the population proportion of auxiliary attribute is known
(7) |
The bias and MSE of , to the first order of approximation,
and
(8) |
Singh et al. ()
Singh et al. [2] suggested the following product estimators for estimating population mean when the population proportion of auxiliary attribute is known
(9) |
Similarly, the bias and MSE of , is given by
and
(10) |
Kumar and Bhougal ()
Kumar and Bhougal [13] proposed an exponential type of ratio-product estimator for estimating population mean when the population proportion of auxiliary attribute is known
(11) |
where is unknown constant.
The bias and MSE of to the first order of approximation is given by
and
(12) |
The optimum value of is given by
Singh and Kumar ()
Singh and Kumar [4] suggested ratio estimator for estimating population mean when the population proportion and of auxiliary attribute are known
(13) |
The bias and MSE of to the first order of approximation are given by
and
(14) |
Singh and Kumar ()
Singh and Kumar [4] suggested product estimator for estimating population mean when the population proportion and of auxiliary attribute are known
(15) |
The bias and MSE of to the first order of approximation are given by
(16) |
Ahmed et al. ()
Ahmed et al. [8] proposed a generalized class of factor type of estimators for estimating population mean when the population proportion and of auxiliary attribute are known is given by
(17) |
where,
, , , , , , and .
The bias and MSE of to the first order of approximation are given by
where and .
The minimum MSE of is given by
(18) |
where
,
The optimum value of and are
Proposed estimator & its properties
Motivated by Singh and Espejo [14] and Lu [6], adopting the same procedure we proposed as estimator for estimating population mean by utilizing two auxiliary attributes as
(19) |
To obtain the bias and MSE of proposed estimator, applying the error approximation into Eq. (17), in terms of ’s, the proposed estimator can be expressed as
(20) |
Expanding the right-hand side of [20], we get
(21) |
In Eq. (21) we will neglect the terms of ’s, power having greater than two, we get
(22) |
To get the bias of the proposed estimator, we need to take expectation on both the sides of (22), hence we will get the bias of the proposed estimator up to the first order of approximation (Fig. 1).
Fig. 1.
Graphical representation of estimators w.r.t to their MSE.
The bias of the proposed estimator is given by
Now for deriving the MSE of the proposed estimator, let us square both sides of (22), neglecting terms ’s having power greater than two, and taking expectation on both sides, after simplification we get,
(23) |
To obtain the optimum value of , we partially differentiate the Eq. (23) with respect to and put it equal to zero, the optimum values of are given by
The minimum MSE of can be shown as,
(24) |
where and .
Theoretical comparison of estimators
In this section we compare our proposed estimator with the exiting estimators discussed in Section 2, we will have the conditions as follows:
From Eqs. (18) and (24)
Results and discussion
To examine the dominance and applicability of proposed estimator in simple random sampling, we considered four real data sets from agricultural, fishers and education sectors available in Ahmad et al. [8]. A similar kind of population for reference is given as Appendix A.
Population 1. (Source from education sector)
Let y represent the number of instructors, the total number of primary and secondary school students in Turkey in 2007, which was larger than 11,440.5, and the total number of primary and secondary school students in Turkey in 2008, which was greater than 333.1647. The population information about data set is given as:
Population 2. (Source from fishers’ sector)
Let y represent the expected catch by recreational marine fishermen in 1995, represent the percentage of fish captured larger than 1000 in 1993, and represent the percentage of fish caught greater than 2000 in 1994. The population information about data set is given as:
Population 3. (Source from agricultural sector)
For the 47 districts of Pakistan, let y represent the tobacco area production in hectares for the year 2009, represent the percentage of farms with tobacco cultivation areas greater than 500 ha for the year 2007, and represent the percentage of farms with tobacco cultivation areas greater than 800 ha for the year 2008. The population information about data set is given as:
Population 4. (Source from agricultural sector)
Let y represent the amount of cotton produced in hectares in 2009, represent the percentage of farms with cotton cultivation areas greater than 37 ha in 2007, and represent the percentage of farms with cotton cultivation areas greater than 35 ha in 2008 for 52 districts in Pakistan. The population information about data set is given as:
We compute the MSE values of existing estimates and proposed estimator using Eqs. (2), (4), (6), (8), (10), (12), (14), (16), (18) and (24) and these values are shown in Table 1.
Table 1.
MSE values estimators.
Population | ||||
---|---|---|---|---|
Estimator | ||||
2515.170 | 2,115,180.565 | 435,363.127 | 362.664 | |
1665.169 | 1,787,076.238 | 352,906.039 | 175.164 | |
9050.198 | 3,274,574.672 | 736,929.821 | 1135.734 | |
1379.541 | 1,847,217.179 | 366,745.882 | 195.717 | |
2536.028 | 1,295,483.198 | 279,378.887 | 338.001 | |
1315.997 | 1,782,466.925 | 351,230.425 | 165.699 | |
5151.870 | 2,045,097.266 | 480,329.895 | 466.088 | |
10,154.721 | 4,937,999.787 | 870,994.957 | 1306.264 | |
1275.439 | 1,496,047.750 | 340,313.804 | 163.048 | |
1200.370 | 1,290,907.767 | 206,924.055 | 106.485 |
Note: Bold number indicate the lowest MSE.
To measure the Percentage Relative Efficiency (PRE), we apply the following formula:
where t=and
The numerical comparison of PRE for existing and proposed estimator is shown in Table 2.
Table 2.
Percentage Relative Efficiency (PRE) of estimators.
Population | ||||
---|---|---|---|---|
Estimator | ||||
100 | 100 | 100 | 100 | |
151.046 | 118.360 | 123.365 | 207.043 | |
27.791 | 64.594 | 59.078 | 31.932 | |
182.319 | 114.506 | 118.710 | 185.300 | |
99.178 | 163.273 | 155.833 | 107.297 | |
191.123 | 118.666 | 123.954 | 218.870 | |
48.821 | 103.427 | 90.638 | 77.810 | |
24.768 | 42.835 | 49.985 | 27.763 | |
163.048 | 141.384 | 127.929 | 222.428 | |
209.533 | 163.852 | 210.398 | 340.578 |
Note: Bold number indicate the higest PRE.
The proposed estimator is compared with the eight estimators as shown in Section 2 and results shown numerically and graphically. It is observed from Table 1, that proposed estimator attains the minimum mean square error 1200.370, 1,290,907.767, 206,924.055 and 106.485 for populations , respectively. While from Table 2, it is revealed that PRE value of the proposed estimator for all four populations () is 209.533, 163.852, 210.398 and 340.578, respectively. The PRE of proposed estimator is highest in comparison to the existing estimators as well as all the estimators given in Ahmed et al. [8]. It is observed from Figs. 1 and 2 that proposed estimator is more efficient for population mean estimation in comparison to the existing estimators. In theory of sample surveys, the major objective of any statistician or investigator is to minimize the MSE and maximize the PRE in the estimation to ideally draw an inference about the study population. The proposed estimator is validated with the help of minimum MSE and maximum PRE. It attains the minimum MSE and maximum PRE in comparison of the other existing estimators as shown graphically for all four populations.
Fig. 2.
Graphical representation of PRE of estimatotrs w.r.t to .
Conclusion
In this article, we proposed an optimum estimator for estimation of population mean in simple random sampling by utilizing the information of the two auxiliary attribute. Mathematical expressions of the proposed estimator i.e. bias, mean squared error (MSE) and minimum mean squared error are derived. Mean square error of proposed estimator is shown in Section 3. The proposed estimator is efficient under the derived conditions discussed in Section 4. The implementation of proposed estimator helps in estimation of précised value of the population mean on the basis of which effective decisions can be made. It is recommended that the proposed estimator may be used in sample surveys in the areas of education, agriculture fisheries and health sciences, etc. Further, the proposed estimator can be extended to utilization of multi auxiliary information under various sampling designs.
CRediT authorship contribution statement
Monika Saini: Conceptualization, Methodology, Validation, Writing – review & editing. Bhatt Ravi Jitendrakumar: Investigation, Resources, Writing – original draft. Ashish Kumar: Formal analysis, Methodology, Writing – review & editing.
Declaration of Competing Interest
There is no conflict of interest among the authors.
Submission Type: Direct Submission
Appendix A
Hypothetical Population: (Source: Singh and Choudhary (20), pp. 177)
Let y be the area under wheat crop (in acres) during the year 1974, x1 be the area under wheat (in acres) 1971 and x2 be the area under wheat (in acres) in 1973.
Hypothetical Population: (Source: Singh and Choudhary [19], pp. 177) | |||
---|---|---|---|
S.N. | Area under wheat (in acres) |
||
(1971) | (1973) | (1974) | |
(x1) | (x2) | (y) | |
1 | 401 | 70 | 50 |
2 | 630 | 163 | 149 |
3 | 1194 | 320 | 284 |
4 | 1170 | 440 | 381 |
5 | 1065 | 250 | 278 |
6 | 827 | 125 | 111 |
7 | 1737 | 558 | 634 |
8 | 1060 | 254 | 278 |
9 | 360 | 101 | 112 |
10 | 946 | 359 | 355 |
11 | 4170 | 109 | 99 |
12 | 1625 | 481 | 498 |
13 | 827 | 125 | 111 |
14 | 96 | 5 | 6 |
15 | 1304 | 427 | 339 |
16 | 377 | 78 | 80 |
17 | 259 | 75 | 105 |
18 | 186 | 45 | 27 |
19 | 1767 | 564 | 515 |
20 | 604 | 238 | 249 |
21 | 700 | 92 | 85 |
22 | 524 | 247 | 221 |
23 | 571 | 134 | 133 |
24 | 962 | 131 | 144 |
25 | 407 | 129 | 103 |
26 | 715 | 190 | 175 |
27 | 845 | 363 | 335 |
28 | 1016 | 235 | 219 |
29 | 184 | 73 | 62 |
30 | 282 | 62 | 79 |
31 | 194 | 71 | 60 |
32 | 439 | 137 | 100 |
33 | 854 | 196 | 141 |
34 | 820 | 255 | 263 |
The population information about data set is given as:
ẟ1 = 1 if farms under the wheat crop which have more than 500 acres land during the year 1971, otherwise ẟ1 = 0.
ẟ2 = 1 if farms under wheat crop which have more than 100 acres land during the year 1973, otherwise ẟ2 = 0.
be the proportion of farms under the wheat crop which have more than 500 acres land during the year 1971 and be the proportion of farms under wheat crop which have more than 100 acres land during the year 1973.
Descriptive Statistics:
Data availability
Data is available in public domain.
References
- 1.Cochran W.G. The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce. J. Agric. Sci. 1940;30(2):262–275. [Google Scholar]
- 2.Singh R., Chauhan P., Sawan N., Smarandache F. Ratio estimators in simple random sampling using information on auxiliary attribute. Auxiliary Information and a priori Values in Construction of Improved Estimators. Infinite Study. 2007;1:7–17. [Google Scholar]
- 3.Malik S., Singh R. An improved estimator using two auxiliary attributes. Appl. Math. Comput. 2013;219(23):10983–10986. [Google Scholar]
- 4.Singh H.P., Kumar S. A regression approach to the estimation of the finite population mean in the presence of non-response. Aust. N. Z. J. Stat. 2008;50(4):395–408. [Google Scholar]
- 5.Ekpenyong E.J., Enang E.I. Efficient exponential ratio estimator for estimating the population mean in simple random sampling. Hacet. J. Math. Stat. 2015;44(3):689–705. [Google Scholar]
- 6.Lu J. Efficient estimator of a finite population mean using two auxiliary variables and numerical application in agricultural, biomedical, and power engineering. Math. Probl. Eng. 2017:1–7. doi: 10.1155/2017/8704734. [DOI] [Google Scholar]
- 7.Zaman T., Kadilar C. Novel family of exponential estimators using information of auxiliary attribute. J. Stat. Manag. Syst. 2019;22(8):1499–1509. [Google Scholar]
- 8.Ahmad S., Arslan M., Khan A., Shabbir J. A generalized exponential-type estimator for population mean using auxiliary attributes. PLoS ONE. 2021;16(5) doi: 10.1371/journal.pone.0246947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Mahajan J., Banal K., Mahajan S. Estimation of crop production using machine learning techniques: a case study of J&K. Int. J. Inf. Technol. 2021;13(4):1441–1448. [Google Scholar]
- 10.Kumar A., Saini M. A predictive approach for finite population mean when auxiliary variables are attributes. Thail. Stat. 2022;20(3):575–584. [Google Scholar]
- 11.Yunusa M.A., Gidado A., Ahijjo Y.M., Danbaba A., Abdulazeez S.A., Audu A. Modified classes of regression-type estimators of population mean in the presences of auxiliary attribute. Asian Res. J. Math. 2022;18(1):65–89. [Google Scholar]
- 12.Naik V.D., Gupta P.C. A note on estimation of mean with known population proportion of an auxiliary character. J. Ind. Soc. Agric. Stat. 1996;48(2):151–158. [Google Scholar]
- 13.Kumar S., Bhougal S. Estimation of the population mean in presence of non-response. Commun. Stat. Appl. Methods. 2011;18(4):537–548. [Google Scholar]
- 14.Singh H.P., Espejo M.R. On linear regression and ratio–product estimation of a finite population mean. J. R. Stat. Soc. Ser. D. 2003;52(1):59–67. (The Statistician) [Google Scholar]
- 15.Waheeb, R., Wheib, K., Andersen, B., & Alsuhili, R. (2022). The prospective of Artificial Neural Network (ANN's) model application to ameliorate management of post disaster engineering projects. Available at SSRN 4180813. doi: 10.2139/ssrn.4180813. [DOI]
- 16.Rajak A.A. Emerging technological methods for effective farming by cloud computing and IoT. Emerg. Sci. J. 2022;6(5):1017–1031. [Google Scholar]
- 17.Jabal Z.K., Khayyun T.S., Alwan I.A. Impact of climate change on crops productivity using MODIS-NDVI time series. Civ. Eng. J. 2022;8(06):1136–1155. [Google Scholar]
- 18.Wayangkau I.H., Mekiuw Y., Rachmat R., Suwarjono S., Hariyanto H. Utilization of IoT for soil moisture and temperature monitoring system for onion growth. Emerg. Sci. J. 2020;4:102–115. [Google Scholar]
- 19.Singh D., Chaudhary F.S. John Wiley & Sons; 1986. Theory and Analysis of Sample Survey Designs. [Google Scholar]
- 20.Cochran W.G. John Wiley & Sons; 2007. Sampling Techniques. [Google Scholar]
- 21.Koyuncu N., Kadilar C. Family of estimators of population mean using two auxiliary variables in stratified random sampling. Commun. Stat. Theory Methods. 2009;38(14):2398–2417. [Google Scholar]
- 22.Shabbir J., Gupta S. Estimation of the finite population mean in two phase sampling when auxiliary variables are attributes. Hacet. J. Math. Stat. 2010;39(1):121–129. [Google Scholar]
- 23.Abd-Elfattah A.M., El-Sherpieny E.A., Mohamed S.M., Abdou O.F. Improvement in estimating the population mean in simple random sampling using information on auxiliary attribute. Appl. Math. Comput. 2010;215(12):4198–4202. [Google Scholar]
- 24.Rather, K.U.I., Jeelani, M.I., Rizvi, S.E.H., & Sharma, M. (2022). A new exponential ratio type estimator for the population means using information on auxiliary attribute. J. Stat. Appl. Pro. Lett. 9, No. 1, 31-42 (2022).
- 25.Zaman T., Sagir M., Şahin M. A new exponential estimators for analysis of COVID-19 risk. Concurr. Comput. Pract. Exp. 2022;34(10):e6806. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data is available in public domain.