Abstract
Water shortage could play an imperative role in the future due to an influx of water demand when compared to water supplies. Inadequate water could damage human life and other aspects related to living. This serious issue can be prevented by estimating the demand for water to bridge the small gap between demand and supplies for water. Some water consumption data recorded daily may be missing and could affect the estimated value of water demand. In this article, new ratio estimators for estimating population total are proposed under unequal probability sampling without replacement when data are missing. Two situations are considered: known or unknown mean of an auxiliary variable and missing data are missing at random for both study and auxiliary variables. The variance and associated estimators of the proposed estimators are investigated under a reverse framework. The proposed estimators are applied to data from simulation studies and empirical data on water demand in Thailand which contain some missing values, to assess the efficacies of the estimators.
Keywords: Water demand, Population total, Unequal probability sampling without replacement, Taylor linearization approach, Nonlinear estimator, Logistic regression
Mathematical Subject Classification: 62D05, 62D10
Introduction
Increasing demand for water is highly concerning because of water supply reduction. There are many reasons that cause an increase in water demand such as the rapid growth of the human population, climate change, and so on. The world water resources consist of more water from the sea compared to available fresh water or rainwater. The amount of clean water is also affected by polluted water. Many developing countries face water scarcity and flooding issues due to climate change which can affect their sustainability in economics and lead to unsafe conditions and poor health of the population. Freshwater is used for a myriad of reasons such as household usage, business and industry, agriculture and much more. Thailand is one of the developing countries that mainly uses freshwater in agriculture which accounts for a majority of the usage of the world’s available freshwater. Metropolitan waterworks and provincial waterworks are organizations who are responsible for producing, delivering, and distributing water supply to all provinces in Thailand while also providing resources for water. The former is responsible for Bangkok, Nonthaburi, and Samut Prakan and the latter is responsible for the rest of the country. Some of the consumption of water data are missing in the database system which could lead to the wrong interpretation based on missing data. The missing data or nonresponse should be taken into consideration before processing for further analysis to make for a more powerful interpretation.
If this issue is not addressed, water shortage could lead to repercussions in the future and it would be harmful for human life because of a lack of clean water to use. The management of water resources to avoid facing water scarcity needs to be taken into consideration. Knowledge of the gap between the demand and supply of water could accommodate the strategies and policy planning for the world to be prepared for sustainable water management in order to provide sufficient water according to the demand. Estimating the water demand can benefit future planning to avoid water shortage. Bakker et al. (2014) investigated three models to forecast water demand in both cases with the model using weather input and not using it. The simulation results found that the model using weather input gave a maximum 11 percent of the errors which is essential in water supply system control and detecting irregularity. Huang & Lin (2017) proposed a system dynamics model for studying demand and supply for water resources to avoid water shortage in China. The model had been used to estimate demand and supply for Shandong in China for the next 15 years. Rainwater is also one of the sources of water usage. Boretti & Rosa (2019) examined the correlation between water demand, population growth, and economic growth to estimate water scarcity in the world by 2050. They found that the demand for water is growing even more than the growth of the population and economy along with a low quality of resources and water to use. Kaewprasert, Niwitpong & Niwitpong (2022) proposed confidence intervals for estimation of the mean of delta-gamma distribution using the Bayesian method and applied it to rainfall data in Chiang Mai Thailand.
The biased estimator, namely the ratio estimator is a popular method for estimating population total ( ) or population mean ( ) of a study variable ( ) when the information of an auxiliary variable ( ) exists and is highly positively correlated with . The ratio estimator was introduced by Cochran (1940) under simple random sampling without replacement (SRSWOR). The mean square error and bias of the ratio estimator are investigated by using the first order approximation of the Taylor linearization approach to transform the ratio estimator to a linear estimator. Then, the properties of the ratio estimator can be approximated from the linear estimator. Sisodia & Dwivedi (1981) proposed a ratio estimator when the population coefficient of variation of is known. The ratio estimator when the kurtosis is known was proposed by H. P. Singh & M. S. Kakran (1993, unpublished data). Upadhyaya & Singh (1999) suggested the ratio type estimators for estimating population mean when the and are known. Bhushan & Kumar (2022) suggested some classes of population mean estimators based on the optimum value of the constant to improve the efficiency of the estimators under ranked set sampling. The ratio estimators of Cochran (1940), Sisodia & Dwivedi (1981), H. P. Singh & M. S. Kakran (1993, unpublished data) and Upadhyaya & Singh (1999) require the population mean of , in order to estimate . Therefore, Perri (2004) proposed an alternative ratio estimator namely regression-in-ratio estimator for estimating . The estimator of Perri (2004) does not require by using the regression estimator to estimate this value. In other words, if the auxiliary variable is correlated with another auxiliary variable namely then can be estimated from by using a regression estimator to estimate this value. The Perri (2004) estimator is a function of two estimators consisting of estimators of and . In the context of unequal probability sampling without replacement (UPWOR), Bacanli & Kadilar (2008) modified the ratio estimators under SRSWOR by estimating the population mean of and under SRSWOR using the Horvitz & Thompson (1952) type estimators. The variance and associated estimators of Bacanli & Kadilar’s (2008) estimator can be obtained by using a Taylor linearization approach and method from Horvitz & Thompson (1952). Lawson (2021) suggested a general class of ratio estimators for population mean in the form of a combined estimator making use of known auxiliary variables such as the coefficient of variation, coefficient of skewness, coefficient of kurtosis and so on. The Lawson (2021) estimator performed well giving a smaller mean square error especially for a small sample size.
The ratio estimators in the full response case cannot be used to estimate population mean or population total of when some elements in the sample units are unresponsive. Cochran (1977) considered the ratio estimator under SRSWOR to estimate in which information on is available for all sample units and is known but some elements of in the sample units are missing.
Later, ratio estimators with their properties when nonresponse occurs in both and but is known under SRSWOR were proposed by Rao (1986, 1987), Khare & Srivastava (1997), Okafor & Lee (2000), Särndal & Lundström (2005). Kumar (2015), Lawson (2017) introduced estimators for estimating population total and population mean and their variance estimators under probability proportional to size with replacement sampling and nonresponse present in the study. The Lawson estimators are approximately unbiased estimators and they do not require the response propensity when the response probability is uniformly nonresponse, and the sampling fraction is small. Under UPWOR and when information on is available for all sample units when is known, Ponkaew & Lawson (2018) proposed a ratio estimator for population total of with a uniform nonresponse. The variance and associated estimators are also discussed under a reverse framework and when the sampling fraction is ignored. In the same year, Ponkaew (2018) proposed a linear generalized regression estimator (GREG) for population total when information about calibration variables exists. The estimator of Ponkaew (2018) is in a form of a nonlinear estimator then automated linearization approach was used to transform this estimator to a linear form. Consequently, the variance and their estimators can be approximated from linear estimators. The ratio estimators in the presence of nonresponse require the value of in both situations where nonresponse occurs with variables and and nonresponse occurs only with the variable . Ponkaew (2018) considered the missing completely at random (MCAR) mechanism which is unlikely to occur in practice. Lawson & Ponkaew (2019) suggested a new GREG estimator using the idea of Lawson (2017) under unequal probability sampling without replacement and nonresponse occurring missing completely at random and when the sampling fraction is small and therefore can be omitted. However, their estimator requires joint inclusion probability which sometimes can be difficult to find. Lawson & Siripanich (2020) improved a new GREG estimator based on the idea of Lawson & Ponkaew (2019) for more flexible situations with the non-uniform nonresponse mechanism or missing at random (MAR) and where the sampling fractions are both large and small. Ponkaew & Lawson (2023) proposed a new approximately unbiased GREG estimator in the form of a ratio estimator following Ponkaew & Lawson (2018) and Lawson & Ponkaew (2019) under the same situation where nonresponse occurs under MCAR but extended it when the sampling fractions are both large and small which is in a general form.
Some researchers suggested to estimate the missing values before further analysis. For example, Shahzad et al. (2020) proposed population mean estimators for when there are some missing observations in the study utilizing robust regression to apply to the regression coefficient estimator under SRSWOR when outliers are present in the study. They considered when nonresponse occurs in the study variable, and in both the study and auxiliary variable when the population mean of the auxiliary variable is known and unknown. Anas et al. (2022) also suggested ratio type regression estimators when nonresponse is present in the study in three situations similar to Shahzad et al. (2020) but the quantile regression in the mean estimator when outliers are present in the study was used. Chodjuntug & Lawson (2022a) suggested a new imputation method to create a population mean estimator when missing data appears in the study variable and applied it to estimate fine particulate matter in Bangkok, Thailand. They suggested to apply two constants to minimize the mean square error of the population mean estimator. Chodjuntug & Lawson (2022b) developed a new estimator by adjusting Chodjuntug & Lawson’s (2022a) by utilizing the response rate and the constant that minimizes the mean square error (MSE) of their proposed estimator. Their estimator using the constant that makes the minimum MSE performed the best. Bhushan et al. (2022) proposed some imputation methods for estimating population mean in the form of logarithmic imputations under SRSWOR for missing data.
In this article, we aim to propose new ratio estimators by extending the Ponkaew & Lawson (2018) estimator to situations where is known or unknown and nonresponse occurs with both variables and . In the situation where is unknown we used the concept from Perri (2004) to estimate its value from the calibration variables using the linear GREG estimator of Ponkaew (2018). The variance and associated estimators of the proposed estimators are investigated under the reverse framework. Furthermore, the proposed ratio estimators are considered under both missing at random (MAR) which is more flexible to occur in practice and also considered under MCAR nonresponse mechanism. Finally, we compared the efficiency of the proposed estimators and their variance estimators between the MAR and MCAR mechanisms through a simulation study and an application to water demand data in Thailand.
Materials and Methods
Basic setup
In this section, we introduce notations and basic notions about the population total estimator and their variance estimators under the reverse framework. Let be a study variable and the population total of is where and is the population size. Suppose the auxiliary variables , and the size variable are available and highly positively correlated with the study variable. The calibration variables where are also available and they are correlated with the auxiliary variable . Let, and be the matrix values of . We are using the GREG estimator model from Särndal, Swensson & Wretman (1992) and Särndal (2007) in which the linear assisting model , and . The linear assisting model is a model describing the relationship between the study variable and auxiliary variable. Let be determined by the linear assisting model that is . Usually, the standard choice of is and it is determined by the linear assisting model : and .
Let, be the set of all possible subsets of and the sample of size was selected from the population under UPWOR. A sampling design is a probability distribution on , i.e., for all and . Let, be the first order inclusion probability and be the second order inclusion probability. We also define and as the expectation and variance operators with respect to the UPWOR sampling design.
In the presence of nonresponse, let subscript and be the nonresponse mechanism and nonresponse indicator variable of which if unit responds to item otherwise . Let, be the vector of the response indicator and be the response probability under MAR nonresponse. Let, and be the expectation and variance operators with respect to the nonresponse mechanism. Three assumptions are defined; the response mechanism is uniform response. and as where or . We also consider three more conditions for investigating the estimator of as follows. nonresponse occurs only on , the information on is available for all and is known. nonresponse occurs on both and and is known and nonresponse occurs both with and and is unknown but information on are available for all and , are known.
Throughout this article, we consider variance estimation of the population total estimator in the presence of nonresponse under the reverse framework. Therefore, we discuss three steps to investigate the variance and its nonlinear estimator such as the ratio estimator when nonresponse occurs in the study variable. Assume that we have variables consisting of , the study variable and , auxiliary variables. Let be a nonlinear estimator and be defined by,
(1) |
where is a known smooth function, , if the variable exhibits nonresponse otherwise it can be obtained by Under the reverse framework, variance of
(2) |
where and . The formula of consists of three steps as below.
Step 1: Investigate a formula of . Since is in a form of a nonlinear estimator then can be approximated by,
(3) |
where is a linear estimator of under the Taylor linearization approach.
Step 2: Investigate the formula of . The formula of can be approximated by,
(4) |
where .
Step 3: Approximate the value of and its estimator. The value of can be obtained by,
(5) |
The estimator of can be obtained by substituting estimators for the unknown parameter in (5). Then, the estimator of is defined by,
(6) |
where , are the estimators of , respectively.
Existing estimators under uniform nonresponse
Uniform nonresponse or missing completely at random (MCAR) is a nonresponse mechanism in which the probability of response of the study variable neither depends on itself nor another variable such as or In this section, we discuss two estimators for estimating population total in the presence of uniform nonresponse namely ratio and GREG estimators proposed by Ponkaew & Lawson (2018) and Ponkaew (2018), respectively. The variance estimation of both ratio and GREG estimators are considered under the reverse framework and the sampling fraction is negligible with the UPWOR sampling design.
The ratio estimator
When nonresponse occurs only with but the population mean and its estimator of are available, Ponkaew & Lawson (2018) proposed ratio estimators to estimate population mean and the total of under unequal probability sampling without replacement and the nonresponse mechanism is MCAR. The Ponkaew & Lawson (2018) estimator for population mean is
(7) |
where , and . Ponkaew & Lawson’s (2018) estimator for population total is
(8) |
We note that, if is unknown the estimator of is equal to . The variance and associated estimators of the estimator in (8) is defined in (9),
(9) |
where . The estimator of is given in (10),
(10) |
where , , and .
The GREG estimator
The GREG estimators for estimating population mean or population total of the study variable is a powerful method when the calibration variables are present where are also available. In full response, Särndal, Swensson & Wretman (1992) and Särndal (2007) proposed a GREG estimator under the linear assisting model ,
(11) |
Let and be determined by the linear assisting model in (5.1) i.e., . In the presence of nonresponse, Särndal & Lundström (2005) proposed a linear GREG estimator to estimate population total. They investigated variance and associated estimators under the two-phase framework. Ponkaew (2018) proposed linear GREG estimators for estimating the population mean of under the MCAR mechanism which is defined by,
(12) |
where , and .
Then, the GREG estimator to estimate the population total of is
(13) |
where .
Under the reverse framework and when sampling fraction is negligible the variance of is
(14) |
where , and .
The estimator of is equal to.
(15) |
where , , , if is unknown otherwise .
Results and discussion
The proposed new ratio estimators
In the previous section, we introduced two estimators of the population total: ratio and GREG estimators in the presence of uniform nonresponse. The variance estimation for both ratio and GREG estimators are considered under the UPWOR sampling design and when the sampling fraction is negligible. However, the ratio estimators in (7) and (8) are considered under a situation where nonresponse occurs in only and they require the value of the population mean of . Then, in this section we aim to propose new ratio estimators when nonresponse occurs in both variables and . We also consider two distinct situations of that are known or unknown. In the situation where is unknown we estimate it from the calibration variables using the GREG estimator. In the context of nonresponse, we investigate the proposed ratio estimator under the MAR mechanism because it has weak assumptions and tends to occur in real life more often than the MCAR mechanism. However, we still consider new ratio estimators under the MCAR mechanism for comparing the efficiency of the proposed estimators. First of all, we extended the Ponkaew & Lawson (2018) estimators to the MAR mechanism. The ratio estimator of Ponkaew & Lawson (2018) for estimating population mean under the MAR mechanism is equal to,
(16) |
where , and .
Then, the ratio estimator for estimating population total under the MAR mechanism is
(17) |
Under the MAR mechanism if is unknown then it is estimated using the probit or logistic regression models. The variance and associated estimators of are discussed in Theorem 4.1.
Theorem 1. Under condition with the reverse framework and the nonresponse mechanism is MAR.
(1) The variance of is
where , and .
(2) The estimator of is
where , , , and .
Proof. Let be defined in (17). Therefore, variance of is
(18) |
Furthermore, the estimator of can be obtained by,
(19) |
Since is a nonlinear estimator then the variance of this estimator is equal to,
(20) |
where , .
Step 1: Investigate the formula of .
By using the Taylor linearization approach the linear estimator of is
(21) |
where . Then can be approximated by,
where and .
Therefore,
(22) |
Step 2: Investigate the formula of .
The formula of can be approximated by,
where .
Then,
(23) |
Step 3: Approximate the value of and its estimators.
The value of can be approximated by,
(24) |
The estimator of is
(25) |
Replace (25) into (18) then the variance of is
(26) |
Furthermore, the estimator of can be obtained by substituting (26) in (19) then,
(27) |
In (16) and (17), we extend the ratio estimators of Ponkaew & Lawson (2018) to the MAR mechanism and discussed the variance and its estimators in Theorem 1. However, the ratio estimator for population mean in (16) and for population total in (17) can be used under the condition that is, when nonresponse occurs only with the variable but information on for all and needs to be known. Next, we proposed new ratio estimators under condition where nonresponse occurs on both and but is known and condition nonresponse occurs both and and is unknown but information of are available for all and the population mean of are also known.
The new ratio estimator when is known
Assume that the condition is satisfied when nonresponse occurs with both variables and but is known. The new ratio estimator for estimating population mean is given below,
(28) |
where , , . Furthermore, the estimator of can be obtained by using the probit or logistic regression models under the MAR mechanism. Then, the new ratio estimator for the population total is
(29) |
The variance and associated estimators of are discussed in Theorem 2.
Theorem 2. Under condition with reverse framework and where the nonresponse mechanism is MAR.
(1) The variance of is
(30) |
where , and .
(2) The estimator of is
(31) |
where , , , , . The value of , if is known otherwise . is the estimator of from the probit or logistic regression models.
The proof in Theorem 2 is similar to the proof in Theorem 1.
In Theorem 2 we investigated the variance and its estimators of . We note that the variance formulas and are the same but the variance estimators of and are slightly different because the estimators of are different.
In (28) and (29) we proposed new ratio estimators for population mean and population total of the study variable when nonresponse occurs on both and variables but is known under the MAR mechanism. Furthermore, the variance and its estimator are also discussed in Theorem 2. Next, we proposed the special case of when the response probability is consider under the MCAR mechanism ( for all ). Under the MAR mechanism the population mean estimator is equal to
(32) |
where , , . Then, the population total estimator is
(33) |
Finally, the variance and associated estimators of y are discussed in Lemma 3.
Lemma 3. Under condition with reverse framework and where the nonresponse mechanism is MCAR.
(1) The variance of is
(34) |
where , and .
(2) The estimator of is
(35) |
where , , , , . The value of , if is known otherwise . is the estimator of under the MCAR mechanism that is .
The new ratio estimator when is unknown
Assume that the condition is satisfied, is unknown and nonresponse occurs on both and variables. However, the information of variable is available for all and is known. Furthermore, variables are highly correlated with . Then, we extended the GREG estimator of Ponkaew’s (2018) to the MAR mechanism and it is defined by
(36) |
where , , .
The new ratio estimator for population mean is
(37) |
Then, the new ratio estimator for population total is
(38) |
The variance and associated estimators of are discussed in Theorem 4.
Theorem 4. Under condition with reverse framework and nonresponse mechanism is MAR.
(1) The variance of is
(39) |
where , , and .
(2) The estimator of is
(40) |
where , , and , if is known otherwise . is the estimator of from the probit or logistic regression models.
Proof. Let be defined in (38). However, the new ratio estimator is a function of the GREG estimator then we use the modified automated linearization approach transform to a simple form and it is defined by
(41) |
where . Then, the new ratio estimator can be approximated by,
(42) |
Therefore, variance of can be approximated from,
(43) |
Furthermore, the estimator of can be obtained by,
(44) |
We note that is a nonlinear estimator then we use steps (1) to (5) for investigating the value of and it is defined by,
(45) |
where . Substitute (45) into (43) then,
(46) |
Furthermore, the estimator of is
(47) |
where , , and .
Next, we consider under the MCAR mechanism as follows. The new ratio estimator for population mean when is unknown and nonresponse occurs on both and variables under the MCAR mechanism is
(48) |
where , , , .
Then, the new ratio estimator for population mean is
(49) |
The variance and associated estimators of are discussed in Lemma 5.
Lemma 5. Under condition with a reverse framework and where the nonresponse mechanism is MCAR.
(1) The variance of is
(50) |
where , , and .
(2) The estimator of is
(51) |
where , , and . The value of is defined in (35).
Simulation studies
In this section, the performance of the proposed new ratio estimators and their variance estimators under the MAR mechanism is compared with the MCAR mechanism via simulation studies. We generated a study variable from the auxiliary variables , , size variable and calibration variable following the model from Sichera (2020) and it is defined by where , , , , , . Four levels of sample sizes and 1,200 are drawn from a population size and and 1,200 are drawn from a population size using Midzuno’s (1952) scheme. We consider the MAR response mechanism with two levels of response rate; 60% and 80% and repeated the simulation 10,000 times ( ) using Program R (R Core Team, 2021). We consider the case where the response probability is unknown and estimated by the logistic regression model for the MAR mechanism and estimated by the function for the MCAR mechanism. The relative root mean square error ( ) was used to compare the efficiency of the proposed ratio estimators and their variance estimators and the formula is
where is the proposed estimators or variance estimators and is expectation of or . The results are shown in Tables 1 and 2.
Table 1. The relative root mean square error of the new ratio estimators and associated variance estimators for N = 3,000.
Response rate (%) | n | The relative root mean square error of the proposed estimators | The relative root mean square error of the variance estimators | ||||||
---|---|---|---|---|---|---|---|---|---|
is known | is known | is unknown | is unknown | ||||||
MAR | MCAR | MAR | MCAR | MAR | MCAR | MAR | MCAR | ||
60 | 100 | 0.0470 | 0.0472 | 0.0465 | 0.0467 | 0.1702 | 0.1703 | 0.2763 | 0.3201 |
200 | 0.0350 | 0.0351 | 0.0354 | 0.0361 | 0.1321 | 0.1316 | 0.1761 | 0.2069 | |
600 | 0.0322 | 0.0324 | 0.0330 | 0.0340 | 0.1150 | 0.1158 | 0.1408 | 0.1658 | |
1,200 | 0.0104 | 0.0105 | 0.0105 | 0.0107 | 0.0427 | 0.0697 | 0.0512 | 0.0804 | |
80 | 100 | 0.0364 | 0.0373 | 0.0366 | 0.0375 | 0.1453 | 0.1490 | 0.1930 | 0.2526 |
200 | 0.0258 | 0.0261 | 0.0266 | 0.0269 | 0.1127 | 0.1141 | 0.1855 | 0.2048 | |
600 | 0.0134 | 0.0139 | 0.0146 | 0.0150 | 0.0588 | 0.0614 | 0.1108 | 0.1149 | |
1,200 | 0.0086 | 0.0089 | 0.0088 | 0.0090 | 0.0433 | 0.0552 | 0.0660 | 0.0875 |
Table 2. The relative root mean square error of the new ratio estimators and associated variance estimators with population size N = 300.
Response rate (%) | n | The relative root mean square error of the proposed estimators | The relative root mean square error of the variance estimators | ||||||
---|---|---|---|---|---|---|---|---|---|
is known | is unknown | is known | is unknown | ||||||
MAR | MCAR | MAR | MCAR | MAR | MCAR | MAR | MCAR | ||
60 | 10 | 0.1166 | 0.1196 | 0.1169 | 0.1198 | 0.6605 | 0.6902 | 2.2661 | 2.7669 |
20 | 0.1038 | 0.1056 | 0.1049 | 0.1067 | 0.5423 | 0.5452 | 1.3307 | 1.3510 | |
60 | 0.0624 | 0.0626 | 0.0631 | 0.0634 | 0.2420 | 0.2489 | 0.3666 | 0.4646 | |
120 | 0.0478 | 0.0479 | 0.0483 | 0.0485 | 0.1548 | 0.1829 | 0.2500 | 0.2634 | |
80 | 10 | 0.0949 | 0.0962 | 0.0957 | 0.0968 | 0.4347 | 0.4533 | 2.2106 | 2.5923 |
20 | 0.0863 | 0.0869 | 0.0869 | 0.0871 | 0.3722 | 0.3736 | 1.3123 | 1.3319 | |
60 | 0.0480 | 0.0484 | 0.0481 | 0.0485 | 0.2084 | 0.2150 | 0.3628 | 0.4144 | |
120 | 0.0298 | 0.0301 | 0.0298 | 0.0303 | 0.1392 | 0.1514 | 0.2164 | 0.2325 |
The simulation results found in Table 1 for that the new population total estimator under missing at random performed better than the estimators under missing completely at random for both situations where is either known or unknown. There was an increase of response rate, decrease of the relative root mean square errors as same as for the sample sizes for all estimators. When is unknown and needs to be estimated, it results in increasing the relative root mean square errors due to the estimation process. Similar results were found in the case of variance estimators. Similar results are found in Table 2 for a smaller sample size
An application to water demand in Thailand
The new estimators are applied to estimate the water demand in Thailand. The data are from the provincial waterworks during August and July 2022. Midzuno’s (1952) scheme is instigated to select a sample of size 40 provinces from the total of 74 provinces. The demand for water in August 2022 is considered as study variable . Two auxiliary variables and are the water supply in August and the water demand in July 2022, respectively. The variable is used to construct the new ratio estimators and the variable is used to estimate the response probabilities with the logistic regression model under the MAR mechanism. The calibration variable is the water supply in July 2022 and the size variable is the number of water users in August 2022. The nonresponse rate is 7.5% in this study.
Table 3 shows the total estimate of water demand in August 2022, Thailand. We see that the estimated water demand when is known is higher than when is unknown under both the MAR and MCAR nonresponse mechanisms. In contrast, the estimates of variance when is unknown is a lot higher than the estimates of variance when is known due to the estimation of the unknown population mean of the auxiliary variable. The new estimators can be useful for application to the real world when nonresponse occurs in the study which requires management before the estimation process and further analysis.
Table 3. The total estimates of water demand in August 2022.
Nonresponse mechanism | Information on the auxiliary variable | Estimated water demand | Variance estimates |
---|---|---|---|
MAR | is known | 122,763,533 | 5,621,837,076,813 |
is unknown | 112,079,391 | 49,945,263,902,570 | |
MCAR | is known | 122,752,240 | 5,958,276,721,564 |
is unknown | 111,926,154 | 52,711,445,906,451 |
Figure 1 shows the conclusion for all the cases of the simulation studies in an empirical study.
Conclusions
The new ratio estimators for estimating population total and population mean when missing data is missing at random occurs with both study and auxiliary variables under UPWOR when the population mean of an auxiliary variable is known and unknown are proposed. In the latter we suggested to estimate it from other variables using the GREG estimator. The new ratio estimators are compared by their efficacies under the MAR and MCAR nonresponse mechanisms through simulation studies and an empirical study using water demand data in Thailand. The results found that the new ratio estimators under the MAR mechanism are more efficient than ratio estimators under the MCAR mechanism for all response rates and sample sizes. The proposed estimators are applied to estimate the demand for water so this information can be used to plan for policies and strategies for preventing water shortages which may occur in the future. The proposed estimators are more useful in practice when compared to the estimators proposed by Ponkaew & Lawson (2018) that considered only under MCAR and when only the study variable is missing which also required the known parameter of the population mean of the auxiliary variable which is difficult to find. The proposed estimators are more flexible to apply in real life because we can use them in more flexible situations when both the nonresponse mechanism is uniform or not uniform which is more likely to occur in real world problems. If the population mean of the auxiliary variable is unknown, it can be estimated using the GREG estimator which makes use of the benefit of the related variables in the estimation process to improve the efficiency of the population total estimators. We can extend the new estimator to complex survey designs such as stratified cluster sampling and consider it under the not missing at random nonresponse mechanism (NMAR).
Supplemental Information
Acknowledgments
Thank you to all the referees for their valuable comments which help to improve the article.
Funding Statement
This research was funded by the National Science, Research and Innovation Fund (NSRF), and King Mongkut’s University of Technology North Bangkok with contract no. KMUTNB-FF-65-28. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Additional Information and Declarations
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
Chugiat Ponkaew performed the experiments, analyzed the data, prepared figures and/or tables, and approved the final draft.
Nuanpan Lawson conceived and designed the experiments, performed the experiments, authored or reviewed drafts of the article, and approved the final draft.
Data Availability
The following information was supplied regarding data availability:
The data and codes for the simulation studies and an application to water data are available in the Supplemental Files.
References
- Anas et al. (2022).Anas MM, Huang Z, Shahzad U, Zaman T, Shahzadi S. Compromised imputation based mean estimators using robust quantile regression. Communications in Statistics – Simulation and Computation. 2022 doi: 10.1080/03610926.2022.2108057. [DOI] [Google Scholar]
- Bacanli & Kadilar (2008).Bacanli S, Kadilar C. Ratio estimators with unequal probability designs. Pakistan Journal of Statistics. 2008;24(3):167–172. [Google Scholar]
- Bakker et al. (2014).Bakker M, van Duist H, van Schagen K, Vreeburg J, Rietveld L. Improving the performance of water demand forecasting models by using weather input. Procedia Engineering. 2014;70:93–102. doi: 10.1016/j.proeng.2014.02.012. [DOI] [Google Scholar]
- Bhushan & Kumar (2022).Bhushan S, Kumar A. On optimal classes of estimators under ranked set sampling. Communications in Statistics – Theory and Methods. 2022;51(8):2610–2639. doi: 10.1080/03610926.2020.1777431. [DOI] [Google Scholar]
- Bhushan et al. (2022).Bhushan S, Kumar A, Pandey AP, Singh S. Estimation of population mean in presence of missing data under simple random sampling. Communications in Statistics – Simulation and computation. 2022 doi: 10.1080/03610918.2021.2006713. [DOI] [Google Scholar]
- Boretti & Rosa (2019).Boretti A, Rosa L. Reassessing the projections of the world water development report. npj Clean Water. 2019;2(15):1–6. doi: 10.1038/s41545-019-0039-9. [DOI] [Google Scholar]
- Chodjuntug & Lawson (2022a).Chodjuntug K, Lawson N. Imputation for estimating the population mean in the presence of nonresponse, with application to fine particle density in Bangkok. Mathematical Population Studies. 2022a;29(4):204–225. doi: 10.1080/08898480.2021.1997466. [DOI] [Google Scholar]
- Chodjuntug & Lawson (2022b).Chodjuntug K, Lawson N. The chain regression exponential type imputation method for mean estimation in the presence of missing data. Songklanakarin Journal of Science and Technology. 2022b;44(4):1109–1118. [Google Scholar]
- Cochran (1940).Cochran WG. The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce. The Journal of Agricultural Science. 1940;30(2):262–275. doi: 10.1017/S0021859600048012. [DOI] [Google Scholar]
- Cochran (1977).Cochran WG. Sampling Techniques (3rd ed.) New York: John Wiley & Sons; 1977. [Google Scholar]
- Horvitz & Thompson (1952).Horvitz DF, Thompson DJ. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association. 1952;47(260):663–685. doi: 10.1080/01621459.1952.10483446. [DOI] [Google Scholar]
- Huang & Lin (2017).Huang L, Lin L. Supply and demand analysis of water resources based on system dynamics model. Journal of Engineering and Technological Sciences. 2017;49(6):705–720. doi: 10.5614/j.eng.technol.sci.2017.49.6.1. [DOI] [Google Scholar]
- Kaewprasert, Niwitpong & Niwitpong (2022).Kaewprasert T, Niwitpong S-A, Niwitpong S. Bayesian estimation for the mean of delta—gamma distributions with application to rainfall data in Thailand. PeerJ. 2022;10(1):e13465. doi: 10.7717/peerj.13465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khare & Srivastava (1997).Khare BB, Srivastava S. Transformed ratio type estimators for the population mean in the presence of non-response. Communications in Statistics – Theory and Methods. 1997;26(7):1779–1791. doi: 10.1080/03610929708832012. [DOI] [Google Scholar]
- Kumar (2015).Kumar S. Efficient use of auxiliary information in estimating the population ratio, product and mean in presence of non-response. Journal of Advanced Computing. 2015;4:68–87. [Google Scholar]
- Lawson (2017).Lawson N. Variance estimation in the presence of nonresponse under probability proportional to size sampling. Proceeding of the 6th Annual International Conference on Computational Mathematics, Computational Geometry and Statistics 2017 (CMCGS 2017) 6th-7th March 2017; Singapore: 2017. [Google Scholar]
- Lawson (2021).Lawson N. An alternative family of combined estimators for estimating population mean in finite populations. Lobachevskii Journal of Mathematics. 2021;42(13):3150–3157. doi: 10.1134/S1995080222010115. [DOI] [Google Scholar]
- Lawson & Ponkaew (2019).Lawson N, Ponkaew C. New generalized regression estimator in the presence of nonresponse under unequal probability sampling. Communications in Statistics – Theory and Methods. 2019;48(10):2483–2498. doi: 10.1080/03610926.2018.1465091. [DOI] [Google Scholar]
- Lawson & Siripanich (2020).Lawson N, Siripanich P. A new generalized regression estimator and variance estimation for unequal probability sampling without replacement for missing data. Communications in Statistics – Theory and Methods. 2020;51(18):6296–6318. doi: 10.1080/03610926.2020.1860224. [DOI] [Google Scholar]
- Midzuno (1952).Midzuno H. On sampling system with probability proportional to sum of sizes. Annals of the Institute of Statistical Mathematics. 1952;3(1):99–107. [Google Scholar]
- Okafor & Lee (2000).Okafor FC, Lee H. Double sampling for ratio and regression estimation with sub sampling the non-respondents. Survey Methodology. 2000;26:183–188. [Google Scholar]
- Perri (2004).Perri P. On the efficient use of regression-in-ratio estimator in simple random sampling. Atti della XLIII Riunione Scientifica della Società Italiana di Statistica; 2004. pp. 537–540. [Google Scholar]
- Ponkaew (2018).Ponkaew C. Estimation for population total in the presence of nonresponse. 2018. PhD Dissertation, King Mongkut’s University of Technology North Bangkok, Thailand.
- Ponkaew & Lawson (2018).Ponkaew C, Lawson N. A new ratio estimator for population total in the presence of nonresponse under unequal probability sampling without replacement. Thai Journal of Mathematics; Special Issue (2018): Asian Conference on Fixed Point Theory and Optimization. 2018;2018(417):429. [Google Scholar]
- Ponkaew & Lawson (2023).Ponkaew C, Lawson N. New generalized regression estimators using a ratio method and its variance estimation for unequal probability sampling without replacement in the presence of nonresponse. Current Applied Science and Technology. 2023;23(2):1–7. doi: 10.55003/cast.2022.02.23.007. [DOI] [Google Scholar]
- R Core Team (2021).R Core Team R: a language and environment for statistical computing. 2021. https://www.R-project.org/ https://www.R-project.org/ R Foundation for Statistical Computing, Vienna, Austria.
- Rao (1986).Rao PSRS. Ratio estimation with subsampling the nonrespondents. Survey Methodology. 1986;12:217–230. [Google Scholar]
- Rao (1987).Rao PSRS. Paper Presented at a Special Contributed Session of the International Statistical Association Meeting, Sept., 216. 1987. Ratio and regression estimates with sub-sampling the non-respondents. Tokyo, Japan. [Google Scholar]
- Shahzad et al. (2020).Shahzad U, Hanif M, Sajjad I, Anas MM. Quantile regression-ratio-type estimators for mean estimation under complete and partial auxiliary information. Scientia Iranica. 2020;29(3):1705–1715. doi: 10.24200/sci.2020.54423.3744. [DOI] [Google Scholar]
- Sichera (2020).Sichera R. Approximate inclusion probabilities for survey sampling. 2020. https://CRAN.R-project.org/package=jipApprox https://CRAN.R-project.org/package=jipApprox R package version 0.1.2.
- Sisodia & Dwivedi (1981).Sisodia B, Dwivedi V. Modified ratio estimator using coefficient of variation of auxiliary variable. Journal-Indian Society of Agricultural Statistics. 1981;33:13–18. [Google Scholar]
- Särndal (2007).Särndal CE. The calibration approach in survey theory and practice. Survey Methodology. 2007;33(2):99–119. [Google Scholar]
- Särndal & Lundström (2005).Särndal CE, Lundström S. Estimation in surveys with nonresponse. New York: John Wiley & Sons; 2005. [Google Scholar]
- Särndal, Swensson & Wretman (1992).Särndal CE, Swensson B, Wretman J. Model assisted survey sampling. New York: Springer-Verlag; 1992. [Google Scholar]
- Upadhyaya & Singh (1999).Upadhyaya LN, Singh HP. Use of transformed auxiliary variable in estimating the finite population mean. Biometrical Journal. 1999;41(5):627–636. doi: 10.1002/(SICI)1521-4036(199909)41:5<627::AID-BIMJ627>3.0.CO;2-W. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The following information was supplied regarding data availability:
The data and codes for the simulation studies and an application to water data are available in the Supplemental Files.