Review Highlights
-
•
We present MAGWGPRS, a new model that is a combination and development of count response MARS and spatial regression.
-
•
Selecting the optimal bandwidth using an adaptive Gaussian kernel function based on the CV method.
-
•
Grouping districts/cities in Java Indonesia based on inter-regional disaggregating variables in modeling DHF cases.
Keywords: DHF, GCV, Generalized Poisson, GWGPR, MARS, MAGPRS, MAGWGPRS, MSE, Weighted-MLE
Method name: Multivariate Adaptive Geographically Weighted Generalized Poisson Regression Splines (MAGWGPRS)
Abstract
This article constructs a new model based on multivariate adaptive generalized Poisson regression splines (MAGPRS) and geographically weighted generalized Poisson regression (GWGPR), which is known as multivariate adaptive geographically weighted generalized Poisson regression splines (MAGWGPRS). The article elaborates the steps of weighted maximum likelihood estimation (weighted-MLE) to obtain the estimated values of its parameters. MAGWGPRS and MAGPRS were applied to the number of dengue hemorrhagic fever (DHF) cases in 119 districts or cities in Java, Indonesia, in 2020, to compare their performance. The fitted value plot versus actual data and a comparison of the mean square error (MSE) value demonstrate the goodness of the two models. The best MAGWGPRS model for each location was obtained, and only one the best MAGPRS model for all locations was acquired. Based on the plot results of the fitted value with the actual data and MSE value, MAGWGPRS is determined to be superior to MAGPRS.
Graphical abstract
Specifications table
| Subject area: | Mathematics and Statistics |
| More specific subject area: | Statistics: Nonparametric Regression, Spatial Regression |
| Name of your method: | Multivariate Adaptive Geographically Weighted Generalized Poisson Regression Splines (MAGWGPRS) |
| Name and reference of original method: | Original Method Multivariate adaptive regression splines (MARS) Geographically weighted generalized Poisson regression (GWGPR) Reference A.P. Ampulembang, B.W. Otok, A.T. Rumiati, Budiasih, Bi-responses nonparametric regression model using MARS and its properties, Appl. Math. Sci. 9 (2015) 1417–1427. https://doi.org/10.12988/ams.2015.5127. S. Hidayati, B.W. Otok, Purhadi, Parameter Estimation and Statistical Test in Multivariate Adaptive Generalized Poisson Regression Splines, IOP Conf. Ser. Mater. Sci. Eng. 546 (2019) 1–11. https://doi.org/10.1088/1757–899X/546/5/052051. S.W. Tyas, L.A. Puspitasari, MethodsX Geographically weighted generalized poisson regression model with the best kernel function in the case of the number of postpartum maternal mortality in east java, MethodsX. 10 (2023) 102,002. https://doi.org/10.1016/j.mex.2023.102002. |
| Resource availability: | Dengue hemorrhagic fever (DHF) cases (y) and the predictors (x) from Badan Pusat Statistik (BPS) in each province in Java, Indonesia |
Method details
Introduction
Multivariate adaptive regression splines (MARS) was first introduced by Friedman in 1991 [1]. Some studies that demonstrate the advantages of MARS compared to other regression models incorporate [2], [3], [4], [5]. Most MARS are developed for continuous [6], [7], [8], [9] and categorical responses [10], [11], [12], [13], however, MARS for count responses are still scarce and limited. MARS with count response is the result of combining MARS and Poisson regression, first discussed by Yasmirullah et al. (2021), which is MAPRS [14]. This model discusses the weighted least square (WLS) method for estimating model parameters. Hidayati et al. (2019) and Otok et al. (2019) combined MAPRS and generalized Poisson regression (GPR) into MAGPRS to solve the equidispersion problem [15,16].
Previous research has not considered the spatial variation precipitated by geographical location in the study area. In fact, there are many problems associated with spatial variation, such as a relationship between the response and predictor that varies depending on geographic location, and parameters that are not constant across the study area [17]. The spatial variation occurs because each location has different characteristics such as geographical, cultural, and socioeconomic differences. As a result, the same predictor variable can have different effects at different locations. One of the statistical techniques used to overcome spatial variation is geographically weighted Poisson regression (GWPR) by Collins (2010) and Nakaya et al. (2005) [17,18]. To overcome cases of overdispersion or underdispersion, Adryanta et al. (2019), Sabtika et al. (2021), and Tyas et al. (2023) developed the GWPR model into GWGPR [19], [20], [21].
The similarity between MAGPRS and GWGPR is that both are localized regressions. The basis functions of MAGPRS are local series which are employed to model complex (non-linear) relationships. As a result, the global model of MAGPRS is a weighted sum of the local models [1,22]. The locality of GWGPR is due to differences in characteristics between observation areas (spatial non-stationarity). As a result, each location has a spatial weight, and the parameter estimates of the resulting regression model will differ depending on its geographical location (latitude and longitude coordinates). The MAGPRS model is extended in this article by considering spatial variation between observation areas, a modification of MAGPRS [15,16] and GWGPR [19], [20], [21]. This development model is significantly different from the previous one. The structure of the model has changed significantly. The modified model causes the basis function and its parameters to be localized at the same time. As a result, the model form, algorithm, and analysis are more complex than in the previous model.
The performance of the proposed model was implemented to the number of DHF cases in Java, Indonesia in 2020. The research units are districts or cities in Java, Indonesia. DHF is an infectious disease caused by the dengue virus and transmitted by the bite of Aedes aegypti or albopicus mosquitoes. The host (human), the agent (virus), and the environment are all essential factors in the growth and spread of DHF, and they differ by region. As a result, DHF is spatial data related to geographical location [23], [24], [25], [26], [27], [28]. The spatial weight matrix is generated using an adaptive Gaussian kernel function, and the optimal bandwidth is determined using the cross-validation (CV) method. Finally, the best MAGWGPRS model is examined using the generalized cross-validation (GCV) method.
Model specifications and estimation procedures
MARS model
In summary, the MARS model can be elaborated as follows. Given for each response variable , and predictor variables , where is the number of observations, and suppose that the estimated function satisfies the regression model
| (1) |
where is error random with mean 0 and variance .
Assume the function in Eq. (1) is a linear combination of the basis functions :
| (2) |
with is the coefficient of the parent basis function, is the coefficient of the non-constant basis function. Each basis function is defined as a truncated spline function:
| (3) |
where , , the value of ,
If , then and if , then
The model of Eq. (1) where is a function in Eq. (2) is identified the MARS model [1,9,29]. Here, is the degree of interaction, is the sign of the basis function at the interaction and the basis function, and is the predictor variable, in which v is the index of the predictor variable associated with the interaction and the basis function at the observation, is the knot value at the interaction and the basis function of the predictor variable .
The MARS algorithm comprises of forward and backward stepwise [1,9,29]. Forward stepwise construct the MARS model by adding truncated spline basis functions (knots and interactions) until the model has the maximum number of basis functions. Following the completion of the forward stepwise process, a backward stepwise is performed to determine the number of feasible basis functions in the model. The basis function that contributes the least to the estimated response value based on the minimum GCV value is eliminated backward stepwise.
-
1.Forward Stepwise:
-
a.Determine the parent basis function, i.e., .
-
b.Suppose there are basis functions, i.e., , then add a new pair of basis functions:
(4) (5)
-
a.
The basis function in Eq. (4) and Eq. (5) is the parent basis function, which is a member of the set of basis functions existing before the addition of the new basis function pair, is a predictor variable that does not exist in the parent basis function. The addition of new basis function pairs in the forward stepwise is carried out until the maximum number of basis functions is reached and is selected based on the minimum MSE.
-
2.Backward Stepwise. It is conducted after the maximum number of basis functions is obtained by forward stepwise.
-
a.Selecting the forward basis functions one by one, with the exception of the parent basis function, and removing it if the GCV value decreases when the basis function is deleted.
-
b.The deletion process is repeated until the GCV value does not decrease despite the fact that the remaining basis function is discarded. The optimal basis function is the basis function that remains after this backward stepwise procedure.
-
a.
Friedman (1991) employed the GCV method in a backward stepwise procedure to select the optimal basis function in the MARS algorithm. The GCV method was developed from [30,31]
| (6) |
with
| (7) |
where is a complex function, is the number of parameters to be estimated, is the degree of interaction, with the optimum value within the interval , and is the estimated value of the response variable on the basis function M and the observation.
MAGPRS model
Given a Generalized Poisson (GP) probability function from [[32], [33], [34]]:
| (8) |
where is the mean of an event and is the dispersion parameter.
If the response variable in Eq. (1) is GP distributed with the probability function given in Eq. (8), then the MAGPRS model:
| (9) |
with
| (10) |
GWGPR model
If , , and is the coordinate point with is latitude and is the longitude at location, then the GWGPR model can be written as follows [19,20].
| (11) |
with
is the intercept parameter atlocation and are the parameters model for each predictor variable at location.
MAGWGPRS model
Model Eq. (10) has global parameters but local basis functions, resulting in a single regression model for all observations (locations). As a result, we would like to create a model Eq. (10) that is location-dependent, particularly regarding MAGWGPRS, so that the new model has different parameters for each location. The geographical location at location, denoted , is defined as in GWGPR section.
Given , , then the MAGWGPRS model can be formed as follows
| (12) |
with
with is the parameter of the parent basis function at location and is the parameter of the non-constant basis function at location.
Parameter estimation of the magwgprs model
The weighted MLE method estimates the MAGWGPRS model parameters at each location by assigning geographic weights. This method maximizes the log-likelihood function by solving a gradient function equal to zero [35]. The following theorem is provided to estimate the parameters of the MAGWGPRS model.
Theorem 1
Given the MAGWGPRS model inEq. (12). When we use the weighted MLE method to calculate the model parameters, we obtain the parameter estimator equation, which is not closed form, i.e.,
(13)
and
| (14) |
Proof of Theorem 1. Based on the probability function in Eq. (8), the likelihood function for MAGWGPRS model
| (15) |
The natural logarithm function of Eq. (15) is
| (16) |
To estimate the parameters of the MAGWGPRS model at location requires spatial weighting. It utilizes distance information from one location to another. Let represent the location in which the local parameter estimates are generated (i.e., regression points) and represent the location in which the data has been observed (i.e., observation points). The spatial weight,, represents the weight assigned to the observation in the calibration of the MAGWGPRS model for the location. This research employs the Gaussian kernel adaptive weighting function in [36], which is defined as where is the Euclidean distance between the observation and the regression point and is the bandwidth at the regression point.
According to Nakaya et al. (2005), the bandwidth governs the rate at which the datum weight decreases as the distance between the observation location and the regression point increases. If the bandwidth value is very small, the variance increases, thus, the number of observations within radius will be small. If the bandwidth value is tremendously large (close to infinity), the variance decreases and the resulting weight between locations approaches 1, hence the estimated parameters will be homogeneous, and the spatial model will be similar to the global regression model [18]. The CV method is employed in [36] to obtain the optimum bandwidth:
where is the estimated value of the observation for bandwidth except at the location.
Next, input into Eq. (16) based on [37]:
| (17) |
Eq. (17) is derived partially for each parameter and equalized to zero using the weighted MLE method. The description is as follow
The first partial derivative of Eq. (17) with respect to gives
hence
| (18) |
Similarly, the first partial derivative of Eq. (17) with respect to gives
| (19) |
Eq. (18) and Eq. (19) are not closed form, then Theorem 1 is proven. □
Since Eq. (18) and Eq. (19) are not closed-form, they are solved by numerical methods, namely the Berndt Hall Hausman (BHHH) method. The advantages of the BHHH method compared to other numerical methods for parameter estimation include its robustness to misspecification of the underlying distribution of the data, its simplicity, its convergence properties, and its computational efficiency. Suppose the parameters in Eq. (18) and Eq. (19) are written in the form of , then the BHHH iteration process stops when . The BHHH iteration equation is
where
Next, calculate the modified GCV MARS for the combination of basis function (BF), maximum interaction (MI), and minimum observation (MO). Finally, select the best MAGWGPRS model based on the minimum value of the modified GCV MARS.
Steps in the research
Stages of research analysis:
-
1.
Data exploration.
-
2.
Equidispersion test of response variables.
-
3.
Spatial heterogeneity test with Breusch-Pagan method.
-
4.
Calculate the Euclidean distance between locations.
-
5.
Determine the optimum bandwidth with the CV method.
-
6.
Calculate the spatial weight matrix of the adaptive Gaussian kernel function.
-
7.
Calculate the GCV of MAGWGPRS (# BF, MI, MO).
-
8.
Select the minimum GCV value.
-
9.
Obtain the best MAGWGPRS model.
Model application
The MAGWGPRS model was implemented to data on DHF cases in 119 districts or cities in Java, Indonesia. For each district or city in 2020, data was obtained from Badan Pusat Statistik (BPS) [38], [39], [40], [41], [42], [43]. The research variables consisted of one response variable and six predictor variables. The response variable is the number of DHF cases in 119 districts/cities. The predictor variables: is population density (ha/person), is the percentage of households that possess access to proper sanitation, is the percentage of households which own access to proper drinking water sources, is the percentage of poor population, is the ratio of medical personnel, and is the ratio of health centers.
The map below depicts the distribution of DHF cases in Java, Indonesia. According to Fig. 1, the highest number of DHF cases were discovered in West Java Province (18 out of 27 districts/cities) and DKI Jakarta (4 out of 6 districts/cities).
Fig. 1.
The distribution of DHF cases in districts/cities, Java Indonesia.
To begin the analysis with MAGWGPRS, there are two steps performed. First, conduct an equidispersion test. According to [44], if the quotient of Pearson Chi-Square or deviance with free degrees is equal to one, the data is said to be equidispersion or , overdispersion if , and underdispersion for other than that. In this research data, the deviance value is 34,807.71 with 112 free degrees, so the dispersion value is 310.783. This indicates that the data has overdispersion. Then, it is performed a spatial heterogeneity test. This test is employed to assess whether there are differences in characteristics between locations. The Breusch-Pagan (BP) test statistic is one that can be utilized. The BP value and p-value in this study are 19.996 and 0.002773, respectively. Because the p-value is less than 0.05, it can be concluded that spatial heterogeneity exists. As a result, the MAGWGPRS model has been validated for predicting the spread of DHF in 119 districts/cities in Java, Indonesia.
Next, estimate the parameters of the MAGWGPRS model for each location. The first step is to determine the optimum bandwidth, BF, MI, and MO. Based on the spatial weights corresponding to the location and the combination of BF, MI, and MO, 36 MAGWGPRS models are generated for each location. Then, for each location, select the best model from the 36 available models based on the smallest GCV or the largest value. The obtained results are 119 best MAGWGPRS models. Finally, the predictor variables are categorized as inter-regional disaggregates, that is, predictor variables that affect the model, as illustrated by Table 1.
Table 1.
Grouping of districts/cities based on inter-regional disaggregating variables in the MAGWGPRS model.
| Variables | Code of districts/cities1 | Total of districts/cities |
| 1–26, 28–36, 40, 44–47, 49, 52, 59–62, 67, 68, 74–77, 86, 92–95, 98, 100, 103, 109, 112–119 | 69 | |
| 27, 37–39, 57, 58, 78, 81, 87–90, 96, 97, 99, 101, 104–108, 110, 111 | 23 | |
| 41, 48, 51, 53–56, 63, 65, 66, 69–73, 79, 80, 83–85, 102 | 21 | |
| 42, 43, 64 | 3 | |
| 50, 82 | 2 | |
| 91 | 1 |
the districts/city code in Appendix A.
Table 1 demonstrates six groups of districts /cities based on the disaggregating variables between regions. District/city locations are symbolized by nonnegative integer codes, which can be perceived in Appendix A. For example, the groups of districts/cities where DHF cases are influenced by are districts/cities 1–26, 28–36, 40, 44–47, 49, 52, 59–62, 67, 68, 74–77, 86, 92–95, 98, 100, 103, 109, 112–119. Districts in the same group therefore have the same MAGWGPRS model structure in the inter-regional disaggregating variables.
Fig. 2 depicts the visual grouping of districts/cities based on the separating variables between regions.
Fig. 2.
Grouping of districts/cities based on inter-regional disaggregating variables in the MAGWGPRS model.
Fig. 2 depicts six groups of districts with adjacent areas that have similar characteristics. The first district Nganjuk (code 91), where population density and the percentage of households with access to safe drinking water have no effect on the number of DHF cases. In two districts (codes 50 and 82), the percentage of households with access to a safe drinking water source has no effect on the number of DHF cases.
As an illustration, to obtain the MAGWGPRS model, we provide examples of Central Jakarta and Surabaya cities. The possible MAGWGPRS models for Central Jakarta and Surabaya cities in accordance with a combination of BF, MI, and MO are presented in Table 2. The best model parameter estimates are presented in Table 3 and Table 4, respectively.
Table 2.
Combination of BF, MI, and MO models of MAGWGPRS in Central Jakarta and Surabaya cites.
| Central Jakarta City |
Surabaya City |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| BF | MI | MO | GCV | R2 | BF | MI | MO | GCV | R2 |
| 12 | 1 | 0 | 2.040 | 0.425 | 12 | 1 | 0 | 2.514 | 0.191 |
| 12 | 1 | 1 | 2.201 | 0.380 | 12 | 1 | 1 | 2.418 | 0.222 |
| 12 | 1 | 2 | 2.207 | 0.378 | 12 | 1 | 2 | 2.152 | 0.308 |
| 12 | 1 | 3 | 0.073 | 0.416 | 12 | 1 | 3 | 2.214 | 0.288 |
| 12 | 2 | 0 | 1.608 | 0.547 | 12 | 2 | 0 | 2.073 | 0.333 |
| 12 | 2 | 1 | 1.909 | 0.462 | 12 | 2 | 1 | 1.598 | 0.486 |
| 12 | 2 | 2 | 1.941 | 0.453 | 12 | 2 | 2 | 1.861 | 0.401 |
| 12 | 2 | 3 | 1.281 | 0.639 | 12 | 2 | 3 | 2.276 | 0.268 |
| 12 | 3 | 0 | 1.608 | 0.547 | 12 | 3 | 0 | 1.847 | 0.406 |
| 12 | 3 | 1 | 1.909 | 0.462 | 12 | 3 | 1 | 1.328 | 0.573 |
| 12 | 3 | 2 | 1.878 | 0.471 | 12 | 3 | 2 | 1.720 | 0.447 |
| 12 | 3 | 3 | 1.281 | 0.639 | 12 | 3 | 3 | 1.720 | 0.447 |
| 18 | 1 | 0 | 1.612 | 0.546 | 18 | 1 | 0 | 2.432 | 0.218 |
| 18 | 1 | 1 | 1.265 | 0.644 | 18 | 1 | 1 | 1.928 | 0.380 |
| 18 | 1 | 2 | 1.535 | 0.568 | 18 | 1 | 2 | 2.087 | 0.329 |
| 18 | 1 | 3 | 1.622 | 0.543 | 18 | 1 | 3 | 2.104 | 0.323 |
| 18 | 2 | 0 | 1.214 | 0.658 | 18 | 2 | 0 | 1.995 | 0.358 |
| 18 | 2 | 1 | 1.500 | 0.577 | 18 | 2 | 1 | 1.352 | 0.565 |
| 18 | 2 | 2 | 1.521 | 0.572 | 18 | 2 | 2 | 1.705 | 0.452 |
| 18 | 2 | 3 | 1.130 | 0.682 | 18 | 2 | 3 | 2.024 | 0.349 |
| 18 | 3 | 0 | 1.015 | 0.714 | 18 | 3 | 0 | 1.728 | 0.444 |
| 18 | 3 | 1 | 1.500 | 0.577 | 18 | 3 | 1 | 0.842 | 0.729 |
| 18 | 3 | 2 | 1.551 | 0.563 | 18 | 3 | 2 | 1.198 | 0.615 |
| 18 | 3 | 3 | 1.074 | 0.697 | 18 | 3 | 3 | 1.392 | 0.552 |
| 24 | 1 | 0 | 1.506 | 0.576 | 24 | 1 | 0 | 2.318 | 0.254 |
| 24 | 1 | 1 | 1.080 | 0.696 | 24 | 1 | 1 | 1.557 | 0.499 |
| 24 | 1 | 2 | 1.229 | 0.654 | 24 | 1 | 2 | 1.824 | 0.413 |
| 24 | 1 | 3 | 1.361 | 0.617 | 24 | 1 | 3 | 1.970 | 0.366 |
| 24 | 2 | 0 | 0.842 | 0.763 | 24 | 2 | 0 | 1.903 | 0.388 |
| 24 | 2 | 1 | 1.391 | 0.608 | 24 | 2 | 1 | 1.187 | 0.618 |
| 24 | 2 | 2 | 1.348 | 0.620 | 24 | 2 | 2 | 1.577 | 0.493 |
| 24 | 2 | 3 | 1.016 | 0.714 | 24 | 2 | 3 | 1.712 | 0.449 |
| 24 | 3 | 0 | 0.883 | 0.751 | 24 | 3 | 0 | 1.466 | 0.528 |
| 24 | 3 | 1 | 1.218 | 0.657 | 24* | 3* | 1* | 0.751* | 0.759* |
| 24 | 3 | 2 | 1.302 | 0.633 | 24 | 3 | 2 | 0.984 | 0.684 |
| 24* | 3* | 3* | 0.823* | 0.768* | 24 | 3 | 3 | 1.121 | 0.640 |
The best model.
Table 3.
Parameter estimation for the best model of Central Jakarta City.
| Coefficient | Estimation | Std. Error | T value | Pr(>|t|) |
|---|---|---|---|---|
| Intercept | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ |
Significant at 0.05.
Table 4.
Parameter estimation for the best model of Surabaya City.
| Coefficient | Estimation | Std. Error | T value | Pr(>|t|) |
|---|---|---|---|---|
| Intercept | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ | |||
| bx_magwgprs[,−1] | ⁎⁎ |
Significant at 0.05.
Based on Table 2, the best model of Central Jakarta city is discovered in the 36th model with BF, MI, MO, GCV, and R2 are 24, 3, 3, 0.823, and 0.768 successively. Table 3 implies the parameter estimation of the best model for Central Jakarta with the inter-region disaggregating variables x1, x2, x3, x4, x5, and x6. For Surabaya city, the best model is the 34th model with BF, MI, MO, GCV, and R2 are 24, 3, 1, 0.751, and 0.759 respectively. From Table 4, the separating variables between regions that possess an effect are x1, x2, x3, x5, and x6, while x4 has no effect. As a result, all predictor variables have an impact on the number of DHF cases in Central Jakarta but not in Surabaya. The following are the best MAGWGPRS models for Central Jakarta and Surabaya respectively.
| (20) |
where
and
| (21) |
where
| (22) |
Furthermore, the MAGPRS model was compared to the performance of the best MAGWGPRS model for Central Jakarta city, Surabaya city, and 117 other districts/cities. Comparisons were made between the actual number of DHF cases and the fitted values of the best MAGWGPRS and MAGPRS models. The MSE value of each model may also be employed to compare the goodness of the two models. Fig. 3 illustrates the data plot of the actual number of DHF cases, the fitted value of the MAGWGPRS model, and the fitted value of the MAGPRS model. In general, the MAGWGPRS fitted values are closer to the actual values than the MAGPRS fitted values. According to the graph, the MAGWGPRS curve pattern is more similar to and closer to the actual curve pattern than the MAGPRS curve pattern. The MSE value of the MAGWGPRS model is less than the MSE value of the MAGPRS model, as illustrated by Table 5.
Fig. 3.
Comparison of the actual value and the fitted value of the response variable in the MAGPRS and MAGWGPRS models.
Table 5.
Comparison of MSE values for the MAGPRS and MAGWGPRS.
| Model | MSE |
|---|---|
| MAGPRS | 190,817 |
| MAGWGPRS | 62,157 |
Conclusion
The MAGWGPRS model, a modification of the MAGPRS model and GWGPR spatial regression, was proposed in this study. We demonstrated a step-by-step procedure for obtaining the estimated parameters of the MAGWGPRS model's estimated parameters. Aside from the benefit of displaying the regression coefficients for each location, MAGWGPRS is a complex model that necessitates a difficult coding program. As a result, determining the best model form becomes more complicated than the MAGPRS model.
Furthermore, the MAGWGPRS model was applied to dengue case data in 119 districts or cities on the Indonesian island of Java. The best model was obtained for each location based on the optimal bandwidth and spatial weighting with adaptive Gaussian kernel and the combination of BF, MI, and MO, specifically providing examples for the Central Jakarta and Surabaya cities. There are 119 best models, but if we employ the MAGPRS model to analyze them, we only obtain one of the best models for all locations. Based on the graphical comparison of actual values with MAGPRS and MAGWGPRS fit values and MSE values, the MAGWGPRS model is better than the MAGPRS model in this case.
The limitation of this article is that there is no hypothesis testing of the proposed model. Therefore, future research can test hypotheses. Furthermore, for various case studies, this model can be extended with other distributions or kernel functions.
Ethics statements
The data used in this research are secondary data derived from the official website of BPS Provinces in Java, Indonesia.
CRediT author statement
Riry Sriningsih: Conceptualization, methodology, and writing-preparation of the first draft. Bambang Widjanarko Otok: Conceptualization, methodology, writing-reviewing and editing, and supervision. Sutikno: Conceptualization, methodology, writing-reviewing and editing, and supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Contributor Information
Riry Sriningsih, Email: rirysriningsih@fmipa.unp.ac.id.
Bambang Widjanarko Otok, Email: bambang_wo@statistika.its.ac.id.
Sutikno, Email: sutikno@statistika.its.ac.id.
Appendix A
Table A1 shows the codes and names of districts/cities used and analyzed in the study.
Table A1.
Codes and names of districts/cities on the island of Java, Indonesia.
| Code and Name | Code and Name | Code and Name | Code and Name |
|---|---|---|---|
| 1. The Thousand Islands | 31.Cimahi City | 61. Tegal | 91. Nganjuk |
| 2. South Jakarta City | 32.Tasikmalaya City | 62. Brebes | 92. Madiun |
| 3. East Jakarta City | 33. Banjar City | 63. Magelang City | 93. Magetan |
| 4. Central Jakarta City | 34. Cilacap | 64. Surakarta City | 94. Ngawi |
| 5. West Jakarta City | 35. Banyumas | 65. Salatiga City | 95. Bojonegoro |
| 6. North Jakarta City | 36. Purbalingga | 66. Semarang City | 96. Tuban |
| 7. Bogor | 37. Banjarnegara | 67. Pekalongan City | 97. Lamongan |
| 8. Sukabumi | 38. Kebumen | 68. Tegal City | 98. Gresik |
| 9. Cianjur | 39. Purworejo | 69. Kulonprogo | 99. Bangkalan |
| 10. Bandung | 40. Wonosobo | 70. Bantul | 100. Sampang |
| 11. Garut | 41. Magelang | 71. Gunung Kidul | 101. Pamekasan |
| 12. Tasikmalaya | 42. Boyolali | 72. Sleman | 102. Sumenep |
| 13. Ciamis | 43. Klaten | 73. Yogyakarta City | 103. Kediri City |
| 14. Kuningan | 44. Sukoharjo | 74. Pacitan | 104. Blitar City |
| 15. Cirebon | 45. Wonogiri | 75. Ponorogo | 105. Malang City |
| 16. Majalengka | 46. Karanganyar | 76. Trenggalek | 106. Probolinggo City |
| 17. Sumedang | 47. Sragen | 77. Tulungagung | 107. Pasuruan City |
| 18. Indramayu | 48. Grobogan | 78. Blitar | 108. Mojokerto City |
| 19. Subang | 49. Blora | 79. Kediri | 109. Madiun City |
| 20. Purwakarta | 50. Rembang | 80. Malang | 110. Surabaya City |
| 21. Karawang | 51. Pati | 81. Lumajang | 111. Batu City |
| 22. Bekasi | 52. Kudus | 82. Jember | 112. Pandeglang |
| 23. West Bandung | 53. Jepara | 83. Banyuwangi | 113. Lebak |
| 24. Pangandaran | 54. Demak | 84. Bondowoso | 114. Tangerang |
| 25. Bogor City | 55. Semarang | 85. Situbondo | 115. Serang |
| 26. Sukabumi City | 56. Temanggung | 86. Probolinggo | 116. Tangerang City |
| 27. Bandung City | 57. Kendal | 87. Pasuruan | 117. Cilegon City |
| 28. Cirebon City | 58. Batang | 88. Sidoarjo | 118. Serang City |
| 29. Bekasi City | 59. Pekalongan | 89. Mojokerto | 119. South Tangerang City |
| 30. Depok City | 60. Pemalang | 90. Jombang |
Data availability
The authors do not have permission to share data.
References
- 1.Friedman J.H. Invited paper: multivariate adaptive regression splines. Ann. Stat. 1991;19:1–141. [Google Scholar]
- 2.Cai M., Koopialipoor M., Armaghani D.J., Pham B.T. Evaluating slope deformation of earth dams due to earthquake shaking using MARS and GMDH techniques. Appl. Sci. 2020;10:1–23. doi: 10.3390/app10041486. [DOI] [Google Scholar]
- 3.García L.A.M., Lasheras F.S., Nieto P.J.G., de Prado L.Á., Sánchez A.B. Predicting benzene concentration using machine learning and time series algorithms. Mathematics. 2020;8:1–22. doi: 10.3390/math8122205. [DOI] [Google Scholar]
- 4.Park S., Hamm S.Y., Jeon H.T., Kim J. Evaluation of logistic regression and multivariate adaptive regression spline models for groundwater potential mapping using R and GIS. Sustain. 2017;9:1–20. doi: 10.3390/su9071157. [DOI] [Google Scholar]
- 5.Zhang W., Goh A.T.C., Zhang Y. Multivariate adaptive regression splines application for multivariate geotechnical problems with big data. Geotech. Geol. Eng. 2016;34:193–204. doi: 10.1007/s10706-015-9938-9. [DOI] [Google Scholar]
- 6.Ampulembang A.P., Otok B.W., Rumiati A.T. Budiasih, Bi-responses nonparametric regression model using MARS and its properties. Appl. Math. Sci. 2015;9:1417–1427. doi: 10.12988/ams.2015.5127. [DOI] [Google Scholar]
- 7.B.W. Otok, Pemilihan Model Terbaik pada MARS Respon Kontinu, 8 (2008) 19–29.
- 8.B.W. Otok, M.S. Akbar, Raupong, Estimasi Spline dan MARS Menggunakan Kuadrat Terkecil, 4 (2007) 1–11.
- 9.Sakamoto W. MARS: selecting basis functions and knots with an empirical Bayes method. Comput. Stat. 2007;22:583–597. doi: 10.1007/s00180-007-0075-7. [DOI] [Google Scholar]
- 10.Zurimi S. Perbandingan metode generalized least square dan ordinary least square pada model multivariate adapative regression spline dengan respon biner. Pros. SEMNAS Mat. Pendidik. Mat. IAIN Ambon. 2018:21–28. [Google Scholar]
- 11.Adityaningrum A., Otok B.W., Fitriasari K. Institut Teknologi Sepuluh Nopember; 2017. Estimasi Propensity Score Matching Berdasarkan Pendekatan Multivariate Adaptive Regression Splines. [Google Scholar]
- 12.B.W. Otok, Konsistensi dan Asimtotik Normalitas Model Multivariate Adaptive Regression Spline (Mars) respon biner consistency and asymptotic normality of maximum likelihood estimator in mars binary response model, 10 (2009) 133–140.
- 13.J.H. Friedman, Estimating functions of mixed ordinal and categorical variables using adaptive splines, in: stanford, California, 1991: pp. 1–51.
- 14.Yasmirullah S.D.P., Otok B.W., Purnomo J.D.T., Prastyo D.D. Modification of multivariate adaptive regression spline (MARS) J. Phys. Conf. Ser. 2021;1863:1–11. doi: 10.1088/1742-6596/1863/1/012078. [DOI] [Google Scholar]
- 15.Hidayati S., Otok B.W. Purhadi, parameter estimation and statistical test in multivariate adaptive generalized poisson regression splines. IOP Conf. Ser. Mater. Sci. Eng. 2019;546:1–11. doi: 10.1088/1757-899X/546/5/052051. [DOI] [Google Scholar]
- 16.Otok B.W., Hidayati S. Purhadi, multivariate adaptive generalized poisson regression spline (MAGPRS) on the number of acute respiratory infection infants. J. Phys. Conf. Ser. 2019:1397. doi: 10.1088/1742-6596/1397/1/012062. [DOI] [Google Scholar]
- 17.S.M. Collins, An Application of Geographically Weighted Poisson Regression, in: canada, 2010: pp. 1–99.
- 18.Nakaya T., Fotheringham A.S., Brunsdon C., Charlton M. Geographically weighted Poisson regression for disease association mapping. Stat. Med. 2005;24:2695–2717. doi: 10.1002/sim.2129. [DOI] [PubMed] [Google Scholar]
- 19.Adryanta M., Purhadi P. Analisis Metode Geographically Weighted Generalized Poisson Regression untuk Pemodelan Faktor yang Mempengaruhi Jumlah Kematian Anak di Provinsi Jawa Timur. J. Sains Dan Seni ITS. 2019;8:D252–D259. doi: 10.12962/j23373520.v8i2.43562. [DOI] [Google Scholar]
- 20.Sabtika W., Prahutama A., Yasin H. Pemodelan Geographically Weighted Generalized Poisson Regression (GWGPR) pada Kasus Kematian Ibu Nifas di Jawa Tengah. J. Gaussian. 2021;10:259–268. doi: 10.14710/j.gauss.v10i2.30946. [DOI] [Google Scholar]
- 21.Tyas S.W., Puspitasari L.A. MethodsX Geographically weighted generalized poisson regression model with the best kernel function in the case of the number of postpartum maternal mortality in east java. MethodsX. 2023;10 doi: 10.1016/j.mex.2023.102002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Put R., Xu Q.S., Massart D.L., Vander Heyden Y. Multivariate adaptive regression splines (MARS) in chromatographic quantitative structure-retention relationship studies. J. Chromatogr. A. 2004;1055:11–19. doi: 10.1016/j.chroma.2004.07.112. [DOI] [PubMed] [Google Scholar]
- 23.Mukhsar, Agusrawati, Indiyanti, Deteksi Overdispersi Data Spasial Kasus DBD Kota Kendari, in: semin. Nas. Ris. Kuantitatif Terap., 2017: pp. 189–193.
- 24.Obenauer J. The increasing risk of vector-borne diseases: mapping the effects of climate change and human population density on future aedes aegypti habitats. ProQuest Diss. Theses. 2017;123 [Google Scholar]
- 25.Ginting E. Universitas Sumatera Utara; 2018. Analisis Faktor yang Mempengaruhi Penyakit Demam Berdarah Dengue dengan Menggunakan Regresi Poisson dan Regresi Binomial Negatif. [Google Scholar]
- 26.Lestanto F. Analisis spasial faktor - faktor yang berhubungan dengan kejadian demam berdarah dengue di puskesmas wilayah kerja di Bantul. J. Ilm. Rekam Medis Dan Inform. Kesehat. 2018;8:66–78. [Google Scholar]
- 27.Taryono A.P.N., Ispriyanti D., Prahutama A. Analisis Faktor-Faktor yang Mempengaruhi Penyebaran Penyakit Demam Berdarah Dengue (DBD) di Provinsi Jawa Tengah dengan Metode Spatial Autoregressive Model dan Spatial Durbin Model. Indones. J. Appl. Stat. I. 2018:1–13. [Google Scholar]
- 28.E.I. Zulheri, Y. Asdi, H. Yozza, Model Regresi Spasial Lag pada Kasus Penyakit Demam Berdarah Dengue (DBD) di Sumatra Utara Tahun 2016, VIII (2019) 59–66.
- 29.Friedman J.H., Roosen C.B. An introduction to multivariate adaptive regression splines. Stat. Methods Med. Res. 1995;4:197–217. doi: 10.1177/096228029500400303. [DOI] [PubMed] [Google Scholar]
- 30.Craven P., Wahba G. Smoothing noisy data with spline functions. Numer. Math. 1979:377–403. [Google Scholar]
- 31.Friedman J.H., Silverman B.W. Flexible parsimonious smoothing and additive modeling. Technometrics. 1989;31:3–21. doi: 10.1080/00401706.1989.10488470. [DOI] [Google Scholar]
- 32.Wang W., Famoye F. Modeling household fertility decisions with generalized Poisson regression. J. Popul. Econ. 1997;10:273–283. doi: 10.1007/s001480050043. [DOI] [PubMed] [Google Scholar]
- 33.Famoye F., Wang W. Censored generalized Poisson regression model. Comput. Stat. Data Anal. 2004;46:547–560. doi: 10.1016/j.csda.2003.08.007. [DOI] [Google Scholar]
- 34.Famoye F. Restricted generalized poisson regression model. Commun. Stat. - Theory Methods. 1993;22:1335–1354. doi: 10.1080/03610929308831089. [DOI] [Google Scholar]
- 35.Y. Pawitan, In all likelihood: statistical modelling and inferences using likelihood, 2001.
- 36.Fotheringham A.S., Brunsdon C., Charlton M. John Wiley & Sons Ltd; 2002. Geographically Weigthted Regression: The Analysis of Spatially Varying Relationships. [Google Scholar]
- 37.Akbarov A., Wu S. Warranty claim forecasting based on weighted maximum likelihood estimation. Qual. Reliab. Eng. Int. 2012;28:663–669. doi: 10.1002/qre.1399. [DOI] [Google Scholar]
- 38.BPS Provinsi Banten dalam Angka 2021, BPS Provinsi Banten, Indonesia, 2021.
- 39.BPS Provinsi Daerah Istimewa Yogyakarta dalam Angka 2021, BPS Provinsi D.I Yogyakarta, Indonesia, 2021.
- 40.BPS Provinsi DKI Jakarta dalam Angka 2021, BPS Provinsi DKI Jakarta, Indonesia, 2021.
- 41.BPS Provinsi Jawa Barat dalam Angka 2021, BPS Jawa Barat, Indonesia, 2021.
- 42.BPS Provinsi Jawa Timur dalam Angka 2021, BPS Provinsi Jawa Timur, Indonesia, 2021.
- 43.BPS Provinsi Jawa Tengah dalam Angka 2021, BPS Jawa Tengah, Indonesia, 2021.
- 44.Famoye F., Wulu J.T., Singh K.P. On the generalized Poisson regression model with an application to accident data. J. Data Sci. 2004;2:287–295. doi: 10.6339/JDS.2004.02(3).167. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The authors do not have permission to share data.




