Abstract
Spatial dependence plays a critical role in modeling multivariate response variables, particularly in fields such as epidemiology and environmental studies. However, existing spatial regression models, such as the Spatial Autoregressive (SAR) model, are designed for univariate responses and are insufficient when multiple response variables are influenced by spatial location. To address this gap, we introduce a Multivariate Spatial Autoregressive (MSAR) model. While previous research has focused primarily on parameter estimation for the proposed model, limited attention has been given to the statistical significance of these parameters. Moreover, existing estimation methods often rely on pseudo-distributions, which may not accurately reflect the underlying data characteristics. This study employs Maximum Likelihood Estimation (MLE), optimized using a concentrated log-likelihood approach, under the assumption of normally distributed data. To assess parameter significance, we apply both the Maximum Likelihood Ratio Test (MLRT) for joint hypotheses and the Wald Test for individual parameters. The findings confirm that the proposed model yields unbiased and consistent parameter estimates. Furthermore, the significance tests reveal key predictor variables associated with pneumonia and diarrhea cases among toddlers. The proposed model achieves a Root Mean Square Error of 5 and an R-squared value of 60 %, demonstrating its effectiveness in capturing spatial dependence in multivariate settings. The main contributions of this study include:
-
•
Development of a MSAR model estimated using MLE to capture spatial dependencies among multiple response variables.
-
•
Implementation of formal hypothesis testing procedures for model parameters using the Likelihood Ratio and Wald tests.
-
•
Application of the proposed model to spatial health data at the village level in Tuban District, East Java, Indonesia, focusing on health problems among children under five.
Keywords: Multivariate spatial autoregressive, Maximum likelihood estimation, Maximum likelihood ratio test, Wald test
Method name: Multivariate Spatial Autoregressive Model
Graphical abstract
Specifications table
Subject area: | Mathematics and Statistics |
More specific subject area: | Spatial Statistics, Spatial dependency, Multivariate Spatial Linear Models |
Name of your method: | Multivariate Spatial Autoregressive Model |
Name and reference of original method: | Original method: Multivariate Linear Regression. References:
|
Resource availability: | None |
Background
Spatial analysis has become a fundamental methodology in scientific research, particularly when the data exhibits a strong geographical component [1,2]. One widely used technique for addressing spatial dependence is the SAR model [[3], [4], [5]], which incorporates spatially lagged dependent variables to account for spatial autocorrelation. SAR models have been extensively studied with regard to parameter estimation and hypothesis testing. Among the available techniques, MLE is the most commonly used and yields consistent estimates [[6], [7], [8]]. However, a key challenge in estimating SAR model parameters is that the spatial effect parameters do not have closed-form solutions, so numerical iteration is required. In addition to estimation, hypothesis testing is typically conducted using the Likelihood Ratio Test (LRT) for joint significance and the Wald Test for individual parameters [[9], [10], [11]].
The SAR model has evolved into the MSAR model in econometric applications, with estimation methods including Quasi-Maximum Likelihood Estimation (QMLE) [12,13], Two-Stage Least Squares (2SLS) [14,15], and Three-Stage Least Squares (3SLS) [16,17]. However, when identification conditions are not met in complex spatial models, parameter estimates may become invalid, affecting the reliability of QMLE. Additionally, QMLE generally provides less efficient estimates than fully specified MLE, assuming the model is correctly specified [18]. More recent work has incorporated MSAR within simultaneous equation models, using the FGLS-3SLS estimation approach and numerical approximation via the average concentrated log-likelihood [16]. While that approach allows spatial effects to be estimated, it is still limited to univariate optimization. Thus, In this study, we extend this approach by applying multivariate optimization to the concentrated log-likelihood using the L-BFGS-B algorithm [[19], [20], [21]]. MSAR models have also been extended to network data. For example, Zhu and Huang (2020) compared QMLE and Least Squares Estimation (LSE) methods [22,23], but LSE remains less effective in handling parameter estimation in complex models, thus highlighting the need for a better methodology in this area using MLE methods that are able to provide accurate and consistent estimates. In addition, most existing studies have focused on parameter estimation without addressing hypothesis testing, which is essential for improving the accuracy and reliability of model predictions and evaluating regression parameters that can vary geographically.
This study proposes an area-based MSAR model for multivariate responses, designed to capture the spatial interaction between more than one correlated response variables, while considering the spatial dependence across regions Parameter estimation is carried out using MLE for the regression coefficients and the covariance matrix, with spatial parameters estimated via the concentrated log-likelihood function optimized using the L-BFGS-B algorithm [24]. In addition to estimation, hypothesis testing is performed using both the LRT and the Wald Test to evaluate spatially varying regression parameters.
Method details
The MSAR model is an extension of the SAR model that incorporates spatial dependencies into the analysis. In this model, the global structure is modeled using multivariate normal linear regression. Therefore, before discussing the MSAR model in detail, this section introduces the foundational concept of multivariate linear regression.
Multivariate normal linear regression
A multivariate linear regression model describes the relationship among multiple response variables and their corresponding predictors. This model is used to determine the relationship between the response variables and the predictor variables . Given a sample of n observations and suppose , the multivariate normal linear regression model for the i-th observation, , is represented in Eq. (1).
(1) |
The multivariate linear regression model can be expressed in matrix form as illustrated in Eq. (2).
(2) |
With
where ; and . Furthermore, the multivariate linear regression model can be expressed in the form of a Vec operator and Kronecker product, as demonstrated in Eq. (3).
(3) |
In Eq. (3), an assumption was made that withand . Based on this assumption, has a distribution of . The probability density function of is therefore given by the following expression.
Further parameter estimation can be carried out using the MLE method, resulting in the following parameter estimators [25].
Multivariate spatial autoregressive
The MSAR model is a further development of the SAR model. Consequently, the analogy of the MSAR model can be traced back to the univariate SAR model. The MSAR model is used to determine the relationship between the response variables and the predictor variables by considering the spatial effect of the lag of the response variables symbolized by ρ and the spatial weight symbolized by W. The MSAR model is mathematically illustrated by Eq. (4):
(4) |
Eq. (4) can be decomposed into the following equation:
The MSAR model in matrix form can be written as Eq. (5), with ρ is diagonal in form, with the elements representing the spatial effects on each of the response variables.
(5) |
If the MSAR model is written in the form of a Vec operator and using a Kronecker product, the resulting equation is given by Eq. (6).
(6) |
The MSAR model assumes that the error term follows a bivariate normal distribution with mean and covariance matrix [16]. The expectation and variance of are shown in the following equation:
Once the expectation and variance of are established, the distribution of is given by:
Parameter estimation of MSAR model
The MSAR parameters were estimated using the MLE estimation method combined with the numerical approximation of the concentrated log-likelihood using the L-BFGS-B optimization method. The MLE method is applied to estimate the regression coefficients and the variance-covariance matrix. The spatial effects were estimated by maximizing the concentrated log-likelihood function using the L-BFGS-B optimization method. The first step is to determine the likelihood function of the model in question. The likelihood function of the MSAR model is presented in Eq. (7).
(7) |
Furthermore, the likelihood function is formulated in the form of a natural logarithm likelihood, as illustrated in Eq. (8).
(8) |
The subsequent stage in parameter estimation is to differentiate Eq. (8) with respect to the parameter and set it equal to zero, thereby obtaining an estimator for the parameter.
The result of the first derivative of , which is equated to zero, is simplified to yield the estimator, symbolized by , which is shown in Eq. (9).
(9) |
Once the parameter estimator has been obtained, the parameter estimator can then be calculated. The steps involved in obtaining theparameter estimator is identical to those used for the parameter estimator. The initial step is to substitute into the ln-likelihood function in Eq. (8), with the estimated value given by Eq. (10).
(10) |
If we assume that , then Eq. (10) can be rewritten as Eq. (11).
(11) |
In Eq. (11), the final element in the quadratic form will yield a real number or a scalar, which is regarded as a matrix [24]. Consequently, the trace of the element is the element itself. Hence, Eq. (11) can be rewritten as Eq. (12) and simplified through the utilization of the cyclic nature of the trace matrix.
(12) |
Subsequently, the ln-likelihood function in Eq. (12) is derived from , which is illustrated in the following equation where is a symmetrical matrix comprising element 1 in positions and , and element 0 in all other row and column positions.
The partial derivative equal to zero is then solved by equalizing the form of the left equation with that of the right equation, thereby obtaining the estimator for the sigma parameter .
(13) |
If is the following equation.
Given that is a matrix, the subsequent step is to transform it into a matrix, which is represented by a symbolized matrix, and then to correlate it with an identity matrix through the use of the Kronecker product.
(14) |
The equation below is obtained from substituting Eq. (14) into Eq. (13)
Based on the previous evidence, theparameter estimator can be approximated by Eq. (14). The matrix can be formed from block elements of . Suppose has the following block structure where each is a matrix.
The matrix can be taken from the main diagonal elements of the block. Thus, the parameter estimator is given by Eq. (15).
(15) |
(9), (15) demonstrate that the equation is not in closed form, necessitating the utilization of a numerical approach for its resolution. The numerical approach to estimating is the concentrated log-likelihood with the L-BFGS-B optimization method. The concentrated log-likelihood function for is the likelihood function obtained from the substitution of the and estimates shown in Eq. (16).
(16) |
where with
Eq. (16) represents the concentrated log-likelihood function. This equation cannot be maximized statistically so a numerical approach is needed with the L-BFGS-B optimization method. The following steps outline the numerical procedure for maximizing the concentrated log-likelihood, thereby obtaining the value of :
a. Generated a sequence of values for where seq(start value, end value, increasing) and substituted each into the rho matrix where
b. Performed bivariate regression of with and obtained
c. Regressed WY with A and obtained which is a matrix.
d. Substituted and into concentrated log-likelihood function.
e. Identified the value of that gave the maximum and then became .
Properties of estimator
The coefficients parameter in the MSAR model are estimated using Eq. (9). is shown to be both unbiased and consistent. An estimator is considered unbiased if its expected value equals the true parameter, and consistent if it converges to the true parameter as the sample size increases. The proof is presented as follows.
Since the expectation of equals , it follows that is an unbiased estimator of .
Next, consistency is shown below.
It can be concluded that is an unbiased and consistent estimator.
Hypothesis testing of MSAR model
Hypothesis testing of the MSAR model parameters is conducted both simultaneously and partially. The MLRT is applied for simultaneous testing, while the Wald test is used for partial parameter testing [26,27]. The hypothesis for simultaneous testing of the model parameters is formulated as follows:
The set of parameters under population, denoted by , is given by ,while the set of parameters under H0, denoted by , is given by . The parameter estimators for the two sets, and , are obtained from parameter estimation using the MLE method described in the previous section. The LRT is calculated in consideration of the formula presented in Eq. (17).
(17) |
Where is the likelihood value of MSAR model using the estimated parameters under H0 and is the likelihood value of MSAR model using the estimated parameters under population. Consequently, the test statistics for testing the parameters simultaneously using the MLRT is presented in Eq. (18).
(18) |
The critical regions for hypothesis testing are as follows:
is distributed according to the chi-square distribution for , whereby the H0 rejection region is or with degree of freedom (df), which is the number of parameters under the population minus the number of parameters under H0.
Once the null hypothesis (H₀) is rejected in the simultaneous test, partial hypothesis testing is conducted to identify which predictor variables exert a statistically significant influence on the response variable. The first partial test focuses on the spatial autoregressive parameter ρ, formulated under the following hypothesis framework:
The test statistics used for testing the above hypothesis with the Wald test is shown in Eq. (19).
(19) |
where is obtained from the root. The value represents the main diagonal element of the Hessian matrix which is represented by and corresponds to . The Wald test statistics in Eq. (19) is deemed to be statistically significant if , thereby rejecting the null hypothesis (H0).
Moreover, the partial testing of parameters is conducted with the objective of identifying the parameters that exert a significant influence on the model. The following hypothesis is employed to test the partial parameters:
The test statistics used for testing the partial parameters with the Wald test is shown in Eq. (20).
(20) |
In this context, the term represents the standard error of obtained from . is the main diagonal element of the variance-covariance matrix . The null hypothesis (H₀) is rejected if .
Measures of model fits
To select the most suitable regression model, two commonly used evaluation metrics are the Root Mean Square Error (RMSE) and the coefficient of determination (R²). RMSE represents the average prediction error of the model, expressed in the same unit as the response variable. Models with lower RMSE values are preferred, as they indicate predictions that are closer to the observed values, reflecting a better model fit. Meanwhile, R2 measures the proportion of variance in the response variable that can be explained by the predictor variables [[28], [29], [30]]. A higher R² value signifies greater explanatory power and stronger predictive performance of the model. The formulas for RMSE and R² are provided in Eqs. (21) and (22) [[31], [32], [33]].
(21) |
(22) |
Data analysis procedure
The analysis was conducted through the following steps:
-
1.
Check the correlation between response variables.
-
2.
Test for multivariate normal distribution.
-
3.
Model the data using multivariate normal linear regression.
-
4.
Perform spatial weighting.
-
5.
Perform spatial dependency testing.
-
6.
Estimate the parameters of MSAR model.
-
7.
Conduct simultaneous hypothesis testing using the test statistic in Eq. (18).
-
8.
Conduct partial hypothesis testing using the test statistics in Eqs. (19) and (20).
- 9.
-
10.
Interpret the results and draw conclusions.
Method validation
To validate the application of the MSAR method, we used a real-world dataset on health issues in children under five years old.
Data set
The dataset used in this study was obtained from the Center for the Study of Regional Resources and Community Empowerment at Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia. The data are secondary in nature and pertain to the year 2023. Observations were collected from 54 villages located across four sub-districts in Tuban Regency, East Java—namely Singgahan, Kerek, Montong, and Senori—as illustrated in Fig. 1.
Fig. 1.
Administrative map of 54 villages (note: colored pink) in Tuban District.
The response variables selected for analysis are the percentage of cases of pneumonia and diarrhea in children under five years old. These two response variables were found to have a correlation coefficient of 0.585. The predictor variables used in this study include the percentage of infants who received exclusive breastfeeding, the percentage of children under five who received complete basic immunization, the percentage who received vitamin A supplementation, the percentage of pregnant women who attended government-sponsored prenatal classes, and the percentage of households with access to clean water. A summary of the research data is presented in Table 1.
Table 1.
Descriptive statistics of research data.
Variable | Description | Mean | SD | Min | Max |
---|---|---|---|---|---|
Response | Pneumonia in toddler (Y1) ( %) | 4.82 | 4.91 | 0.00 | 16.68 |
Diarrhea in toddler (Y2) ( %) | 12.99 | 8.10 | 0.83 | 30.61 | |
Predictor | Exclusive breastfeeding (X1) (10 %) | 2.53 | 2.40 | 0.00 | 12.22 |
Complete basic immunization (X2) ( %) | 22.56 | 6.70 | 6.12 | 50.00 | |
Toddlers who received vit. A (X3) (10 %) | 13.04 | 10.80 | 1.40 | 81.30 | |
Pregnant women who attended pregnancy classes (X4) (10 %) | 5.09 | 5.99 | 0.00 | 33.33 | |
Households with clean water (X5) coverage ( %) | 98.35 | 4.22 | 79.94 | 100.00 |
Modelling child health problems using multivariate normal linear regression
Before conducting multivariate linear regression analysis, the distribution of the response variables was assessed for multivariate normality using a quantile-quantile (Q-Q) plot. The results indicated that the Mahalanobis distance exceeded 50 %, with a proportion of 53.70 %, suggesting that the two response variables follow a bivariate normal distribution Subsequently, the parameters of the multivariate normal linear regression model were estimated, and the results are shown in Table 2. The table reveals that the predictor variables significantly influencing the prevalence of pneumonia (Y₁) in children under five are the percentage of infants who were exclusively breastfed (X₁) and the percentage of children who received complete basic immunization (X₂). Meanwhile, the variables influencing the prevalence of diarrhea (Y₂) include exclusive breastfeeding (X₁), complete basic immunization (X₂), and access to clean water (X₅).
Table 2.
Estimated values of multivariate normal linear regression parameters.
Parameters | Estimated Value | Standard Error | T | p-value |
---|---|---|---|---|
16.6738 | 19.5660 | 0.8522 | 0.3961 | |
0.7882 | 0.3516 | 2.2416 | 0.0272* | |
0.3373 | 0.1099 | 3.0685 | 0.0027* | |
−0.0013 | 0.0669 | −0.0206 | 0.9835 | |
−0.1088 | 0.1251 | −0.8700 | 0.3863 | |
−0.2123 | 0.1954 | −1.0867 | 0.2797 | |
67.2771 | 19.5660 | 3.4383 | 0.0008* | |
0.7053 | 0.3516 | 2.0061 | 0.0475* | |
0.5214 | 0.1099 | 4.7431 | 6.85 × 10–6* | |
−0.0748 | 0.0669 | −1.1187 | 0.2658 | |
0.0664 | 0.1251 | 0.5311 | 0.5965 | |
−0.6832 | 0.1953 | −3.4966 | 0.0007* |
: significant at 5 % alpha.
The multivariate normal linear regression model can be shown in the following equation.
Spatial weighting and testing for spatial dependence
The MSAR model was used to estimate the prevalence of pneumonia and diarrhea among children under five in southwestern Tuban Regency. This analysis employed a queen contiguity spatial weighting matrix, which accounts for the asymmetrical geographical layout of the region. The matrix was constructed based on shared boundaries between villages.
Following the construction of the spatial weighting matrix, spatial dependence was assessed using the residuals from the multivariate normal linear regression model. The spatial dependence test was conducted in R using the Bivariate Moran's I statistic [[34], [35], [36]], which yielded a Moran's I value of 0.1101, with an expected value of –0.0073 and a variance of 0.0051. The resulting Z-score was 1.6513, which exceeds the critical value of Z₀.₀₅ = 1.64. Therefore, the null hypothesis (H₀) of no spatial dependence is rejected. This result indicates the presence of bivariate spatial dependence in the regression residuals, justifying further spatial analysis.
Modelling child health data using the MSAR model
In MSAR modelling, the regression coefficients include a spatial effect parameter, denoted as ρ. Therefore, estimating this parameter is the first step, conducted using a numerical approximation method based on the concentrated log-likelihood function. Once the parameter estimation has been obtained, and can be estimated. The results of the estimation is presented in Table 3, while the value is as follows.
Table 3.
Estimated values of multivariate spatial autoregressive parameters.
Parameters | Estimated Value | Standard Error | Wald Statistic | P-value |
---|---|---|---|---|
0.42 | 0.01 | 2895.30 | 0.00* | |
15.76 | 12.90 | 1.49 | 0.22 | |
0.59 | 0.23 | 6.59 | 0.01* | |
0.25 | 0.07 | 11.98 | 0.00* | |
0.02 | 0.04 | 0.21 | 0.65 | |
−0.10 | 0.08 | 1.46 | 0.23 | |
−0.20 | 0.13 | 2.46 | 0.12 | |
0.38 | 0.01 | 14,786.09 | 0.00* | |
62.92 | 21.71 | 8.40 | 0.00* | |
0.41 | 0.39 | 1.13 | 0.29 | |
0.40 | 0.12 | 10.95 | 0.00* | |
−0.03 | 0.07 | 0.17 | 0.68 | |
0.05 | 0.14 | 0.14 | 0.71 | |
−0.66 | 0.22 | 9.25 | 0.00* |
: significant at 5 % alpha.
The initial step involves simultaneous hypothesis testing of all model parameters to determine whether they collectively wield a significant influence. The value of the test statistics was 43,240.59, which is greater than the . Accordingly, the null hypothesis (H₀) is rejected, indicating that at least one parameter significantly contributes to the model. This justifies proceeding with partial (individual) hypothesis tests to identify which specific parameters are influential in the MSAR model.
Table 3 indicates that the parameters and are significant to the model, thereby suggesting that spatial dependencies in the rates of pneumonia and diarrhea must be considered in the model. The MSAR model for the percentage of pneumonia cases (Y₁) identifies two significant predictor variables: the percentage of infants exclusively breastfed (X₁) and the percentage of children under five who received complete basic immunization (X₂). Meanwhile, for the percentage of diarrhea cases (Y₂), the significant predictors are X₂ (complete basic immunization) and X₅ (households with access to clean water).
As shown in Table 4, the MSAR model better captures the relationship between predictor variables and child health outcomes than the standard multivariate normal linear regression model. This is demonstrated by its lower Root Mean Square Error (RMSE) of 4.97 and a higher R-squared value of approximately 60 %. These findings support the conclusion that, when multivariate data exhibit spatial autocorrelation, the MSAR model provides a more accurate and reliable estimation framework.
Table 4.
Model comparison.
Model | RMSE | R-square |
---|---|---|
Multivariate Normal Linear Regression | 5.22 | 55.21 % |
Model MSAR | 4.97 | 59.98 % |
In total, 54 distinct MSAR models were developed—one for each village. The model estimates for both pneumonia (Y₁) and diarrhea (Y₂) are summarized as where:
Taking Gemulung village as an example, the MSAR model for Gemulung village (code number 5) is where:
The above MSAR model of Gemulung Village can be interpreted as follows:
-
1.
For every 100 children under five, approximately 10 to 11 are affected by pneumonia, and 9 to 10 by diarrhea in Gemulung Village. Similar patterns are likely present in neighboring villages—Mulyoagung, Sidonganti, Trantang, and Wolutengah—due to spatial dependence.
-
2.
A 1 % increase in the proportion of exclusively breastfed infants is associated with a rise in pneumonia cases, which contradicts theoretical expectations. This may be due to the lagging effect of exclusive breastfeeding on pneumonia incidence. Additionally, pneumonia cases in Gemulung appear to influence similar increases in the four neighboring villages. No significant relationship was found between exclusive breastfeeding and diarrhea prevalence.
-
3.
A 1 % increase in the percentage of children receiving complete basic immunization is linked to higher pneumonia and diarrhea rates. This finding contradicts existing theory, likely due to temporal lag in the variable's impact. Increases in pneumonia and diarrhea in Gemulung are associated with corresponding rises (10–11 and 9–10 cases per 100 children, respectively) in neighboring villages.
-
4.
The percentage of children under five who received vitamin A supplementation showed no significant effect on pneumonia or diarrhea incidence.
-
5.
The proportion of pregnant women attending pregnancy classes did not significantly influence pneumonia or diarrhea rates among children under five.
-
6.
A 1 % increase in household clean water coverage is associated with a decrease of approximately one diarrhea case per 100 children under five, but has no significant effect on pneumonia rates. Diarrhea cases in Gemulung also appear to influence similar increases (9–10 per 100) in the neighboring villages.
Conclusions
This study focused on area-based spatial modeling in the context of multivariate response regression, introducing the MSAR model as an extension of the conventional SAR model. The MSAR approach incorporates geographic weighting to account for spatial dependencies between neighboring regions. Parameter estimation was carried out using MLE via concentrated log-likelihood, which resulted in unbiased and consistent estimates. The significance of model parameters was tested both simultaneously using the LRT and partially using the Wald Test, which enabled the identification of influential predictor variables. The application of the MSAR model to data on pneumonia and diarrhea cases among children under five in Tuban Regency, East Java, demonstrated its effectiveness in handling spatial autocorrelation. Compared to the standard multivariate normal linear regression, the MSAR model showed better accuracy. The variables that affect the incidence of pneumonia and diarrhea were the percentage of infants who receive exclusive breastfeeding, the percentage of toddlers who receive complete basic immunization, and the percentage of households which have access clean water. However, the current model is limited to multivariate normal data distributions. Future research should explore extensions of the MSAR framework that can accommodate non-normal data.
Limitations
Assumption of error distribution is normal distribution.
Ethics statements
The data used in this research has been approved by the Center for the Study of Regional Resources and Community Empowerment Institut Teknologi Sepuluh Nopember Surabaya, Indonesia.
Supplementary material and/or additional information [Optional]
None
CRediT authorship contribution statement
Sutikno: Conceptualization, Methodology, Writing – original draft, Writing – review & editing, Validation. Purhadi: Methodology, Conceptualization. Fachrunisah: Visualization, Writing – review & editing, Software. Fajar Dwi Cahyoko: Writing – review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The first author would like to gratefully acknowledge the Government of Tuban Regency for providing funding for this research.
Footnotes
Related research article: None
For a published article: None
Appendix A
Table A1.
Village and neighbor codes.
Village Codes | Village | Sub-district | Count | Neighbor | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Banyuurip | Senori | 3 | 19 | 51 | 54 | |||||
2 | Binangun | Singgahan | 6 | 34 | 37 | 45 | 50 | 51 | 52 | ||
3 | Bringin | Montong | 3 | 20 | 32 | 40 | |||||
4 | Gaji | Kerek | 7 | 7 | 8 | 16 | 23 | 26 | 47 | 53 | |
5 | Gemulung | Kerek | 4 | 28 | 38 | 49 | 53 | ||||
6 | Guwoterus | Montong | 6 | 28 | 30 | 38 | 41 | 47 | 48 | ||
7 | Hargoretno | Kerek | 8 | 4 | 8 | 27 | 31 | 33 | 41 | 46 | 47 |
8 | Jarorejo | Kerek | 5 | 4 | 7 | 22 | 23 | 46 | |||
9 | Jatisari | Senori | 5 | 11 | 19 | 24 | 36 | 51 | |||
10 | Jetakss | Montong | 4 | 20 | 33 | 40 | 42 | ||||
11 | Kaligede | Senori | 2 | 9 | 19 | ||||||
12 | Karanglo | Kerek | 3 | 22 | 31 | 39 | |||||
13 | Kasiman | Kerek | 4 | 16 | 23 | 26 | 39 | ||||
14 | Katerban | Senori | 1 | 34 | |||||||
15 | Kedungjambe | Singgahan | 4 | 29 | 35 | 44 | 50 | ||||
16 | Kedungrejo | Kerek | 4 | 4 | 13 | 23 | 26 | ||||
17 | Lajo Kidul | Singgahan | 4 | 18 | 36 | 43 | 45 | ||||
18 | Lajo Lor | Singgahan | 3 | 17 | 28 | 43 | |||||
19 | Leran | Senori | 4 | 1 | 9 | 11 | 51 | ||||
20 | Maindu | Montong | 3 | 3 | 10 | 40 | |||||
21 | Manjung | Montong | 1 | 44 | |||||||
22 | Margomulyo | Kerek | 6 | 8 | 12 | 23 | 31 | 39 | 46 | ||
23 | Margorejo | Kerek | 6 | 4 | 8 | 13 | 16 | 22 | 39 | ||
24 | Medalem | Senori | 2 | 9 | 36 | ||||||
25 | Mergosari | Singgahan | 5 | 28 | 29 | 43 | 45 | 50 | |||
26 | Mliwang | Kerek | 3 | 4 | 13 | 16 | |||||
27 | Montongsekar | Montong | 4 | 7 | 32 | 33 | 41 | ||||
28 | Mulyoagung | Singgahan | 8 | 5 | 6 | 18 | 25 | 29 | 38 | 43 | 48 |
29 | Mulyorejo | Singgahan | 7 | 15 | 25 | 28 | 30 | 44 | 48 | 50 | |
30 | Nguluhan | Montong | 5 | 6 | 29 | 41 | 44 | 48 | |||
31 | Padasan | Kerek | 5 | 7 | 12 | 22 | 33 | 46 | |||
32 | Pakel | Montong | 6 | 3 | 27 | 33 | 40 | 41 | 44 | ||
33 | Pucangan | Montong | 7 | 7 | 10 | 27 | 31 | 32 | 40 | 42 | |
34 | Rayung | Senori | 6 | 2 | 14 | 35 | 37 | 50 | 54 | ||
35 | Saringembat | Singgahan | 3 | 15 | 34 | 50 | |||||
36 | Sendang | Senori | 5 | 9 | 17 | 24 | 45 | 51 | |||
37 | Sidoharjo | Senori | 4 | 2 | 34 | 52 | 54 | ||||
38 | Sidonganti | Kerek | 5 | 5 | 6 | 28 | 47 | 49 | |||
39 | Sumberarum | Kerek | 4 | 12 | 13 | 22 | 23 | ||||
40 | Sumurgung | Montong | 5 | 3 | 10 | 20 | 32 | 33 | |||
41 | Talangkembar | Montong | 7 | 6 | 7 | 27 | 30 | 32 | 44 | 47 | |
42 | Talun | Montong | 2 | 10 | 33 | ||||||
43 | Tanggir | Singgahan | 5 | 17 | 18 | 25 | 28 | 45 | |||
44 | Tanggulangin | Montong | 6 | 15 | 21 | 29 | 30 | 32 | 41 | ||
45 | Tanjungrejo | Singgahan | 7 | 2 | 17 | 25 | 36 | 43 | 50 | 51 | |
46 | Temayang | Kerek | 4 | 7 | 8 | 22 | 31 | ||||
47 | Tengger Wetan | Kerek | 7 | 4 | 6 | 7 | 38 | 41 | 49 | 53 | |
48 | Tingkis | Singgahan | 4 | 6 | 28 | 29 | 30 | ||||
49 | Trantang | Kerek | 4 | 5 | 38 | 47 | 53 | ||||
50 | Tunggulrejo | Singgahan | 7 | 2 | 15 | 25 | 29 | 34 | 35 | 45 | |
51 | Wanglu Kulon | Senori | 8 | 1 | 2 | 9 | 19 | 36 | 45 | 52 | 54 |
52 | Wanglu Wetan | Senori | 4 | 2 | 37 | 51 | 54 | ||||
53 | Wolutengah | Kerek | 4 | 4 | 5 | 47 | 49 | ||||
54 | Wonosari | Senori | 5 | 1 | 34 | 37 | 51 | 52 |
Fig. A1.
Map of tuban regency village codes.
Data availability
Data will be made available on request.
References
- 1.Mennis J., Guo D. Spatial data mining and geographic knowledge discovery-an introduction. Comput. Environ. Urban Syst. 2009;33:403–408. doi: 10.1016/j.compenvurbsys.2009.11.001. [DOI] [Google Scholar]
- 2.Charles A.C., Armstrong A., Nnamdi O.C., Innocent M.T., Obiageri N.J., Begianpuye A.F., Timothy E.E. Review of spatial analysis as a geographic information management tool. Am. J. Eng. Technol. Manag. 2024 doi: 10.11648/j.ajetm.20240901.12. [DOI] [Google Scholar]
- 3.Krisztin T., Piribauer P. A Bayesian approach for the estimation of weight matrices in spatial autoregressive models. Spat. Econ. Anal. 2023;18:44–63. doi: 10.1080/17421772.2022.2095426. [DOI] [Google Scholar]
- 4.Koley M., Bera A.K. Springer International Publishing; 2022. Testing For Spatial Dependence in a Spatial Autoregressive (SAR) Model in the Presence of Endogenous Regressors. [DOI] [Google Scholar]
- 5.Liu X., Chen J. Variable selection for the spatial autoregressive model with autoregressive disturbances. Mathematics. 2021;9 https://www.mdpi.com/2227-7390/9/12/1448 [Google Scholar]
- 6.LeSage J., Pace R.K. Chapman and Hall/CRC; New York: 2009. Introduction to Spatial Econometrics. [DOI] [Google Scholar]
- 7.Yokoi T. 50th Congr. Eur. Reg. Sci. Assoc. "Sustainable Reg. Growth Dev. Creat. Knowl. Econ. 2010. Efficient maximum likelihood estimation of spatial autoregressive models with normal but heteroskedastic disturbances. [DOI] [Google Scholar]
- 8.Jeong H., fei Lee L. Maximum likelihood estimation of a spatial autoregressive model for origin–destination flow variables. J. Econom. 2024;242 doi: 10.1016/j.jeconom.2024.105790. [DOI] [Google Scholar]
- 9.Anselin L. Springer Netherlands Dordrecht; 1988. Spatial Econometrics: Methods and Models. [DOI] [Google Scholar]
- 10.Yang H., Huang W., Ma X., Xu Y., Huang M. Proc. 2022 3rd Int. Conf. Big Data Soc. Sci. (ICBDSS 2022) Atlantis Press International BV; 2022. Research on the time-space impact paths of economic convergence-empirical evidence from 30 provinces in China; pp. 110–122. [DOI] [Google Scholar]
- 11.Liu T., Lee L. A likelihood ratio test for spatial model selection. J. Econom. 2019;213:434–458. doi: 10.1016/j.jeconom.2019.07.001. [DOI] [Google Scholar]
- 12.Yang K., fei Lee L. Identification and QML estimation of multivariate and simultaneous equations spatial autoregressive models. J. Econom. 2017;196:196–214. doi: 10.1016/j.jeconom.2016.04.019. [DOI] [Google Scholar]
- 13.Su L., Jin S. Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive modelsI. J. Econom. 2010;157:18–33. doi: 10.1016/j.jeconom.2009.10.033. [DOI] [Google Scholar]
- 14.Liu X., Lee L.F. Two-stage least squares estimation of spatial autoregressive models with endogenous regressors and many instruments. Econom. Rev. 2013;32:734–753. doi: 10.1080/07474938.2013.741018. [DOI] [Google Scholar]
- 15.Kelejian H.H., Prucha I.R. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real Estate Financ. Econ. 1998;17:99–121. doi: 10.1023/A:1007707430416. [DOI] [Google Scholar]
- 16.Sirait T. Multivariate general spatial three-stage least squares fixed effect panel simultaneous models and estimation of their parameters. WSEAS Trans. Math. 2020;19:373–383. doi: 10.37394/23206.2020.19.38. [DOI] [Google Scholar]
- 17.Luo G., Wu M., Pang Z. Estimation of spatial autoregressive models with covariate measurement errors. J. Multivar. Anal. 2022 https://www.sciencedirect.com/science/article/pii/S0047259X22000872 [Google Scholar]
- 18.White H. Maximum likelihood estimation of misspecified models. Econometrica. 1982;50:1–25. doi: 10.4337/9781035334926.00009. [DOI] [Google Scholar]
- 19.Nocedal J., Liu D.C. On the limited memory BFGS method for large scale optimization. Math. Program. 1989;45:503–528. [Google Scholar]
- 20.Gerber F., Furrer R. OptimParallel: an R package providing a parallel version of the l-BFGS-B optimization method. R J. 2019:11. doi: 10.32614/rj-2019-030. [DOI] [Google Scholar]
- 21.Xiao Y., Wei Z., Wang Z. A limited memory BFGS-type method for large-scale unconstrained optimization. Comput. Math. with Appl. 2008;56:1001–1009. doi: 10.1016/j.camwa.2008.01.028. [DOI] [Google Scholar]
- 22.Hu W., Jing B., Zhang B., Huang D. Crawling subsampling for multivariate spatial autoregression model in large-scale networks. Electron. J. Stat. 2021;15:3678–3707. doi: 10.1214/21-EJS1872. [DOI] [Google Scholar]
- 23.Zhu X., Huang D., Pan R., Wang H. Multivariate spatial autoregressive model for large scale social networks. J. Econom. 2020;215:591–606. doi: 10.1016/j.jeconom.2018.11.018. [DOI] [Google Scholar]
- 24.Byrd R., Lu P., Nocedal J., Zhu C. A limited memory algorithm for bound constrained optimization. J. Sci. Comput. 1995;16:1190–1208. [Google Scholar]
- 25.Christensen R. Springer; New York: 1991. Linear Models for Multivariate, Time Series, and Spatial Data. [Google Scholar]
- 26.Yasin H., Purhadi A.Choiruddin. Spatial clustering based on geographically weighted multivariate generalized gamma regression. MethodsX. 2024;13 doi: 10.1016/j.mex.2024.102903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Fadmi F.R., Otok B.W., Kuntoro S.Melaniani, Sriningsih R. Segmentation of stunting, wasting, and underweight in Southeast Sulawesi using geographically weighted multivariate Poisson regression. MethodsX. 2024;12 doi: 10.1016/j.mex.2024.102736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.D.N. Gujarati, D.C. Porter, Basic Econometrics, 5 ed, McGraw-Hill Education, 2008.
- 29.Ozili P.K. The acceptable R-square in empirical modelling for social science research. Soc. Res. Methodol. Publ. Results. 2022 @. [Google Scholar]
- 30.Chicco D., Warrens M.J., Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021;7:1–24. doi: 10.7717/PEERJ-CS.623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Johnson R.A., Wichern D.W. Pearson Prentice Hall; 2007. Applied Multivariate Statistical Analysis. 6 ed. [Google Scholar]
- 32.E. Kasuya, On the use of r and r squared in correlation and regression, 2018. 10.1111/1440-1703.1011. [DOI]
- 33.Keer M., Lohiya H., Chouhan S. Goodness of Fit for Linear Regression using R squared and Adjusted R-Squared. Int. J. Res. Publ. Rev. J. Homepage. 2023;4:2431–2439. @@. [Google Scholar]
- 34.Yamada H. Moran's I for Multivariate Spatial Data. Mathematics. 2024;12:2746. doi: 10.3390/math12172746. [DOI] [Google Scholar]
- 35.Bivand R.S., Wong D.W.S. Comparing implementations of global and local indicators of spatial association. TEST An Off. J. Spanish Soc. Stat. Oper. Res. 2018;27:716–748. doi: 10.1007/s11749-018-0599-x. [DOI] [Google Scholar]
- 36.Cheng Z. The spatial correlation and interaction between manufacturing agglomeration and environmental pollution. Ecol. Indic. 2016;61:1024–1032. doi: 10.1016/j.ecolind.2015.10.060. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.