Estimation of parameters and hypothesis testing of multivariate spatial autoregressive model

Sutikno; Purhadi; Fachrunisah; Fajar Dwi Cahyoko

doi:10.1016/j.mex.2025.103294

. 2025 Mar 28;14:103294. doi: 10.1016/j.mex.2025.103294

Estimation of parameters and hypothesis testing of multivariate spatial autoregressive model

Sutikno ^a,^⁎, Purhadi ^a, Fachrunisah ^a, Fajar Dwi Cahyoko ^b

PMCID: PMC12001134 PMID: 40241707

Abstract

Spatial dependence plays a critical role in modeling multivariate response variables, particularly in fields such as epidemiology and environmental studies. However, existing spatial regression models, such as the Spatial Autoregressive (SAR) model, are designed for univariate responses and are insufficient when multiple response variables are influenced by spatial location. To address this gap, we introduce a Multivariate Spatial Autoregressive (MSAR) model. While previous research has focused primarily on parameter estimation for the proposed model, limited attention has been given to the statistical significance of these parameters. Moreover, existing estimation methods often rely on pseudo-distributions, which may not accurately reflect the underlying data characteristics. This study employs Maximum Likelihood Estimation (MLE), optimized using a concentrated log-likelihood approach, under the assumption of normally distributed data. To assess parameter significance, we apply both the Maximum Likelihood Ratio Test (MLRT) for joint hypotheses and the Wald Test for individual parameters. The findings confirm that the proposed model yields unbiased and consistent parameter estimates. Furthermore, the significance tests reveal key predictor variables associated with pneumonia and diarrhea cases among toddlers. The proposed model achieves a Root Mean Square Error of 5 and an R-squared value of 60 %, demonstrating its effectiveness in capturing spatial dependence in multivariate settings. The main contributions of this study include:

•
Development of a MSAR model estimated using MLE to capture spatial dependencies among multiple response variables.
•
Implementation of formal hypothesis testing procedures for model parameters using the Likelihood Ratio and Wald tests.
•
Application of the proposed model to spatial health data at the village level in Tuban District, East Java, Indonesia, focusing on health problems among children under five.

Keywords: Multivariate spatial autoregressive, Maximum likelihood estimation, Maximum likelihood ratio test, Wald test

Method name: Multivariate Spatial Autoregressive Model

Graphical abstract

Specifications table

Subject area:	Mathematics and Statistics
More specific subject area:	Spatial Statistics, Spatial dependency, Multivariate Spatial Linear Models
Name of your method:	Multivariate Spatial Autoregressive Model
Name and reference of original method:	Original method: Multivariate Linear Regression. References: • Christensen, R.: Linear Models for Multivariate, Time Series, and Spatial Data. New York: Springer Science+Business Media (1991). • Johnson, R.A., & Wichern, D.W.: Applied Multivariate Statistical Analysis. New Jersey: Pearson Education, Inc (2007).
Resource availability:	None

Open in a new tab

Background

Spatial analysis has become a fundamental methodology in scientific research, particularly when the data exhibits a strong geographical component [1,2]. One widely used technique for addressing spatial dependence is the SAR model [[3], [4], [5]], which incorporates spatially lagged dependent variables to account for spatial autocorrelation. SAR models have been extensively studied with regard to parameter estimation and hypothesis testing. Among the available techniques, MLE is the most commonly used and yields consistent estimates [[6], [7], [8]]. However, a key challenge in estimating SAR model parameters is that the spatial effect parameters do not have closed-form solutions, so numerical iteration is required. In addition to estimation, hypothesis testing is typically conducted using the Likelihood Ratio Test (LRT) for joint significance and the Wald Test for individual parameters [[9], [10], [11]].

The SAR model has evolved into the MSAR model in econometric applications, with estimation methods including Quasi-Maximum Likelihood Estimation (QMLE) [12,13], Two-Stage Least Squares (2SLS) [14,15], and Three-Stage Least Squares (3SLS) [16,17]. However, when identification conditions are not met in complex spatial models, parameter estimates may become invalid, affecting the reliability of QMLE. Additionally, QMLE generally provides less efficient estimates than fully specified MLE, assuming the model is correctly specified [18]. More recent work has incorporated MSAR within simultaneous equation models, using the FGLS-3SLS estimation approach and numerical approximation via the average concentrated log-likelihood [16]. While that approach allows spatial effects to be estimated, it is still limited to univariate optimization. Thus, In this study, we extend this approach by applying multivariate optimization to the concentrated log-likelihood using the L-BFGS-B algorithm [[19], [20], [21]]. MSAR models have also been extended to network data. For example, Zhu and Huang (2020) compared QMLE and Least Squares Estimation (LSE) methods [22,23], but LSE remains less effective in handling parameter estimation in complex models, thus highlighting the need for a better methodology in this area using MLE methods that are able to provide accurate and consistent estimates. In addition, most existing studies have focused on parameter estimation without addressing hypothesis testing, which is essential for improving the accuracy and reliability of model predictions and evaluating regression parameters that can vary geographically.

This study proposes an area-based MSAR model for multivariate responses, designed to capture the spatial interaction between more than one correlated response variables, while considering the spatial dependence across regions Parameter estimation is carried out using MLE for the regression coefficients and the covariance matrix, with spatial parameters estimated via the concentrated log-likelihood function optimized using the L-BFGS-B algorithm [24]. In addition to estimation, hypothesis testing is performed using both the LRT and the Wald Test to evaluate spatially varying regression parameters.

Method details

The MSAR model is an extension of the SAR model that incorporates spatial dependencies into the analysis. In this model, the global structure is modeled using multivariate normal linear regression. Therefore, before discussing the MSAR model in detail, this section introduces the foundational concept of multivariate linear regression.

Multivariate normal linear regression

A multivariate linear regression model describes the relationship among multiple response variables and their corresponding predictors. This model is used to determine the relationship between the response variables $Y_{1}, Y_{2}, . . ., Y_{p}$ and the predictor variables $A_{1}, A_{2}, . . ., A_{q}$ . Given a sample of n observations and suppose $j = 1, 2, . . ., p$ , the multivariate normal linear regression model for the i-th observation, $i = 1, 2, . . ., n$ , is represented in Eq. (1).

\begin{matrix} Y_{1 i} & = & β_{01} + β_{11} A_{1 i} + . . . + β_{q 1} A_{q i} + ε_{1 i} \\ Y_{2 i} & = & β_{02} + β_{12} A_{1 i} + . . . + β_{q 2} A_{q i} + ε_{2 i} \\ ⋮ \\ Y_{j i} & = & β_{0 j} + β_{1 j} A_{1 i} + . . . + β_{q j} A_{q i} + ε_{j i} \end{matrix}

(1)

The multivariate linear regression model can be expressed in matrix form as illustrated in Eq. (2).

Y_{(n \times p)} = A_{n \times (q + 1)} B_{(q + 1) \times p} + Ξ_{(n \times p)}

(2)

With

Y = [\begin{matrix} y_{1} & y_{2} \dots y_{p} \end{matrix}], A = [\begin{matrix} 1 & A_{11} & A_{21} & \dots & A_{q 1} \\ 1 & A_{12} & A_{22} & \dots & A_{q 2} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & A_{1 n} & A_{2 n} & \dots & A_{q n} \end{matrix}], B = [\begin{matrix} β_{1} & β_{2} \dots β_{p} \end{matrix}], Ξ = [\begin{matrix} ɛ_{1} & ɛ_{2} \dots ɛ_{p} \end{matrix}]

where $y_{j} = {[Y_{j 1} Y_{j 2} \dots Y_{j n}]}^{T}; j = 1, 2, . . ., p$ ; $β_{j} = {[β_{0 j} β_{1 j} \dots β_{q j}]}^{T}; j = 1, 2, . . ., p$ and $ɛ_{j} = {[ε_{j 1} ε_{j 2} \dots ε_{j n}]}^{T}; j = 1, 2, . . ., p$ . Furthermore, the multivariate linear regression model can be expressed in the form of a Vec operator and Kronecker product, as demonstrated in Eq. (3).

V e c {(Y)}_{p n \times 1} = {(I_{p} \otimes A)}_{p n \times p (q + 1)} V e c {(B)}_{p (q + 1) \times 1} + V e c {(Ξ)}_{p n \times 1}

(3)

In Eq. (3), an assumption was made that $V e c (Ξ) \sim N (0, Σ_{p \times p} \otimes I_{n \times n})$ with $E (V e c (Ξ)) = 0$ and $C o v (V e c (Ξ)) = Σ_{p \times p} \otimes I_{n \times n}$ . Based on this assumption, $V e c (Y)$ has a distribution of $V e c (Y) \sim N_{p n} ((I_{p} \otimes A) V e c (B), Σ \otimes I_{n})$ . The probability density function of $V e c (Y)$ is therefore given by the following expression.

f (V e c (Y)) = {(2 π)}^{- p n / 2} {| Σ |}^{- n / 2} \exp (- \frac{1}{2} {(V e c (Y) - ((I_{p} \otimes A) V e c (B)))}^{T} {(Σ \otimes I_{n})}^{- 1} (V e c (Y) - ((I_{p} \otimes A) V e c (B))))

Further parameter estimation can be carried out using the MLE method, resulting in the following parameter estimators [25].

V e c (\hat{B}) = (I \otimes {(A^{T} A)}^{- 1} A^{T}) (V e c (Y))

\hat{Σ} = \frac{(Y^{T} (I - A {(A^{T} A)}^{- 1} A^{T}) Y)}{n}

Multivariate spatial autoregressive

The MSAR model is a further development of the SAR model. Consequently, the analogy of the MSAR model can be traced back to the univariate SAR model. The MSAR model is used to determine the relationship between the response variables and the predictor variables by considering the spatial effect of the lag of the response variables symbolized by ρ and the spatial weight symbolized by W. The MSAR model is mathematically illustrated by Eq. (4):

\begin{matrix} Y_{1 i} & = & ρ_{1} w_{i} Y_{1 i} + a^{T} β_{1} + ε_{1 i} \\ Y_{2 i} & = & ρ_{2} w_{i} Y_{2 i} + a^{T} β_{2} + ε_{2 i} \\ ⋮ \\ Y_{p i} & = & ρ_{p} w_{i} Y_{p i} + a^{T} β_{p} + ε_{p i} \end{matrix}

(4)

Eq. (4) can be decomposed into the following equation:

[\begin{matrix} Y_{1 i} \\ \begin{matrix} Y_{2 i} \\ ⋮ \\ Y_{p i} \end{matrix} \end{matrix}] = [\begin{matrix} β_{01} + ρ_{1} \sum_{i * = 1, i \neq i *}^{n} w_{i i *} y_{1 i *} + a^{T} β_{1} + ε_{1 i} \\ \begin{matrix} β_{02} + ρ_{2} \sum_{i * = 1, i \neq i *}^{n} w_{i i *} y_{2 i *} + a^{T} β_{2} + ε_{2 i} \\ ⋮ \\ β_{0 p} + ρ_{p} \sum_{i * = 1, i \neq i *}^{n} w_{i i *} y_{p i *} + a^{T} β_{p} + ε_{p i} \end{matrix} \end{matrix}]

The MSAR model in matrix form can be written as Eq. (5), with ρ is diagonal in form, with the elements $ρ_{1}, ρ_{2}, . . ., ρ_{p}$ representing the spatial effects on each of the response variables.

Y_{n \times p} = W_{n \times n} Y_{n \times p} ρ_{p \times p} + A_{n \times (q + 1)} B_{(q + 1) \times p} + Ξ_{n \times p}

(5)

If the MSAR model is written in the form of a Vec operator and using a Kronecker product, the resulting equation is given by Eq. (6).

V e c (Y) = (ρ^{T} \otimes W) V e c (Y) + (I_{2} \otimes A) V e c (B) + V e c (Ξ)

V e c (Y) = {(I - (ρ^{T} \otimes W))}^{- 1} (I_{p} \otimes A) V e c (B) + {(I - (ρ^{T} \otimes W))}^{- 1} V e c (Ξ)

(6)

The MSAR model assumes that the error term follows a bivariate normal distribution with mean $E (V e c (Ξ)) = 0$ and covariance matrix $V a r (V e c (Ξ)) = Σ \otimes I_{n}$ [16]. The expectation and variance of $V e c (Y)$ are shown in the following equation:

E (V e c (Y)) = {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) V e c (B)

V a r (V e c (Y)) = {(I_{p n} - (ρ^{T} \otimes W))}^{- 1} (Σ \otimes I_{n}) {(I_{p n} - (ρ \otimes W^{T}))}^{- 1}

Once the expectation and variance of $V e c (Y)$ are established, the distribution of $V e c (Y)$ is given by:

V e c (Y) \sim N_{p n} ({(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) V e c (B), {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (Σ \otimes I_{n}) {(I_{p n} - ρ \otimes W^{T})}^{- 1}) .

Parameter estimation of MSAR model

The MSAR parameters were estimated using the MLE estimation method combined with the numerical approximation of the concentrated log-likelihood using the L-BFGS-B optimization method. The MLE method is applied to estimate the regression coefficients and the variance-covariance matrix. The spatial effects were estimated by maximizing the concentrated log-likelihood function using the L-BFGS-B optimization method. The first step is to determine the likelihood function of the model in question. The likelihood function of the MSAR model is presented in Eq. (7).

\begin{matrix} L (V e c (B), Σ, ρ) & = & \prod_{i = 1}^{n} f (Y_{1 i}, Y_{2 i}, . . ., Y_{p i}) = f (V e c (Y)) \\ = & {(2 π)}^{- p n / 2} {| {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (Σ \otimes I_{n}) {(I_{p n} - ρ \otimes W^{T})}^{- 1} |}^{- 1 / 2} \\ \exp (- \frac{1}{2} {(V e c (Y) - ({(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) V e c (B)))}^{T} \\ {({(I_{p n} - ρ^{T} \otimes W)}^{- 1} (Σ \otimes I_{n}) {(I_{p n} - ρ \otimes W^{T})}^{- 1})}^{- 1} \\ (V e c (Y) - ({(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) V e c (B)))) \end{matrix}

(7)

Furthermore, the likelihood function is formulated in the form of a natural logarithm likelihood, as illustrated in Eq. (8).

\begin{matrix} \ln L (V e c (B), Σ, ρ) & = & \ln (\prod_{i = 1}^{n} f (Y_{1 i}, Y_{2 i}, . . ., Y_{p i})) = \ln f (V e c (Y)) \\ = & - \frac{p n}{2} \ln (2 π) - \frac{1}{2} \ln (| {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (Σ \otimes I_{n}) {(I_{p n} - ρ \otimes W^{T})}^{- 1} |) \\ - \frac{1}{2} {(V e c (Y) - ({(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) V e c (B)))}^{T} \\ ((I_{p n} - ρ \otimes W^{T}) {(Σ \otimes I_{n})}^{- 1} (I_{p n} - ρ^{T} \otimes W)) \\ (V e c (Y) - ({(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) V e c (B))) \end{matrix}

(8)

The subsequent stage in parameter estimation is to differentiate Eq. (8) with respect to the $V e c (B)$ parameter and set it equal to zero, thereby obtaining an estimator for the $V e c (B)$ parameter.

\begin{matrix} \frac{\partial \ln L (V e c (B), Σ, ρ)}{\partial V e c (B)} & = & - 2 ((I_{p} \otimes A^{T}) {(I_{p n} - ρ \otimes W^{T})}^{- 1} (I_{p n} - ρ \otimes W^{T}) (Σ^{- 1} \otimes I_{n}) \\ \times (I_{p n} - ρ^{T} \otimes W) (V e c (Y))) + 2 (I_{p} \otimes A^{T}) {(I_{p n} - ρ \otimes W^{T})}^{- 1} \\ \times (I_{p n} - ρ \otimes W^{T}) (Σ^{- 1} \otimes I_{n}) ((I_{p n} - ρ^{T} \otimes W)) \\ ({(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) V e c (B)) = 0 \end{matrix}

The result of the first derivative of $V e c (B)$ , which is equated to zero, is simplified to yield the $V e c (B)$ estimator, symbolized by $V e c (\hat{B})$ , which is shown in Eq. (9).

V e c {(\hat{B})}_{i n i t i a l} = (I_{p} \otimes {(A^{T} A)}^{- 1} A^{T}) (I_{p n} - ρ^{T} \otimes W) (V e c (Y))

(9)

Once the $V e c (B)$ parameter estimator has been obtained, the $Σ$ parameter estimator can then be calculated. The steps involved in obtaining the $Σ$ parameter estimator is identical to those used for the $V e c (B)$ parameter estimator. The initial step is to substitute $V e c (B)$ into the ln-likelihood function in Eq. (8), with the estimated value given by Eq. (10).

\begin{matrix} \ln L (V e c {(\hat{B})}_{i n i t i a l}, Σ, ρ) & = & - \frac{p n}{2} \ln (2 π) - \frac{1}{2} \ln (| {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (Σ \otimes I_{n}) {(I_{p n} - ρ \otimes W^{T})}^{- 1} |) \\ - \frac{1}{2} (V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A {(A^{T} A)}^{- 1} A^{T}) \times \\ {(I_{p n} - ρ^{T} \otimes W) (V e c (Y)))}^{T} ((I_{p n} - ρ \otimes W^{T}) {(Σ \otimes I_{n})}^{- 1} \times \\ (I_{p n} - ρ^{T} \otimes W)) (V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} \times \\ (I_{p} \otimes A {(A^{T} A)}^{- 1} A^{T}) (I_{p n} - ρ^{T} \otimes W) (V e c (Y))) . \end{matrix}

(10)

If we assume that $M = A {(A^{T} A)}^{- 1} A^{T}$ , then Eq. (10) can be rewritten as Eq. (11).

\begin{matrix} \ln L (V e c {(\hat{B})}_{i n i t i a l}, Σ, ρ) & = & - \frac{p n}{2} \ln (2 π) - \frac{1}{2} \ln (| {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (Σ \otimes I_{n}) {(I_{p n} - ρ \otimes W^{T})}^{- 1} |) \\ - \frac{1}{2} {(V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes M) (I_{p n} - ρ^{T} \otimes W) (V e c (Y)))}^{T} \\ ((I_{p n} - ρ \otimes W^{T}) {(Σ \otimes I_{n})}^{- 1} (I_{p n} - ρ^{T} \otimes W)) \\ (V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes M) (I_{p n} - ρ^{T} \otimes W) (V e c (Y))) . \end{matrix}

(11)

In Eq. (11), the final element in the quadratic form will yield a real number or a scalar, which is regarded as a $(1 \times 1)$ matrix [24]. Consequently, the trace of the element is the element itself. Hence, Eq. (11) can be rewritten as Eq. (12) and simplified through the utilization of the cyclic nature of the trace matrix.

\begin{matrix} \ln L (Σ, ρ) & = & - \frac{p n}{2} \ln (2 π) - \frac{1}{2} \ln (| {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (Σ \otimes I_{n}) {(I_{p n} - ρ \otimes W^{T})}^{- 1} |) \\ - \frac{1}{2} t r [((I_{p n} - ρ \otimes W^{T}) {(Σ \otimes I_{n})}^{- 1} (I_{p n} - ρ^{T} \otimes W)) \times \\ (V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes M) (I_{p n} - ρ^{T} \otimes W) (V e c (Y))) \\ \times {(V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes M) (I_{p n} - ρ^{T} \otimes W) (V e c (Y)))}^{T}] \end{matrix}

(12)

Subsequently, the ln-likelihood function in Eq. (12) is derived from $σ_{j j *}$ , which is illustrated in the following equation where $T_{j j *}$ is a $(p \times p)$ symmetrical matrix comprising element 1 in positions $(j, j^{*})$ and $(j^{*}, j)$ , and element 0 in all other row and column positions.

\begin{matrix} \frac{\partial \ln (L (Σ, ρ))}{\partial σ_{j j *}} & = & - \frac{1}{2} t r [(Σ^{- 1} T_{j j *} \otimes I_{n})] \\ + \frac{1}{2} t r [(I_{p n} - ρ \otimes W^{T}) (Σ^{- 1} T_{j j *} Σ^{- 1} \otimes I_{n}) (I_{p n} - ρ^{T} \otimes W) \times \\ (V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes M) (I_{p n} - ρ^{T} \otimes W) (V e c (Y))) \\ \times {(V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes M) (I_{p n} - ρ^{T} \otimes W) (V e c (Y)))}^{T}] = 0 \end{matrix}

The partial derivative equal to zero is then solved by equalizing the form of the left equation with that of the right equation, thereby obtaining the estimator for the sigma parameter $Σ$ .

\begin{matrix} \frac{1}{2} t r [(Σ^{- 1} T_{j j *} \otimes I_{n})] & = & \frac{1}{2} t r [(Σ^{- 1} T_{j j *} Σ^{- 1} \otimes I_{n}) (I_{p n} - ρ^{T} \otimes W) (V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} \\ \times (I_{p} \otimes M) (I_{p n} - ρ^{T} \otimes W) (V e c (Y))) (V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} \\ \times {(I_{p} \otimes M) (I_{p n} - ρ^{T} \otimes W) (V e c (Y)))}^{T} (I_{p n} - ρ \otimes W^{T})] \end{matrix}

(13)

If $Φ_{i n i t i a l}$ is the following equation.

\begin{matrix} Φ_{i n i t i a l} & = & (I_{p n} - ρ^{T} \otimes W) (V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes M) \\ \times (I_{p n} - ρ^{T} \otimes W) (V e c (Y))) (V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes M) \\ \times {(I_{p n} - ρ^{T} \otimes W) (V e c (Y)))}^{T} (I_{p n} - ρ \otimes W^{T}) \end{matrix}

Given that $Φ_{i n i t i a l}$ is a $(p n \times p n)$ matrix, the subsequent step is to transform it into a $(p \times p)$ matrix, which is represented by a symbolized matrix, and then to correlate it with an $(n \times n)$ identity matrix through the use of the Kronecker product.

\underset{p n \times p n}{{\hat{Φ}}_{i n i t i a l}} = \underset{p \times p}{{\hat{Σ}}_{i n i t i a l}} \otimes \underset{n \times n}{I_{n}}

(14)

The equation below is obtained from substituting Eq. (14) into Eq. (13)

\begin{matrix} \frac{1}{2} t r [({\hat{Σ}}^{- 1} T_{j j *} \otimes I_{n})] = \frac{1}{2} t r [({\hat{Σ}}^{- 1} T_{j j *} {\hat{Σ}}^{- 1} \otimes I_{n}) ({\hat{Σ}}_{i n i t i a l} \otimes I_{n})] \\ t r [({\hat{Σ}}^{- 1} T_{j j *} \otimes I_{n})] = t r [({\hat{Σ}}^{- 1} T_{j j *} \otimes I_{n})] \end{matrix}

Based on the previous evidence, the $Σ$ parameter estimator can be approximated by Eq. (14). The $\hat{Σ}$ matrix can be formed from $(n \times n)$ block elements of ${\hat{Φ}}_{i n i t i a l}$ . Suppose ${\hat{Φ}}_{i n i t i a l}$ has the following block structure where each ${\hat{Φ}}_{j j *}$ is a $(n \times n)$ matrix.

{\hat{Φ}}_{i n i t i a l} = [\begin{matrix} {\hat{Φ}}_{i n i t i a l 11} & {\hat{Φ}}_{i n i t i a l 12} & \dots & {\hat{Φ}}_{i n i t i a l 1 p} \\ {\hat{Φ}}_{i n i t i a l 21} & {\hat{Φ}}_{i n i t i a l 22} & \dots & {\hat{Φ}}_{i n i t i a l 2 p} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\hat{Φ}}_{i n i t i a l p 1} & {\hat{Φ}}_{i n i t i a l p 2} & \dots & {\hat{Φ}}_{i n i t i a l p p} \end{matrix}]

The $\hat{Σ}$ matrix can be taken from the main diagonal elements of the ${\hat{Φ}}_{j j *}$ block. Thus, the parameter estimator is given by Eq. (15).

{\hat{Σ}}_{i n i t i a l} = [\begin{matrix} t r ({\hat{Φ}}_{i n i t i a l 11}) / n & t r ({\hat{Φ}}_{i n i t i a l 12}) / n & \dots & t r ({\hat{Φ}}_{i n i t i a l 1 p}) / n \\ t r ({\hat{Φ}}_{i n i t i a l 21}) / n & t r ({\hat{Φ}}_{i n i t i a l 22}) / n & \dots & t r ({\hat{Φ}}_{i n i t i a l 2 p}) / n \\ ⋮ & ⋮ & ⋱ & ⋮ \\ t r ({\hat{Φ}}_{i n i t i a l p 1}) / n & t r ({\hat{Φ}}_{i n i t i a l p 2}) / n & \dots & t r ({\hat{Φ}}_{i n i t i a l p p}) / n \end{matrix}]

(15)

(9), (15) demonstrate that the equation is not in closed form, necessitating the utilization of a numerical approach for its resolution. The numerical approach to estimating $ρ$ is the concentrated log-likelihood with the L-BFGS-B optimization method. The concentrated log-likelihood function for $ρ$ is the likelihood function obtained from the substitution of the $V e c {(\hat{B})}_{i n i t i a l}$ and ${\hat{Σ}}_{i n i t i a l}$ estimates shown in Eq. (16).

\begin{matrix} \ln L^{c o n} (ρ) & = & \ln L (V e c {(\hat{B})}_{M Re g}, {\hat{B}}_{W Y}, {\hat{Σ}}_{c o n}, ρ) \\ = & - \frac{p n}{2} \ln (2 π) - \frac{1}{2} \ln (| {(I_{p n} - ρ^{T} \otimes W)}^{- 1} ({\hat{Σ}}_{c o n} \otimes I_{n}) {(I_{p n} - ρ \otimes W^{T})}^{- 1} |) \\ - \frac{1}{2} {(V e c (Y) - ({(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) (V e c {(\hat{B})}_{M Re g} - (I_{p} \otimes {\hat{B}}_{W Y}) V e c (ρ))))}^{T} \\ ((I_{p n} - ρ \otimes W^{T}) {({\hat{Σ}}_{c o n} \otimes I_{n})}^{- 1} (I_{p n} - ρ^{T} \otimes W)) \end{matrix}

(V e c (Y) - ({(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) (V e c {(\hat{B})}_{M Re g} - (I_{p} \otimes {\hat{B}}_{W Y}) V e c (ρ))))

(16)

where ${\hat{Σ}}_{c o n} = [\begin{matrix} t r ({\hat{Φ}}_{c o n 11}) / n & t r ({\hat{Φ}}_{c o n 12}) / n & \dots & t r ({\hat{Φ}}_{c o n 1 p}) / n \\ t r ({\hat{Φ}}_{c o n 21}) / n & t r ({\hat{Φ}}_{c o n 22}) / n & \dots & t r ({\hat{Φ}}_{c o n 2 p}) / n \\ ⋮ & ⋮ & ⋱ & ⋮ \\ t r ({\hat{Φ}}_{c o n p 1}) / n & t r ({\hat{Φ}}_{c o n p 2}) / n & \dots & t r ({\hat{Φ}}_{c o n p p}) / n \end{matrix}]$ with

\begin{matrix} {\hat{Φ}}_{c o n} & = & (I_{p n} - ρ^{T} \otimes W) (V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) (V e c {(\hat{B})}_{M Re g} - (I_{p} \otimes {\hat{B}}_{W Y}) V e c (ρ))) \\ {(V e c (Y) - {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) (V e c {(\hat{B})}_{M Re g} - (I_{p} \otimes {\hat{B}}_{W Y}) V e c (ρ)))}^{T} (I_{p n} - ρ \otimes W^{T}) . \end{matrix}

Eq. (16) represents the concentrated log-likelihood function. This equation cannot be maximized statistically so a numerical approach is needed with the L-BFGS-B optimization method. The following steps outline the numerical procedure for maximizing the concentrated log-likelihood, thereby obtaining the value of $\hat{ρ}$ :

a. Generated a sequence of values for $ρ_{j}; j = 1, 2, . ., p$ where $ρ_{j} =$ seq(start value, end value, increasing) and substituted each into the rho matrix where $ρ = d i a g (ρ_{1 k}, ρ_{2 k}, . . ., ρ_{p k}); k = 1, 2, . . ., n (s e q (ρ_{j}))$
b. Performed bivariate regression of $V e c (Y)$ with $(I_{p} \otimes A)$ and obtained $V e c {(\hat{B})}_{M Re g}$
c. Regressed WY with A and obtained ${\hat{B}}_{W Y}$ which is a $(q + 1) \times p$ matrix.
d. Substituted $V e c {(\hat{B})}_{M Re g}$ and ${\hat{B}}_{W Y}$ into concentrated log-likelihood function.
e. Identified the value of $ρ$ that gave the maximum $\ln L^{c o n}$ and then became $\hat{ρ}$ .

Properties of estimator

The coefficients parameter in the MSAR model are estimated using Eq. (9). $V e c (\hat{B})$ is shown to be both unbiased and consistent. An estimator is considered unbiased if its expected value equals the true parameter, and consistent if it converges to the true parameter as the sample size increases. The proof is presented as follows.

\begin{matrix} E (V e c (\hat{B})) = E ((I_{p} \otimes {(A^{T} A)}^{- 1} A^{T}) (I_{p n} - ρ^{T} \otimes W) (V e c (Y))) \\ = (I_{p} \otimes {(A^{T} A)}^{- 1} A^{T}) (I_{p n} - ρ^{T} \otimes W) E (V e c (Y)) \\ = (I_{p} \otimes {(A^{T} A)}^{- 1} A^{T}) (I_{p n} - ρ^{T} \otimes W) {(I_{p n} - ρ^{T} \otimes W)}^{- 1} (I_{p} \otimes A) V e c (B) \\ = (I_{p} \otimes {(A^{T} A)}^{- 1} A^{T}) (I_{p} \otimes A) V e c (B) \\ = (I_{p} \otimes {(A^{T} A)}^{- 1} A^{T} A) V e c (B) \\ = V e c (B) \end{matrix}

Since the expectation of $V e c (\hat{B})$ equals $V e c (B)$ , it follows that $V e c (\hat{B})$ is an unbiased estimator of $V e c (B)$ .

Next, $V e c (\hat{B})$ consistency is shown below.

\begin{matrix} V a r (V e c (\hat{B})) & = & V a r ((I_{p} \otimes {(A^{T} A)}^{- 1} A^{T}) (I_{p n} - ρ^{T} \otimes W) (V e c (Y))) \\ = & [\begin{matrix} t r ({\hat{Φ}}_{11}) / n & t r ({\hat{Φ}}_{12}) / n & \dots & t r ({\hat{Φ}}_{1 p}) / n \\ t r ({\hat{Φ}}_{21}) / n & t r ({\hat{Φ}}_{22}) / n & \dots & t r ({\hat{Φ}}_{2 p}) / n \\ ⋮ & ⋮ & ⋱ & ⋮ \\ t r ({\hat{Φ}}_{p 1}) / n & t r ({\hat{Φ}}_{p 2}) / n & \dots & t r ({\hat{Φ}}_{p p}) / n \end{matrix}] \otimes {(A^{T} A)}^{- 1} \end{matrix}

\begin{matrix} lim_{n \to \infty} V a r (V e c (\hat{B})) & = & lim_{n \to \infty} [\begin{matrix} t r ({\hat{Φ}}_{11}) / n & t r ({\hat{Φ}}_{12}) / n & \dots & t r ({\hat{Φ}}_{1 p}) / n \\ t r ({\hat{Φ}}_{21}) / n & t r ({\hat{Φ}}_{22}) / n & \dots & t r ({\hat{Φ}}_{2 p}) / n \\ ⋮ & ⋮ & ⋱ & ⋮ \\ t r ({\hat{Φ}}_{p 1}) / n & t r ({\hat{Φ}}_{p 2}) / n & \dots & t r ({\hat{Φ}}_{p p}) / n \end{matrix}] \otimes lim_{n \to \infty} {(A^{T} A)}^{- 1} \\ = & 0 \otimes {(A^{T} A)}^{- 1} \\ = & 0 \end{matrix}

It can be concluded that $V e c (\hat{B})$ is an unbiased and consistent estimator.

Hypothesis testing of MSAR model

Hypothesis testing of the MSAR model parameters is conducted both simultaneously and partially. The MLRT is applied for simultaneous testing, while the Wald test is used for partial parameter testing [26,27]. The hypothesis for simultaneous testing of the model parameters is formulated as follows:

\begin{matrix} H_{0} : β_{1 j} = β_{2 j} = \dots = β_{q j} = 0, j = 1, 2, . . ., p \\ H_{1} : at least one of β_{k j} \neq 0, k = 1, 2, \dots, q, j = 1, 2, . . ., p \end{matrix}

The set of parameters under population, denoted by $Ω_{M S A R}$ , is given by $Ω_{M S A R} = {V e c (B), V e c (Σ), ρ}$ ,while the set of parameters under H₀, denoted by $ω_{M S A R}$ , is given by $ω_{M S A R} = {β_{0 ω}, (V e c (Σ_{ω})), ρ_{0 ω}}$ . The parameter estimators for the two sets, ${\hat{Ω}}_{M S A R}$ and ${\hat{ω}}_{M S A R}$ , are obtained from parameter estimation using the MLE method described in the previous section. The LRT is calculated in consideration of the formula presented in Eq. (17).

L R = \frac{L ({\hat{ω}}_{M S A R})}{L ({\hat{Ω}}_{M S A R})} < L R_{0}

(17)

Where $L ({\hat{ω}}_{M S A R})$ is the likelihood value of MSAR model using the estimated parameters under H₀ and $L ({\hat{Ω}}_{M S A R})$ is the likelihood value of MSAR model using the estimated parameters under population. Consequently, the test statistics for testing the parameters simultaneously using the MLRT is presented in Eq. (18).

G_{M S A R}^{2} = - 2 (\ln (L ({\hat{ω}}_{M S A R})) - \ln (L ({\hat{Ω}}_{M S A R})))

(18)

The critical regions for hypothesis testing are as follows:

\begin{matrix} α & = & P (L R < L R_{0}), 0 < L R_{0} \leq 1 \\ = & P (\ln L R^{2} < L R_{0}^{2}) \\ = & P (\ln L R^{2} < \ln L R_{0}^{2}) \\ = & P (G_{M S A R}^{2} > χ_{(α, d f)}^{2}) \end{matrix}

$G_{M S A R}^{2}$ is distributed according to the chi-square distribution for $n \to \infty$ , whereby the H₀ rejection region is $G_{M S A R}^{2} > χ_{(α, d f)}^{2}$ or $p - v a l u e < α$ with degree of freedom (df), which is the number of parameters under the population minus the number of parameters under H₀.

\begin{matrix} d f & = & n (L ({\hat{Ω}}_{M S A R}) - n (L ({\hat{ω}}_{M S A R})) \\ = & (p (q + 1) + 2 p + p) - (p + 2 p + p) = p q \end{matrix}

Once the null hypothesis (H₀) is rejected in the simultaneous test, partial hypothesis testing is conducted to identify which predictor variables exert a statistically significant influence on the response variable. The first partial test focuses on the spatial autoregressive parameter ρ, formulated under the following hypothesis framework:

\begin{matrix} H_{0} : ρ_{j} = 0 \\ H_{1} : ρ_{j} \neq 0; j = 1, 2, . . ., p \end{matrix}

The test statistics used for testing the above hypothesis with the Wald test is shown in Eq. (19).

W a l d_{ρ_{j}} = {(\frac{{\hat{ρ}}_{j}}{\hat{s e} ({\hat{ρ}}_{j})})}^{2} \sim χ_{1}^{2}

(19)

where $\hat{s e} (\hat{ρ_{j}})$ is obtained from the $\hat{V a r} (\hat{ρ_{j}})$ root. The $\hat{V a r} (\hat{ρ_{j}})$ value represents the main diagonal element of the Hessian matrix which is represented by $- {(H (\hat{ρ}))}^{- 1}$ and corresponds to $\hat{ρ_{j}}$ . The Wald test statistics in Eq. (19) is deemed to be statistically significant if $W a l d_{ρ_{j}} > χ_{α, 1}^{2}$ , thereby rejecting the null hypothesis (H₀).

Moreover, the partial testing of $β_{k j}$ parameters is conducted with the objective of identifying the parameters that exert a significant influence on the model. The following hypothesis is employed to test the partial $β_{k j}$ parameters:

\begin{matrix} H_{0} : β_{k j} = 0 \\ H_{1} : β_{k j} \neq 0, k = 1, 2, \dots, q, j = 1, 2, . . ., p \end{matrix}

The test statistics used for testing the partial $β_{k j}$ parameters with the Wald test is shown in Eq. (20).

W a l d_{β_{j}} = {(\frac{{\hat{β}}_{k j}}{\hat{s e} ({\hat{β}}_{k j})})}^{2} \sim χ_{1}^{2}

(20)

In this context, the term $\hat{s e} (\hat{β_{k j}})$ represents the standard error of ${\hat{β}}_{k j}$ obtained from $\hat{V a r} (\hat{β_{k j}})$ . $\hat{V a r} (\hat{β_{k j}})$ is the main diagonal element of the variance-covariance matrix $\hat{V a r} (V e c (\hat{B}))$ . The null hypothesis (H₀) is rejected if $W a l d_{β_{j}} > χ_{α, 1}^{2}$ .

Measures of model fits

To select the most suitable regression model, two commonly used evaluation metrics are the Root Mean Square Error (RMSE) and the coefficient of determination (R²). RMSE represents the average prediction error of the model, expressed in the same unit as the response variable. Models with lower RMSE values are preferred, as they indicate predictions that are closer to the observed values, reflecting a better model fit. Meanwhile, R² measures the proportion of variance in the response variable that can be explained by the predictor variables [[28], [29], [30]]. A higher R² value signifies greater explanatory power and stronger predictive performance of the model. The formulas for RMSE and R² are provided in Eqs. (21) and (22) [[31], [32], [33]].

R M S E = R M S E_{1} + R M S E_{2} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(Y_{1 i} - {\hat{Y}}_{1 i})}^{2}} + \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(Y_{2 i} - {\hat{Y}}_{2 i})}^{2}}

(21)

R^{2} = 1 - \frac{S S E}{S S T} = 1 - \frac{\sum_{i = 1}^{n} {(Y_{1 i} - {\hat{Y}}_{1 i})}^{2} + \sum_{i = 1}^{n} {(Y_{2 i} - {\hat{Y}}_{2 i})}^{2}}{\sum_{i = 1}^{n} {(Y_{1 i} - {\bar{Y}}_{1})}^{2} + \sum_{i = 1}^{n} {(Y_{2 i} - {\bar{Y}}_{2})}^{2}}

(22)

Data analysis procedure

The analysis was conducted through the following steps:

1.
Check the correlation between response variables.
2.
Test for multivariate normal distribution.
3.
Model the data using multivariate normal linear regression.
4.
Perform spatial weighting.
5.
Perform spatial dependency testing.
6.
Estimate the parameters of MSAR model.
7.
Conduct simultaneous hypothesis testing using the test statistic in Eq. (18).
8.
Conduct partial hypothesis testing using the test statistics in Eqs. (19) and (20).
9.
Evaluate model fit using Eqs. (21) and (22).
10.
Interpret the results and draw conclusions.

Method validation

To validate the application of the MSAR method, we used a real-world dataset on health issues in children under five years old.

Data set

The dataset used in this study was obtained from the Center for the Study of Regional Resources and Community Empowerment at Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia. The data are secondary in nature and pertain to the year 2023. Observations were collected from 54 villages located across four sub-districts in Tuban Regency, East Java—namely Singgahan, Kerek, Montong, and Senori—as illustrated in Fig. 1.

Fig 1 — Administrative map of 54 villages (note: colored pink) in Tuban District.

The response variables selected for analysis are the percentage of cases of pneumonia and diarrhea in children under five years old. These two response variables were found to have a correlation coefficient of 0.585. The predictor variables used in this study include the percentage of infants who received exclusive breastfeeding, the percentage of children under five who received complete basic immunization, the percentage who received vitamin A supplementation, the percentage of pregnant women who attended government-sponsored prenatal classes, and the percentage of households with access to clean water. A summary of the research data is presented in Table 1.

Table 1.

Descriptive statistics of research data.

Variable	Description	Mean	SD	Min	Max
Response	Pneumonia in toddler (Y₁) ( %)	4.82	4.91	0.00	16.68
	Diarrhea in toddler (Y₂) ( %)	12.99	8.10	0.83	30.61
Predictor	Exclusive breastfeeding (X₁) (10 %)	2.53	2.40	0.00	12.22
	Complete basic immunization (X₂) ( %)	22.56	6.70	6.12	50.00
	Toddlers who received vit. A (X₃) (10 %)	13.04	10.80	1.40	81.30
	Pregnant women who attended pregnancy classes (X₄) (10 %)	5.09	5.99	0.00	33.33
	Households with clean water (X₅) coverage ( %)	98.35	4.22	79.94	100.00

Open in a new tab

Modelling child health problems using multivariate normal linear regression

Before conducting multivariate linear regression analysis, the distribution of the response variables was assessed for multivariate normality using a quantile-quantile (Q-Q) plot. The results indicated that the Mahalanobis distance exceeded 50 %, with a proportion of 53.70 %, suggesting that the two response variables follow a bivariate normal distribution Subsequently, the parameters of the multivariate normal linear regression model were estimated, and the results are shown in Table 2. The table reveals that the predictor variables significantly influencing the prevalence of pneumonia (Y₁) in children under five are the percentage of infants who were exclusively breastfed (X₁) and the percentage of children who received complete basic immunization (X₂). Meanwhile, the variables influencing the prevalence of diarrhea (Y₂) include exclusive breastfeeding (X₁), complete basic immunization (X₂), and access to clean water (X₅).

Table 2.

Estimated values of multivariate normal linear regression parameters.

Parameters	Estimated Value	Standard Error	T	p-value
$β_{01}$	16.6738	19.5660	0.8522	0.3961
$β_{11}$	0.7882	0.3516	2.2416	0.0272^*
$β_{21}$	0.3373	0.1099	3.0685	0.0027^*
$β_{31}$	−0.0013	0.0669	−0.0206	0.9835
$β_{41}$	−0.1088	0.1251	−0.8700	0.3863
$β_{51}$	−0.2123	0.1954	−1.0867	0.2797
$β_{02}$	67.2771	19.5660	3.4383	0.0008^*
$β_{12}$	0.7053	0.3516	2.0061	0.0475^*
$β_{22}$	0.5214	0.1099	4.7431	6.85 × 10^–6^*
$β_{32}$	−0.0748	0.0669	−1.1187	0.2658
$β_{42}$	0.0664	0.1251	0.5311	0.5965
$β_{52}$	−0.6832	0.1953	−3.4966	0.0007^*

Open in a new tab

^⁎

: significant at 5 % alpha.

The multivariate normal linear regression model can be shown in the following equation.

[\begin{matrix} {\hat{Y}}_{1} \\ {\hat{Y}}_{2} \end{matrix}] = [\begin{matrix} 16.6738 + 0.7882 X_{1} + 0.3373 X_{2} - 0.0013 X_{3} - 0.1088 X_{4} - 0.2123 X_{5} \\ 67.2771 + 0.7053 X_{1} + 0.5214 X_{2} - 0.0748 X_{3} + 0.0664 X_{4} - 0.6832 X_{5} \end{matrix}]

Spatial weighting and testing for spatial dependence

The MSAR model was used to estimate the prevalence of pneumonia and diarrhea among children under five in southwestern Tuban Regency. This analysis employed a queen contiguity spatial weighting matrix, which accounts for the asymmetrical geographical layout of the region. The matrix was constructed based on shared boundaries between villages.

Following the construction of the spatial weighting matrix, spatial dependence was assessed using the residuals from the multivariate normal linear regression model. The spatial dependence test was conducted in R using the Bivariate Moran's I statistic [[34], [35], [36]], which yielded a Moran's I value of 0.1101, with an expected value of –0.0073 and a variance of 0.0051. The resulting Z-score was 1.6513, which exceeds the critical value of Z₀.₀₅ = 1.64. Therefore, the null hypothesis (H₀) of no spatial dependence is rejected. This result indicates the presence of bivariate spatial dependence in the regression residuals, justifying further spatial analysis.

Modelling child health data using the MSAR model

In MSAR modelling, the regression coefficients include a spatial effect parameter, denoted as ρ. Therefore, estimating this parameter is the first step, conducted using a numerical approximation method based on the concentrated log-likelihood function. Once the $\hat{ρ}$ parameter estimation has been obtained, $V e c (\hat{B})$ and $\hat{Σ}$ can be estimated. The results of the $V e c (\hat{B})$ estimation is presented in Table 3, while the $\hat{Σ}$ value is as follows.

\hat{Σ} = [\begin{matrix} 11.82 & 6.36 \\ 5.30 & 33.48 \end{matrix}]

Table 3.

Estimated values of multivariate spatial autoregressive parameters.

Parameters	Estimated Value	Standard Error	Wald Statistic	P-value
$ρ_{1}$	0.42	0.01	2895.30	0.00^*
$β_{01}$	15.76	12.90	1.49	0.22
$β_{11}$	0.59	0.23	6.59	0.01^*
$β_{21}$	0.25	0.07	11.98	0.00^*
$β_{31}$	0.02	0.04	0.21	0.65
$β_{41}$	−0.10	0.08	1.46	0.23
$β_{51}$	−0.20	0.13	2.46	0.12
$ρ_{2}$	0.38	0.01	14,786.09	0.00^*
$β_{02}$	62.92	21.71	8.40	0.00^*
$β_{12}$	0.41	0.39	1.13	0.29
$β_{22}$	0.40	0.12	10.95	0.00^*
$β_{32}$	−0.03	0.07	0.17	0.68
$β_{42}$	0.05	0.14	0.14	0.71
$β_{52}$	−0.66	0.22	9.25	0.00^*

Open in a new tab

^⁎

: significant at 5 % alpha.

The initial step involves simultaneous hypothesis testing of all model parameters to determine whether they collectively wield a significant influence. The value of the $G^{2}$ test statistics was 43,240.59, which is greater than the $χ_{(0, 05; 10)}^{2} = 18.307$ . Accordingly, the null hypothesis (H₀) is rejected, indicating that at least one parameter significantly contributes to the model. This justifies proceeding with partial (individual) hypothesis tests to identify which specific parameters are influential in the MSAR model.

Table 3 indicates that the parameters $ρ_{1}$ and $ρ_{2}$ are significant to the model, thereby suggesting that spatial dependencies in the rates of pneumonia and diarrhea must be considered in the model. The MSAR model for the percentage of pneumonia cases (Y₁) identifies two significant predictor variables: the percentage of infants exclusively breastfed (X₁) and the percentage of children under five who received complete basic immunization (X₂). Meanwhile, for the percentage of diarrhea cases (Y₂), the significant predictors are X₂ (complete basic immunization) and X₅ (households with access to clean water).

As shown in Table 4, the MSAR model better captures the relationship between predictor variables and child health outcomes than the standard multivariate normal linear regression model. This is demonstrated by its lower Root Mean Square Error (RMSE) of 4.97 and a higher R-squared value of approximately 60 %. These findings support the conclusion that, when multivariate data exhibit spatial autocorrelation, the MSAR model provides a more accurate and reliable estimation framework.

Table 4.

Model comparison.

Model	RMSE	R-square
Multivariate Normal Linear Regression	5.22	55.21 %
Model MSAR	4.97	59.98 %

Open in a new tab

In total, 54 distinct MSAR models were developed—one for each village. The model estimates for both pneumonia (Y₁) and diarrhea (Y₂) are summarized as ${[\begin{matrix} {\hat{Y}}_{1 i} & {\hat{Y}}_{2 i} \end{matrix}]}^{T}$ where:

{\hat{Y}}_{1 i} = 15.76 + 0.42 \sum_{i * = 1, i \neq i *}^{54} w_{i i *} y_{1 i *} + 0.59 X_{1 i} + 0.25 X_{2 i} + 0.02 X_{3 i} - 0.10 X_{4 i} - 0.20 X_{5 i}

{\hat{Y}}_{2 i} = 62.92 + 0.38 \sum_{i * = 1, i \neq i *}^{54} w_{i i *} y_{2 i *} + 0.41 X_{1 i} + 0.41 X_{2 i} - 0.03 X_{3 i} + 0.05 X_{4 i} - 0.66 X_{5 i}

Taking Gemulung village as an example, the MSAR model for Gemulung village (code number 5) is ${[\begin{matrix} {\hat{Y}}_{15} & {\hat{Y}}_{25} \end{matrix}]}^{T}$ where:

\begin{matrix} {\hat{Y}}_{15} & = & 15.76 + 0.11 Y_{128} + 0.11 Y_{138} + 0.11 Y_{149} + 0.11 Y_{153} \\ + 0.59 X_{15} + 0.25 X_{25} + 0.02 X_{35} - 0.10 X_{45} - 0.20 X_{55} \end{matrix}

\begin{matrix} {\hat{Y}}_{25} & = & 62.92 + 0.09 Y_{228} + 0.09 Y_{238} + 0.09 Y_{249} + 0.09 Y_{253} \\ + 0.41 X_{15} + 0.40 X_{25} - 0.03 X_{35} + 0.05 X_{45} - 0.66 X_{55} \end{matrix}

The above MSAR model of Gemulung Village can be interpreted as follows:

1.
For every 100 children under five, approximately 10 to 11 are affected by pneumonia, and 9 to 10 by diarrhea in Gemulung Village. Similar patterns are likely present in neighboring villages—Mulyoagung, Sidonganti, Trantang, and Wolutengah—due to spatial dependence.
2.
A 1 % increase in the proportion of exclusively breastfed infants is associated with a rise in pneumonia cases, which contradicts theoretical expectations. This may be due to the lagging effect of exclusive breastfeeding on pneumonia incidence. Additionally, pneumonia cases in Gemulung appear to influence similar increases in the four neighboring villages. No significant relationship was found between exclusive breastfeeding and diarrhea prevalence.
3.
A 1 % increase in the percentage of children receiving complete basic immunization is linked to higher pneumonia and diarrhea rates. This finding contradicts existing theory, likely due to temporal lag in the variable's impact. Increases in pneumonia and diarrhea in Gemulung are associated with corresponding rises (10–11 and 9–10 cases per 100 children, respectively) in neighboring villages.
4.
The percentage of children under five who received vitamin A supplementation showed no significant effect on pneumonia or diarrhea incidence.
5.
The proportion of pregnant women attending pregnancy classes did not significantly influence pneumonia or diarrhea rates among children under five.
6.
A 1 % increase in household clean water coverage is associated with a decrease of approximately one diarrhea case per 100 children under five, but has no significant effect on pneumonia rates. Diarrhea cases in Gemulung also appear to influence similar increases (9–10 per 100) in the neighboring villages.

Conclusions

This study focused on area-based spatial modeling in the context of multivariate response regression, introducing the MSAR model as an extension of the conventional SAR model. The MSAR approach incorporates geographic weighting to account for spatial dependencies between neighboring regions. Parameter estimation was carried out using MLE via concentrated log-likelihood, which resulted in unbiased and consistent estimates. The significance of model parameters was tested both simultaneously using the LRT and partially using the Wald Test, which enabled the identification of influential predictor variables. The application of the MSAR model to data on pneumonia and diarrhea cases among children under five in Tuban Regency, East Java, demonstrated its effectiveness in handling spatial autocorrelation. Compared to the standard multivariate normal linear regression, the MSAR model showed better accuracy. The variables that affect the incidence of pneumonia and diarrhea were the percentage of infants who receive exclusive breastfeeding, the percentage of toddlers who receive complete basic immunization, and the percentage of households which have access clean water. However, the current model is limited to multivariate normal data distributions. Future research should explore extensions of the MSAR framework that can accommodate non-normal data.

Limitations

Assumption of error distribution is normal distribution.

Ethics statements

The data used in this research has been approved by the Center for the Study of Regional Resources and Community Empowerment Institut Teknologi Sepuluh Nopember Surabaya, Indonesia.

Supplementary material and/or additional information [Optional]

None

CRediT authorship contribution statement

Sutikno: Conceptualization, Methodology, Writing – original draft, Writing – review & editing, Validation. Purhadi: Methodology, Conceptualization. Fachrunisah: Visualization, Writing – review & editing, Software. Fajar Dwi Cahyoko: Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The first author would like to gratefully acknowledge the Government of Tuban Regency for providing funding for this research.

Footnotes

Related research article: None

For a published article: None

Appendix A

Fig. A1 and Table A1.

Table A1.

Village and neighbor codes.

Village Codes	Village	Sub-district	Count	Neighbor
1	Banyuurip	Senori	3	19	51	54
2	Binangun	Singgahan	6	34	37	45	50	51	52
3	Bringin	Montong	3	20	32	40
4	Gaji	Kerek	7	7	8	16	23	26	47	53
5	Gemulung	Kerek	4	28	38	49	53
6	Guwoterus	Montong	6	28	30	38	41	47	48
7	Hargoretno	Kerek	8	4	8	27	31	33	41	46	47
8	Jarorejo	Kerek	5	4	7	22	23	46
9	Jatisari	Senori	5	11	19	24	36	51
10	Jetakss	Montong	4	20	33	40	42
11	Kaligede	Senori	2	9	19
12	Karanglo	Kerek	3	22	31	39
13	Kasiman	Kerek	4	16	23	26	39
14	Katerban	Senori	1	34
15	Kedungjambe	Singgahan	4	29	35	44	50
16	Kedungrejo	Kerek	4	4	13	23	26
17	Lajo Kidul	Singgahan	4	18	36	43	45
18	Lajo Lor	Singgahan	3	17	28	43
19	Leran	Senori	4	1	9	11	51
20	Maindu	Montong	3	3	10	40
21	Manjung	Montong	1	44
22	Margomulyo	Kerek	6	8	12	23	31	39	46
23	Margorejo	Kerek	6	4	8	13	16	22	39
24	Medalem	Senori	2	9	36
25	Mergosari	Singgahan	5	28	29	43	45	50
26	Mliwang	Kerek	3	4	13	16
27	Montongsekar	Montong	4	7	32	33	41
28	Mulyoagung	Singgahan	8	5	6	18	25	29	38	43	48
29	Mulyorejo	Singgahan	7	15	25	28	30	44	48	50
30	Nguluhan	Montong	5	6	29	41	44	48
31	Padasan	Kerek	5	7	12	22	33	46
32	Pakel	Montong	6	3	27	33	40	41	44
33	Pucangan	Montong	7	7	10	27	31	32	40	42
34	Rayung	Senori	6	2	14	35	37	50	54
35	Saringembat	Singgahan	3	15	34	50
36	Sendang	Senori	5	9	17	24	45	51
37	Sidoharjo	Senori	4	2	34	52	54
38	Sidonganti	Kerek	5	5	6	28	47	49
39	Sumberarum	Kerek	4	12	13	22	23
40	Sumurgung	Montong	5	3	10	20	32	33
41	Talangkembar	Montong	7	6	7	27	30	32	44	47
42	Talun	Montong	2	10	33
43	Tanggir	Singgahan	5	17	18	25	28	45
44	Tanggulangin	Montong	6	15	21	29	30	32	41
45	Tanjungrejo	Singgahan	7	2	17	25	36	43	50	51
46	Temayang	Kerek	4	7	8	22	31
47	Tengger Wetan	Kerek	7	4	6	7	38	41	49	53
48	Tingkis	Singgahan	4	6	28	29	30
49	Trantang	Kerek	4	5	38	47	53
50	Tunggulrejo	Singgahan	7	2	15	25	29	34	35	45
51	Wanglu Kulon	Senori	8	1	2	9	19	36	45	52	54
52	Wanglu Wetan	Senori	4	2	37	51	54
53	Wolutengah	Kerek	4	4	5	47	49
54	Wonosari	Senori	5	1	34	37	51	52

Open in a new tab

Fig A1 — Map of tuban regency village codes.

Data availability

Data will be made available on request.

References

1.Mennis J., Guo D. Spatial data mining and geographic knowledge discovery-an introduction. Comput. Environ. Urban Syst. 2009;33:403–408. doi: 10.1016/j.compenvurbsys.2009.11.001. [DOI] [Google Scholar]
2.Charles A.C., Armstrong A., Nnamdi O.C., Innocent M.T., Obiageri N.J., Begianpuye A.F., Timothy E.E. Review of spatial analysis as a geographic information management tool. Am. J. Eng. Technol. Manag. 2024 doi: 10.11648/j.ajetm.20240901.12. [DOI] [Google Scholar]
3.Krisztin T., Piribauer P. A Bayesian approach for the estimation of weight matrices in spatial autoregressive models. Spat. Econ. Anal. 2023;18:44–63. doi: 10.1080/17421772.2022.2095426. [DOI] [Google Scholar]
4.Koley M., Bera A.K. Springer International Publishing; 2022. Testing For Spatial Dependence in a Spatial Autoregressive (SAR) Model in the Presence of Endogenous Regressors. [DOI] [Google Scholar]
5.Liu X., Chen J. Variable selection for the spatial autoregressive model with autoregressive disturbances. Mathematics. 2021;9 https://www.mdpi.com/2227-7390/9/12/1448 [Google Scholar]
6.LeSage J., Pace R.K. Chapman and Hall/CRC; New York: 2009. Introduction to Spatial Econometrics. [DOI] [Google Scholar]
7.Yokoi T. 50th Congr. Eur. Reg. Sci. Assoc. "Sustainable Reg. Growth Dev. Creat. Knowl. Econ. 2010. Efficient maximum likelihood estimation of spatial autoregressive models with normal but heteroskedastic disturbances. [DOI] [Google Scholar]
8.Jeong H., fei Lee L. Maximum likelihood estimation of a spatial autoregressive model for origin–destination flow variables. J. Econom. 2024;242 doi: 10.1016/j.jeconom.2024.105790. [DOI] [Google Scholar]
9.Anselin L. Springer Netherlands Dordrecht; 1988. Spatial Econometrics: Methods and Models. [DOI] [Google Scholar]
10.Yang H., Huang W., Ma X., Xu Y., Huang M. Proc. 2022 3rd Int. Conf. Big Data Soc. Sci. (ICBDSS 2022) Atlantis Press International BV; 2022. Research on the time-space impact paths of economic convergence-empirical evidence from 30 provinces in China; pp. 110–122. [DOI] [Google Scholar]
11.Liu T., Lee L. A likelihood ratio test for spatial model selection. J. Econom. 2019;213:434–458. doi: 10.1016/j.jeconom.2019.07.001. [DOI] [Google Scholar]
12.Yang K., fei Lee L. Identification and QML estimation of multivariate and simultaneous equations spatial autoregressive models. J. Econom. 2017;196:196–214. doi: 10.1016/j.jeconom.2016.04.019. [DOI] [Google Scholar]
13.Su L., Jin S. Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive modelsI. J. Econom. 2010;157:18–33. doi: 10.1016/j.jeconom.2009.10.033. [DOI] [Google Scholar]
14.Liu X., Lee L.F. Two-stage least squares estimation of spatial autoregressive models with endogenous regressors and many instruments. Econom. Rev. 2013;32:734–753. doi: 10.1080/07474938.2013.741018. [DOI] [Google Scholar]
15.Kelejian H.H., Prucha I.R. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real Estate Financ. Econ. 1998;17:99–121. doi: 10.1023/A:1007707430416. [DOI] [Google Scholar]
16.Sirait T. Multivariate general spatial three-stage least squares fixed effect panel simultaneous models and estimation of their parameters. WSEAS Trans. Math. 2020;19:373–383. doi: 10.37394/23206.2020.19.38. [DOI] [Google Scholar]
17.Luo G., Wu M., Pang Z. Estimation of spatial autoregressive models with covariate measurement errors. J. Multivar. Anal. 2022 https://www.sciencedirect.com/science/article/pii/S0047259X22000872 [Google Scholar]
18.White H. Maximum likelihood estimation of misspecified models. Econometrica. 1982;50:1–25. doi: 10.4337/9781035334926.00009. [DOI] [Google Scholar]
19.Nocedal J., Liu D.C. On the limited memory BFGS method for large scale optimization. Math. Program. 1989;45:503–528. [Google Scholar]
20.Gerber F., Furrer R. OptimParallel: an R package providing a parallel version of the l-BFGS-B optimization method. R J. 2019:11. doi: 10.32614/rj-2019-030. [DOI] [Google Scholar]
21.Xiao Y., Wei Z., Wang Z. A limited memory BFGS-type method for large-scale unconstrained optimization. Comput. Math. with Appl. 2008;56:1001–1009. doi: 10.1016/j.camwa.2008.01.028. [DOI] [Google Scholar]
22.Hu W., Jing B., Zhang B., Huang D. Crawling subsampling for multivariate spatial autoregression model in large-scale networks. Electron. J. Stat. 2021;15:3678–3707. doi: 10.1214/21-EJS1872. [DOI] [Google Scholar]
23.Zhu X., Huang D., Pan R., Wang H. Multivariate spatial autoregressive model for large scale social networks. J. Econom. 2020;215:591–606. doi: 10.1016/j.jeconom.2018.11.018. [DOI] [Google Scholar]
24.Byrd R., Lu P., Nocedal J., Zhu C. A limited memory algorithm for bound constrained optimization. J. Sci. Comput. 1995;16:1190–1208. [Google Scholar]
25.Christensen R. Springer; New York: 1991. Linear Models for Multivariate, Time Series, and Spatial Data. [Google Scholar]
26.Yasin H., Purhadi A.Choiruddin. Spatial clustering based on geographically weighted multivariate generalized gamma regression. MethodsX. 2024;13 doi: 10.1016/j.mex.2024.102903. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Fadmi F.R., Otok B.W., Kuntoro S.Melaniani, Sriningsih R. Segmentation of stunting, wasting, and underweight in Southeast Sulawesi using geographically weighted multivariate Poisson regression. MethodsX. 2024;12 doi: 10.1016/j.mex.2024.102736. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.D.N. Gujarati, D.C. Porter, Basic Econometrics, 5 ed, McGraw-Hill Education, 2008.
29.Ozili P.K. The acceptable R-square in empirical modelling for social science research. Soc. Res. Methodol. Publ. Results. 2022 @. [Google Scholar]
30.Chicco D., Warrens M.J., Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021;7:1–24. doi: 10.7717/PEERJ-CS.623. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Johnson R.A., Wichern D.W. Pearson Prentice Hall; 2007. Applied Multivariate Statistical Analysis. 6 ed. [Google Scholar]
32.E. Kasuya, On the use of r and r squared in correlation and regression, 2018. 10.1111/1440-1703.1011. [DOI]
33.Keer M., Lohiya H., Chouhan S. Goodness of Fit for Linear Regression using R squared and Adjusted R-Squared. Int. J. Res. Publ. Rev. J. Homepage. 2023;4:2431–2439. @@. [Google Scholar]
34.Yamada H. Moran's I for Multivariate Spatial Data. Mathematics. 2024;12:2746. doi: 10.3390/math12172746. [DOI] [Google Scholar]
35.Bivand R.S., Wong D.W.S. Comparing implementations of global and local indicators of spatial association. TEST An Off. J. Spanish Soc. Stat. Oper. Res. 2018;27:716–748. doi: 10.1007/s11749-018-0599-x. [DOI] [Google Scholar]
36.Cheng Z. The spatial correlation and interaction between manufacturing agglomeration and environmental pollution. Ecol. Indic. 2016;61:1024–1032. doi: 10.1016/j.ecolind.2015.10.060. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.

[bib0001] 1.Mennis J., Guo D. Spatial data mining and geographic knowledge discovery-an introduction. Comput. Environ. Urban Syst. 2009;33:403–408. doi: 10.1016/j.compenvurbsys.2009.11.001. [DOI] [Google Scholar]

[bib0002] 2.Charles A.C., Armstrong A., Nnamdi O.C., Innocent M.T., Obiageri N.J., Begianpuye A.F., Timothy E.E. Review of spatial analysis as a geographic information management tool. Am. J. Eng. Technol. Manag. 2024 doi: 10.11648/j.ajetm.20240901.12. [DOI] [Google Scholar]

[bib0003] 3.Krisztin T., Piribauer P. A Bayesian approach for the estimation of weight matrices in spatial autoregressive models. Spat. Econ. Anal. 2023;18:44–63. doi: 10.1080/17421772.2022.2095426. [DOI] [Google Scholar]

[bib0004] 4.Koley M., Bera A.K. Springer International Publishing; 2022. Testing For Spatial Dependence in a Spatial Autoregressive (SAR) Model in the Presence of Endogenous Regressors. [DOI] [Google Scholar]

[bib0005] 5.Liu X., Chen J. Variable selection for the spatial autoregressive model with autoregressive disturbances. Mathematics. 2021;9 https://www.mdpi.com/2227-7390/9/12/1448 [Google Scholar]

[bib0006] 6.LeSage J., Pace R.K. Chapman and Hall/CRC; New York: 2009. Introduction to Spatial Econometrics. [DOI] [Google Scholar]

[bib0007] 7.Yokoi T. 50th Congr. Eur. Reg. Sci. Assoc. "Sustainable Reg. Growth Dev. Creat. Knowl. Econ. 2010. Efficient maximum likelihood estimation of spatial autoregressive models with normal but heteroskedastic disturbances. [DOI] [Google Scholar]

[bib0008] 8.Jeong H., fei Lee L. Maximum likelihood estimation of a spatial autoregressive model for origin–destination flow variables. J. Econom. 2024;242 doi: 10.1016/j.jeconom.2024.105790. [DOI] [Google Scholar]

[bib0009] 9.Anselin L. Springer Netherlands Dordrecht; 1988. Spatial Econometrics: Methods and Models. [DOI] [Google Scholar]

[bib0010] 10.Yang H., Huang W., Ma X., Xu Y., Huang M. Proc. 2022 3rd Int. Conf. Big Data Soc. Sci. (ICBDSS 2022) Atlantis Press International BV; 2022. Research on the time-space impact paths of economic convergence-empirical evidence from 30 provinces in China; pp. 110–122. [DOI] [Google Scholar]

[bib0011] 11.Liu T., Lee L. A likelihood ratio test for spatial model selection. J. Econom. 2019;213:434–458. doi: 10.1016/j.jeconom.2019.07.001. [DOI] [Google Scholar]

[bib0012] 12.Yang K., fei Lee L. Identification and QML estimation of multivariate and simultaneous equations spatial autoregressive models. J. Econom. 2017;196:196–214. doi: 10.1016/j.jeconom.2016.04.019. [DOI] [Google Scholar]

[bib0013] 13.Su L., Jin S. Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive modelsI. J. Econom. 2010;157:18–33. doi: 10.1016/j.jeconom.2009.10.033. [DOI] [Google Scholar]

[bib0014] 14.Liu X., Lee L.F. Two-stage least squares estimation of spatial autoregressive models with endogenous regressors and many instruments. Econom. Rev. 2013;32:734–753. doi: 10.1080/07474938.2013.741018. [DOI] [Google Scholar]

[bib0015] 15.Kelejian H.H., Prucha I.R. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real Estate Financ. Econ. 1998;17:99–121. doi: 10.1023/A:1007707430416. [DOI] [Google Scholar]

[bib0016] 16.Sirait T. Multivariate general spatial three-stage least squares fixed effect panel simultaneous models and estimation of their parameters. WSEAS Trans. Math. 2020;19:373–383. doi: 10.37394/23206.2020.19.38. [DOI] [Google Scholar]

[bib0017] 17.Luo G., Wu M., Pang Z. Estimation of spatial autoregressive models with covariate measurement errors. J. Multivar. Anal. 2022 https://www.sciencedirect.com/science/article/pii/S0047259X22000872 [Google Scholar]

[bib0018] 18.White H. Maximum likelihood estimation of misspecified models. Econometrica. 1982;50:1–25. doi: 10.4337/9781035334926.00009. [DOI] [Google Scholar]

[bib0019] 19.Nocedal J., Liu D.C. On the limited memory BFGS method for large scale optimization. Math. Program. 1989;45:503–528. [Google Scholar]

[bib0020] 20.Gerber F., Furrer R. OptimParallel: an R package providing a parallel version of the l-BFGS-B optimization method. R J. 2019:11. doi: 10.32614/rj-2019-030. [DOI] [Google Scholar]

[bib0021] 21.Xiao Y., Wei Z., Wang Z. A limited memory BFGS-type method for large-scale unconstrained optimization. Comput. Math. with Appl. 2008;56:1001–1009. doi: 10.1016/j.camwa.2008.01.028. [DOI] [Google Scholar]

[bib0022] 22.Hu W., Jing B., Zhang B., Huang D. Crawling subsampling for multivariate spatial autoregression model in large-scale networks. Electron. J. Stat. 2021;15:3678–3707. doi: 10.1214/21-EJS1872. [DOI] [Google Scholar]

[bib0023] 23.Zhu X., Huang D., Pan R., Wang H. Multivariate spatial autoregressive model for large scale social networks. J. Econom. 2020;215:591–606. doi: 10.1016/j.jeconom.2018.11.018. [DOI] [Google Scholar]

[bib0024] 24.Byrd R., Lu P., Nocedal J., Zhu C. A limited memory algorithm for bound constrained optimization. J. Sci. Comput. 1995;16:1190–1208. [Google Scholar]

[bib0025] 25.Christensen R. Springer; New York: 1991. Linear Models for Multivariate, Time Series, and Spatial Data. [Google Scholar]

[bib0026] 26.Yasin H., Purhadi A.Choiruddin. Spatial clustering based on geographically weighted multivariate generalized gamma regression. MethodsX. 2024;13 doi: 10.1016/j.mex.2024.102903. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0027] 27.Fadmi F.R., Otok B.W., Kuntoro S.Melaniani, Sriningsih R. Segmentation of stunting, wasting, and underweight in Southeast Sulawesi using geographically weighted multivariate Poisson regression. MethodsX. 2024;12 doi: 10.1016/j.mex.2024.102736. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0028] 28.D.N. Gujarati, D.C. Porter, Basic Econometrics, 5 ed, McGraw-Hill Education, 2008.

[bib0029] 29.Ozili P.K. The acceptable R-square in empirical modelling for social science research. Soc. Res. Methodol. Publ. Results. 2022 @. [Google Scholar]

[bib0030] 30.Chicco D., Warrens M.J., Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021;7:1–24. doi: 10.7717/PEERJ-CS.623. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0031] 31.Johnson R.A., Wichern D.W. Pearson Prentice Hall; 2007. Applied Multivariate Statistical Analysis. 6 ed. [Google Scholar]

[bib0032] 32.E. Kasuya, On the use of r and r squared in correlation and regression, 2018. 10.1111/1440-1703.1011. [DOI]

[bib0033] 33.Keer M., Lohiya H., Chouhan S. Goodness of Fit for Linear Regression using R squared and Adjusted R-Squared. Int. J. Res. Publ. Rev. J. Homepage. 2023;4:2431–2439. @@. [Google Scholar]

[bib0034] 34.Yamada H. Moran's I for Multivariate Spatial Data. Mathematics. 2024;12:2746. doi: 10.3390/math12172746. [DOI] [Google Scholar]

[bib0035] 35.Bivand R.S., Wong D.W.S. Comparing implementations of global and local indicators of spatial association. TEST An Off. J. Spanish Soc. Stat. Oper. Res. 2018;27:716–748. doi: 10.1007/s11749-018-0599-x. [DOI] [Google Scholar]

[bib0036] 36.Cheng Z. The spatial correlation and interaction between manufacturing agglomeration and environmental pollution. Ecol. Indic. 2016;61:1024–1032. doi: 10.1016/j.ecolind.2015.10.060. [DOI] [Google Scholar]

PERMALINK

Estimation of parameters and hypothesis testing of multivariate spatial autoregressive model

Sutikno

Purhadi

Fachrunisah

Fajar Dwi Cahyoko

Abstract

Graphical abstract

Background

Method details

Multivariate normal linear regression

Multivariate spatial autoregressive

Parameter estimation of MSAR model

Properties of estimator

Hypothesis testing of MSAR model

Measures of model fits

Data analysis procedure

Method validation

Data set

Fig. 1.

Table 1.

Modelling child health problems using multivariate normal linear regression

Table 2.

Spatial weighting and testing for spatial dependence

Modelling child health data using the MSAR model

Table 3.

Table 4.

Conclusions

Limitations

Ethics statements

Supplementary material and/or additional information [Optional]

CRediT authorship contribution statement

Declaration of competing interest

Acknowledgments

Footnotes

Appendix A

Table A1.

Fig. A1.

Data availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases