Skip to main content
MethodsX logoLink to MethodsX
. 2023 Jan 5;10:102002. doi: 10.1016/j.mex.2023.102002

Geographically weighted generalized poisson regression model with the best kernel function in the case of the number of postpartum maternal mortality in east java

Sischa Wahyuning Tyas 1,, Gunardi 1, Lila Aryani Puspitasari 1
PMCID: PMC9874063  PMID: 36713305

Abstract

This study proposes a Geographically Weighted Generalized Poisson Regression (GWGPR) model with the best kernel function to obtain a model of the number of postpartum maternal mortality in East Java Province in 2020 and determine the factors that affect the number of maternal postpartum mortality in East Java in 2020. The kernel functions used in this study are fixed bisquare kernel, fixed tricube kernel, and adaptive bisquare kernel. Optimum bandwidth selection using the Cross-Validation (CV) method. The results obtained the best model is the GWGPR model with a fixed bisquare kernel because it produces the smallest AIC value of 194.92. Variables significantly affecting the number of maternal postpartum mortality in East Java in 2020 vary in each district/city where there are three regional groups. The percentage of pregnant women who had a pregnancy visit K1, the percentage of pregnant women who had a pregnancy visit K4, the percentage of households receiving cash assistance, and the ratio of hospitals and health centres have a significant effect on Kabupaten Blitar, Mojokerto, Gresik, Bangkalan, Blitar City, Mojokerto City, Surabaya City. While the five predictor variables together significantly affect districts/cities included in group 3, such as Ponorogo, Trenggalek, Tulungagung, Kediri, Malang, Lumajang, Jember, Banyuwangi, Bondowoso and so on. Some of the highlights of the proposed approach are:

  • Generalized Poisson regression model using the maximum likelihood estimation (MLE) method.

  • The kernel functions used in the Geographically Weighted Generalized Poisson Regression (GWGPR) model to determine bandwidth are fixed bisquare, fixed tricube, and adaptive bisquare kernel functions selected using the Cross Validation (CV) method.

  • The computation procedure is easy to implement.

Keywords: Poisson distribution, GPR, GWGPR, Bandwidth, Kernel, Cross-validation (CV), MMR

Method name: Geographically Weighted Generalized Poisson Regression (GWGPR)

Graphical abstract

Image, graphical abstract


Specifications table

Subject area: Mathematics and Statistics
More specific subject area: Statistics: Regression, Spatial Regression.
Name of your method: Geographically Weighted Generalized Poisson Regression (GWGPR)
Name and reference of the original method: Original Method
Geographically weighted Poisson regression models with different kernels
Reference
Murakami D., Tsutsumida N., Yoshida T., “Stable Geographically Weighted Poisson Regression for Count Data, GIScience 2021 Short Paper Proceedings. (2021).
Resource availability: Several maternal postpartum mortalities (Y) from East Java Province Health Profile 2020 and the predictors (X) from the Central Statistics Agency in East Java.

Method details

Introduction

Regression analysis is a statistical method used to determine the relationship between variables. In a study, it is often found that the object of research is in the form of count data influenced by several explanatory variables. As the case of the number of maternal deaths is a rare event with discrete data, the appropriate regression analysis model to explain the relationship between the dependent variable in the form of discrete and rare data with independent variables in the form of discrete, continuous, categorical, or mixed data is the Poisson regression model. The Poisson regression model is a non-linear regression model used to model count or discrete data, with the main requirement being that the response variable has a Poisson distribution [1]. In the Poisson regression model, there is a specific assumption that the mean and variance of the response variable are equal (equidispersion).

However, there are times when the equidispersion assumption is only sometimes met. Namely, here can also be cases of overdispersion (the value of the data variance is greater than the value of the mean) or underdispersion (the value of the data variance is smaller than the value of the mean) in data modeling with a Poisson distribution [2]. One method that can be used to overcome these cases is Generalized Poisson Regression (GPR). The generalized regression model is one of the alternative models counting data with overdispersion or underdispersion cases in Poisson regression. Generalized Poisson Regression modeling produces a global regression model for all observation locations. In some instances, each observation location has a set of data that is different from one location to another, so to overcome this diversity, spatial data analysis can be used. Spatial regression modeling is performed on response variables in the form of count data that experience underdispersion or overdispersion and depend on the characteristics of the observed location using Geographically Weighted Generalized Poisson Regression (GWGPR). The GWGPR model develops generalized Poisson Regression by considering weights in the form of latitude and longitude of the observation location points. The GWGPR model's parameter estimator is local for each point or observation location. Some research on the GWGPR model is research by Ghanim, Al-Hasani, et al. (2021) entitled "Geographically weighted Poisson regression models with different kernels: application to road traffic accidents," analysing traffic accident data in the country of Oman using the Geographically Weighted Generalized Poisson Regression method with optimum weighting using adaptive kernel functions, namely box-car, bi-square, tricube, exponential and Gaussian [3].

Murakami, Daisuke et al. (2021) used the Geographically Weighted Generalized Poisson Regression method to analyze the covid-19 case in Tokyo, Japan. In this study, the bandwidth used was an adaptive and fixed Gaussian kernel [4]. Maternal Mortality Rate (MMR) is the number of women who die from a cause of death related to pregnancy disorders or their handling during pregnancy, childbirth, or in the maternal postpartum period (42 days after childbirth) but not caused by accidents or injuries [5]. MMR is an essential indicator in determining the degree of community welfare, especially in the health sector. MMR in developing countries such as Indonesia is still relatively high. Based on data from the Ministry of Health in 2020, maternal mortality in Indonesia reached 4627. This figure increased by 10.25% compared to 2019 [6]. East Java Province ranks second in the region with the highest MMR in Indonesia, with 565 maternal mortalities, most of which are caused by maternal postpartum mortality, amounting to 284 people [7]. In their research, Fronczak et al. (2005) stated that 75% of complications occurred postpartum, resulting in high maternal mortality [8]. This is an essential concern for the Indonesian government, especially in East Java Province, so MMR is one of the Sustainable Development Goals (SDGs) targets from 2015 to 2030, namely 70 per 100,000 live births in 2030 [9]. To reduce MMR, problems related to pregnancy, childbirth, and especially the postpartum period (42 days after delivery) cannot be separated from the various factors that influence it. This research will examine case data on maternal postpartum mortality in East Java in 2020 using the GWGPR method. In the GWGPR model, the optimum weight selection uses the fixed kernel bisquare, fixed kernel tricube and adaptive kernel bisquare functions, which are selected using the Cross-Validation (CV) method. Researchers use this method because there has been no previous research using the three kernels mentioned and the indicators considered to affect the dependent variable are also new.

Model specifications and estimation procedures

Multicollinearity

Multicollinearity in the regression model can be determined using the Variance Inflation Factor (VIF), which is expressed in the following equation:

VIFj=11Rj2 (1)

where Rj2=SSRSST=i=1n(X^ijX¯ij)2i=1n(XijX¯ij)2, SSR (Sum of Squares Regression) is the variation caused by the relationship between predictor variables, SST (Sum of Squares Total) is a measure of the variation of the value of X from the mean of X itself, Xij is the value of Xj at the ith observation, X¯j is average value of Xj and X^j is the predicted value of Xj at ith observation.

Poisson distribution

In some research, it is often found that the object of research is in the form of count data influenced by several explanatory variables. A regression model based on the Poisson distribution can determine the relationship pattern between variables. The Poisson distribution is a discrete distribution that measures the probability of a certain number of events or events occurring within a certain period. The probability of a Poisson distribution with a random variable Y and parameter μ>0, which represents the number of successes that occur in a given time interval or region, is [10]

f(Y;μ)=eμμyy!y=1,2,3, (2)

where μ is the average number of ’successes’ that occur in a given time interval or region and e=2.7183. The Poisson distribution has the same mean value and variance as μ.

Poisson regression

The Poisson regression model is a non-linear regression model used to model count or discrete data, with the main requirement being that the response variable has a Poisson distribution. Poisson regression modeling the expected value of the response variable, with E(Y) as a linear function of the predictor variables (Xi) is defined as follows [11].

E(Y|Xi)=μ(Xi,β) (3)

where

XiT=[1xi1xi2xik]
β=[β0β1β2βk]T

Since the mean (μ) of Poisson distribution is always positive, the function μ(Xi,β) is chosen such that the linear predictor is:

ηi=β0+β1xi1+β2xi2+...+βkxik (4)

which can represent any real value to a positive real value. The logarithmic function is a suitable link function to model Poisson regression in the Generalized Linear Model approach, is:

lnE(Y|Xi)=lnμi=β0+β1xi1+β2xi2+...+βkxik=ηi (5)

or

μi=exp(β0+β1xi1+β2xi2+...+βkxik)=exp(XiTβ) (6)

Thus, the Poisson regression model for the response variable Y, which is Poisson distributed with parameter μi is:

μi=exp(XiTβ) (7)

Parameter estimation of the Poisson regression model can be done using the Maximum Likelihood Estimation (MLE) method. The maximum likelihood estimator for the parameter (β) is expressed by (β^), which is obtained by maximizing the likelihood function. Assuming that y1,y2,…, yn are random variables that are mutually independent and ynPoisson(μi),then the likelihood function for Poisson regression model is:

L(β)=i=1nf(yi;β,μi)=[[i=1nμiyi]exp[i=1nμi]i=1nyi!] (8)

The log-likelihood function of the Poisson regression model is:

lnL(β)=[[i=1nμiyi]exp[i=1nμi]i=1nyi!]=ln[i=1nμiyi]lnexp[i=1nμi]lni=1nyi!=i=1nyilnμii=1nμii=1nyi!

Because of μi=exp(XiTβ) so that

lnL(β)=i=1nyiln[exp(XiTβ)]i=1nexp(XiTβ)i=1nlnyi!=i=1nyi(XiTβ)i=1n(XiTβ)i=1nlnyi! (9)

Furthermore, to obtain the estimator (β^), the log-likelihood function in Eq. (9) ismaximized by lowering it concerning the parameter β and equating it to zero. Parameter estimation using the MLE method obtained an equation that is not closed form, so we can estimate the parameter β^ by using the Newton-Raphson iteration method as follows:

equalβ^(m+1)=β^(m)H1(βm)g(βm) (10)

where H1(βm) is Hessian matrix and g(βm) is gradient vector.

Testing the parameters of the Poisson regression model is carried out to test whether the regression model parameters have a significant effect on the response variable (Y). Model parameter testing using the partial and simultaneous tests is as follows.

Hypothesis:

  • H0:β1=β2==βk=0

  • H1: at least one βp ≠ 0, p = 1, 2, …, k

The test statistic used:

D(β^)=2ln(L(w^)L(Ω^))=2(lnL(w^)L(Ω^)) (11)

The L(w^) function is the maximum likelihood value for a simple model with no predictors involved, and L(Ω^) is the maximum likelihood value for a complete model with predictor variables involved. The rejection region of H0 is to reject H0 if the value of D(β^)>x(a,k)2, this means that there is at least one variable that has a significant influence on the response variable (Y). The smaller the value of D(β^) indicates, the smaller the error rate in the model [12].

After testing the parameters simultaneously, we will continue with partial parameter testing. The partial parameter testing hypothesis is as follows:

Hypothesis:

  • H0:βp=0,p=1,2,,k

  • H1:βp ≠ 0,

The test statistic used:

Z=β^pseβ^p (12)

The rejection region of H0 is to reject H0 at the significance level α if the value of |Z|>Z(α/2) this means that the variable p has a significant effect on the response variable.

Overdispersion or underdispersion in poisson regression models

The presence of overdispersion or underdispersion in Poisson regression can be detected using the devians, and Pearson Chi-Square values divided by the degrees of freedom. A value or quotient more significant than 1 indicates the presence of overdispersion (var(Yi|XiT)>E(Yi|XiT)), whereas if the result of dividing the two values is smaller than 1 indicating the presence of under dispersion var(Yi|XiT)>E(Yi|XiT), then one method that can be used to overcome this case is Generalized Poisson Regression [13].

Generalized poisson regression

The Generalized Poisson Regression (GPR) model is used for counting data that experience equidispersion violations. The Generalized Poisson Regression model has the same form as Poisson Regression as follows:

μi=exp(XiTβ) (13)

In the GPR model, the values of the parameters ϕ,β0,β1,β2,,βk will be estimated using the Maximum Likelihood Estimation (MLE) method with the likelihood

The function of the GPR model are as follows:

L(μi,ϕ)=i=1n(μi1+ϕμi)yi(1+ϕyi)yi1yi!exp(μi(1+ϕyi)1+ϕμi) (14)

Next, Newton Raphson iteration is performed to maximize the log-likelihood function formulated as follows:

lnL(ϕ,β)=i=1n[yi(XiTβln(1+ϕexp(XiTβ))]+(yi1)ln(1+ϕyi)lnyi!exp(XiTβ)(1+ϕyi)1+ϕexp(XiTβ)

Perform Newton Raphson iteration based on equation:

β^(m+1)=β^(m)H1(βm)g(βm) (15)

The iteration process will stop if it has found an estimated value that converges to a value as follows:

β^(m+1)β^(m)β^(m+1)β^(m)ε,ε0 (16)

Furthermore, the estimation of the parameter ϕ using the Newton-Raphson iteration method is:

ϕ(m+1)=ϕ(m)(2lnL(μi,ϕ(m))ϕ2)1(lnL(μi,ϕ(m))ϕ) (17)

A more accessible approach to estimating the parameter phi is to use the estimation of moments, equating the Pearson Chi-Square statistic with the degrees of freedom [14]. To determine ϕ with the method of moments can be expressed by the equation:

i=1nyiμivar(Yi)(nk)=0 (18)

Thus, the following iteration is obtained:

ϕ(m+1)=ϕ(m)(2i1n(yiμi)2(1+ϕμi)3)1(i=1n(yiμi)2μi(1+ϕμi)2)(nk) (19)

where μi=exp(XiTβ)>0

GPR parameter testing is done with the Maximum Likelihood Ratio Test (MLRT) with the following hypothesis:

  • H0:β1=β2==βk=0

  • H1: at least one βp ≠ 0, p = 1, 2, …, k

The test statistic used:

D(β^)=2ln(L(w^)L(Ω^))=2(lnL(w^)L(Ω))

The H0 rejection region is to reject H0 if the value ofD(β^)>x(a,k)2, this means that there is at least one variable that has a significant effect on the model. If the simultaneous test decision is to reject H0, then the next step is to test the parameters partially to find out which parameters have a significant effect on the model. The hypothesis used is as follows:

  • H0:βp=0,p=1,2,,k

  • H1:βp ≠ 0,

The test statistic used:

W=β^pseβ^p

The rejection region of H0 is to reject H0 at the significance level α if the value of |W|>Z(α/2) this means that the variable p has a significant effect on the response variable.

Spatial heterogeneity test

Spatial heterogeneity occurs when the same predictor variable has an unequal effect on different locations within a study area. Spatial heterogeneity can be tested using the Breusch-Pagan (BP) test. Hypotheses used in the Breusch- Pagan test is as follows.

  • H0:σ12=σ22==σn2=σ2 (Variance between locations is equal)

  • H1: at least there is one σ12σ2 (Variance between locations is different)
    BP=12fTZ(ZTZ)1ZTf (20)

Where the vector element f=(f1,f2,,fn)T is f1=ei2σ^2 with ei=yiyi^, Z is a matrix of size n×(k+1) containing the vector of normalized X for each other σ^2 is the residual variance (ei). Reject H0at significance level α if the value of BP>X(α,k)2or pvalue<α, this means that the variance between locations is different.

Geographically weighted regression

Geographically weighted regression is used to analyze spatial heterogeneity, where each parameter is calculated at each observation location, so each region or observation location has different regression parameter values. Systematically the Geographically Weighted Regression (GWR) model with response variable y and predictor variables x1,x2,,xkat the ith location can be written as follows:

yi=β0(ui,vi)+p=1kβp(ui,vi)xip+εi,i=1,2,,n (21)

In spatial analysis, the role of bandwidth for GWR models is essential because the weight value represents the location of observations between one data and another. There are three types of bandwidth functions used in this study:

  • i.

    Fixed Bisquare

The form of fixed bisquare is expressed by the following formula:

wij={[1(dijh)2]2fordij<h0,forothers (22)
  • ii.

    Fixed Tricube

The following formula expresses the form of the fixed tricube:

wij={[1(dijh)3]3fordij<h0,forothers (23)
  • iii.
    Adaptive Bisquare
    wij={[1(dijhi)2]2fordij<h0,forothers (24)

with hi is the bandwidth at the th observation location.

One of the methods used to determine the optimum bandwidth size is the Cross-Validation (CV) method. CV method is defined as follows:

CV=i=1n(YiY^i(h))2 (25)

where Y^i(h) is the value of the estimator Yi for observations at locations ui,vi not included in the calculation. The optimum bandwidth value (ℎ) is obtained when ℎ produces the minimum CV value.

Geographically weighted generalized poisson regression

Spatial regression modeling is performed on response variables in the form of count data that experience underdispersion or overdispersion and depend on the characteristics of the observed location using Geographically Weighted Generalized Poisson Regression. The difference between global Poisson regression and GWGPR is in parameter estimation. The global Poisson regression model has the same parameter estimator value for each observation location, so it is global. The GWGPR model produces a different parameter estimator value for each observation location area, so it is local [15].

The GWGPR model is the development of Generalized Pois-son Regression by considering the weights in the form of latitude and longitude of the observation location points. The GWGPR model follows the distribution of Generalized Poisson Regression so that the probability function for each ith location is as follows:

fi(yi,ui,ϕ)=(ui1+ϕui)yi(1+ϕyi)yi1yi!exp(ui(1+ϕyi)1+ϕui) (26)

GWGPR modeling uses a generalized linear model with a “g” function (link function) that connects the mean of the response variable with the linear predictor (η). The link function used in the GWGPR model is the log link. The GWGPR model with ui as the latitude coordinate and as the longitude coordinate used as the parameter weight is

ηi=g(ui)=β0(ui,vi)+β1(ui,vi)xi1++βk(ui,vi)xikηi=g(μi)=ln(μi)=XiTβ(ui,vi)μi=g1(ηi)=g1XiTβ(ui,vi)μi=exp(β0(ui,vi)+β1(ui,vi)xi1++βk(ui,vi)xik) (27)

Parameter estimation of GWGPR model

In the GWGPR model, the method used to estimate the model parameters is the Maximum Likelihood Estimation (MLE) method, with the GWGPR model likelihood function as follows:

L(β(ui,vi))=i=1nf(yi)=i=1n(ui1+ϕui)yi(1+ϕui)yi1yi!exp(ui(1+ϕyi)1+ϕui) (28)

Furthermore, the likelihood function in Eq. (28) is con-verted into natural logarithm form as follows:

lnL(β(ui,vi),ϕ)=i=1nln[(ui1+ϕui)yi(1+ϕyi)yi1yi!exp(ui(1+ϕyi)1+ϕui)]=i=1n[yi(lnuiln(1+ϕui))+(yi1)ln(1+ϕyi)lnyi!ui(1+ϕyi)1+ϕui] (29)

By substituting the value μ=expXiTβ(ui,vi) the Eq. (30) is obtained.

L(β(ui,vi)ϕ)=i=1n[yi(lneXiTβ(ui,vi)ln(1+ϕeXiTβ(ui,vi)))]+(yi1)ln(1+ϕyi)lnyi!eXiTβ(ui,vi)(1+ϕyi)1+ϕeXiTβ(ui,vi) (30)

The process of obtaining parameter estimators of the GWGPR model is by deriving the log-likelihood function for each parameter and then equating it to zero. Because the results are not close form, it is necessary to do a Newton-Raphson iteration with the following algorithm.

  • 1.
    Determine the initial estimated values of the parameters
    β^(ui,vi)(0)=[ϕ0(ui,vi)β0(ui,vi)β10(ui,vi)βk0(ui,vi)]
  • 2.
    Form a gradient vector (g) with k estimated parameters
    gT(β(ui,vi)(m))(k+1)×1=(lnL(β(ui,vi))ϕ(ui,vi),lnL(β(ui,vi))β0(ui,vi),lnL(β(ui,vi))β1(ui,vi),,lnL(β(ui,vi))βk(ui,vi))β=β(m)
  • 3.

    Form the Hessian matrix (H) which is the second derivative of Eq. (30)

  • 4.

    Substitute the values of β^(ui,vi)(0) into the elements of the g vector and H matrix to obtain the gradient vector g(β^(ui,vi)(0)) and Hessian matrix H(β^(ui,vi)(0)).

  • 5.
    Perform iteration using equation:
    β^(ui,vi)(m+1)=β^(ui,vi)(m)H1(β^(ui,vi)(m))g(β^(ui,vi)(m))
  • 6.

    The iteration process starts from m = 0, and the value of β^(ui,vi)(m) is a set of parameter estimators that converge at the mth iteration for the ith location.

  • 7.
    If you have not gotten a convergent parameter estimation, then the iteration continues back to step 5 until iteration m=m+1. The iteration process is stopped if
    |β^(ui,vi)(m+1)β^(ui,vi)(m)|ε

where ε is a very small number.

Simultaneous parameter significance test

The method used for simultaneous parameter testing in the GWGPR model is the Maximum Likelihood Ratio Test (MLRT) method. Testing the parameters of the GWGPR model is carried out to determine the significance of the β parameter with the following hypothesis.

  • H0:β1(ui,vi)=β2(ui,vi)==βk(ui,vi)

  • H1: at least oneβp(ui,vi)0,p=1,2,,k

Test Statistic:

D(β^)=2ln(L(w^)L(Ω^))=2(lnL(w^)L(Ω))

D(β^) is the devians value of GWGPR model, w={β0} is the set of parameters under H0andΩ=β0,β1,β2,,βk is the set of parameters under population.L(w^) is the maximum likelihood value for a simple model without involving predictors, andL(Ω) is the maximum likelihood value for the complete model involving variables Furthermore, the value of D(β^) is obtained by solving equation as follows:

D(β^)=2(lnL(w^)L(Ω))

D(β^) is x2 distributed with k independent degrees. The decision to reject H0 if the value of D(β^)>x(a,k)2means that there is at least one variable that has a significant effect on the response variable.

Partial parameter significance test

Partial GWGPR model parameter testing is conducted to determine the effect of individual predictor variables on the response variable at each location. The partial parameter testing hypothesis is as follows.

  • H0:βp(ui,vi)=0,p=1,2,,k

  • H1:βp(ui,vi)0

The Statistics:

Z=β^p(ui,vi)se(β^p(ui,vi))

where se(β^p(ui,vi)) is the standard error of β^p(ui,vi). The H0 rejection area is to reject H0attheα significance level if the value |Z|>Z(α/2) which means that variable p has a significant effect on the response variable at each location in the GWGPR model.

Akaike information criterion

Akaike Information Criterion (AIC) is one of the criteria to determine the best model. A small AIC value indicates that the model is getting better. The best model selection using the AIC value criterion is as follows:

AIC=2lnL(β)+2k

is maximum log-likelihood model and k is number of parameters estimated in the model.

Data

Maternal postpartum mortality rate

Maternal Mortality Rate (MMR) is the number of female mortalities that occur during pregnancy or its handling during pregnancy, childbirth, or within 42 days after the end of preg nancy (postpartum period) but not mortality caused by accidents or injuries. MMR is an essential indicator in determining the welfare of society, especially in the health sector. The Ministry of Health in 2020 revealed that the number of maternal mortalities in Indonesia is still relatively high, reaching 4627 people. East Java Province ranks second with the number of maternal postpartum mortality at 284. The number of maternal postpartum mortality in East Java Province over five years is presented in the following figure.

Several studies have found that the factor that has a significant effect on the number of maternal postpartum mortality is the percentage of pregnant women who had a pregnancy visit K4 [16]. Furthermore, there is research on the number of maternal postpartum mortality, with one of the factors affecting maternal mortality is the percentage of handling obstetric complications in each district/city [17]. Another study on the number of maternal postpartum mortality found that one of the factors affecting maternal postpartum mortality was the percentage of pregnant women who had a pregnancy visit K1 [18]. There is also research on factors that are thought to affect maternal mortality such as the percentage of households receiving cash assistance and the ratio of health centers and hospitals [19].

Based on the description above, in this study, several variables are used that are thought to affect the number of maternal postpartum mortality in East Java in 2020. Y is the number of maternal postpartum mortality, x1 is the percentage of pregnant women who had a pregnancy visit K1, x2 is the percentage of obstetric complications, x3 is the percentage of pregnant women who had a pregnancy visit K4, x4 is percentage of households receiving cash assistance, and x5 is ratio of health centers and hospital.

Characteristics of maternal postpartum mortality rate in east java

Descriptive analysis was carried out to obtain the characteristics of the number of maternal postpartum mortality (Y) and the factors that influence it. In 2020 the number of maternal postpartum mortality in East Java was 284. Descriptive statistics of all variables used and presented in Table 1 below.

Table 1.

Overview of Maternal Postpartum Mortality Rate in East Java and Factors Suspected To Affect It.

Variables Min Max Mean Variance
Y 1 30 7.47 31.553
X1 86.60 105.90 98.2184 16.479
X2 0 25.80 7.6579 73.464
X3 78.50 99.30 90.2763 25.641
X4 4.14 28.21 13.2650 30.216
X5 4.48 17.98 7.1982 6.717

Table 1. shows that the factors that are thought to affect the number of maternal postpartum mortality in East Java have a large enough variance and a large enough range that data heterogeneity is suspected. The predictor variable with the highest variance is the percentage of obstetric complications (x2) of 73.464%. The percentage of pregnant women who had a preg nancy visit K1 (x1) in East Java had an average of 98.22% with the highest percentage in Kabupaten Bondowoso at 105.9% and the lowest percentage in Nganjuk District at 86.6%. The handling of obstetric complications (x2) in East Java is quite good. This can be seen from the average value of the percentage of obstetric complications of 7.657% with the lowest percentage of complications in Kabupaten Trenggalek, Blitar, Lumajang, Jember, Bondowoso, Situbondo, Probolinggo, Mo jokerto, Jombang, Magetan, Ngawi, Bojonegoro, Kediri City, Probolinggo City, while the highest percentage was in Pasuruan city at 25.8%. The average percentage of pregnant women who had a pregnancy visit K4 was 90.28%, with the highest rate in Madiun City at 99.3% and the lowest percentage in Kabupaten Situbondo at 78.5%. The percentage of households receiving cash assistance was lowest in Surabaya City at 4.14%. The ratio of health centers and hospitals has an average of 7.2, with the lowest ratio being Kabupaten Malang at 4.48 while the highest percentage is Mojokerto City at 17.98.

An overview of the distribution of cases of the number of maternal postpartum mortality in East Java in 2020 is presented in Fig. 2.

Fig. 2.

Fig 2

Maternal Postpartum Mortality Rate in East Java 2020.

Results and analysis

Multicollinearity test

One of the requirements that must be met in Poisson regression analysis is multicollinearity detection. The presence of multicollinearity can be known from each predictor variable's VIF (Variance Inflation Factor) value. A VIF value of more than 10 indicates a case of multicollinearity.

Table 2 shows that the VIF value for all predictor variables is less than ten, so there are no cases of multicollinearity in the data.

Table 2.

Independent variable VIF value.

Variable VIF
X1 1.68
X2 1.16
X3 1.33
X4 1.53
X5 1.20

Distribution fit test

The distribution suitability test is used to determine whether the data is Poisson distributed or not. To conduct the test, you can use the Kolmogorov Smirnov Test with the following hypothesis:

  • H0 : Data follows Poisson distribution

  • H1 : Data does not follow Poisson distribution Significance level: α=0.05

Test Statistic

D=max1<i<N(F(Yi)i1N,i1NF(Yi))

where F (Yi) is the cumulative probability distribution. Test criteria: accept H0 if the value of D<DN,α in the Kolmogorov Smirnov table or the >α Based on the output, the pvalue=0.087 is obtained, so with a significance level of α=0.05, then the pvalue=0.087>0.05, this means that the data on the number of maternal postpartum mortality is Poisson distributed.

Spatial heterogeneity test and the best bandwidth

Spatial heterogeneity testing is conducted to determine whether there are characteristics between observation location points. Spatial heterogeneity can be tested using the Breusch- Pagan (BP) test with the following hypothesis test:

  • H0:σ12=σ22==σ382=σ2 (Variance between locations is the same)

  • H1: there is at least one σi2σ2;i=1,2,,38 (Variance between locations is the different)

The test statistic value is 17.802 and the pvalue is 0.003206. The significance level used is 5%, so x0.05;52 is obtained at 11.07. Therefore, it is concluded that reject H0, or the variance between locations is different. This means that there is spatial diversity between regions, or the characteristics between location points are different.

The existence of spatial effects causes spatial heterogeneity, so spatial weighting is needed. The best spatial weighting is obtained from the minimum Cross-Validation (CV) value's bandwidth value. Table 3 shows the optimum bandwidth selection value using the fixed bisquare, fixed tricube and adaptive bi-square kernel functions.

Table 3.

Optimum bandwidth selection.

Kernel Function CV
Fixed Bisquare 1367.024
Fixed Tricube 1377.01
Adaptive Bisquare 1401.472

Table 3 shows that the kernel function that produces the optimum bandwidth is the fixed bisquare Kernel function with a CV value of 1367.024. After obtaining the optimum bandwidth value, the spatial weight matrix for each observation location will be obtained by substituting the bandwidth value and Euclid distance.

Model testing

Poisson regression

Based on the result, obtaining the estimated value of the Poisson regression model parameters can be done using the Maximum Likelihood Estimation (MLE) method with Newton- Raphson iterations. The estimated values of the Poisson regression model parameters are presented in Table 4 below.

Table 4.

Parameter estimation of poisson regression model.

Parameter Estimated value SE Z P-value
β0 5.527718 1.563 3.535 0.000408
β1 0.012778 0.017647 0.724 0.469008
β2 −0.02061 0.007963 −2.588 0.00966
β3 −0.0253 0.013211 −1.915 0.055458
β4 −0.04565 0.013695 −3.333 0.000859
β5 −0.25964 0.040695 −6.38 0.000000000177
Devians 72,026
AIC 225,19

Table 4 shows that the AIC value of the Poisson regression model is 225.19. The predictor variables that have a significant effect on the Poisson regression model are the percentage of handling obstetric complications (x1), the percentage of households receiving cash assistance (x4) and the ratio of health centers and hospitals (x5). So that the effect of predictor variables x1,x4,x5 is significant but the effect of predictor variables x2,x3 is not significant on the response variable. The Poisson regression model formed in the case of the number of maternal postpartum mortality in East Java Province in 2020 is

μ^=exp(5.5277+0.01278x10.0206x20.025x30.04565x40.25964x5)

Generalized poisson regression

From the Poisson regression modeling, it is known that the data on the number of maternal postpartum mortality experience overdispersion cases because D(β^) value of the Poisson regression model in Table 4 is 72.026 with an independent degree of 32, resulting in a value of 2.25. This value is greater than 1, so the method used to overcome these cases is Generalized Poisson Regression (GPR). The Maximum Likelihood Estimation (MLE) method is used with Newton-Raphson iterations to obtain the estimated value of the GPR model parameters. Table 4 shows the estimated values of the GPR model parameters.

Based on Table 5, it is known that the values of devians D (β^) is 202.4872. The significance level used is 0.05, so the value of x0.05;52 is 11.07. The devians value D (β^) is more than x0.05;52which means reject H0. So it can be concluded that at least one predictor variable has a significant effect on the model.

Table 5.

Parameter estimation of generalized poisson regression model.

Parameter Estimated value SE Z P-value
β0 5.460 2.154 2.535 0.0113
β1 0.009827 0.024482 0.401 0.6881
β2 −0.01925 0.010911 −1.764 0.0777
β3 −0.02257 0.018373 −1.229 0.2192
β4 −0.04172 0.018785 −2.221 0,0264
β5 −0.25167 0.054851 −4.588 0.00000447
Devians 202.4872
AIC 216.4872

The partial parameter test results show that the variables that have a significant effect are the percentage of households receiving cash assistance (x4) and the ratio of health centers and hospitals (x5) so that the GPR model for the case of the number of maternal mortalities in East Java Province in 2020 is as follows.

μ^=exp(5.4603+0.0098x10.019247x20.02257x30.04172x40.2516x5)

Geographically weighted generalized poisson regression

Testing the parameters of the GWGPR model is carried out to determine the significance of the parameters β with the following hypothesis:

  • H0:β1(ui,vi)=β2(ui,vi)==β2(ui,vi)=0

  • H1: there is at least one βp(ui,vi)0,p=1,2,,5

Based on the result, the devians value D (β^) is 180.9181. The significance level used is 5%, so x0.05;52 is 11.07. The devians value of D(β^) is greater than the value of x0.05;52 so the test decision is to reject H0. This means that at least one predictor variable has a significant effect on the GWGPR model. Furthermore, the significance of the GWGPR model parameters is partially carried out to determine which predictor variables or factors have a significant effect in each region or location. The partial parameter testing hypothesis is as follows:

  • H0:βp(ui,vi)=0,i=1,2,,38,p=1,2,,5

  • H1:βp(ui,vi)0

The predictor variable is said to have a significant effect on the response variable if the value of

|Z|>Z(α/2). The significance level used is 5%, so the Z(α/2) value is 1.96. Based on the result, the percentage of pregnant women who had a pregnancy visit K1 (x1), the percentage of pregnant women who had a pregnancy visit K4 (x3) and the percentage of households receiving cash assistance (x4) had a significant effect on 38 districts/cities. While the percentage of obstetric complications (x2)has a significant effect on 31 districts/cities and the ratio of health centers and hospitals (x5) has a significant effect on 37 districts/cities. The significant variables in each district/city in East Java are presented as follows:

  • Variable x1,x2,x3,x4,x5 significant to 1 district (Kabupaten Pacitan).

  • Variable x1,x2,x3,x4,x5 significant to 7 district/city (Kabupaten Blitar, Mojokerto, Gresik, Bangkalan, Blitar city, Mojokerto city, Surabaya city).

  • Variable x1,x2,x3,x4,x5 significant to 30 district/city (Kabupaten Ponorogo, Trenggalek, Tulungagung, Kediri, Malang, Lumajang, Jember, Banyuwangi, Bondowoso, Situbondo, Probolinggo, Pasuruan, Sidoarjo, Jombang, Nganjuk, Madiun, Magetan, Ngawi, Bojonegoro, Tuban, Lamongan, Sam-pang, Pamekasan, Sumenep, Kediri city, Malang city, Probolinggo city, Pasuruan city, Madiun city, Batu city).

The grouping of districts/cities in East Java based on significant variables is presented in Fig. 3.

Fig. 3.

Fig 3

Mapping Based on Significant Variables.

Based partial parameter testing, as an example, we will present parameter testing at the 3rd research location (u3,v3)=, namely Kabupaten Trenggalek, with parameter estimates shown in Table 6.

Table 6.

Parameter testing of GWGPR model in kabupaten trenggalek with fixed bisquare kernel.

Parameter Estimated Value Z count
β0 4.02379 −0.00618
β1 −0.00183 −2503.51
β2 −0.01013 7.150146
β3 −0.00583 1.146733
β4 −0.01475 6.524979
β5 −0.14298 3.770794
Devians 180.92
AIC 194.92

Table 6 shows that all predictor variables have a significant effect on the GWGPR model in Trenggalek District because the Zcount value is greater than Z0.05/2. The GWGPR model formed in the case of the number of maternal postpartum mortality in the Kabupaten Trenggalek is as follows.

μ^=exp(4.0230.0018x10.01013x20.0058x30.0417x40.1429x5)

Based on the GWGPR model in Kabupaten Trenggalek, every 5% increase in the percentage of pregnant women who had a pregnancy visit K1 will reduce the average number of maternal postpartum mortality in East Java by exp0.00183=0.998 times, assuming other variables are constant. Suppose the percentage of pregnant women who had a pregnancy visit K4 increases by 5%. In that case, it will reduce the aver- age number of maternal postpartum mortality in East Java by exp(0.00583)=0.994 times, assuming other variables are constant. The same interpretation applies to the predictor variables of the percentage of obstetric complications, the percent- age of households receiving cash assistance also variable the ratio of health centers and hospitals.

Determination of the best model

Based on Table 4, it is known that for the Poisson regression model, three variables have a significant effect on the model, namely the percentage of obstetric complications (x2), the per- centage of households receiving cash assistance (x4) also the ratio of hospitals and health centers (x5).

Table 5 shows that the variables that significantly affect the GPR model are the percentage of households receiving cash assistance (x4) and the ratio of hospitals and health, centers (x5). Meanwhile, in the GWGPR model, the variables of the percentage of pregnant women who had a pregnancy visit K1(x1), the percentage of pregnant women who had a pregnancy visit k4 (x3), and the percentage of households receiving cash assistance (x4) together influence 38 districts/cities. Further- more, the variables of obstetric complications and the ratio of hospitals and health centers have different influences for each district/city.

One of the criteria for determining the best model is to look at the Akaike Information Criterion (AIC) value obtained. The smaller the AIC value, the better the model used. The following are the AIC values for the Poisson, GPR and GWGPR regression models.

The AIC value of the Poisson regression model is 225.19, the AIC value of the GPR model is 216.4872, and the GWGPR model with the best kernel (fixed bisquare) has an AIC value of 194.92. This means that the GWGPR model is the best because it has the smallest AIC value compared to other models (Algorithm 1, Fig. 1, Table 7).

Algorithm 1.

Algorithm 1 is the procedure for estimating the parameters of the GWGPR model in the case of the number of maternal deaths in East Java in 2020.

  • 1.

    Researchers are looking for case data on the number of maternal postpartum mortality in East Java Province in 2020 and longitude also latitude coordinate data.

  • 2.

    Analyzing the Poisson regression model

  • 3.

    Performing generalized Poisson regression model analysis

  • 4.

    Analyze the Geographically Weighted Generalized Pois- son Regression (GWGPR) model with the following steps:

  • a.

    Determine (ui,vi) based on longitude and latitude for each district.

  • b.

    Breusch-Pagan test to see the presence of spatial heterogeneity in the data.

  • c.

    Calculating the Euclidean distance between observation locations based on geographical position.

  • d.

    Determining the optimum bandwidth using the Cross- Validation (CV) method.

  • e.

    Calculate the best weight matrix using fixed bisquare kernel, tricube kernel and adaptive bisquare kernel.

  • f.

    Parameter estimation of the GWGPR model using the Maximum Likelihood Estimation (MLE) method followed by Newton-Raphson iteration.

  • g.

    Simultaneous and partial significance testing of model parameters

  • h.

    Interpreting the GWGPR model obtained.

  • i.

    Determining the best model using the AIC value, the best model is the model with the smallest AIC value.

  • 5.

    The final result of this research is to get the best model to represent the number of maternal postpartum mortality and know the mapping of the number of maternal postpartum mortality in East Java Province.

Fig. 1.

Fig 1

Maternal Postpartum Mortality Rate in East Java 2016–2020.

Table 7.

Best model with AIC value.

Regression Model AIC
Poisson Regression 225.19
Generalized Poisson Regression 216.4872
Geographically Weighted Generalized Poisson Regression 194.92

Conclusion

The average number of maternal postpartum mortality in each district/city in East Java Province was 7.47 in 2020, with the highest death of 30 cases in Kabupaten Jember and the lowest case in Sampang, Kediri city, Madiun City and Batu city. The high variance value of 31.553 indicates a reasonably high difference in maternal postpartum mortality in each district/city. The best kernel function is the fixed bisqure kernel function with a CV value of 1367.024. Based on the results obtained, the GWGPR model is the best model to model the case of the number of maternal postpartum mortality in East Java in 2020 because it produces the smallest AIC value of 194.92 with three groupings obtained based on the results of partial testing for GWGPR modeling.

Ethics statements

The data used in this study are secondary data derived from the 2020 East Java Provincial Health Profile published by the East Java Provincial Health Office and some secondary data derived from the official website of the Badan Pusat Statistik (BPS) East Java.

CRediT authorship contribution statement

Ghanim Al-Hasani: Methodology, GWPR Methods; Murakami, Daisuke: Spatial, Validity tests; Gunardi: Conceptualization, Conceptual.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors express our gratitude to the Lembaga Pengelola Dana Pendidikan (LPDP), Ministry of Finance, Republic of Indonesia for their financing support, which funded this research as a part of the first author's thesis.

Data Availability

  • Data will be made available on request.

References

  • 1.Hogg R., Craig A. 3rd ed.) McMillan Publishing Company; 1970. Introduction to Mathematical Statistics; pp. 91–231. [Google Scholar]
  • 2.Khoshgoftaar T.M., Gao K., Szabo R.M. Comparing software fault predictions of pure and zero-inflated Poisson regression models. Int. J. Syst. Sci. 2004;36(11):705–715. [Google Scholar]
  • 3.Al-Hasani G., Md Asaduzzaman, Soliman A.H. Geographically weighted poisson regression models with different kernels: application to road traffic accident data, communications in statistics: case studies. Data Anal. Appl. 2021;7(2):166–181. [Google Scholar]
  • 4.Murakami D., Tsutsumida N., Yoshida T. Proceedings of the GIScience 2021 Short Paper Proceedings. 2021. Stable geographically weighted poisson regression for count data. [Google Scholar]
  • 5.Badan Pusat Statistik, Angka kematian ibu menurut pulau [maternal mortality rate by island], BPS, http://www.bps.go.id (accessed Jan. 2, 2022).
  • 6.Kementerian Kesehatan Republik Indonesia, Profil kesehatan Indonesia 2021 [Indonesia Health Profile 2021], Pusat Data dan Teknologi Informasi, https://pusdatin.kemkes.go.id (accessed Jan. 10, 2022).
  • 7.Dinkes Jatim . Dokumen Publikasi; 2020. Profil Kesehatan Provinsi Jawa Timur Tahun.https://dinkes.jatimprov.go.id [East Java Province Health Profile 2020] accessed Jan. 8, 2022. [Google Scholar]
  • 8.Fronczak N., Antelman G., Moran A.C., Caulfield L.E., Baqui A.H. Delivery-related complications and early postpartum morbidity in Dhaka, Bangladesh. Int. J. Gynecol. Obstet. 2005;91:271–278. doi: 10.1016/j.ijgo.2005.09.006. [DOI] [PubMed] [Google Scholar]
  • 9.Badan Perencanaan Pembangunan Nasional . 1st ed. Vol. 1. Badan Perencanaan Pembangu nan Nasional; 2014. Rencana pembangunan jangka menengah nasional 2015-2019 [national medium term development plan 2015-2019] pp. 6–75. (Buku I Agenda Pembangunan Nasional). [Google Scholar]
  • 10.Walpole R.E. Vol. 3. PT Gramedia; 1995. (Pengantar Statistik). [Google Scholar]
  • 11.Kleinbaum D.G., Kupper L.L., Muller K.E. Vol. 2. PWS Kent Publishing Company; 1988. Variable reduction and factor analysis; pp. 595–640. (Applied Regression Analysis and Other Multivariable Methods). [Google Scholar]
  • 12.Caraka R.E., Yasin H. Geographically weighted regression (GWR), GWR sebuah pendekatan regresi geografis [a geographic regression approach] Mobius. 2017;1:80–92. [Google Scholar]
  • 13.Allison P.D. SAS institute Inc; Cary, NC: 1999. Logistic Regression Using SAS System: Theory and Application. [Google Scholar]
  • 14.Ismail A.I., Sohn W., Tellez M., Amaya A., Sen A., Hasson H., Pitts N.B. The international caries detection and assessment system (ICDAS): an integrated system for measuring dental caries. Community Dent. Oral Epidemiol. 2007;35(3):170–178. doi: 10.1111/j.1600-0528.2007.00347.x. [DOI] [PubMed] [Google Scholar]
  • 15.Nakaya T., Fotheringham A.S., Brunsdon C., Charlton M. Geographically weighted poisson regression for disease association mapping. Stat. Med. 2005;24(17):2695–2717. doi: 10.1002/sim.2129. [DOI] [PubMed] [Google Scholar]
  • 16.Herni A.R.R. Purhadi, pemodelan jumlah kematian ibu nifas di karesidenan pekalongan provinsi jawa tengah tahun 2017 menggunakan regresi zero-inflated poisson inverse gaussian [modeling the number of maternal deaths in pekalongan prefecture, central java province in 2017 using zero-inflated poisson inverse gaussian regression] Inferensi. 2020;3(2):2721–3862. [Google Scholar]
  • 17.Ilalang A.P., Kholisatin N., Taibatunniswah N., Choiruddin A. Sutikno, pemodelan faktor-faktor yang mempengaruhi angka kematian ibu di jawa timur menggunakan geographi cally weighted regression [modeling factors affecting mater- nal mortality rate in east java using geographically weighted regression] Inferensi. 2021;4(1):2721–3862. [Google Scholar]
  • 18.Sabtika W., Prahutama A., Yasin H. Pemodelan geographi- cally weighted generalized poisson regression (Gwgpr) pada kasus kematian ibu nifas di jawa tengah [modeling geographically weighted generalized poisson regression (Gwgpr) on postpartum maternal mortality cases in central java] J. Gaussian. 2021;10(2):259–268. [Google Scholar]
  • 19.I.A. Mufti, Pemodelan faktor-faktor yang mempengaruhi jumlah kematian ibu di jawa timur menggunakan metode geo-graphically generalized weighted poisson regression [model ing factors affecting the number of maternal deaths in east java using geographically generalized weighted poisson regression method], Thesis. Department of Statistics ITS. (2018).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

  • Data will be made available on request.


Articles from MethodsX are provided here courtesy of Elsevier

RESOURCES