Abstract
This study proposes the development of nonparametric regression for data containing spatial heterogeneity with local parameter estimates for each observation location. GWTSNR combines Truncated Spline Nonparametric Regression (TSNR) and Geographically Weighted Regression (GWR). So it is necessary to determine the optimum knot point from TSNR and determine the best geographic weighting (bandwidth) from GWR by deciding the best knot point and bandwidth using Generalized Cross Validation (GCV). The case study analyzed the Morbidity Rate in North Sumatra in 2020. This study will estimate the model using knot points 1, 2, and 3 and geographic weighting of the Kernel Function, Gaussian, Bisquare, Tricube, and Exponential. Based on data analysis, we obtained that the best model for Morbidity Rate data in North Sumatra 2020 based on the minimum GCV value is the model using knots 1 and the Kernel Function of Bisquare. Based on the GWTSNR model, the significant predictors in each district/city were grouped into eight groups. Furthermore, the GWTSNR is better at modeling morbidity rates in North Sumatra 2020 by obtaining adjusted R-square = 96.235 than the TSNR by obtaining adjusted R-squared = 70.159. Some of the highlights of the proposed approach are:
-
•
The method combines nonparametric and spatial regression in determining morbidity rate modeling.
-
•
There were three-knot points tested in the truncated spline nonparametric regression and four geographic weightings in the spatial regression and then to determine the best knot and bandwidth using Generalized Cross Validation.
-
•
This paper will determine regional groupings in North Sumatra 2020 based on significant predictors in modeling morbidity rates.
Keywords: Spatial regression, Nonparametric regression, Morbidity rate, Kernel function
Method name: Geographically Weighted Truncated Spline Nonparametric Regression (GWTSNR)
Graphical abstract
Specifications Table
Subject area | Mathematics and Statistics |
More specific subject area | Statistics: Nonparametric Regression, Spatial Regression. |
Name of your method | Geographically Weighted Truncated Spline Nonparametric Regression (GWTSNR) |
Name and reference of original method | Original Method |
Geographically Weighted Regression with Spline Approach. | |
Reference | |
Sifriyani, S.H. Kartiko, I.N. Budiantara, and Gunardi, Geographically Weighted Regression with Spline Approach. Far East Journal of Mathematical Sciences, 101 (6) (2017) 1183-1196. DOI: 10.17654/MS101061183 | |
Resource availability | Morbidity rate data (Y) from The Health Office in North Sumatra and The predictors (X) from the Central Statistics Agency in North Sumatra. |
Method details
Introduction
Health is a state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity. Health is a state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity. Therefore, the health indicators in the area can be measured by the number of people who experience illness or contract a disease. Illness or health complaints in Indonesia are referred to as morbidity. There are many uses for morbidity rates in a country. Morbidity statistics measure a country's level of health and the provision of health facilities. This data can be used to measure the extent to which medical facilities are utilized and can assist in investigating the pattern of disease occurrence [1].
According to the Performance Report of the Government Agencies of the Health Office of North Sumatra Province in 2020, it is explained that the same as the national condition in Indonesia, in the last five years, the Morbidity Rate in North Sumatra was 11.84% in 2015, decreased to 11.15% in 2016, increased again to 11.17% in 2017 then again reduced to 11.03% in 2018, but in 2019 increased to 11.97% and increased again in 2020 to 12.24% [2]. The morbidity rate has a more critical role than the mortality rate. Because if the morbidity rate is high, it will trigger death, then a high mortality rate, so that life expectancy in an area will be below. In general, it can be said that the realization of optimal public health is one element of the welfare of the national goal, namely the ability to live healthily for every resident. The morbidity rate is data influenced by spatial effects. Spatial data is dependent data where measurement data at other locations affect data at one location [3]. As a result, the spatial data is unsuitable for solving using linear regression analysis because it will produce an inaccurate model. In linear regression analysis, it is assumed that the error variance is fixed (homoscedasticity) and there is no dependence between errors (autocorrelation) at each observation location. Suppose the results of the regression analysis show the existence of heteroscedasticity and autocorrelation. In that case, it can be indicated that the parameters of the regression model are influenced by other factors, namely geographical factors. Therefore, in spatial data analysis, geographical factors are essential in determining the weights to be used [4]. One spatial regression method that can model spatial data is Geographically Weighted Regression (GWR). GWR was first introduced by Fotheringham in 1967 [5]. In GWR, each parameter is calculated at each location point, resulting in a parameter estimator that can only be used to predict each point or location where the data is observed and concluded. Research using the GWR theory was carried out by Brunsdon et al. (1996), Crespo et al. (2007), Leung et al. (2000a), and Leung et al. (2000b) [6], [7], [8], [9]. The development of GWR has also been carried out by statisticians such as Yu (2010), Wrenn and Sam (2014), and Zuhdi (2017) [10], [11], [12]. The development of GWR carried out by statisticians is still in a linear form. But in reality, not all data is known to have a clear relationship pattern, or the regression curve is unknown [13]. So nonparametric regression is an alternative approach to be used in the cases. Several kinds of nonparametric regression models are often discussed as Spline (Budianatara, 2002; Budiantara et al., 1997; Green and Silverman, 1994; Wahba, 1990), Kernel (Hardle, 1990), Fourier series (Antoniadis, 1994) and Wavelets (Antoniadis, 2001) [14], [15], [16], [17], [18], [19], [20]. A spline is a segmented polynomial that has flexibility properties. Spline is very dependent on the knot point. Truncated Spline is a segmented polynomial model that allows adapting effectively to the local characteristics of the data. Thus, Sifriyani et al. (2017) developed a Geographically Weighted Truncated Spline Nonparametric Regression (GWTSNR) model to solve the problem of spatial analysis in which the regression curve is unknown [21].
In this research, the GWTSNR model will be developed by giving more geographical weights using the Fixed Kernel function, namely Fixed Gaussian, Fixed Bisquare, Fixed Tricube, and Fixed Exponential. Furthermore, the Generalized Cross Validation (GCV) method will be used in selecting the best weighting method, which is the development of the Cross Validation (CV) method.
The main limitations, applicability, and findings
This study will be explained the formation of the GWTSNR model and the resulting parameter estimates. Based on previous research, it has been explained how to build the GWTSNR model, determine parameter estimates, and test hypotheses with its application to the case of the open unemployment rate. In this study, GWTSNR model requires bandwidth and knot points. Previously they only used two kernel functions as geographic weighting, namely Gaussian and Bisquare, and determined the best weighting using Cross Validation (CV). Then they use variations of knots in the model, namely knot 1, knot 2, and knot 3, for each predictor by determining the best knot using Generalized Cross Validation (GCV).
Furthermore, in this research, we will develop the use of the GWTSNR model for the case studies that we will examine by varying the kernel function into four kernel functions, namely Gaussian, Bisquare, Tricube, and Exponential. Furthermore, in determining the best weighting using Generalized Cross Validation (GCV), a development of Cross Validation (CV). Moreover, we will also provide an algorithm for using this model in analyzing the case studies studied.
Model specifications and estimation procedures
TSNR model
Nonparametric regression is one of the regression models used to determine the relationship between the response variable and the predictor variable whose regression curve is unknown. It is a very flexible regression model in modeling data patterns [22]. In general, nonparametric regression models can be presented as follows:
(1) |
is a response variable, is predictor variables, is an unknown regression curve or does not follow a particular pattern, and . If the regression curve is an additive model and is approximated by a spline function, the regression model is obtained as follows:
(2) |
where , are real constants with , and then the truncated function is as follows
(3) |
Where, is a knot point that shows the shape of the behaviour change of the function at certain sub-intervals. And parameter estimation of the TSNR model was carried out using the maximum likelihood method as follows;
(4) |
Where
: Parameters Estimation of TSNR model.
: Matrics of predictor variables.
: Vector of the response variable.
GWR model
Fotheringham first introduced GWR in 1967. It is the development of multiple linear regression. The multiple linear regression model has constant parameters at each observation location, while GWR has local parameters at each observation location. In the GWR model, the relationship between the response variables and predictor variables at location as follows:
(5) |
And parameter estimation of the GWR model was carried out using the maximum likelihood method as follows;
(6) |
Parameters Estimation of GWR model
Matrics of predictor variables
Vector of the response variable.
: Matrics of geographic weights.
GWTSNR model
GWTSNR is a development of nonparametric regression for spatial data with local parameter estimators for each observation location. Sifriyani, Gunardi, S.H. Kartiko, and I.N. Budiantara developed the method (2017). GWTSNR is a nonparametric regression approach used to solve spatial analysis problems where the regression curve is unknown. The assumptions used in the GWTSNR model have normally distributed errors, zero mean and variance σ2 at each location (ui, vi). Mathematically the relationship between the response variable yi and the predictor variable (x1i, x2i,…,xli) at i-th location for the model can be expressed as follows (Sifriyani et al., 2017);
(7) |
Eq. (7) is the GWTSNR model of degree with areas. The components are described as follows: is response variable on the -th location where , is the -th predictor variable in the -th area with , is the -th knot point on the -th predictor variable component with , is a parameter of the polynomial component of the GWTSNR, the -th parameter of the -th predictor variable in the -th area, and is a truncated component parameter of GWTSNR, which is the -th parameter, at the knot point and the -th predictor variable in the -th area.
GWTSNR estimation procedures
Next, we will determine the parameter estimates dan from the GWTSNR model using Maximum Likelihood Estimation (MLE). The steps are as follows:
-
a.Determine the probability density function of .
(8)
b. Establish a likelihood function.
(9) |
-
c.Establish a weighted likelihood function at the -th location.
(10)
: the weighting value at the -th location to the -location.
-
d.Calculating ln from the weighted likelihood function.
(11)
where .
-
e.
Maximize ln L
Let
(12) |
Furthermore, to obtain the estimator of and can be obtained by completing the following optimization:
(13) |
In other words, the estimators and can be obtained with MLE by completing the following optimization:
(14) |
Parameters Estimation :
(15) |
Parameters Estimation :
(16) |
To obtain an estimator that is independent on , then substitute Eq. 16 into Eq. 15 as follows.
(17) |
Where,
To obtain an estimator that is independent on , then substitute Eq. 15 into Eq. 16 as follows.
(18) |
Where,
Optimum knot point determination
The knot point is a joint point where there is a change in the behavior pattern of the function or curve. However, the number of knot points will also affect the complexity of the model with the many parameters used so that the proper method is needed to determine the optimal knot point. Optimal knot points can be obtained using the Generalized Cross Validation (GCV). The GCV method is generally defined as follows [22,23].
(19) |
Where,
: the identity matrics
: the number of observations
: knot points
: Mean Square Error of TSNR Model.
Optimum bandwidth and geographic weights matrics determination
The role of weights in the GWR model is critical because this weighting value represents the location of the observation data with others. Lesage (2001) introduced several weighting methods using Kernel functions, including the Gaussian Kernel, the Exponential Kernel, the Bisquare Kernel, and the Tricube Kernel [24].
-
i.Gaussian
(20)
Where is the standard normal density function and denotes the standard deviation of the distance vector .
-
ii.Exponential
(21)
: the distance from -th location to -th location, and is the bandwidth value, which is a function smoothing parameter value whose value is always positive.
-
iii.Bisquare
(22) -
iv.Tricube
(23)
where : Euclidean Distance between location to location And is a known non-negative parameter called bandwidth or smoothing parameter. The optimum bandwidth can be determined using GCV, which is as follows.
(24) |
Where
: GCV value on bandwidth
: the sum of the main diagonal elements of the weight matrix
In this study, in selecting the optimum bandwidth using Generalized Cross Validation (GCV). The optimum bandwidth is chosen by finding the smallest GCV. The smallest GCV is generated from the model that has the slightest error.
Spatial heterogeneity test
Differences in characteristics between observation points cause spatial heterogeneity. Identification of spatial homogeneity can be made by using the Breusch-Pagan test. Hypotheses used in the Breusch-Pagan test [25]:
: (homoscedasticity)
: At least there is one (heteroscedasticity)
Test statistics
(25) |
where
; , ; is a matrix containing vectors that have been standardized for each observation. Reject if or where is the number of predictors.
Model fit significance test
The model fit significance test determines whether the GWTSNR model is better than the global model. The following hypothesis is used [26].
: and
: At least one or
The test statistics used
Where,
: The degrees of freedom
(26) |
Rejection Criteria, is rejected if
(27) |
Simultaneous parameter significance test
A simultaneous test was conducted to determine the significance of the regression model parameters together. The form of the accompanying test hypothesis is as follows [27].
- : At least there is one or
The test statistics used:
(28) |
Where,
- : The degrees of freedom
: The degrees of freedom
Rejection Criteria, is rejected if
(29) |
Partial parameter significance test
Individual testing is carried out to determine whether the individual parameters have a significant effect on the response variable, with the following hypothesis:
and with
: At least there is one or
The test statistics used:
(30) |
Where,is the diagonal element of the matrix . The test statistic for the GWTSNR model in Eq. 30 will follow a distribution with degrees of freedom and a significance level of . The rejection area will reject if the value or which means that the parameter has a significant effect on the model (Tables 2 and 5).
Table 2.
VIF Value of Independent Variable
Variables | VIF |
---|---|
2.852 | |
2.617 | |
3.814 | |
1.774 | |
2.478 | |
2.081 | |
3.698 |
It can be seen that the VIF value < 10 in all independent variables, and it can be concluded that there is no multicollinearity between the predictor variables used in this study so that the predictor variables in this study can be used in the formation of a regression model.
Tabel 5.
Best Knot Point
Knot Point | GCV |
---|---|
1 | 8.227800831 |
2 | 11.12530693 |
3 | 10.02368543 |
Based on the minimum GCV value, the best model is the GWTSNR model with the one-knot point with a GCV value of 8.227800831.
The research steps
The steps of analysis in the research are as follows:
-
a.
Describe the morbidity rate in North Sumatra and its predictors.
-
b.
Make a scatterplot between the morbidity rate and each predictor to determine the relationship pattern.
-
c.
Do spatial heterogeneity tests using the Breusch-Pagan method.
-
d.
Calculates the Euclidean distance between the -th location and the -th location
-
e.
Determine the best weighting of the kernel functions, namely Gaussian, Bisquare, Tricube, and Exponential, based on the minimum GCV value.
-
f.
Choose the optimum knot point based on the minimum GCV value.
-
g.
Get the best GWTSNR model.
-
h.
Test the fit model hypothesis between the GWTSNR model and the TSNR model.
-
i.
Determine parameter significance tests simultaneously and partially.
-
j.
Interpret the GWTSNR model.
-
k.
Map 33 districts/cities in North Sumatra based on significant predictor variables.
Data
Morbidity rate
Morbidity is a condition where a person is said to be sick if the perceived health complaints cause disruption of daily activities, namely, unable to carry out work activities, take care of the household, and carry out normal activities as usual. The formula for calculating the morbidity rate is as follows [28],
(31) |
where,
: Morbidity Rate
: The number of people who experience health complaints and disruption of activities
: Total population
The morbidity rate in an area is affected by some factors. The determinant factors of morbidity are social, economic, and cultural factors [29]. Based on Wulandari (2017), it was found that population density, the average length of schooling, poverty percentage, regional minimum wage, percentage of open defecation households, and percentage of households with a distance from drinking water sources to sewage storage > 10 meters significantly affected to the morbidity rate in East Java [30]. Based on Hanum (2013), using the Multivariate Geographically Weighted Regression model, it was found that life expectancy, illiteracy rate, percentage of the population with protected water sources from wells, percentage of the population seeking outpatient treatment at health workers, percentage of the population with distance sources of drinking water to sewage storage > 10 meters and the percentage of the population with a monthly per capita expenditure of 200,000 to 299,999 for nutritious food significantly affected on morbidity rates [31]. According to Gordon (1954), the morbidity rate was influenced by environmental factors consisting of the biological environment, the physical environment, the socio-economic environment, maternal education level, and health services [32].
Based on the description above, in this research, several predictor variables were used, which were thought to have an effect on the morbidity rate in North Sumatra in 2020. The variables are followed as follows:
: Morbidity Rate
: Poverty Percentage
: Percentage of Households with Access to Proper Sanitation
: Population Density
: Open Unemployment Rate
: General Hospitals
: Percentage of Households with Access to Resources Adequate Drinking Water
: Average Length of School
The data used in this research is secondary data. Morbidity rate data is accessed from the official website of the North Sumatra Provincial Health Office in a publication with the title Government Agency Performance Report of the North Sumatra Provincial Health Office 2020. And all predictors that are considered to affect morbidity rates are accessed from the website of the Central Bureau of Statistics (BPS) North Sumatra or contained in the BPS North Sumatra publication with the title North Sumatra Province in Figures 2021. The research units used are 33 districts/cities in North Sumatra province.
Characteristics of morbidity rates in North Sumatra
North Sumatra is the second-largest province on Sumatra Island. The population of North Sumatra in 2020 reached 14,799,361 people. 14,799,361 people inhabited the North Sumatra area of 72,981.23 km², and the average population density of North Sumatra was 202.78 people per square kilometer. In 2020 the morbidity rate in North Sumatra reached 12.24. It means that there are 12 out of 100 residents in North Sumatra who experience illness complaints (Fig. 1, Fig. 2, Fig. 3).
Fig. 1.
Morbidity Rates in Indonesia and North Sumatra 2015 – 2020. It can be seen that for a period of six years, from 2015 to 2020, the morbidity rates in North Sumatra Province were consistently below the national figure. All variables ranging from the response variable to the seven predictor variables that are thought to affect the average, variance to the minimum, and maximum values are calculated.
Fig. 2.
Description information based on variable data mapping. The lowest Y is in Humbang Hasundutan Regency, and the highest Y is in Batubara Regency. The lowest is in Deli Serdang Regency, and the highest is in West Nias Regency. The lowest is in South Nias Regency, and the highest is in Binjai City. The lowest is in Pakpak Bharat Regency, and the highest is in Medan City. The lowest is in Humbang Hasundutan Regency, and the highest is in Gunungsitoli City. The lowest is in West Nias Regency, and the highest is in Medan City. The lowest is in Padang Sidimpuan Regency, and the highest is Pematangsiantar City. And the lowest is Nias, and the highest is in Medan City.
Fig. 3.
Scatterplot between Morbidity Rate and Predictors. The plot between the variable morbidity rate with all predictors does not form or follow a certain pattern. So that all predictors are included in nonparametric components.
The Performance Report of the Government Agencies of the Health Office of North Sumatra Province in 2020 is explained similarly to national conditions. In the last five years, the Sickness Rate in North Sumatra was 11.84% in 2015, decreasing to 11.15% in 2015. 2016, to 11.17% in 2017, then decreased to 11.03% in 2018, but in 2019 it increased to 11.97% and increased again in 2020 to 12.24%, as shown in the following graph
The results of the calculation of descriptive statistics can be presented in Table 1 below.
Table 3.
Spatial Heterogeneity Test
Breusch Pagan | Df | p-value | Decision |
---|---|---|---|
34.647 | 14 | 0.00166 | Reject H0 |
Table 3 shows that the or then is rejected. In other words, the variance between locations is different (heterogeneous), or there are differences in characteristics between one observation point and another.
Table 4.
Bandwidth Value and GCV
Kernel Function | Bandwidth Valune (h) | GCV |
---|---|---|
Gaussian | 3.296679331 | 156.2569201 |
Bisquare | 1.499684794 | 109.1468086 |
Tricube | 1.499965502 | 110.1381101 |
Exponential | 3.296690063 | 156.2569174 |
Table 4 shows that there is spatial heterogeneity, with the optimum bandwidth value of 1.499684794 using the Bisquare Kernel weighting function based on the minimum GCV value.
Table 6.
Results of ANOVA Model Fit
Variation | Sum of Squares | df | Mean Squares | p-value | |
---|---|---|---|---|---|
Regression | 94.2863 | 25 | 3.7714 | 9.868 | 1.7653e-08 |
Error | 11.0831 | 29 | 0.3822 | ||
Total | 105.3694 | 54 |
Table 6 shows that using the significance level , is obtained. Because then is rejected.
Tabel 7.
Results of ANOVA Simultaneous Parameter Significance Test
Variation | Sum of Squares | df | Mean Squares | p-value | |
---|---|---|---|---|---|
Regression | 302.3977 | 29 | 10.4275 | 27.2845 | 2.2427e-14 |
Error | 11.0831 | 29 | 0.3822 | ||
Total | 313.4808 | 58 |
Table 7 shows that using the significance level , is obtained. Because then is rejected.
Table 1.
Factors that affect morbidity rates in North Sumatra 2020.
Variables | Mean | Minimum | Maximum | Variance |
---|---|---|---|---|
12.56 | 6.22 | 20.13 | 11.74972973 | |
10.8 | 3.88 | 25.69 | 22.16703409 | |
70.87 | 11.48 | 96.2 | 685.6326 | |
1065.80 | 42.97 | 73.55 | 4329029.026 | |
5.77 | 0.84 | 16.41 | 11.8475 | |
5.94 | 0 | 67 | 136.6837 | |
81.37 | 42.39 | 99.71 | 275.9497 | |
9.1 | 5.36 | 11.39 | 2.0435 |
Tabel 1 provides the descriptive statistics of Variables used in the research.
Furthermore, a mapping of the data information used will be given as follows.
Results and analysis
Data patterns between morbidity rates and predictors
Next, a scatterplot of the morbidity rate and the factors that influence it will be presented to see the pattern of relationships between the dependent variables on all independent variables. If the resulting plot forms a certain pattern, then parametric regression is good. Nonparametric regression is appropriate if it does not follow a certain pattern.
Multicollinearity test
Ragnar Frisch first coined the term multicollinearity. The multicollinearity test is a requirement for all causality (regression) hypothesis tests. Multicollinearity will be detected if the value of . The value of is stated as follows:
(32) |
where is the coefficient of determination of the -th variable at the -th location. The following is the value of the seven independent variables used in this study:
Spatial heterogeneity test and the best weights matrix
The existence of differences in characteristics between location points causes spatial heterogeneity, so spatial weighting is needed. The best spatial weighting is obtained from the bandwidth value, which has the minimum Generalized Cross Validation (GCV) value. The following are the results of spatial heterogeneity testing and the selection of the best bandwidth.
Best knot point selection
The next step in determining the best model is determining the knot point. The knot point is the point where the data pattern changes. The following table shows the GCV value at each knot point.
Parameter estimation of morbidity rate model in North Sumatra in 2020
Based on the results of selecting the optimum knot point, the following parameter estimators from the GWTSNR model with the one-knot point.
(33) |
The following is the GWTSNR model, which is written as an example of the 30th location, namely Medan City.
(34) |
Model fit significance test
Test the model suitability hypothesis between GWTSNR model with the TSNR model. The following is the ANOVA table of the model suitability test.
Thus, it can be concluded that there is a significant difference between GWTSNR model and TSNR model.
Simultaneous parameter significance test
Simultaneous testing is carried out to test the estimation of model parameters simultaneously. The following are the results of the ANOVA simultaneous parameter test.
Thus it can be concluded that there is at least one parameter in the GWTSNR model that is significant to the response variable or, in other words, poverty percentage, the percentage of households who have access to proper sanitation, population density, open unemployment rate, and general hospital, percentage of households with access to resources adequate drinking water, and the average length of school have a simultaneous effect on the morbidity rate in North Sumatra 2020.
Partial parameter significance test
The calculation results from the partial parameter significance test show that the predictor variables that have an effect differ for each area. This resulted in 8 districts/cities mapping groups based on influential predictor variables. The grouping of districts/cities based on variables significant to the 2020 morbidity rate in North Sumatra is given as follows.
-
1.
The morbidity rates in Nias, Mandailing Natal, Tapanuli Utara, Labuhanbatu, Simalungun, Dairi, Deli Serdang, Langkat, Nias Selatan, Humbang Hasundutan, Samosir, Batu Bara, Padang Lawas Utara, Padang Lawas, Labuhanbatu Selatan, Nias Utara, Nias Barat, Medan City, Padang Sidimpuan, and Gunungsitoli City are affected by and .
-
2.
The morbidity rates in Karo, Pematangsiantar City, Tebing Tinggi City, and Binjai City are affected by and .
-
3.
The morbidity rates in Tapanuli Selatan, Labuhanbatu Utara, and Tanjung Balai City are affected by and .
-
4.
The morbidity rate in Serdang Bedagai is affected by and .
-
5.
The morbidity rates in Toba Samosir and Pakpak Bharat are affected by and .
-
6.
The morbidity rate in Sibolga City is affected by and .
-
7.
The morbidity rate in Asahan is affected by and .
-
8.
The morbidity rate in Tapanuli Tengah is affected by .
The mapping of Morbidity Rates can be presented in Fig. 4.
Fig. 4.
Mapping Morbidity Rates in North Sumatra 2020 based on Significant Variables.
Model interpretation
After doing a partial significance test with 13 regional groups for significant parameters, the best model interpretation is carried out, namely the GWTSNR model with the one-knot point at the 30th location of Medan City in Eq. 34.
The interpretation of the above model is explained as follows.
-
a.Assuming the predictors (, , , , , ) are constant, then the effect of the poverty percentage variable on the morbidity rate in 2020 in Medan City can be written as follows.
(35)
Based on the model obtained, it can be interpreted that if the poverty percentage is less than 4.3162%, then every 1% increase in the poverty percentage will reduce the morbidity rate by 19.077%. Meanwhile, if the poverty percentage is more than or equal to 4.3162 %, then every 1% increase in the poverty percentage will increase the morbidity rate by 0.155 %. Because the poverty percentage in Medan City in 2020 is 8.01, if the poverty percentage increases, the morbidity rate will also increase.
-
b.Assuming the predictors (, , , , , ) are constant, then the effect of the percentage of households that have access to proper sanitation on the morbidity rate in 2020 in Medan City can be written as follows.
(36)
Based on the model obtained, it can be interpreted that if the percentage of households with access to proper sanitation is less than 13.1744%, it increases by 1%. The morbidity rate will increase by 0.450%. Meanwhile, if the percentage of households with access to proper sanitation is more than or equal to 13.1744%, it increases by 1%. The morbidity rate will decrease by 0.302%. Because the percentage of households that have access to proper sanitation in Medan City in 2020 is 93.16, if the percentage of households that have access to proper sanitation increases, the morbidity rate will decrease.
-
c.Assuming the predictors (, , , , , ) are constant, then the effect of the population density on morbidity rates in 2020 in Medan City can be written as follows.
(37)
Based on the model obtained, it can be interpreted that if the population density is less than 225,903 then it increases by 1 unit, and the morbidity rate will decrease by 0.0075%. Meanwhile, if the population density is more than or equal to 225.903 there is an increase of 1 unit, then the morbidity rate will increase by 0.0016%. Because the population density in Medan City in 2020 is 9189.63, if the population density increases, the morbidity rate will also increase.
-
d.Assuming the predictors (, , , , , ) are constant, then the effect of the open unemployment rates on the morbidity rate in 2020 in Medan City can be written as follows.
(38)
Based on the model obtained, it can be interpreted that if the open unemployment rate is less than 1.1514, it increases by 1%, and then the morbidity rate increases by 26.111%. Meanwhile, if the open unemployment rate is more than or equal to 1.1514, it increases by 1%, and the morbidity rate will decrease by 0.352%. Because the open unemployment rate in Medan City in 2020 is 10.74, if the open unemployment rate increases, the morbidity rate will decrease.
-
e.Assuming the predictors (, , , , , ) are constant, the effect of the number of public hospitals on the morbidity rate in 2020 (y) in Medan City can be written as follows.
(39)
Based on the model obtained, it can be interpreted that if the number of public hospitals is less than 1.34 then it increases by 1 unit, then the morbidity rate will increase by 8,749%. Meanwhile, if the number of public hospitals is more than or equal to 1.34 then it increases by 1 unit, then the morbidity rate will decrease by 0.119%. Because the number of public hospitals in Medan City in 2020 is 67, if the number of public hospitals experiences more units, the morbidity rate will decrease.
-
f.
Assuming the predictors (, , , , , ) are constant, then the effect of the percentage of households that have access to safe drinking water on the morbidity rate in 2020 (y) in Medan City can be written as follows.
(40)
Based on the model obtained, it can be interpreted that if the percentage of households that have access to safe drinking water is less than 44,4184%, then it will increase by 1%, and then the morbidity rate will increase by 2,092%. Meanwhile, if the percentage of households that have access to proper drinking water is more than or equal to 44,4184 then it increases by 1%, then the morbidity rate will increase by 0.128%. Because the percentage of households that have access to safe drinking water in Medan City in 2020 is 98.79, if the percentage of households that have access to safe drinking water increases, the morbidity rate will increase.
-
g.Assuming the predictors (, , , , , ) are constant, then the effect of the length of schooling on the morbidity rate in 2020 (y) in Medan City can be written as follows.
(41)
Based on the model obtained, it can be interpreted that if the average length of schooling is less than 5.4806 years, then it increases for one year, and then the morbidity rate will decrease by 1.853%. Meanwhile, if the average length of schooling is more than or equal to 5,4806 the next year, it increases for one year, then the morbidity rate will decrease by 4,354%. Because the average length of schooling in Medan City in 2020 is 11.39, if the average length of schooling increases, the morbidity rate will decrease.
Model comparison
Based on the modeling, the level of goodness of the model is obtained based on the coefficient of determination from the GWTSNR model with the TSNR as the global regression as follows. Based on data analysis, it is obtained that the GWTSNR model with a one-knot point has a determination coefficient adjusted r-square of 96.235%, which is greater than the coefficient of determination (adjusted r-square) from TSNR model with a one-knot point is 70.159%. This indicates that the GWTSNR model with a one-knot point is the best model by being able to explain the effect of the predictor variables and to the morbidity rate variable of 96.235%.
Conclusion
Morbidity rate data in North Sumatra 2020 has regression curves between predictor variables and the response variable does not determine a certain pattern. And morbidity rate in North Sumatra 2020 also has a spatial effect. There are eight regional groupings based on significant predictors with different effects on each group. Modeling the morbidity rate using the GWTSNR model with one-knot point has a coefficient of determination (adjusted r-square) of 96.235% which is greater than the coefficient of determination (adjusted r-square) of TSNR with one-knot point of 70.159%. This indicates that the GWTSNR model with a one-knot point is the best model by being able to explain the effect of the predictor variables, , , , , and on the morbidity rate of 96.235%.
Ethics statements
The data used in this study is the morbidity rate data in North Sumatra in 2020. The data is secondary data accessed from the official website of the North Sumatra Provincial Health Office (https://bit.ly/3a3g2Xw) in the publication of North Sumatra Provincial Health Office Performance Report in 2020. The predictor variables used in this study are secondary data accessed on the website of the Central Statistics Agency of North Sumatra (https://sumut.bps.go.id/), or there are in the publication of BPS North Sumatra with the title North Sumatra Province in Figures 2021.
CRediT author statement
Sifriyani, Gunardi, S.H. Kartiko, dan I. N. Budiantara: Methodology
Gunardi: Conceptual
Gunardi, Herni Utami, Zulaela, Sumardi: Writing-Reviewing
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
Funding: This work was supported by Lembaga Pengelola Dana Pendidikan (LPDP) Republik Indonesia.
Footnotes
Related research article
For a published article
Data Availability
Data will be made available on request.
References
- 1.Ayeni O. The importance of morbidity statistics in the evaluation of public health in Africa. Jimlar Mutane. 1976;1(2):193–197. [PubMed] [Google Scholar]
- 2.The health office in North Sumatra. North Sumatra provincial health office performance report, 2020 online at http://dinkes.sumutprov.go.id/.
- 3.Cressie N.A.C. John Wiley and Sons; New York: 1991. Statistics for Spatial Data; pp. 803–872. [Google Scholar]
- 4.LeSage J.P. Regression analysis of spatial data. J. Reg. Policy. 1997;27(2):83–94. [Google Scholar]
- 5.Fotheringham A.S., Brunsdon C., Charlton M. John Wiley and Sons Ltd; England: 2002. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships; pp. 65–102. [Google Scholar]
- 6.Brunsdon C., Fotheringham A.S., Charlton M.E. Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr. Anal. 1996;28(4):281–298. [Google Scholar]
- 7.Mennis J.L., Jordan L. The distribution of environmental equity: exploring spatial nonstationarity in multivariate models of air toxic releases. J. Ann. Assoc. Am. Geogr. 2005;95(2):249–268. [Google Scholar]
- 8.Leung Y., Mei C., Zhang W.X. Statistic test for spatial non stationarity based on the geographically weighted regression model. J. Environ. Plan. A. 2000;32(1):9–32. [Google Scholar]
- 9.Leung Y., Mei C., Zhang W.X. Testing for spatial autocorrelation among the residuals of the geograhically weighted regression. J. Environ. Plan. A. 2000;32(5):871–890. [Google Scholar]
- 10.Yu D. Exploring spatiotemporally varying regressed relationships: the geographically weighted panel regression analysis, J. Int. Arch. Photogramm., Rem. Sens. Spat. Inf. Sci. - ISPRS Arch. 2010;38(6):134–139. [Google Scholar]
- 11.Wrenn D.H., Sam A.G. Geographically and temporally weighted likelihood regression: exploring the spatiotemporal determinants of land use change. J. Reg. Sci. Urban Econ. 2014;44(15):60–74. [Google Scholar]
- 12.Zuhdi S., Saputro D.R.S., Widyaningsih P. Parameters estimation of geographically weighted ordinal logistic regression (GWOLR) model. J. Phys. Conf. Ser. 2017;855(1):1–5. [Google Scholar]
- 13.Afifah N., Budiantara I.N., Latra I.N. Mixed estimator of Kernel and Fourier series in semiparametric regression. J. Phys. Conf. Ser. 2017;855(1):1–8. [Google Scholar]
- 14.Pane R., Budiantara I.N., Zain I., Otok B.W. Parametric and nonparametric estimators in fourier series semiparametric regression and their characteristics. J. Appl. Math. Sci. 2014;8(101):5053–5064. [Google Scholar]
- 15.Mariati N.P.A.M., Budiantara I.N., Ratnasari V. Modeling poverty percentages in the papua islands using fourier series in nonparametric regression multivariable. J. Phys. Conf. Ser. 2019;1397(1):1–7. [Google Scholar]
- 16.Green P.J., Silverman B.W. Chapman and Hall; London: 2002. Nonparametric Regression and Generalized Linear Model. [Google Scholar]
- 17.Wahba G. SIAM; Philadelphia, Pennsylvania: 1990. Spline Models for Observational Data. [Google Scholar]
- 18.Hardle G. Cambridge University Press; New York: 1990. Applied Nonparametric Regression. [Google Scholar]
- 19.Antoniadis A., Gregorire G., Mackeagu W. Wavelet methods for curve estimation. J. Am. Statist. Assoc. 1994;89(428):1340–1353. [Google Scholar]
- 20.Antoniadis A., Bigot J., Spatinas T. Wavelet estimators in nonparametric regression: a comparative simulation study. J. Stat. Softw. 2001;6(6):1–83. [Google Scholar]
- 21.Sifriyani S.H., Kartiko I.N., Budiantara Gunardi. Geographically weighted regression with spline approach. Far East J. Math. Sci. 2017;101(6):1183–1196. [Google Scholar]
- 22.Eubank R.L. Marcel Dekker; New York: 1988. Spline Smoothing and Nonparametric Regression. [Google Scholar]
- 23.Green P.J., Silverman B.W. Chapman and Hall; London: 1994. Nonparametric Regression and Generalized Linear Model. [Google Scholar]
- 24.LeSage J.P. Regression analysis of spatial data. J. Regional and Policy. 2001;27(2):83–84. [Google Scholar]
- 25.Anselin L. Kluwer Academic Publisher; Dordrecht: 1988. Spatial Econometrics: Methods and Models. [Google Scholar]
- 26.Sifriyani I.N., Budiantara S.H., Kartiko Gunardi. A new method of hypothesis test for truncated spline nonparametric regression influenced by spatial heterogeneity and application. J. Hindawi Abstract Appl. Anal. 2018;2018:909–920. [Google Scholar]
- 27.Sifriyani Simultaneous hypothesis testing of multivariable nonparametric spline regression in the GWR model. Int. J. Stat. Probabil. 2019;8(4):32–46. [Google Scholar]
- 28.Central Bureau of Statistics of the Republic of Indonesia (BPS), Republic of Indonesia Health Statistics Indonesia, 2021 online at https://sirusa.bps.go.id/.
- 29.Demographic Institute UI. (2010). Demographic basics. Jakarta: Salemba Empat.
- 30.Wulandari K., Budiantara I.N., Ratna M. Modeling factors affecting morbidity rates in east java using spline nonparametric regression. J. Sci. Art ITS. 2017;6(1):108–114. [Google Scholar]
- 31.Hanum D. Factors affecting morbidity of the east java population using multivariate geographically weighted regression (MGWR) J. Sci. Art ITS. 2013;2(2):189–194. [Google Scholar]
- 32.Gordon J.E. Epidemiology in modern perspective. Proc. R. Soc. Med. 1954;47(7):564–570. doi: 10.1177/003591575404700712. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.