Skip to main content
International Journal of Environmental Research and Public Health logoLink to International Journal of Environmental Research and Public Health
. 2021 Jan 27;18(3):1109. doi: 10.3390/ijerph18031109

A Spatio-Temporal Analysis of the Health Situation in Poland Based on Functional Discriminant Coordinates

Mirosław Krzyśko 1, Waldemar Wołyńki 2,*, Marcin Szymkowiak 3,4, Andrzej Wojtyła 5
Editor: Paul B Tchounwou
PMCID: PMC7908150  PMID: 33513775

Abstract

The aim of this study was to investigate if the provinces of Poland are homogeneous in terms of the observed spatio-temporal data characterizing the health situation of their inhabitants. The health situation is understood as a set of selected factors influencing inhabitants’ health and the healthcare system in their area of residence. So far, studies concerning the health situation of selected territorial units have been based on data relating to a specific year rather than longer periods. The task of assessing province homogeneity was carried out in two stages. In stage one, the original spatio-temporal data space (space of multivariate time series) was transformed into a functional discriminant coordinates space. The resulting functional discriminant coordinates are synthetic measures of the health situation of inhabitants of particular provinces. These measures contain complete information regarding 8 diagnostic variables examined over a period of 6 years. In the second stage, the Ward method, commonly used in cluster analysis, was applied in order to identify groups of homogeneous provinces in the space of functional discriminant coordinates. Sixteen provinces were divided into four clusters. The homogeneity of the clusters was confirmed by the multivariate functional coefficient of variation.

Keywords: health policy, health inequalities, healthcare access, spatial distribution of the health situation, cluster analysis, functional discriminant coordinates, multivariate functional coefficient of variation, spatio-temporal data

1. Introduction

Health is universally regarded as one of the most highly appreciated values. Good health is the main factor contributing to people’s well-being, which enhances their opportunities to participate in social life and to benefit from economic and employment growth. Better health is also consistently associated with greater life satisfaction (Ngamaba et al. [1]). A good health situation is instrumental in achieving good labor market outcomes. By reducing the individual’s capacity to work long hours, a deteriorated health status decreases their chance of getting employed and being productive at work and has a strong impact on the labor market situation (James et al. [2], OECD [3]).

For the purpose of this study, the term “health situation” is understood as inhabitants’ health described by the set of selected indicators and healthcare access in their area of residence. According to Penchansky and Thomas [4], healthcare access can be defined as a multi-faceted concept expressing the “degree of fit” between clients (patients) and the healthcare system according to five important dimensions: availability, accessibility, accommodation, affordability, and acceptability. In our article, we focus mainly on the first dimension, i.e., availability, which represents the “spatial” component of healthcare access. This choice is motivated by the fact that our understanding of availability is the same as that presented by Penchansky and Thomas [4], i.e., as the relationship between the number and type of existing services (and resources) and the number of patients and types of their needs. In other words, it represents the adequacy of the supply of physicians, dentists, and other providers; of facilities, such as clinics and hospitals; and of specialized programs and services, such as mental health and emergency care. The second dimension highlighted by Penchansky and Thomas [4], i.e., accessibility, also represents the “spatial” component of healthcare access and is understood as the relationship between the location of supply and the location of clients, while accounting for clients’ transportation resources and travel time, distance and cost. In our approach, however, we do not take this spatial component into account, mainly due to limited data availability. However, it should be emphasized that the literature on accessibility in the context of healthcare access is wide (see, for instance, Wang [5] and Neutens [6]). There is also a rich literature devoted to applications of the access concept to various health care services, e.g., regarding health care accessibility analyzed in connection with availability (Barbarisi et al. [7], Bruno et al. [8], Lu et al. [9], Okuyama et al. [10], Pu et al. [11]).

The importance attached to health by different organizations is reflected in the way health protection is implemented in public policies, particularly in health policy, which aims to have a positive influence on population health (De Leeuw et al. [12]). This policy can be understood as government decisions and plans of action to make progress towards achieving the goals of the health system: improved health status of the population, better financial risk protection, and better client satisfaction; or intermediate outcomes for health systems, which include: quality, access, and efficiency (Campos and Reich [13]). In many countries, the main objective of public health policy is to create the conditions for good and equitable health for the entire population and within specific groups and to eliminate avoidable health inequalities. It should also be underlined that decisions made in sectors outside of public health and health care, such as education, transportation, and criminal justice, strongly affect health and well-being (Pollack et al. [14]).

As mentioned above, the main objective of activities in the area of health policy is to improve the health of the population, and this improvement is nowadays understood in two ways (Łyszczarz [15]). First, in terms of improving the average health status, e.g., measured in terms of life expectancy or premature mortality. Second, increasing the importance attached to the issue of inequalities in health. The term ‘health inequality’ refers to differences in the health of individuals or specific subgroups; any measurable aspect of health that varies across individuals or according to socially relevant groupings can be called a health inequality (Boyle [16], Kawachi et al. [17], Arcaya et al. [18]). The aim of actions in the field of health policy is to reduce such inequalities. Large inequalities in health status exist across population groups, countries and specific regions within countries (Wojtyła-Buciora et al. [19], Wojtyła et al. [20]). These health inequalities are linked to many factors, including differential exposure to health risk factors and access to health care (Samet [21]). Inequalities in health are mainly manifested as differences in the health status between socioeconomic groups, but they can also described in terms of employment status, sex or geographic location (Crombie et al. [22]). The prospect of reducing inequalities seems to be increasingly important in contemporary research trends on the implementation of health policy in the world (see, e.g., Spinakis et al. [23]). It should also be emphasized that health inequalities have to be considered as a global problem, which not only affects populations of the poorest countries and regions but also those of the richest ones; persistent health inequalities are among the most serious and challenging health problems worldwide (Barreto [24]). How policies can reduce the main factors of health inequality and promote health equality will be a key challenge for public health in the future.

Monitoring population health and eliminating health inequalities are essential activities aimed at maintaining and improving public health. The main goals of health monitoring involve measuring the extent of health problems, their trends, and the degree of variation between different population groups, including spatial distribution, as well as identifying priority areas for public health (Pizot et al. [25]). Another objective of monitoring is to track the current health situation at the national and local level. This is especially important today, in the era of the Covid-19 pandemic, when up-to-date information is required at lower levels of spatial aggregation.

The unequal spatial distribution of resources, such as clinics, hospitals, nurses, pharmacies, or doctors, could make entire communities more vulnerable and less resilient to adverse health effects. That is why the health situation needs to be investigated by accounting for spatial differences to gain a deeper understanding of why and how some geographical areas experience different health than others (Ozdenerol [26]). Understanding the role played by location in shaping the geographic distribution of the health situation within countries is critical for informing appropriate public health policy regarding prevention and treatment (Casper et al. [27]).

There are numerous articles about the spatial variation in the health situation, health inequalities or health conditions at the local or national level (see, for instance, Gilliland et al. [28], Wang and Nie [29], Chen et al. [30]). In the case of Poland, various analyses have been conducted to investigate regional inequalities in the health status of the population (Wierzbicka [31], Bem et al. [32]).

Interestingly, all studies mentioned above were based on data for a specific year or, in some cases where comparative analysis was involved, for two years (e.g., Shi et al. [33], Hübelová et al. [34]). If the authors of these articles chose, say, p variables describing the health situation of a given territorial unit, then the obtained data were p-dimensional vectors or points in a p-dimensional Euclidean space.

This article presents a more general approach to investigating the health situation across territorial units, which is based on spatio-temporal data. This kind of data is more general than static vector data as it takes into account changes that happen over time. The statistical methodology involving the use of functional discriminant coordinates and cluster analysis is applied to available data for Poland. However, this approach can be used to investigate the health situation or other phenomena at lower levels of spatial aggregation in other countries. For this reason, its results may be useful for policy-makers in the field of public health. The data to measure the health situation in Poland come from the Local Data Bank (LDB). Several important variables related to the health situation observed at the level of districts (LAU—also called poviats) located within provinces (approximately equivalent to NUTS2 (regions) level and also called voivodships) (see Section 2) in the period 2013–2018 were taken into account in the analysis. More specifically, each district is described by 8 variables representing the situation over 6 years. The data for 380 districts were arranged in the form of a matrix with 6 rows and 8 columns, containing a total of 18,240 numerical values.

The main aim of this article is to determine whether Polish provinces are homogeneous in terms of spatio-temporal data characterizing their health situation. In order to answer this question, three multivariate statistical methods were used: multivariate functional discriminant coordinates analysis (MFDCA), functional cluster analysis (FCA), and the multivariate functional coefficient of variation (MFCV).

In the first step, spatio-temporal data were transformed into functional data by applying a continuous function of time t (see, e.g., Górecki and Krzyśko [35]). Functional data can be regarded as realizations of the random process X(t). Then, functional discriminant coordinates were constructed in the functional data space, and further calculations were performed in the functional discriminant coordinate space.

At this point, an important question arises: do the functional data recorded as continuous functions really exist and can these multivariate functions actually be derived? This question is critical because, in practice, values of an observed random process are always recorded in discrete moments in time, sparsely or densely distributed in the interval of variability over time. Thus, in this case, we encounter a time series or, in other words, a highly-dimensional vector of observations. However, there are numerous reasons why it is useful to model a time series as a continuous function (elements of a certain functional space); one of them is that functional data have many advantages in comparison to other representations of time series. In particular, the MFDCA derived in the present study has the following statistical advantages:

  • Firstly, functional data are normally used to cope with the problem of missing observations, which is inevitable in many areas of applied research. Unfortunately, most methods concerning data analysis require complete time series. The removal of a time series with missing observations from a data set is one of popular solutions, but this can lead, and in most cases does lead, to serious data loss. Another possibility is to use one of the many methods of missing data prediction, but, in that case, the results will depend on the interpolation method. Contrary to these approaches, in the case of functional data, the problem of missing observations is resolved by expressing a given time series in the form of a continuous function set.

  • Secondly, in the statistical development of MFDCA, the structure of observations is naturally retained when using functional data, i.e., the temporal link is maintained and the information regarding any measurement is taken into account. Consequently, results are assumed to be robust.

  • Thirdly, moments of observation do not have to be equally spaced in a particular time series, which can be a major advantage in online applications.

  • Fourthly, when using functional data, one avoids the problem of dimensionality. When the total number of time points in which observations are made exceeds the number of time series under analysis, most statistical methods do not provide satisfactory results because of misleading false estimates. In the case of functional data, this problem can be avoided because the time series are replaced by a set of continuous representative functions, which are independent of the time points in which observations are made.

The construction of functional discriminant coordinates is described in Górecki et al. [36], and their application to fruit data can be found in Hanusz et al. [37]. Two other proposals are: kernel discriminant coordinates (Krzyśko et al. [38]) and discriminant coordinates with the additional condition imposed on the covariance matrix (Krzyśko et al. [39]).

In the second step, cluster analysis was used to distinguish between groups of homogeneous provinces. Ward’s hierarchical clustering method was chosen as a commonly used technique in cluster analysis. Moreover, to determine whether obtained clusters are homogeneous, a functional multivariate coefficient of variation was applied.

The main value of this article, according to its authors, is the proposed statistical methodology. Despite the use of country-specific data for the purpose of spatial analysis of the health situation, the presented methods are universal and can be successfully applied to any territorial unit and spatio-temporal dynamic data connected to other phenomena (e.g., poverty or the labor market situation at lower levels of spatial aggregation).

This article is organized as follows. Section 2 contains a short description of data used to analyze differences in the health situation across Polish provinces. The section also provides a description of the administrative division of Poland and details of the procedure of data standardization, as well as their transformation into functional data. Section 3 presents the statistical methodology involving the use of functional discriminant coordinates, cluster analysis, and the functional multivariate coefficient of variation. How this approach was applied to real data describing the health situation in Poland is described in Section 4. Finally, concluding remarks and further steps to be taken in the future are provided in Section 5.

2. The Data

The original data set contains values of p=8 variables characterizing the health situation of the population (see Table 1). All variables come from the LDB, which is Poland’s largest database of information relating to the economy, society and the environment. Data and statistical indicators in the LDB describe entire country, as well as units representing three NUTS levels: macroregions (NUTS1), regions (NUTS2), and subregions (NUTS3).

Table 1.

List of variables used in analysis.

Variable Description Type of Variable
1 Nurses and midwives per 10,000 population S
2 Doctors per 10,000 population S
3 Population per generally available pharmacy D
4 Deaths of people due to cardiovascular disease per 100,000 population D
5 Total deaths due to cancer per 100,000 population D
6 Health out-patient departments per 10,000 population S
7 Number of doctors consultations per 10,000 population S
8 Infant deaths per 1000 live births D

Table 1 also contains information about variable type, with S denoting the so-called stimulant, where a higher value means a better situation (in terms of health), and D denoting the so-called destimulant, where lower values represent a better situation (Walesiak and Dudek [40]).

The variables were selected with a view to obtaining a relatively comprehensive description of the health situation of the population and taking into account their availability and completeness. The data cover the period 2013–2018, i.e., T=6 years and describe n=380 districts located within 16 provinces (see Table 2).

Table 2.

The composition of Polish provinces.

Number Province Name Number of Districts
1 dolnośląskie 30
2 kujawsko-pomorskie 23
3 lubelskie 24
4 lubuskie 14
5 łódzkie 24
6 małopolskie 22
7 mazowieckie 42
8 opolskie 12
9 podkarpackie 25
10 podlaskie 17
11 pomorskie 20
12 śląskie 36
13 świętokrzyskie 14
14 warmińsko-mazurskie 21
15 wielkopolskie 35
16 zachodniopomorskie 21
Total 380

Provinces are essentially equivalent to NUTS2 units, while districts are the upper level of local administrative units, which are currently not part of the NUTS system. The NUTS classification (Nomenclature of territorial units for statistics) is a geographical standard used for a statistical division of the EU Member States economic territories into three regional levels of specified classes of the population. It was established in order to enable the collection, compilation, and dissemination of harmonized regional statistics in the European Union. More information about the administrative division of Poland can be found at https://stat.gov.pl/en/regional-statistics/classification-of-territorial-units/administrative-division-of-poland/. Figure 1 shows the administrative division of Poland into provinces and districts (the left panel) and the division of opolskie (as an example) into districts (the right panel).

Figure 1.

Figure 1

Administrative division of Poland—provinces and districts.

The values of the selected variables, expressed in different measurement units and having different ranges of variation, were standardized using the method of zero unitization (see, for example, Jajuga and Walesiak [41]).

Subsequently, the unitized data were transformed into functional data using the least squares method (see, e.g., Górecki and Krzyśko [35]).

Now, let us assume that the d-th component of the Z process can be represented by a finite number of orthonormal basis functions {φb}:

Zd(t)=b=0Bdαdbφb(t),tI,d=1,2,,p,

where αdb are random variables such that Var(αdb)< for d=1,2,,p and b=0,1,,Bd.

Let

α=(α10,,α1B1,,αp0,,αpBp)

and

Φ(t)=φB1(t)000φB2(t)000φBp(t),

where φBd=(φ0,,φBd), d=1,,p, αRK+p,ΦRp×(K+p),K=B1++Bp. Then,

Z(t)=Φ(t)α,tI. (1)

Individual years (time points) were assigned the following values: t1=0.5(2013),t2=1.5(2014),,t6=5.5(2018). The ϕ functions are considered on the interval I=[0,T]=[0,6]. The Fourier base of the form

ϕ0(t)=1/T,ϕ2k1(t)=2/Tsin(2πkt/T),ϕ2k(t)=2/Tcos(2πkt/T),

where t[0,T], k=1,2,, was adopted as the orthonormal basis. Górecki and Krzyśko [35] showed that the Fourier base leads to a minimal number of terms in the expansion of a given function into a series, which is a desirable feature because expansion coefficients play the role of new variables in the functional approach. Given the small number of time points, for each of the 8 variables, the number of expansion terms was the same and equal to 5. Hence, B1==B8=4, K=B1++B8=32, K+p=40. Thus, αR40 and ΦR8×40.

Figure 2 shows the functional data (average values) for 8 variables and 16 provinces. One can see how the values of individual variables vary over time and between provinces.

Figure 2.

Figure 2

Average values of 8 variables calculated from functional data for districts included in each of the 16 provinces. Note: The ordinate axis shows the unitized values of a given variable.

3. Statistical Methodology

3.1. Functional Discriminant Coordinates

Our purpose is to construct a discriminant coordinate based on multivariate functional data, i.e., to construct

U=<u,Z>=Iu(t)Z(t)dt

such that their between-class variance is maximal compared with the within-class variance, where

u(t)=Φ(t)γ.

The construction of functional discriminant coordinates is described in Górecki et al. [36] and Hanusz et al. [37].

The construction of discriminant coordinates for the random process Z essentially consists in constructing classical discriminant coordinates for a random vector α because the discriminant component Uk has the form Uk=γkα, where α is the random vector in the representation Z(t)=Φ(t)α of the random process Z, and γk is an eigenvector in the generalized eigenproblem (BλkW)γk=0, where B and W are the between-class and within-class matrices, respectively.

Remark 1.

The examination of the elements of the vector weight function for the original processes in each discriminant coordinate (elements of the vectors uk) helps to interpret the principal axes of between-class variation.

At a given time point t, the greater the absolute value of a component of the vector weight function, the greater the contribution in the structure of the given functional discriminant coordinate, from the process Z corresponding to that component. The total contribution of a particular original process Zi in the structure of a particular functional discriminant coordinate is equal to the area under the module weighting function corresponding to this process.

In practice, vector α is unknown and must be estimated based on the sample. Let zi1,zi2,,zini be a sample belonging to the i-th class, where i=1,2,,L. The function zij has the form

zij(t)=Φ(t)aij,

where aij=(a10(ij),,a1K1(ij),,ap0(ij),,apKp(ij)), i=1,2,,L,j=1,2,,ni.

Let

a¯i=1nij=1niaij,a¯=1ni=1Lnia¯i,i=1,,L,n=n1++nL.

Then,

B^=1L1i=1Lni(a¯ia¯)(a¯ia¯),
W^=1nLi=1Lj=1ni(aija¯i)(aija¯i).

Next, we find the non-zero eigenvalues λ^1🟉λ^2🟉λ^s🟉 and the corresponding eigenvectors γ^1,γ^2,,γ^s of the matrix W^1B^, where s=min(K+p,L1). Hence,

u^k(t)=Φ(t)γ^k,

and the coefficients of the projection of the j-th realization zij of the process Z belonging to the i-th class on the k-th functional discriminant coordinate are equal to:

U^ijk=<u^k,zij>=γ^kaij,

for i=1,2,,L,j=1,2,,ni,k=1,2,,s.

The plots of the pairs (U^ij1,U^ij2) provide a visual representation of the relative position of groups in the two-dimensional space. Since the configuration obtained is deemed to be optimal in terms of the ability to discriminate between the groups, wide overlaps are to be considered as a sign of no or small differences between the groups involved.

3.2. Cluster Analysis

Provinces that are homogeneous in terms of the considered variables were identified using cluster analysis. More precisely, we applied Ward’s hierarchical clustering method (see, for example, Seber [42], Chapter 7; Mirkin [43]; Krzyśko et al. [44], Chapter 12). The clustering procedure is based on the Mahalanobis distance between the provinces.

Let U^ij=(U^ij1,,U^ijs). This distance is defined by the following formula:

dij2=(U¯iU¯j)S1(U¯iU¯j),

where

U¯i=1nij=1niU^ij,i=1,2,,L,

and

S=1nLi=1Lj=1ni(U^ijU¯i)(U^ijU¯i),n=n1++nL.

The Mahalanobis distance takes into account not only the difference between the mean vectors of two provinces; the difference is also weighted by the variances and covariances of the examined variables estimated for all provinces (the differentiation of districts around the mean provinces was taken into account).

3.3. Functional Multivariate Coefficient of Variation

Let Z=(Z1,,Zp), be a p-dimensional random process with mean function μ=(μ1,,μp)0. We assume that the process Z belongs to the Hilbert space L2p(I) of p-dimensional vectors of square integrable functions on I.

The functional multivariate coefficient of variation (MFCV) for the random process Z is defined as follows (Krzyśko and Smaga [45])

MFCV=Var(<μ*,Z>μ,

where μ*(t)=μ(t)/μ, tI.

If process Z has the form (1), then

MFCV=aJΦΣαJΦa(aJΦa)2,

where JΦ=diag(Jϕ1,,Jϕp), Jϕk=Iϕk(t)ϕk(t)dt, and Σα=Cov(α) is the Bk×Bk cross product matrix corresponding to the basis {ϕkl}l=1, k=1,,p. For the orthonormal basis, for instance the Fourier basis, the cross product matrix is equal to the identity matrix. Then (Albert and Zhang [46]),

MFCV=aΣαa(aa)2.

4. Results

To construct functional discriminant coordinates, we calculated the estimates ai of the vectors αi, i=1,,L.

The vectors ai were then used to construct the estimator B^ of the matrix of between-class variability and the estimator W^ of the matrix of within-class variability. Next, the non-zero eigenvalues of λ^k🟉 of the matrix W^1B^ and the corresponding eigenvectors γ^k, k=1,,15, were calculated.

Multivariate functional discriminant coordinates have the form:

U^k=<u^k,Z>,

where

u^k(t)=Φ(t)γ^k,k=1,,s,
s=min(K+p,L1)=min(40,15)=15,

are vectors of weight functions.

We treat the resulting multivariate functional discriminant coordinates as indicators (synthetic measures) of the health situation of inhabitants of Polish provinces. These indicators contain full information on the values of 8 diagnostic variables measured over 6 years. They are, therefore, composite indicators of the health situation.

These 15 composite indicators have a different power of differentiating between the provinces (these indicators have different variances (eigenvalues); see Table 3).

Table 3.

Eigenvalues and related statistics.

Number Eigenvalue % Total Variance % Cumulative Variance
1 48.2682 27.6745 27.6745
2 28.6024 16.3991 44.0735
3 26.3461 15.1055 59.1790
4 15.1611 8.6926 67.8716
5 11.6214 6.6631 74.5347
6 9.4012 5.3901 79.9248
7 8.5436 4.8984 84.8232
8 5.6155 3.2197 88.0429
9 4.9859 2.8587 90.9016
10 4.8982 2.8083 93.7099
11 3.4169 1.9591 95.6690
12 2.7111 1.5544 97.2234
13 2.0385 1.1687 98.3921
14 1.4576 0.8357 99.2279
15 1.3467 0.7721 100.0000

The first indicator is the strongest and the fifteenth is the least powerful. It is not possible to see the mutual position of provinces in the 15-dimensional space of these indicators, but it is possible in the space of the first two composite indicators that differentiate between the provinces most clearly.

It can be noticed that 44.1% of total variability is attributed to the first two multivariate functional coordinates.

The mean values of the 16 provinces in the system of the first two functional discriminant coordinates are presented in Table 4.

Table 4.

The mean values of the 16 provinces (first two).

Number Variable 1 Variable 2
1 −1.0418 −0.1272
2 −0.9538 −1.8010
3 2.1856 0.8277
4 −0.2247 −0.0520
5 −0.3005 0.4795
6 1.0408 0.2084
7 0.4584 −1.0042
8 0.9223 1.5262
9 2.5934 −1.5963
10 0.5307 2.2876
11 −1.0424 0.8810
12 −0.7883 0.7896
13 3.0622 0.1767
14 −1.6026 0.6644
15 −1.5847 −1.2004
16 −0.9763 0.6872

The location of the 16 provinces in the system of the first two functional discriminant coordinates is shown in Figure 3.

Figure 3.

Figure 3

Plotted values of the first two functional discriminant coordinates.

The total contribution of the individual variables to the structure of the particular functional discriminant coordinates can be estimated using the area under the absolute value of the weight functions corresponding to a given variable. The graphs of the eight components of the vector weight function for the first and second functional discriminant coordinates are shown in Figure 4.

Figure 4.

Figure 4

Weight functions for the first (left) and the second (right) functional discriminant coordinate.

These contributions, for the first and second functional discriminant coordinates for 8 variables are also given in Table 5. Table 5 shows that the largest share in the construction of the first functional discriminant coordinate is played by variable No. 2 (Doctors per 10,000 population)—32.0%—and variable No. 7 (Number of doctor consultations per 10,000 population)—14.7%. On the other hand, variable No. 4 (Deaths of people due to cardiovascular disease per 100,000 population)—21.0%—and variable No. 2 (Doctors per 10,000 population)—20.2%—have the greatest share in the construction of the second functional discriminant coordinate. Values of coefficients of the vector weight functions are also presented in Table 5.

Table 5.

Values of coefficients of the vector weight functions.

First functional discriminant coordinate
Variable γ^10 γ^11 γ^12 γ^13 γ^14 Area Area (%)
1 2.5407 −3.2389 2.0978 -0.3165 −0.9303 9.0039 8.2381
2 0.5792 7.8019 −0.7947 −7.3340 12.8083 34.9609 31.9876
3 0.2994 −0.5582 2.4586 6.3065 0.1062 14.4837 13.2519
4 −3.0143 0.3007 6.7926 −1.9114 −1.2470 15.1264 13.8399
5 3.8320 0.7027 0.1091 −0.9202 −0.7656 9.3864 8.5882
6 −0.1795 −3.5460 −0.4691 1.3195 −1.3298 8.7501 8.0059
7 0.9525 0.6116 −6.6553 4.7278 1.7190 16.0264 14.6634
8 −0.1802 0.6553 0.0063 −0.1974 0.3950 1.5574 1.4249
Second functional discriminant coordinate
Variable γ^20 γ^21 γ^22 γ^23 γ^24 Area Area (%)
1 −0.8113 2.1021 1.9010 7.7922 6.7172 23.1732 17.3879
2 1.3009 −9.6903 −5.7988 6.8425 3.5944 26.8581 20.1529
3 0.2159 5.0384 0.8509 −7.6787 0.3243 18.8570 14.1493
4 −1.4582 10.6698 0.1764 −1.8355 8.9551 28.0178 21.0231
5 −0.0163 −2.8675 −0.7596 −0.2515 −0.4281 6.5687 4.9288
6 0.7540 2.9261 −1.3371 0.7459 −2.1031 7.4906 5.6206
7 1.2655 −0.1701 4.4019 −4.0271 −7.7531 20.7457 15.5665
8 −0.5597 −0.0570 −0.2735 −0.0484 0.3602 1.5604 1.1709

In the next step, cluster analysis was used to select groups of homogeneous provinces in a fifteen-dimensional space of functional discriminant coordinates. The Ward method was selected as a commonly used technique. The Mahalanobis distance was chosen as a measure of the distance between the mean vectors of individual provinces. The obtained dendrogram is presented in Figure 5.

Figure 5.

Figure 5

Dendrogram for 16 Polish provinces (the Ward method).

We obtained four homogeneous clusters. Which cluster individual provinces belong to is shown in Table 6 and in Figure 6 (spatial distribution).

Table 6.

Membership of provinces in the four clusters.

Number Province Cluster
1 dolnośląskie I
2 kujawsko-pomorskie II
3 lubelskie III
4 lubuskie II
5 łódzkie I
6 małopolskie III
7 mazowieckie I
8 opolskie IV
9 podkarpackie III
10 podlaskie IV
11 pomorskie II
12 śląskie II
13 świętokrzyskie III
14 warmińsko-mazurskie II
15 wielkopolskie II
16 zachodniopomorskie II

Figure 6.

Figure 6

Spatial variation in the health situation across the provinces of Poland.

Taking into account the spatial variation in the health situation of the provinces, it is possible to distinguish four spatial clusters. The first one, denoted as II, consists of six provinces located in the north-western part of Poland. At the opposite end of the country, there are four provinces that make up cluster III. Finally, in the approximate middle belt, one can see cluster I, consisting of three provinces. Cluster IV consists of two non-contiguous provinces (podlaskie and opolskie), located in different parts of Poland.

Figure 6 shows that Poland can be divided into two part: Western Poland (provinces belonging to clusters I and II) and Eastern Poland (provinces belonging to clusters III and IV). The health of inhabitants living in the provinces belonging to cluster I is the best, while that of people living in the provinces belonging to cluster IV is the worst. All the previous studies conducted in Poland show that Western Poland is better developed in socio-economic terms than Eastern Poland (see, for instance, Szymkowiak et al. [47], Marchetti et al. [48], Roszka [49]). Current research shows that this division is also valid as regards the health situation.

Decision-makers at national and local government levels should be advised to redirect more funds to improve the health situation of inhabitants of Easter Poland.

To verify that the obtained four clusters are really homogeneous, a multivariate functional coefficient of variation (MFCV) was calculated for all provinces together and for each cluster separately (see Table 7).

Table 7.

Values of the multivariate functional coefficient of variation (MFCV).

Provinces MFCV
All 0.3705
Cluster I 0.2975
Cluster II 0.2890
Cluster III 0.1728
Cluster IV 0.2047

As can be seen, the coefficients of variation for individual clusters are lower than that for all indeed homogeneous.

5. Conclusions

The above statistical analysis provides evidence for the conclusion that the provinces are not homogeneous in terms of the selected variables characterizing the health situation of their inhabitants. The analysis consisted of multiple steps. Values of the selected variables, which are expressed in different measurement units and have different ranges of variation, were standardized using the method of zero unitization. Then, the unitized data were transformed into functional data in order to enable the construction of discriminant coordinates in the functional data space. The multivariate functional discriminant coordinates were treated as composite indicators (synthetic measures) of the health situation of inhabitants of Polish provinces. These indicators contain full information on the values of 8 diagnostic variables measured over 6 years.

In the next step, cluster analysis was applied to select groups of homogeneous provinces in the space of functional discriminant coordinates using the Ward method. The Mahalanobis distance was chosen as a measure of distance between the mean vectors of individual provinces. The homogeneity of the resulting four clusters was analyzed using a multivariate functional coefficient of variation, which was calculated for all provinces together and for each cluster separately. It turned out that the coefficients of variation for individual clusters are smaller than the corresponding value for combined provinces, which confirms that the clusters are indeed homogeneous.

The obtained clusters illustrate changes in the situation of the provinces (over a period of six analyzed years). In previous studies, data for each year are analyzed separately using classical statistical methods. However, one must not forget that one deals with spatio-temporal data that change over time.

The authors realize that the choice of diagnostic variables may be a weakness of this study. These particular diagnostic variables were selected with a view to obtaining relatively comprehensive description of the health situation of the population, given their availability and completeness. Therefore, the selection should be treated mainly as an illustration of the proposed statistical methodology for processing spatio-temporal data.

The statistical methods used for multivariate functional data were suggested earlier by the authors of this paper.

Author Contributions

Conceptualization, M.K.; Methodology, M.K. and W.W.; Software, W.W. and M.S.; Validation, A.W.; Formal analysis, M.K., W.W., and M.S.; Investigation, M.K., W.W., M.S., and A.W.; Data curation, M.S.; Writing—original draft preparation, M.K., W.W., M.S., and A.W.; Writing—review and editing, M.K., W.W., M.S., and A.W.; Visualization, W.W. and M.S.; Funding acquisition, A.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in the Local Data Bank, which is Poland’s largest database of information relating to the economy, society and the environment.

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Ngamaba K.H., Panagioti M., Armitage C.J. How strongly related are health status and subjective well-being? Systematic review and meta-analysis. Eur. J. Public Health. 2010;27:879–885. doi: 10.1093/eurpub/ckx081. [DOI] [PubMed] [Google Scholar]
  • 2.James C., Devaux M., Sassi F. Inclusive Growth and Health. OECD Publishing; Paris, France: 2017. OECD Health Working Papers, No. 103. [Google Scholar]
  • 3.OECD . Health for Everyone? Social Inequalities in Health and Health Systems. OECD Publishing; Paris, France: 2019. OECD Health Policy Studies. [DOI] [Google Scholar]
  • 4.Penchansky R., Thomas J.W. The concept of access: Definition and relationship to consumer satisfaction. Med. Care. 1981;19:127–140. doi: 10.1097/00005650-198102000-00001. [DOI] [PubMed] [Google Scholar]
  • 5.Wang F. Measurement. optimization. and impact of health care accessibility: A methodological review. Ann. Assoc. Am. Geogr. 2012;102:1104–1112. doi: 10.1080/00045608.2012.657146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Neutens T. Accessibility, equity and health care: Review and research directions for transport geographers. J. Transp. Geogr. 2015;43:14–27. doi: 10.1016/j.jtrangeo.2014.12.006. [DOI] [Google Scholar]
  • 7.Barbarisi I., Bruno G., Diglio A., Elizalde J., Piccolo C. A spatial analysis to evaluate the impact of deregulation policies in the pharmacy sector: Evidence from the case of Navarre. Health Policy. 2019;123:1108–1115. doi: 10.1016/j.healthpol.2019.08.010. [DOI] [PubMed] [Google Scholar]
  • 8.Bruno G., Cavola M., Diglio A., Piccolo C. Improving spatial accessibility to regional health systems through facility capacity management. Socio-Econ. Plan. Sci. 2020;71:100881. doi: 10.1016/j.seps.2020.100881. [DOI] [Google Scholar]
  • 9.Lu C., Zhang Z., Lan X. Impact of China’s referral reform on the equity and spatial accessibility of healthcare resources: A case study of Beijing. Soc. Sci. Med. 2019;235:112386. doi: 10.1016/j.socscimed.2019.112386. [DOI] [PubMed] [Google Scholar]
  • 10.Okuyama K., Akai K., Kijima T., Abe T., Isomura M., Nabika T. Effect of geographic accessibility to primary care on treatment status of hypertension. PLoS ONE. 2019;14:e0213098. doi: 10.1371/journal.pone.0213098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Pu Q., Yoo E.H., Rothstein D.H., Cairo S., Malemo L. Improving the spatial accessibility of healthcare in North Kivu, Democratic Republic of Congo. Appl. Geogr. 2020;121:102262. doi: 10.1016/j.apgeog.2020.102262. [DOI] [Google Scholar]
  • 12.De Leeuw E., Clavier C., Breton E. Health policy–why research it and how: Health political science. Health Res. Policy Syst. 2014;12:55. doi: 10.1186/1478-4505-12-55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Campos P.A., Reich M.R. Political analysis for health policy implementation. Health Syst. Reform. 2019;5:224–235. doi: 10.1080/23288604.2019.1625251. [DOI] [PubMed] [Google Scholar]
  • 14.Pollack Porter K.M., Rutkow L., McGinty E.E. The importance of policy change for addressing public health problems. Public Health Rep. 2018;133:9S–14S. doi: 10.1177/0033354918788880. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Łyszczarz B. Dynamika regionalnych nierówności w zdrowiu w Polsce (Dynamics of Regional Health Inequalities in Poland) Nierówności Społeczne Wzrost Gospod. 2014;38:191–200. [Google Scholar]
  • 16.Boyle P. Global public health—Challenges and leadership. J. Health Inequalities. 2018;4:55–61. doi: 10.5114/jhi.2018.83574. [DOI] [Google Scholar]
  • 17.Kawachi I., Subramanian S.V., Almeida-Filho N. A glossary for health inequalities. J. Epidemiol. Community Health. 2002;56:647–652. doi: 10.1136/jech.56.9.647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Arcaya M.C., Arcaya A.L., Subramanian S.V. Inequalities in health: Definitions, concepts, and theories. Glob. Health Action. 2015;8:1–12. doi: 10.3402/gha.v8.27106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wojtyła-Buciora P., Stawińska-Witoszyńska B., Wojtyła K., Klimberg A., Wojtyła C., Wojtyła A., Samolczyk-Wanyura D., Marcinkowski J.T. Assessing physical activity and sedentary lifestyle behaviours for children and adolescents living in a district of Poland. What are the key determinants for improving health? Ann. Agric. Environ. Med. 2014;21:606–612. doi: 10.5604/12321966.1120611. [DOI] [PubMed] [Google Scholar]
  • 20.Wojtyła C., Biliński P., Paprzycki P., Warzocha K. Haematological parameters in postpartum women and their babies in Poland—Comparison of urban and rural areas. Ann. Agric. Environ. Med. 2011;18:380–385. [PubMed] [Google Scholar]
  • 21.Samet J.M. The environment and health inequalities: Problems and solutions. J. Health Inequalities. 2019;5:21–27. doi: 10.5114/jhi.2019.87818. [DOI] [Google Scholar]
  • 22.Crombie I.K., Irvine L., Elliott L., Wallace H. Closing the Health Inequalities Gap: An International Perspective. WHO Regional Office for Europe; Copenhagen, Denmark: 2005. No. EUR/05/5048925. [Google Scholar]
  • 23.Spinakis A., Anastasiou G., Panousis V., Spiliopoulos K., Palaiologou S., Yfantopoulos J. Expert Review and Proposals for Measurement of Health Inequalities in the European Union—Full Report. European Commission Directorate General for Health and Consumers; Luxembourg: 2011. [Google Scholar]
  • 24.Barreto M.L. Health inequalities: A global perspective. Ciência Saúde Coletiva. 2017;22:2097–2108. doi: 10.1590/1413-81232017227.02742017. [DOI] [PubMed] [Google Scholar]
  • 25.Pizot C., Dragomir M., Macacu A., Koechlin A., Bota M., Boyle P. Global burden of pancreas cancer: Regional disparities in incidence, mortality and survival. J. Health Inequalities. 2019;5:96–112. doi: 10.5114/jhi.2019.87844. [DOI] [Google Scholar]
  • 26.Ozdenerol E. Spatial Health Inequalities: Adapting GIS Tools and Data Analysis. CRC Press; Boca Raton, FL, USA: 2016. [Google Scholar]
  • 27.Casper M., Kramer M.R., Peacock J.M., Vaughan A.S. Population Health, Place, and Space: Spatial Perspectives in Chronic Disease Research and Practice. Prev. Chronic Dis. 2019;16:E123. doi: 10.5888/pcd16.190237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Gilliland J.A., Shah T.I., Clark A., Sibbald S., Seabrook J.A. A geospatial approach to understanding inequalities in accessibility to primary care among vulnerable populations. PLoS ONE. 2019;14:e0210113. doi: 10.1371/journal.pone.0210113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wang Z., Nie K. Measuring Spatial Patterns of Health Care Facilities and Their Relationships with Hypertension Inpatients in a Network-Constrained Urban System. Int. J. Environ. Res. Public Health. 2019;16:3204. doi: 10.3390/ijerph16173204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Chen J., Bai Y., Zhang P., Qiu J., Hu Y., Wang T., Xu C., Gong P. A Spatial Distribution Equilibrium Evaluation of Health Service Resources at Community Grid Scale in Yichang, China. Sustainability. 2020;12:52. doi: 10.3390/su12010052. [DOI] [Google Scholar]
  • 31.Wierzbicka A. Taxonomic analysis of the Polish public health in comparison with selected European countries. Stat. Transition. New Ser. 2012;13:343–364. [Google Scholar]
  • 32.Bem A., Ucieklak-Jeż P., Siedlecki R. The spatial differentiation of the availability of health care in polish regions. Procedia-Soc. Behav. Sci. 2016;220:12–20. doi: 10.1016/j.sbspro.2016.05.464. [DOI] [Google Scholar]
  • 33.Shi L., Starfield B., Kennedy B., Kawachi I. Income inequality, primary care, and health indicators. J. Fam. Pract. 1999;48:275–284. [PubMed] [Google Scholar]
  • 34.Hübelová D., Kozumpliková A., Jadaczková V., Rousová G. Spatial differentiation of selected healyh factors of the South Moravian Region population. Geogr. Cassoviensis. 2018;XII:33–52. [Google Scholar]
  • 35.Górecki T., Krzyśko M. Functional Principal Components Analysis. In: Pociecha J., Decker R., editors. Data Analysis Methods and Its Applications. C.H. Beck; Munich, Germany: 2012. pp. 71–87. [Google Scholar]
  • 36.Górecki T., Krzyśko M., Waszak Ł., Wołyński W. Selected statistical methods of data for multivariate functional data. Stat. Pap. 2018;59:153–182. [Google Scholar]
  • 37.Hanusz Z., Krzyśko M., Nadulski R., Waszak Ł. Discriminant coordinates analysis for multivariate functional data. Commun. Stat. Theory Methods. 2020;49:4506–4519. doi: 10.1080/03610926.2019.1602650. [DOI] [Google Scholar]
  • 38.Krzyśko M., Wołyński W., Ratajczak W., Kierczyńska A., Wenerska B. Sustainable development of Polish macroregions-study by means of the kernel discriminant coordinates method. Int. J. Environ. Res. Public Health. 2020;17:7021. doi: 10.3390/ijerph17197021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Krzyśko M., Łukaszonek W., Wołyński W. Discriminant coordinates analysis in the case of multivariate repeated measures data. Stat. Transit. New Ser. 2018;19:495–506. doi: 10.21307/stattrans-2018-027. [DOI] [Google Scholar]
  • 40.Walesiak M., Dudek A. The Choice of Variable Normalization Method in Cluster Analysis. In: Soliman K.S., editor. Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development During Global Challenges. International Business Information Management Association; Seville, Spain: 2020. pp. 325–340. [Google Scholar]
  • 41.Jajuga K., Walesiak M. Classification and Information Processing at the Turn of the Millennium. Springer; Berlin/Heidelberg, Germany: 2000. Standardisation of data set under different measurement scales; pp. 105–112. [Google Scholar]
  • 42.Seber G.A.F. Multiple Observations. John Wiley and Sons, Inc.; New York, NY, USA: 2004. [Google Scholar]
  • 43.Mirkin B. Clustering: A Data Recovery Approach. 2nd ed. Taylor and Francis Group, LLC.; London, UK: 2013. [Google Scholar]
  • 44.Krzyśko M., Wołyński W., Górecki T., Skorzybut M. Learning Systems: Pattern Recognition, Cluster Analysis and Dimensional Reduction. Scientific and Technical Publishers; Warsaw, Poland: 2008. [Google Scholar]
  • 45.Krzyśko M., Smaga Ł. A multivariate coefficient of variation for functional data. Stat. Its Interface. 2019;12:647–658. [Google Scholar]
  • 46.Albert A., Zhang L. A novel definition of the multivariate coefficient of variation. Biom. J. 2010;52:667–675. doi: 10.1002/bimj.201000030. [DOI] [PubMed] [Google Scholar]
  • 47.Szymkowiak M., Młodak A., Wawrowski Ł. Mapping poverty at the level of subregions in Poland using indirect estimation. Stat. Transit. New Ser. 2017;18:609–635. doi: 10.21307/stattrans-2017-003. [DOI] [Google Scholar]
  • 48.Marchetti S., Beręsewicz M., Salvati N., Szymkowiak M., Wawrowski Ł. The use of a three-level M-quantile model to map poverty at local administrative unit 1 in Poland. J. R. Stat. Soc. Ser. A. 2018;181:1077–1104. doi: 10.1111/rssa.12349. [DOI] [Google Scholar]
  • 49.Roszka W. Spatial microsimulation of personal income in Poland at the level of subregions. Stat. Transition. New Ser. 2019;20:133–153. doi: 10.21307/stattrans-2019-028. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data presented in this study are openly available in the Local Data Bank, which is Poland’s largest database of information relating to the economy, society and the environment.


Articles from International Journal of Environmental Research and Public Health are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES