Reconstruction and normalization of LISA for spatial analysis

Yanguang Chen

doi:10.1371/journal.pone.0303456

. 2024 May 22;19(5):e0303456. doi: 10.1371/journal.pone.0303456

Reconstruction and normalization of LISA for spatial analysis

Yanguang Chen ^1,^*

Editor: Yuxia Wang²

PMCID: PMC11111027 PMID: 38776327

Abstract

The local indicators of spatial association (LISA) are important measures for spatial autocorrelation analysis. However, there is an inadvertent fault in the mathematical processes of deriving LISA in literature so that the local Moran and Geary indicators do not satisfy the second basic requirement for LISA: the sum of the local indicators is proportional to a global indicator. This paper aims at reconstructing the calculation formulae of the local Moran indexes and Geary coefficients through mathematical derivation and empirical evidence. Two sets of LISAs were clarified by new mathematical reasoning. One set of LISAs is based on non-normalized weights and non-centralized variable (MI1 and GC1), and the other set is based on row normalized weights and standardized variable (MI2 and GC2). The results show that the first set of LISAs satisfy the above-mentioned second requirement, but the second the set cannot. Then, the third set of LISA was proposed and can be treated as canonical forms (MI3 and GC3). This set of LISAs satisfies the second requirement. The observational data of city population and traffic mileage in Beijing-Tianjin-Hebei region of China were employed to verify the theoretical results. This study helps to clarify the misunderstandings about LISAs in the field of geospatial analysis.

1 Introduction

Geography has two core concepts on location effect: difference and dependence. The former is related to a classical topic of geography, while the latter is related to spatial correlation analysis. The concept of spatial difference is also termed regional differences, which came from areal differentiation [1–3]. The traditional concept of difference seems to be in contradiction with the pursuit of general laws, so geography embarks on the road of "exceptionalism" [4]. After the quantitative revolution (1953–1976), geography began to attach importance to spatial organization and correlation, which indicates spatial dependence. Spatial interaction models and spatial autocorrelation analysis are the main approaches to research spatial correlation processes [5, 6]. Spatial autocorrelation is originally a biological statistic concept, which is mainly used to evaluate whether the spatial sampling results meet the traditional statistical requirements [7–9]. When geographers introduced spatial autocorrelation measure into geospatial analysis, they found that there are few spatial uncorrelated phenomena. In this context, the spatial autocorrelation analysis method was developed [10–12]. The early spatial autocorrelation analysis was only at the global level, rarely involving the local level, so it provided limited geospatial information. In other words, the initial spatial autocorrelation focuses on spatial dependence rather than spatial difference. After the theoretical revolution in the later period of the quantitative revolution was frustrated, the traditional regional trend of thought of geography returned quietly, and the concept of regional difference was again valued by geographers with a new expression of spatial heterogeneity [13]. Tobler proposed the first law of geography based on spatial dependence [14], and Harvey proposed that spatial heterogeneity be the second law of geography [15]. The study of spatial heterogeneity naturally involves spatial locality. According to Fotheringham [16–18], there are three trends in the development of quantitative geography: localization, computation and visualization. In this sense, local spatial autocorrelation analysis came into being [13, 19–22]. Therefore, spatial difference (heterogeneity) and spatial correlation (dependency) have reached the same goal through different routes [13, 23].

Local spatial autocorrelation analysis is developed on the basis of global spatial autocorrelation analysis. The Local Indicators of Spatial Association (LISA) proposed by Anselin [19] plays an important role in the local correlation analysis of geographical research. LISA includes local Moran indexes and local Geary coefficients. These spatial statistics, together with the G index proposed by Getis and Ord [21] and Moran scatterplot proposed by Anselin [13], have become systematic tools for local autocorrelation analysis. However, even the wisest are not always free from error. The Anselin’s outstanding paper contains some important issues that need to be addressed. The main problems are as follows. First, there is an unintentional mistake of mathematical reasoning resulted from step skip of mathematical transformation. This mistake leads readers to misunderstand the relationship between global normalized spatial weight matrix and row-normalized spatial weight matrix. Second, the row-normalized spatial weight matrix violates the distance axiom. A spatial weight matrix is based on distance matrix or generalized distance matrix, which must conforms to distance axiom. Otherwise, the calculation result of the global or local Moran’s index may appear abnormal. Third, the basic difference between Moran’s index and Geary’s coefficient was omitted. Moran’s index is based on spatial population, while Geary’s coefficient is based on spatial sample. Different definitions lead to different application directions. However, in the definitions of LISA, the local Geary’s coefficient is based on spatial population rather spatial sample. This is not consistent with original aim of defining Geary’s coefficient.

The above issues cause a series of consequences. First, the two sets of LISA values are not equivalent to each other. For example, the ratios of the LISA values based on non-normalized spatial weight matrix to the LISA values based on normalized spatial weight matrix are not constants. This is a serious logical problem. As we know, if two measures are equivalent to one another, the ratio of the two measures is a constant. For example, the ratio of Student’s t statistic to Pearson’s part correlation coefficient is a constant, which equals the square root of the ratio of residuals mean square deviation to total sum of squares. Second, sometimes, the calculated values of Moran’s index and Geary’s coefficient exceed reasonable upper and lower limits. Moran’s index bear two sets of boundary values at least. One is absolute boundary values, that is -1 and 1, which depend on the mathematical structure of Moran’s index formula and can be proved by conditional extremum principle of quadratic form. The other is relative boundary values, which are determined by the maximum and minimum eigenvalues of normalized spatial weight matrix [24–26]. Beyond the boundary values of spatial statistics is another logical problem. One of the key reasons lies in that symmetric spatial contiguity matrix is replaced by asymmetric row normalized spatial weight matrix in the process of mathematical deduction. What is more, Anselin’s LISA lack clear boundary value and critical value. Anyway, spatial statistics represent a kind of measures, which may be used to describe or infer. No matter where the goal is, a good measure should have a clear critical value or boundary value. For example, the boundary values of Pearson correlation coefficient is -1 and 1, and the critical value is 0. The purpose of this paper is to develop the spatial measures based on LISA. The rest parts are organized as below. In Section 2, Anselin’s mathematical reasoning process is sorted out and his unintentional mistakes are corrected. Based on the mathematical derivation, the local Moran index and local Geary coefficient will be normalized. In addition, the strict mathematical relationship between Moran’s indexes and Geary’s coefficients are derived. In Section 3, the observational data of the system of cities in Beijing-Tianjin-Hebei region in China will be employed to testify the improved results. In Sections 4 and 5, the related questions are discussed, and finally, the discussion will be concluded by summarizing the main points of this study.

2 Theoretical results

2.1 Local spatial autocorrelation measurements

2.1.1 The first formula of local Moran index

One of the bases of spatial analysis is spatial proximity matrix, which can be measured by spatial distance matrix. Spatial distance matrix or spatial proximity matrix can be transformed into spatial contiguity matrix by means of spatial weight function such as negative power law or step function [27, 28]. A spatial contiguity matrix can be treated as non-normalized spatial weight matrix. Suppose that there are n elements in a geographical region, and this size of the ith element is measured by x_i (i = 1, 2,…,n). The size variable x are not standardized and the spatial contiguity matrix V = [v_ij] is not transformed into the globally normalized spatial weight matrix W = [w_ij]. Note that the so-called global normalization refers to the normalization of a matrix or vector by the sum of its elements. So, global normalization can also be termed sum-normalization or sum-based normalization. Correspondingly, row-normalization is a type of local normalization which can also be called row-based normalization. Using the symbol systems defined in this context, we can extract two sets of local spatial autocorrelation statistics (Table 1). The first local Moran index formula defined by Anselin [19] is as follows

I_{i}^{*} = (x_{i} - \bar{x}) \sum_{j = 1}^{n} v_{i j} (x_{j} - \bar{x}) = y_{i} \sum_{j = 1}^{n} v_{i j} y_{j},

(1)

where $y_{i} = x_{i} - \bar{x}$ , $y_{j} = x_{j} - \bar{x}$ denote centralized size variables, and $\bar{x}$ refers to mean value. In Eq (1), i≠j, otherwise v_ij = 0. The centralized variables can be transformed into standardized variables by means of z-score formula. Based on population standard derivation, the standardized variables can be expressed as

z_{i} = \frac{y_{i}}{σ} = \frac{x_{i} - \bar{x}}{σ}, z_{j} = \frac{y_{j}}{σ} = \frac{x_{j} - \bar{x}}{σ},

where z denotes standardized variable, and σ refers to population standard deviation. The sum of Eq (1) is

\sum_{i = 1}^{n} I_{i}^{*} = \sum_{i = 1}^{n} y_{i} \sum_{j = 1}^{n} v_{i j} y_{j} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j} y_{i} y_{j},

(2)

which is essentially the sum of spatially weighted outer products of centralized variables. The spatial weight coefficient is not normalized by sum. The sum of the elements in spatial contiguity matrix is

V_{0} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j} .

(3)

Table 1. Three sets of LISAs researched in this paper based on Anselin’s work.

Item	Index	Weight matrix	Size variable	Symbol
First set of local LISA	Local Moran’s I	No normalization	Centralization	MI1
First set of local LISA	Local Geary’s C	No normalization	Centralization	GC1
Second set of local LISA	Local Moran’s I	Row normalization	Standardization based on population standard deviation	MI2
Second set of local LISA	Local Geary’s C	Row normalization	Standardization based on population standard deviation	GC2
Third set of local LISA	Local Moran’s I	Global normalization	Standardization based on population standard deviation	MI3
Third set of local LISA	Local Geary’s C	Global normalization	Standardization based on sample standard deviation	GC3

Open in a new tab

Note: If a spatial dataset is large enough, the distinction between population standard derivation and sample standard derivation can be ignored. However, sometimes the spatial data set is not so large, and this difference cannot be ignored, otherwise biased calculation results may lead to inappropriate conclusions.

Dividing Eq (2) by V₀ yields spatial precision weighted auto-covariance as follows

C o v = \frac{1}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j}} \sum_{i = 1}^{n} I_{i}^{*} = \frac{1}{V_{0}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j} y_{i} y_{j} .

(4)

Furthermore, the spatial weighted covariance can be divided by the population variance of the size variable, which is called the second moment in literature [19], that is

σ^{2} = \frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}^{2} .

(5)

The result is global Moran’s index, I = Cov/σ². It can be expanded as

I = \frac{\frac{1}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j}} \sum_{i = 1}^{n} I_{i}^{*}}{\frac{1}{n} \sum_{i = 1}^{n} y_{i}^{2}} = \frac{n \sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j} y_{i} y_{j}}{V_{0} \sum_{i = 1}^{n} y_{i}^{2}} = \frac{1}{σ^{2} V_{0}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j} y_{i} y_{j} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} z_{i} z_{j},

(6)

where w_ij is the element of the globally normalized weight matrix W. According to Anselin [19], Eq (6) can be expressed as

I = \frac{1}{σ^{2} V_{0}} \sum_{i = 1}^{n} I_{i}^{*} .

(7)

The relationship between the sum of Anselin’s first local Moran’s indexes and the global Moran’s index is obtained as below

\sum_{i = 1}^{n} I_{i}^{*} = σ^{2} V_{0} I = γ I .

(8)

The proportionality coefficient in Eq (8) is

γ = σ^{2} V_{0} = (\frac{1}{n} \sum_{i = 1}^{n} y_{i}^{2}) (\sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j}),

(9)

which represents the general expression of the ratio of the sum of local Moran’s indexes to the global Moran’s index. Please note that Eqs (8) and (9) are derived from the relations based on non-normalized spatial weight matrix. They cannot be directly applied to the mathematical processes based on row-normalized spatial weight matrix. According to Anselin [19], Eq (3) can be replaced by a vector indicating the sum of rows of the spatial contiguity matrix as below

V_{i} = \sum_{j = 1}^{n} v_{i j} .

(10)

Correspondingly, spatial contiguity matrix can be normalized by row. Anselin called it row-standardized spatial weights matrix [19]. In this way, Eq (4) becomes a locally weighted spatial auto-covariance, that is

C o v_{i} = \frac{I_{i}^{*}}{\sum_{j = 1}^{n} v_{i j}} = \frac{1}{V_{i}} \sum_{j = 1}^{n} v_{i j} y_{i} y_{j} .

(11)

The summation of Eq (11) is

\sum_{i = 1}^{n} C o v_{i} = \sum_{i = 1}^{n} \frac{I_{i}^{*}}{V_{i}} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} \frac{v_{i j}}{V_{i}} y_{i} y_{j} .

(12)

Based on Eqs (11) and (12), it is impossible to obtain the global spatial weighted auto-covariance, and it is impossible to derive the simple summation relationship between local Moran index and global Moran index. If so, the reasoning from Eq (4) to Eq (9) will be invalid.

It can be seen that the local-global relationship based on Anselin’s first local Moran index formula suggests a global normalized weight matrix with symmetry. The first local Moran index formula of Anselin [19] is correct, it satisfy the two requirements defined by Anselin [19]. The shortcoming lies in that it is not standardized. A good measure should have a clear critical value (reference value) or a pair of explicit boundary values. However, the local Moran index calculated by Eq (1) has neither boundary values nor clear threshold value.

2.1.2 The second formula of local Moran index

Suppose that the variables are standardized, the spatial contiguity matrix is transformed into a spatial weight matrix which is normalized by row. In this way, V₀ in is replaced by V_i in Eq (4). Thus, revised Eq (4) divided by population variance yields the second local Moran’s index formula of Anselin [19], I_i** = Cov_i/σ², that is

I_{i}^{* *} = \frac{1}{σ^{2}} \sum_{j = 1}^{n} \frac{v_{i j}}{V_{i}} y_{i} y_{j} = \frac{1}{σ^{2}} y_{i} \sum_{j = 1}^{n} w_{i j}^{*} y_{j},

(13)

where w_ij* denotes the elements in the row-normalized spatial weight matrix, V^*. Apparently, Eq (13) is based on Eqs (10) and (11). Thus, in terms of Eq (10), the sum of the spatial weight matrix is

V_{0}^{*} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} \frac{v_{i j}}{V_{i}} = \sum_{i = 1}^{n} (\frac{1}{V_{i}} \sum_{j = 1}^{n} v_{i j}) = \sum_{i = 1}^{n} (1) = n .

(14)

The variance of standardized variable is 1, namely, σ² = 1. For normalized matrix by row, the sum is V₀* = n, thus we have

γ = σ^{2} V_{0}^{*} = V_{0}^{*} = n .

(15)

Substituting Eq (15) into Eq (8) seems to yield the following relation

\sum_{i = 1}^{n} I_{i}^{* *} = n I,

(16)

which is once of relations given by Anselin [19]. Note that the symbols have been slightly changed. That is, V₀ is replaced by V₀*, and I_i* is replaced by I_i**. The new added asterisk indicates the inherent difference between the two sets of local Moran’s indexes. On the surface, there is no problem at all in the mathematical derivation process. However, Anselin [19] inadvertently made a mistake in above reasoning process (S1 File). Looking at Eq (14) alone, we may think that there is no problem. However, by summing Eq (13), it is impossible to extract an independent Eq (14), and this is exactly the problem. In fact, Anselin [19] unintentionally replaced a mathematical concept by directly applying the derived results based on non-normalized weight matrices to the relationship formula based on row-normalized spatial weight matrices. Regardless of whether the spatial contiguity matrix is symmetric or not, the non- normalized spatial weight matrix and the row normalized spatial weight matrix are not isomorphic to each other. However, the non-normalized spatial weight matrix is isomorphic to the sum-based normalized spatial weight matrix.

Mathematical deduction problems can be revealed through logical analysis, and also can be reflected through empirical analysis. Let us check the problem from another view of angle. The relation between the second set of local Moran’s indexes of Anselin [19] and global Moran’s index can be derived from Eq (13). The summation of the local Moran’s indexes based on Eq (13) is

\sum_{i = 1}^{n} I_{i}^{* *} = \frac{1}{σ^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \frac{v_{i j}}{V_{i}} y_{i} y_{j} = V_{0} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \frac{w_{i j}}{V_{i}} z_{i} z_{j} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j}^{*} z_{i} z_{j} .

(17)

By variable standardization, the population standard deviation becomes 1 unit, i.e., σ² = 1. However, the row sum of spatial contiguity matrix V_i is not a constant. It can neither be eliminated nor converted to a constant. Therefore, no constant proportionality relation between the second set of local Moran’s index and the global Moran’s index. If and only if Eq (6) is introduced into Eq (17) can the proportional relationship similar to Eq (8) be derived. Based on Eq (6), Eq (17) can be re-expressed as

\sum_{i = 1}^{n} I_{i}^{* *} = \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j}^{*} z_{i} z_{j}}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} z_{i} z_{j}} I .

(18)

Unfortunately, we cannot prove the following relation:

\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j}^{*} z_{i} z_{j} = n \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} z_{i} z_{j} = n I .

(19)

This lends further support to the judgment that Eq (16) does not hold. However, the proportional relationship given in Eqs (17) and (18) can be easily verified by the observational data. Another view of angle is to examine the ratios of two sets of local Moran indices. If the ratios are constant, the two definitions are equivalent to one another, otherwise they are not. In fact, the values in the first set of local Moran indexes divided by the corresponding values in the second set of local Moran indexes yields

\frac{I_{i}^{*}}{I_{i}^{* *}} = \frac{σ^{2} \sum_{j = 1}^{n} v_{i j} y_{i} y_{j}}{\sum_{j = 1}^{n} \frac{v_{i j}}{V_{i}} y_{i} y_{j}} = σ^{2} V_{i},

(20)

which, obviously, is a variable that changes with V_i rather than a constant.

It can be seen that the ratios of two sets of local Moran’s indexes are not constant, so they are not equivalent to each other. This suggests that, the second set of local Moran indexes cannot satisfy the second requirement of Anselin [19], which said, “The sum of the local indicators is proportional to a global indicator”. The reason for the fault is that Anselin [19] inadvertently replaced a concept in this mathematical derivation. Concretely speaking, the globally normalized symmetric weight matrix W becomes the locally normalized asymmetric weight matrix V^*. This way violates the law of identity of concepts and the principle of logical consistency in mathematical reasoning.

2.1.3 The formula of local Geary coefficient

The global Geary coefficient is complementary to the global Moran index: the former is oriented to spatial sample analysis, and the latter is based on spatial statistical population. Similar to the treatment of local Moran index, two local Geary statistics were defined by Anselin [19]. It is assumed that the variables are not standardized and the spatial contiguity matrix is not transformed into a global normalized spatial weight matrix. Anselin [19] defined the first local Geary’s coefficient as

C_{i}^{*} = \sum_{j = 1}^{n} v_{i j} {(y_{i} - y_{j})}^{2},

(21)

in which the divisor 2 is ignored. Suppose that the variable is standardized, and the spatial contiguity matrix is transformed into a row normalized spatial weight matrix. Anselin [19] defines the second local Geary coefficient as

C_{i}^{* *} = \frac{1}{σ^{2}} \sum_{j = 1}^{n} w_{i j}^{*} {(y_{i} - y_{j})}^{2} .

(22)

Summation of Eq (21) divided by the population variance σ² is

\frac{1}{σ^{2}} \sum_{i = 1}^{n} C_{i}^{*} = \frac{n \sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j} {(y_{i} - y_{j})}^{2}}{\sum_{i = 1}^{n} y_{i}^{2}} = \frac{2 n V_{0}}{n - 1} \frac{(n - 1) \sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j} {(y_{i} - y_{j})}^{2}}{2 V_{0} \sum_{i = 1}^{n} y_{i}^{2}} = γ_{c} C,

(23)

where C refers to global Geary coefficient. It can be expressed as

C = \frac{(n - 1) \sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j} {(y_{i} - y_{j})}^{2}}{2 V_{0} \sum_{i = 1}^{n} y_{i}^{2}} = \frac{1}{2 s^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} {(y_{i} - y_{j})}^{2} = \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} {(z_{i}^{*} - z_{j}^{*})}^{2} .

(24)

in which z* referes to the standardized size variable based on the sample standard deviation s, i.e.,

z_{i}^{*} = \frac{y_{i}}{s} = \frac{x_{i} - \bar{x}}{s}, z_{j}^{*} = \frac{y_{j}}{s} = \frac{x_{j} - \bar{x}}{s} .

Here s denotes sample standard deviation, that is, s = σ(n/(n-1))^1/2. In addition, the proportional coefficient between the sum of the first local Geary coefficient divided by the population variance and the global Geary coefficient is as below

γ_{c} = \frac{2 n V_{0}}{n - 1} .

(25)

Therefore, the relationship between the sum of the first local Geary coefficients and the global Geary coefficients is

\sum_{i = 1}^{n} C_{i}^{*} = \frac{2 n V_{0} σ^{2}}{n - 1} C = γ_{c} σ^{2} C .

(26)

This formula is correct, and it satisfies the two requirements given by Anselin [19]. However, it is neither direct nor standard. Dividing the summation of Eq (21) by both the population variance σ² and the sum of the spatial weight matrix V₀ to obtain the relationship between the local Geary’s coefficients and the global Geary coefficient, that is

\sum_{i = 1}^{n} C_{i}^{* *} = \frac{n \sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j} {(y_{i} - y_{j})}^{2}}{V_{0} \sum_{i = 1}^{n} y_{i}^{2}} = \frac{2 n}{n - 1} \frac{(n - 1) \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} {(y_{i} - y_{j})}^{2}}{2 \sum_{i = 1}^{n} y_{i}^{2}} = \frac{2 n}{n - 1} C .

(27)

This is the corrected expression of the relationship between local Geary coefficient and global Geary’s coefficient, differing from that given by Anselin [19]. The reason is that derivation of this relationship is based on the global normalization of spatial weight matrix. However, due to the fact that divisor 2 is ignored in Eq (21), when n is sufficiently large in Eq (27), the sum of local Geary’s coefficients does not equal the global Geary’s coefficient. Based on the row-normalized weight matrix, the sum of local Geary’s coefficients is

\sum_{i = 1}^{n} C_{i}^{* *} = \frac{n \sum_{i = 1}^{n} \frac{1}{V_{i}} \sum_{j = 1}^{n} v_{i j} {(y_{i} - y_{j})}^{2}}{\sum_{i = 1}^{n} y_{i}^{2}} = \frac{n V_{0} \sum_{i = 1}^{n} \frac{1}{V_{i}} \sum_{j = 1}^{n} w_{i j} {(y_{i} - y_{j})}^{2}}{\sum_{i = 1}^{n} y_{i}^{2}} .

(28)

The constant proportional relationship between local Geary coefficient and global Geary coefficient cannot be derived in terms of Eq (28). Anselin [19] believes that, according to Eq (25), for the weight matrix normalized by row, V₀ = n, so there is γ_c = 2n²/(n-1), that’s right. Then he gave the following relation

\sum_{i = 1}^{n} C_{i}^{* *} = \frac{2 n^{2}}{n - 1} C = γ_{c} C .

(29)

This is wrong and cannot be strictly derived by mathematical methods, nor can it be verified by observational data. Based on the row-normalized weight matrix, the correct result is

\sum_{i = 1}^{n} C_{i}^{* *} = \frac{2 n}{n - 1} \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j}^{*} {(z_{i}^{*} - z_{j}^{*})}^{2}}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} {(z_{i}^{*} - z_{j}^{*})}^{2}} C = γ_{c}^{*} C,

(30)

in which γ_c* represents the proportionality coefficient. The coefficient can be expressed as

γ_{c}^{*} = \frac{2 n}{n - 1} \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j}^{*} {(z_{i}^{*} - z_{j}^{*})}^{2}}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} {(z_{i}^{*} - z_{j}^{*})}^{2}},

(31)

which is not a constant. It cannot be proved that Eq (29) is equivalent to Eq (30). Moreover, starting from Eqs (21) and (22), the proportional relationship between the two sets of local Geary coefficients is

\frac{C_{i}^{*}}{C_{i}^{* *}} = \frac{σ^{2} \sum_{j = 1}^{n} v_{i j} {(y_{i} - y_{j})}^{2}}{\sum_{j = 1}^{n} \frac{v_{i j}}{V_{i}} {(y_{i} - y_{j})}^{2}} = σ^{2} V_{i} = \frac{I_{i}^{*}}{I_{i}^{* *}} .

(32)

This is obviously not a constant, but a variable that changes with the sum of the rows of the spatial proximity matrix. This shows that the two sets of local Geary coefficients are not equivalent to each other, and the ratio of the corresponding values of the two sets of local Geary coefficients is equal to the ratio of the values of the two sets of local Moran’s indices. In short, the second set of local Geary statistic does not satisfy the second requirement given by Anselin [19].

2.2 Revised and normalized results

2.2.1 Adjustment of symbol system and clarification of concept

Concept is the cornerstone of logic. If and only if a concept is clear, there will be no mistakes in reasoning. The premise of mathematical reasoning is the symbolization of concepts. Confusion of symbols can easily lead to mistakes in reasoning. The main reason for the inconsistency between the two sets of LISA proposed by Anselin [19] is the unintentional concept substitution caused by the symbol mixing of spatial measure matrixes. At present, there are several problems about spatial autocorrelation in geographical literature.

Firstly, the symbols of the spatial weight matrix need to be improved. The symbols of spatial contiguity matrix (SCM), say, [1/d_ij], and those of spatial weight matrix (SWM), say, [v_ij/∑∑v_ij], where v_ij = 1/d_ij, are confused with each other. The two matrixes are regarded as equivalence and are both represented by the same symbol [w_ij]. In fact, the spatial distance matrix can be transformed into a spatial contiguity matrix according to a certain distance decay function, and the weight matrix can be obtained by normalizing the spatial contiguity matrix [29]. Despite the final result is the same in the case of symbol confusion, the expression form causes many unnecessary misunderstandings for beginners. This paper distinguishes the symbols as follows: SCM is represented by V, its elements are represented by v_ij; SWM is represented by W, and its elements are expressed as w_ij. Thus we have SCM, V = [v_ij], and SWM, W = [w_ij] = [v_ij/∑∑v_ij].

Secondly, the definitions of spatial matrixes need to be explained. After the spatial contiguity matrix (SCM) is transformed into the spatial weight matrix (SWM), the global normalization and local normalization by row are confused. Anselin [19], the original founder of the local Moran index, adopted the method of row normalization (he term the processing “row-standardization”). The sum of the SWM elements is thus equal to n. However, this method will lead to two results: (1) The symmetry of the spatial distance matrix is broken. Spatial weight matrix comes from spatial distance matrix or generalized spatial distance matrix. One of the important properties of distance measure is symmetry: d_ij = d_ji holds for all i and j [30]. This is one of the four principles of the distance axioms (positivity, specification, symmetry, and triangle inequality). (2) The absolute value of the calculated local Moran index may exceed 1 sometimes. Moran index is an autocorrelation coefficient whose absolute value should fall between—1 and 1 in theory. As for the special boundary values of Moran’s index determined by the maximum and minimum eigenvalues of the spatial weight matrix, it should be discussed in another work.

Thirdly, the meanings and symbols of the two types of variance are different. The population variance is often confused with the sample variance in spatial statistics. Moran’s index is defined based on population variance, and Geary’s coefficient is defined based on sample variance [29]. According to Fisher’s symbol system in statistics, the population variance is expressed as σ², and the denominator in the formula is n; the sample variance is expressed as s², and the denominator in the formula is n-1 in the formula [31]. The relationship between them is σ² = (n-1)s²/n.

Fourth, the difference in numbering between rows and columns needs to be noted. There is sometimes confusion between row summation and column summation. The sum based on row vector is expressed as summation by j, and the sum of column vector is expressed as summation by i. Based on globally normalized weight matrix, the difference is only formal and has nothing to do with the results. However, based on row-normalized weight matrix, the results of row summation differs from the results of column summation.

Fifth, the methods of value transformation need to be particularly clarified. The concepts of normalization and standardization are always confused in literature. Generalized standardization includes normalization. However, both standardization and normalization have different definition methods and corresponding calculation formulas. The transformation formula of variables should be determined according to different research objectives (S2 File).

In order to make it easy for readers to understand, it is necessary to distinguish symbols, and then clarify the concept of variable transformation. There are three principles for adopting symbols in this paper: First, the principle of consensus. Priority will be given to the conventional expression in the field of mathematical statistics. For example, the population standard deviation is expressed as σ, and the sample standard deviation is expressed as s [31]. Second, the principle of direction. For example, the spatial weight matrix represents W because “W” it is the capital form of the initial of “weight”. Third, the principle of distinction. For example, the spatial contiguity matrix represents V, so as to distinguish it from the spatial weight matrix W, and this distinguishing facilitates mathematical reasoning. Among the above three principles, the distinction principle is the most important (Table 2). In the spatial autocorrelation literature, centralization variables (such as defining local Moran’s index), standardized variables (such as simplifying the calculation of global Moran index) and globally normalized variables (such as simplifying the calculation of Getis-Ord’s index) are used, respectively (Table 3). In the literature, when the spatial weight matrix is normalized by row, the concept of row standardization is adopted, but the calculation formula is not given [19]. This can easily lead to misunderstandings for beginners of spatial autocorrelation analysis.

Table 2. Comparison between Anselin’s symbol system and the symbol system in this paper.

Measure set	Anselin	This paper
Spatial proximity matrix (SPM)	--	U = {d_ij}
Spatial contiguity matrix (SCM): non-normalized SWM	W = {w_ij}	V = {v_ij}
Row-normalized spatial weight matrix (RSWM)	W = {w_ij}	--
Sum-normalized spatial weight matrix (SSWM)	--	W = {w_ij}
Row-normalized spatial weight matrix	W = {w_ij}	--
Sum of elements of spatial contiguity matrix	S ₀	V ₀
Sum of elements of spatial weight matrix	S ₀	W ₀
Size variable	--	x_i, x_j
Centralized variable	z_i, z_j	y_i, y_j
Standardized variable	--	z_i, z_j
Population variance	m ₂	σ ²
Sample variance	--	s ²
Global Moran’s I	I	I
Local Moran’s I	I _i	I _i
Global Geary’s I	c	C
Local Geary’s I	c _i	C _i

Open in a new tab

Note: In the context, the sum-normalized spatial weight matrix is also termed sum-based normalized spatial weight matrix or globally normalized spatial weight matrix by sum. Correspondingly, the row-normalized spatial weight matrix is also called row-based normalized spatial weight matrix or locally normalized spatial weight matrix by row.

Table 3. Value transformation methods, calculation formulas, and properties of converted variables.

Method	Calculation formula	Property
Centralization	y_i = x_i- $\bar{x}$	The mean value is 0
Standardization by z-score	z_i = (x_i- $\bar{x}$ )/σ, z_i* = (x_i- $\bar{x}$ )/s,	The mean value is 0 and the standard deviation is 1
Range normalization	x_i^(r) = (x_i-x_min)/(x_max-x_min)	The values range from 0 to 1
Global normalization	x_i^(t) = x_i/∑_ix_i, w_ij = v_ij /∑_i∑_jv_ij	The values come between 0 and 1 and the sum of the values equals 1

Open in a new tab

2.2.2 Definition of normalized local Moran’s index

Moran’s index is defined on the basis of population standard deviation rather than sample standard deviation. Accordingly, local Moran’s index should also be defined through population standard deviation. In light of Eq (7), canonical local Moran’s index can be defined as

I_{i} = \frac{I_{i}^{*}}{σ^{2} V_{0}} = \frac{1}{σ^{2}} y_{i} \sum_{j = 1}^{n} \frac{v_{i j}}{V_{0}} y_{j} = z_{i} \sum_{j = 1}^{n} w_{i j} z_{j} .

(33)

Further, according to Eq (7), the relation between global Moran’s index and the sum of local Moran’s indexes is

I = \sum_{i = 1}^{n} (\frac{I_{i}^{*}}{σ^{2} V_{0}}) = \sum_{i = 1}^{n} I_{i} .

(34)

According to Eq (33), the relation between Anselin’s first set of local Moran indexes and the local Moran’s indexes formula improved in this paper is

I_{i}^{*} = γ I_{i} = σ^{2} V_{0} I_{i} .

(35)

Thus, for the globally normalized spatial weight matrix W and the standardized variable based on population standard deviation z, we have σ² = 1, V₀ = 1. Thus, Eq (9) should be replaced by

γ_{0} = σ^{2} V_{0} = (\frac{1}{n} \sum_{i = 1}^{n} z_{i}^{2}) (\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j}) = 1 .

(36)

This suggests that, according to the second basic requirement for LISA from Anselin [19], the sum of normalized local Moran’s index equals the global Moran’s index.

2.2.3 Definition of normalized local Geary’s coefficient

Geary’s coefficient is defined on the basis of sample standard deviation rather than population standard deviation. Accordingly, local Geary’s coefficient should also be defined through sample standard deviation. The generalized Geary’s coefficient is another case [29]. In terms of Eq (26), global Geary’s coefficient can be expressed as

C = \frac{n - 1}{2 n V_{0} σ^{2}} \sum_{i = 1}^{n} C_{i}^{*} = \frac{1}{2 V_{0} s^{2}} \sum_{i = 1}^{n} C_{i}^{*} = \sum_{i = 1}^{n} (\frac{C_{i}^{*}}{2 V_{0} s^{2}}) = \sum_{i = 1}^{n} C_{i},

(37)

where s² = nσ²/(n-1) reflects the relationship between sample variance s² and population variance σ². Thus local Geary’s coefficient can be defined as

C_{i} = \frac{C_{i}^{*}}{2 V_{0} s^{2}} = \frac{1}{2 V_{0} s^{2}} \sum_{j = 1}^{n} v_{i j} {(y_{i} - y_{j})}^{2} = \frac{1}{2} \sum_{j = 1}^{n} w_{i j} {(z_{i}^{*} - z_{j}^{*})}^{2} .

(38)

Summing Eq (38) yields global Geary’s coefficient, that is, Eq (24). According to Eq (37), the relation between Anselin’s first set of Geary’s coefficient and the local Geary’s coefficient formula improved in this paper is

C_{i}^{*} = γ_{c} σ^{2} C_{i} = 2 s^{2} V_{0} C_{i} .

(39)

Thus, for the globally normalized spatial weight matrix W and the standardized vector based on sample standard deviation z*, we have s² = 1, V₀ = 1. Thus, according to Eq (26), the relation between proportionality coefficients is

γ_{c} σ^{2} = 2 s^{2} V_{0} = 2 .

(40)

Moran’s index and Geary’s coefficient reflect the same problem from different angles of view. It can be proved that the relationship between global Moran’s I and global Geary’s C is as follows

C = \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j} y_{i}^{2} - \sum_{i = 1}^{n} \sum_{j = 1}^{n} v_{i j} y_{i} y_{j}}{V_{0} \frac{1}{n - 1} \sum_{i = 1}^{n} y_{i}^{2}} = \frac{n - 1}{n} (o^{T} W z^{2} - z^{T} W z) = \frac{n - 1}{n} (o^{T} W z^{2} - I),

(41)

where z denotes standardized vector based on population standard deviation, z² = diag(zz^T) refers to a vector composed of the squares of the elements in z, o^T = [1 1 … 1] is a ones vector in which all the elements are 1. The symbol “T” indicates transposition, and the function "diag" represents taking the diagonal elements of a matrix to form a vector. If the mean of the global Moran’s index is treated as I₀ = 1/(1-n), the mean of global Geary’s coefficient, C₀, can be estimated by

C_{0} = \frac{n - 1}{n} (e^{T} W z^{2} - I_{0}) = \frac{n - 1}{n} (e^{T} W z^{2} - \frac{1}{1 - n}) = \frac{n - 1}{n} e^{T} W z^{2} + \frac{1}{n} .

(42)

Further, the relationship between local Moran’s indexes and local Geary’s coefficient can be derived. From Eq (38) it follows

C_{i} = \frac{1}{2 V_{0} \frac{n}{n - 1} σ^{2}} \sum_{j = 1}^{n} v_{i j} {(y_{i} - y_{j})}^{2} = \frac{n - 1}{2 n} \sum_{j = 1}^{n} w_{i j} {(z_{i} - z_{j})}^{2} .

(43)

Changing the form of Eq (43) yields

C_{i} = \frac{n - 1}{2 n} (\sum_{j = 1}^{n} w_{i j} (z_{i}^{2} + z_{j}^{2}) - 2 \sum_{j = 1}^{n} w_{i j} z_{i} z_{j}) = \frac{n - 1}{2 n} (\sum_{j = 1}^{n} w_{i j} (z_{i}^{2} + z_{j}^{2}) - 2 I_{i}) .

(44)

This means that there is a strict numerical conversion relationship between local Moran’s indexes and local Geary’s coefficient, although they describe the same problem from different angles. It can be seen that Eq (41) can be obtained by summing Eq (44).

In the new framework for LISA, the spatial weight matrix is normalized by sum. This is a type of global normalization in value transformation. There are several benefits to using a globally normalized weight matrix. We know that mathematics is a science relying highly on form in a sense. The same mathematical method often has vastly different effects when expressed in different forms. For spatial autocorrelation, using a normalized spatial weight matrix instead of a non-normalized weight matrix results in at least the following advantages. First, by normalized weight matrix, it is very convenient to calculate the global Moran’s index I and local Moran’s indexes I_i, and reflect the clear relationship between the two, I and I_i [29]. Second, normalizing weight matrix, we can obtain a standardized Moran’s scatterplot, where the slope of the trend line is exactly equal to the global Moran’s index value [32]. Third, based on normalized weight matrix, the structure of the parameters of the spatial autoregressive models can be clearly revealed using the spatial autocorrelation coefficients. Fourth, it makes the values of local Moran’s index and local Geary’s coefficient more intuitive. The fourth advantage mentioned above is more relevant to the research in this work. Many basic measures and models of spatial statistical analysis are rooted in conventional statistics and are created by analogy with time series analysis methods. The common measures and models of time series analysis, such as autocorrelation coefficients and autoregressive models, are also rooted in traditional statistical theories. The development of statistics took place in the wider context of the Victorian culture of measurement [31]. For simplicity’s sake, the numerous data of measurement results are usually condensed into an index [33]. In this case, an index is often treated as a characteristic measurement [6, 34]. A good index either has a pair of clear boundary values, a clear critical value, or even a combination of both. Based on standardized variable and globally normalized spatial weight matrix, the values of the local Moran’s indexes fall between -1 and 1, the corresponding critical value is 0; and the values of the local Geary’s coefficient falls between 0 and 2, and the corresponding critical value is 1.

3 Empirical analysis

3.1 Study area and data

The results of mathematical deduction ultimately need to be verified through mathematical reasoning and empirical analysis. After all, the success of sciences rests with their great emphasis on the role of quantifiable data and their interplay with models [35]. Taking cities in Beijing, Tianjin and Hebei (BTH) region as an example, we can make a concise calculation case study. This is a demonstrative case, not an explanatory case. In other words, this example is used to verify the reasoning results rather than to study the spatial structure and characteristics of BTH urban systems. The study area includes Beijing city, Tianjin city, and the main cities of Hebei Province. The study region is also termed Jing-Jin-Ji (JJJ) region in literature [36]. The cities are all of prefecture level and above, and the number of cities is n = 13. The size measurement is the city population of the fifth census in 2000 and the sixth census in 2010. Town population is not taken into account. At present, urban population has the definitions of regional total population, municipal population, city population and urban population consisting city population and town population. This case uses the city population, which can better reflect the characteristics of city size. City population size can be reflected by night light area in map [32, 36]. The population size was processed by centralization (y), population-based standardization (z) and sample-based standardization (z*) (Table 4). As for the spatial weight matrix, the basic data is derived from the traffic mileage between cities (Table 5). The spatial weight function adopts the special negative power law, the inverse proportion function, which is actually the intersection of power law and hyperbolic function. Thus, the spatial contiguity is defined as

v_{i j} = {\begin{cases} 1 / d_{i j}, i \neq j \\ 0, i = j \end{cases},

(45)

where d_ij denotes the distance by road between city i and city j. On this basis, the traffic mileage matrix (U) can be transformed into a spatial contiguity matrix (V), which can be changed to the global normalization weight matrix (W) and row normalization weight matrix (W*).

Table 4. Beijing-Tianjin-Hebei city population and its centralization and standardization results.

City	2000				2010
City	x	y	z	z*	x	y	z	z*
Beijing	949.6688	769.1377	2.9976	2.8800	1555.2378	1284.2528	2.9870	2.8698
Tianjin	531.3702	350.8391	1.3673	1.3137	885.6234	614.6384	1.4296	1.3735
Shijiazhuang	193.0579	12.5268	0.0488	0.0469	275.6871	4.7021	0.0109	0.0105
Tanshan	140.3887	-40.1424	-0.1564	-0.1503	163.7579	-107.2271	-0.2494	-0.2396
Qinhuangdao	70.7267	-109.8044	-0.4279	-0.4112	95.1872	-175.7978	-0.4089	-0.3928
Handan	107.1068	-73.4243	-0.2862	-0.2749	111.7417	-159.2433	-0.3704	-0.3558
Xingtai	53.6282	-126.9029	-0.4946	-0.4752	63.7797	-207.2053	-0.4819	-0.4630
Baoding	90.2496	-90.2815	-0.3519	-0.3381	98.0177	-172.9673	-0.4023	-0.3865
Zhangjiakou	79.6580	-100.8731	-0.3931	-0.3777	90.0218	-180.9632	-0.4209	-0.4044
Chengde	32.5821	-147.9490	-0.5766	-0.5540	49.8293	-221.1557	-0.5144	-0.4942
Cangzhou	44.3561	-136.1750	-0.5307	-0.5099	48.9701	-222.0149	-0.5164	-0.4961
Langfang	29.5879	-150.9432	-0.5883	-0.5652	46.6539	-224.3311	-0.5218	-0.5013
Hengshui	24.5229	-156.0082	-0.6080	-0.5842	38.2976	-232.6874	-0.5412	-0.5200
Mean	180.5311	0.0000	0.0000	0.0000	270.9850	0.0000	0.0000	0.0000
σ	256.5845	256.5845	1.0000	0.9608	429.9496	429.9496	1.0000	0.9608
s	267.0616	267.0616	1.0408	1.0000	447.5057	447.5057	1.0408	1.0000

Open in a new tab

Table 5. Spatial distance matrix (d_ij) of Beijing-Tianjin-Hebei cities based on traffic mileage.

City	Beijing	Tianjin	Shijiazhuang	Tanshan	Qinhuangdao	Handan	Xingtai	Baoding	Zhangjiakou	Chengde	Cangzhou	Langfang	Hengshui
Beijing	0	160.8855	321.7625	185.4770	288.9055	479.9810	430.2520	187.1300	198.1975	194.5940	233.4440	83.2755	299.7580
Tianjin	160.8855	0	344.5825	101.4105	242.6355	454.8400	425.3890	201.9420	332.9375	280.6470	138.6135	86.1555	259.8555
Shijiazhuang	321.7625	344.5825	0	423.7510	568.1560	167.2815	114.0840	138.9090	430.8215	506.6400	221.7565	283.2495	142.5935
Tanshan	185.4770	101.4105	423.7510	0	151.3880	547.4205	517.8910	289.5120	376.8000	185.3500	215.0285	144.6130	352.4360
Qinhuangdao	288.9055	242.6355	568.1560	151.3880	0	711.7120	662.2960	433.9170	481.3360	222.2030	375.5205	292.9180	508.4835
Handan	479.9810	454.8400	167.2815	547.4205	711.7120	0	53.4600	296.7465	606.6940	664.8585	335.0465	440.4685	214.2995
Xingtai	430.2520	425.3890	114.0840	517.8910	662.2960	53.4600	0	245.8830	557.3515	615.1295	299.4430	391.1260	167.0325
Baoding	187.1300	201.9420	138.9090	289.5120	433.9170	296.7465	245.8830	0	278.0950	372.0075	150.5130	147.8300	144.8405
Zhangjiakou	198.1975	332.9375	430.8215	376.8000	481.3360	606.6940	557.3515	278.0950	0	372.8730	411.7425	257.5700	455.2955
Chengde	194.5940	280.6470	506.6400	185.3500	222.2030	664.8585	615.1295	372.0075	372.8730	0	407.1040	259.8085	495.3555
Cangzhou	233.4440	138.6135	221.7565	215.0285	375.5205	335.0465	299.4430	150.5130	411.7425	407.1040	0	149.7245	140.0620
Langfang	83.2755	86.1555	283.2495	144.6130	292.9180	440.4685	391.1260	147.8300	257.5700	259.8085	149.7245	0	237.8790
Hengshui	299.7580	259.8555	142.5935	352.4360	508.4835	214.2995	167.0325	144.8405	455.2955	495.3555	140.0620	237.8790	0

Open in a new tab

3.2 Calculation results

For the data of two years and two statistics, i.e., local Moran index and local Geary coefficient, three sets of calculation results are given, respectively. The calculation process is simple, easy to understand, and the author’s calculations can be repeated by readers using Microsoft Excel (See S1 and S2 Datasets). For the local spatial statistics defined by Anselin [19], the first set of local Moran index is expressed as MI1, the second set of local Moran index as MI2; the first set of local Geary coefficients is expressed as GC1, and the second set of local Geary coefficients is written as GC2. Accordingly, the modified local Moran index and Geary coefficient are expressed as MI3 and GC3, respectively (Fig 1). The results are as follows. First, the ratio of MI1 to MI2 is not a constant, and the ratio of GC1 to GC2 is also not a constant. This proves that the two sets of local Moran indices and the two sets of local Geary coefficients of Anselin [19] are not equivalent to one another; Secondly, the ratio of MI1 to MI3 is a constant, and the ratio of GC1 to GC3 is also a constant. It is proved that the first set of local Moran index of Anselin [19] is equivalent to the modified local Moran index in this paper, and the first set of local Geary coefficient of Anselin [19] is also equivalent to the modified local Geary coefficient of this paper (Tables 6 and 7). The reason is that the first set of local Moran index and local Geary coefficient defined by Anselin [19] are based on symmetric spatial contiguity matrix. The modified statistics in this paper are based on the globally normalized spatial weight matrix which is symmetric, while the second set of local Moran index and local Geary coefficient defined by Anselin [19] are based on the locally normalized spatial weight matrix, in which the symmetry is broken.

Fig 1 — (**Note**: Moran’s index is taken as an example in this figure. By analogy, we can know the conversion process of the Geary’s coefficient. In fact, using Eqs (42) and (44), we can achieve the numerical conversion between Moran’s index and Geary’s coefficient readily).

Table 6. Comparison of three sets of local Moran index values in two years.

City	2000					2010
City	Local MI1	Local MI2	Local MI3	MI1/MI2	MI1/MI3	Local MI1	Local MI2	Local MI3	MI1/MI2	MI1/MI3
Beijing	-2686.4966	-0.7067	-0.0612	3801.3644	43916.8725	-7140.4536	-0.6690	-0.0579	10673.67042	123312.1000
Tianjin	-387.0133	-0.0951	-0.0088	4071.1117	43916.8725	-1175.2192	-0.1028	-0.0095	11431.08104	123312.1000
Shijiazhuang	-23.1481	-0.0068	-0.0005	3385.2705	43916.8725	-14.4935	-0.0015	-0.0001	9505.340198	123312.1000
Tanshan	-121.7919	-0.0343	-0.0028	3547.3310	43916.8725	-603.5770	-0.0606	-0.0049	9960.382257	123312.1000
Qinhuangdao	-142.9763	-0.0607	-0.0033	2356.2158	43916.8725	-379.2385	-0.0573	-0.0031	6615.906335	123312.1000
Handan	170.5561	0.0533	0.0039	3202.3026	43916.8725	594.8129	0.0662	0.0048	8991.593275	123312.1000
Xingtai	185.0124	0.0511	0.0042	3618.1153	43916.8725	637.3519	0.0627	0.0052	10159.13409	123312.1000
Baoding	-92.0058	-0.0244	-0.0021	3771.5181	43916.8725	-335.7750	-0.0317	-0.0027	10589.86662	123312.1000
Zhangjiakou	-231.9379	-0.1057	-0.0053	2194.2630	43916.8725	-708.7104	-0.1150	-0.0057	6161.166944	123312.1000
Chengde	-363.3994	-0.1476	-0.0083	2461.9446	43916.8725	-889.9662	-0.1287	-0.0072	6912.777246	123312.1000
Cangzhou	-194.7349	-0.0538	-0.0044	3620.4838	43916.8725	-561.9455	-0.0553	-0.0046	10165.78443	123312.1000
Langfang	-1369.3138	-0.3073	-0.0312	4455.7783	43916.8725	-3399.6518	-0.2717	-0.0276	12511.16811	123312.1000
Hengshui	27.8793	0.0081	0.0006	3431.1735	43916.8725	120.3620	0.0125	0.0010	9634.229089	123312.1000
Sum	-5229.3702	-1.4299	-0.1191	43916.8725	570919.3421	-13856.5039	-1.3523	-0.1124	123312.1000	1603057.3005
Expected	-5229.3702	-1.5480	-0.1191	43916.8725	570919.3421	-13856.5039	-1.4608	-0.1124	123312.1000	1603057.3005

Open in a new tab

Table 7. Comparison of three sets of local Geary coefficient values in two years.

City	2000					2010
City	Local GC1	Local GC2	Local GC3	GC1/GC2	GC1/GC3	Local GC1	Local GC2	Local GC3	GC1/GC2	GC1/GC3
Beijing	41036.8054	10.7953	0.4313	3801.3644	95153.2237	113754.5272	10.6575	0.4258	10673.6704	267176.2168
Tianjin	12819.0307	3.1488	0.1347	4071.1117	95153.2237	37929.2182	3.3181	0.1420	11431.0810	267176.2168
Shijiazhuang	2908.7705	0.8592	0.0306	3385.2705	95153.2237	8029.3420	0.8447	0.0301	9505.3402	267176.2168
Tanshan	5340.6947	1.5056	0.0561	3547.3310	95153.2237	15962.5572	1.6026	0.0597	9960.3823	267176.2168
Qinhuangdao	3628.6681	1.5400	0.0381	2356.2158	95153.2237	10073.4191	1.5226	0.0377	6615.9063	267176.2168
Handan	2044.0978	0.6383	0.0215	3202.3026	95153.2237	5920.6445	0.6585	0.0222	8991.5933	267176.2168
Xingtai	2655.7337	0.7340	0.0279	3618.1153	95153.2237	7227.0101	0.7114	0.0270	10159.1341	267176.2168
Baoding	5080.6946	1.3471	0.0534	3771.5181	95153.2237	14731.9805	1.3911	0.0551	10589.8666	267176.2168
Zhangjiakou	4499.9163	2.0508	0.0473	2194.2630	95153.2237	12851.4607	2.0859	0.0481	6161.1669	267176.2168
Chengde	5353.0964	2.1743	0.0563	2461.9446	95153.2237	14332.0819	2.0733	0.0536	6912.7772	267176.2168
Cangzhou	5400.0965	1.4915	0.0568	3620.4838	95153.2237	15101.1057	1.4855	0.0565	10165.7844	267176.2168
Langfang	13324.4547	2.9904	0.1400	4455.7783	95153.2237	35822.5797	2.8632	0.1341	12511.1681	267176.2168
Hengshui	4161.8231	1.2129	0.0437	3431.1735	95153.2237	10946.6401	1.1362	0.0410	9634.2291	267176.2168
Sum	108253.8824	30.4883	1.1377	43916.8725	1236991.9079	302682.5671	30.3506	1.1329	123312.1000	3473290.8178
Expected	108253.8824	32.0446	1.1377	43916.8725	1236991.9079	302682.5671	31.9099	1.1329	123312.1000	3473290.8178

Open in a new tab

Using the calculation results, we can verify two key equations. The relationship between the sum of the first set of local Moran indexes and the global Moran index satisfies Eq (8), and the relationship between the sum of the first set of local Geary coefficients and the global Geary coefficient satisfies Eq (26). However, the relationship between the sum of the second set of local Moran indexes and the global Moran index does no satisfy Eq (16), and the relationship between the sum of the second set of local Geary coefficients and the global Geary coefficient does not satisfy Eq (27). The sum of spatial contiguity matrices is V₀ = 0.6671. In 2000, the population variance of city population in Beijing-Tianjin-Hebei region is σ² = 65835.5974, thus γ = σ²V₀ = 43916.8725, the global Moran index is I = -0.1191, and the sum of the first set of local Moran indexes is ∑I_i^* = -5229.3702 = γI = 43916.8725*(-0.1191). On the other hand, n = 13, γ_c = 2nV₀/(n-1) = 1.4453, and the global Geary coefficient is C = 1.1377, so the sum of the first set of local Geary coefficients is ∑C_i^* = 108253.8824 = γ_cσ²C = 1.4453*65835.5974*1.1377. However, the sum of the second set of local Moran indices is ∑I_i^** = -1.4299, while n*I = 13*(-0.1191) = -1.5480. The two values are not equal to one another (-1.4299≠-1.5480). The sum of the second set of local Geary coefficients is ∑C_i^** = 30.4883, and 2n²*C/(n-1) = 28.1667*1.1377 = 32.0446. The two values are not equal to one another (30.4883≠32.0446). These results indicate that, based on the conventional formula for the second sets of LISA, Anselin’s [19] second basic requirement cannot be met. The sum of the third set of local Moran index is equal to the global Moran index, the ratio of the first set of local Moran indexes to the corresponding third set of local Moran indexes is γ = σ²V₀ = 43916.8725, which is a constant; the sum of the third set of local Geary coefficients equals the global Geary coefficient, and the ratio of the first set of local Geary coefficients to the corresponding third set of local Geary coefficient is γ_cσ² = 1.4453* 65835.5974 = 95153.2237 is a constant (Tables 6 and 7). This suggests that, based on improved formulae, Anselin’s [19] second basic requirement can be met by the calculation results.

The calculation result of one year may be regarded as an isolated case, so we might as well take a look at the situation in 2010. Based on the 6^th census data, the population variance of Beijing-Tianjin-Hebei city population is σ² = 184856.6464, thus γ = σ²V₀ = 123312.1000, the global Moran index is I = -0.1124, and the sum of the first set of local Moran indexes is ∑I_i^* = -13856.5039 = γI = 123312.1000*(-0.1124). On the other hand, γ_c = 1.4453, and the global Geary coefficient is C = 1.1329, so the sum of the first set of local Geary coefficients is ∑C_i^* = 302682.5671 = γ_cσ²C = 1.4453*184856.6464*1.1329. However, the sum of the second set of local Moran indices is ∑I_i^** = -1.3523, while n*I = 13*(-0.1124) = -1.4608 (Fig 2(A)). The two numbers are not equal to each other (-1.3523≠-1.4608). The sum of the second set of local Geary coefficients is ∑C_i^** = 30.3506, and 2n²*C/(n-1) = 28.1667*1.1329 = 31.9099. The two numbers are not equal to each other (30.3506≠31.9099). These results once again indicate that Anselin’s [19] second basic requirement cannot be satisfied through common formula. The sum of the third set of local Moran index is equal to the global Moran index, the ratio of the first set of local Moran indexes to the corresponding numbers in the third set of local Moran index is γ = σ²V₀ = 123312.1000 (Fig 2(B)); the sum of the third set of local Geary coefficients equals the global Geary coefficient, and the ratio of the first set of local Geary coefficient to the corresponding third set of local Geary coefficient is γ_cσ² = 1.4453* 184856.6464 = 267176.2168 is a constant (Tables 6 and 7). This suggests that, based on new formulae, Anselin’s [19] second basic requirement can be satisfied once again by the calculation results. It can be seen that the calculation results of the two years fully support the previous theoretical inferences and related judgments.

Fig 2 — (a) MI2 vs MI1 (high correlation). (b) 2MI3 vs MI1 (perfect fit) (Note: The second set of local Moran’s indexes (MI2) are highly correlated with the first local Moran’s indexes (MI1), but not equivalent to one another. The third set of local Moran’s indexes (MI3) is equivalent to the first set of local Moran’s indexes (MI1). The coefficient 1/γ = 1/123312.1000 = 0.000008110. MI2 does not satisfy the second requirement for LISAs given by Anselin [19]).

4 Questions and discussion

The re-expressed local Moran indexes and the local Geary coefficients in this work are derived from Anselin’s correct definition and relationship, without substantial innovation. The contribution of this study lies in three aspects. First, it clarifies a series of logical misunderstandings of local spatial autocorrelation statistics and gives the correct expressions. Second, it normalizes the local spatial autocorrelation statistics, and the canonical results are helpful for more convenient application. Third, it clarifies a number of fundamental concepts related to spatial autocorrelation that have long been confused in literature. In terms of the tradition of statistics, important concepts and their symbols have been distinguished. Especially, it emphasizes the distance axiom hidden behind the spatial weight matrix. If the spatial contiguity matrix is normalized by row, the locally normalized spatial weight matrix will bear a different mathematical structure from the non-normalized spatial weight matrix and the globally normalized spatial weight matrix by sum. Applying the results derived from the models based on non-normalized spatial weight matrix to the relation formulae based on row-normalized spatial weight matrix results in wrong mathematical expressions. Generally speaking, spatial contiguity matrix is of symmetry. Therefore, non-normalized spatial weight matrix and globally normalized spatial weight matrix are symmetric. Substitution of symmetric spatial weight matrix with asymmetric spatial weight matrix leads to two wrong relations: First, the sum of local Moran index based on standardized variable and local normalized weight matrix is equal to n times of global Moran index; Second, the sum of local Geary coefficients based on standardized variable and local normalized weight matrix is equal to 2n²/(n-1) times of global Geary coefficient. In fact, the two relations can never be derived from Anselin’s original assumptions.

The errors based on the wrong relations are not too significant in many cases, but the results have a far-reaching impact on geographical analysis. Concretely speaking, these incorrect relationships lead to a series of problems (Table 8): (1) The relationship between the definitions of two local Moran indexes is broken (not equivalent to each other). The first set of local LISA is based on symmetric spatial adjacency matrix, and the second set is based on asymmetric spatial weight matrix normalized by row. As a result, the ratio of the values of the two sets of parameters is not a constant. (2) When defining the local spatial autocorrelation index, we only consider the relationship between one element and other elements. The pairwise correlation between all elements is ignored. That is, for the local index of the ith geographical element, only the relationships between element i and element j are taken into account, the relationships between element j and element k are neglected (i, j, k = 1,2,3,…,n). In this case, the wholeness of a geographical system is overlooked in the local spatial analysis. (3) The absolute value of the local Moran index may exceed 1, thus decoupling from the concept of correlation coefficient. Moran’s index was proposed by analogy with Pearson correlation. The values of Moran’s index comes between -1 and 1. (4) The parameters are lack of clear boundary value and critical value. The absolute boundary values of Moran index is -1 and 1. The critical value is 0 in theory and 1/(1-n) in experience. The boundary values of the Geary coefficient are 0 and 2, and the critical value is theoretically 1. In addition, Anselin [19] used the population standard deviation to replace the sample standard deviation when defining the local Geary coefficient. Where logic is concerned, no problem; while where history is concerned, there is problem: the result violates the original intention of the definition of Geary coefficient. In spatial analysis, it is sometimes difficult to distinguish between spatial samples and spatial populations. Moran’s index, which is derived from Pearson correlation coefficient, as indicated above, is a statistics based on population standard deviation. Geary’s coefficient is defined by analogy with Durbin-Watson statistics based on sample standard deviation in order to make up for the deficiency of Moran’s index. To define the local Geary coefficient, we should respect the original meaning of the definition of the Geary coefficient, so that the local Geary coefficient can be effectively associated with the global Geary coefficient. From the existing literature, some readers have found Anselin’s mistakes. Some scholars adopt a compromise approach. For example, they use the global normalized spatial weight matrix instead of the local normalized spatial weight matrix by row, but multiply n in front of the corrected local Moran index calculation formula—I found this kind of treatment in some teaching courseware. This ensures that the sum of local Moran indexes is equal to n times the global Moran index.

Table 8. Functions and problems of Anselin’s LISA and the improved effect of this paper.

Definer	Variable	Statistic	Function	Advantages and disadvantages
Anselin	Central variable and non-normalized symmetric contiguity matrix	First local Moran’s I	Reflect local spatial dependence	Simple but lack of clear boundary value and critical value (reference value)
		First local Geary’s C	Reflect local spatial dependence	Simple but lack of clear boundary value and critical value (reference value)
	Standard variable and row-normalized asymmetric weight matrix	Second Moran’s I	Reflect local spatial dependence from the perspective of population	Decoupled from the first definition of local Moran’s I; Decoupling from correlation coefficient; The relationships between two elements in the system is ignored
		Second Geary’s C	Reflect local spatial dependence from the perspective of population	Decoupled from the first definition of local Geary’s C; Decoupling from the analogy with the Durbin-Watson statistic; The relationships between two elements in the system is ignored; sample standard deviation is replaced by population standard deviation
This paper	Standardized variable and global normalized symmetric weight matrix	Third Moran’s I	Reflect local spatial dependence from the perspective of population	Equivalent to the first definition of local Moran’s I; Linked to correlation coefficient; The spatial relationship of other elements other than the target geographical elements is considered; There are clear boundary values and critical values
This paper		Third Geary’s C	Reflect local spatial dependence from the perspective of samples	Equivalent to the first definition of local Geary’s C; Linked to generalized Durbin-Watson statistics; The spatial relationship of other elements other than the target geographical elements is considered; Return to the sample analysis perspective of global Geary coefficient; There are clear boundary values and critical values

Open in a new tab

As we know, Anselin is a well-known outstanding scholar in the field of geographical spatial analysis. Due to the far-reaching influence of Anselin’s work, its logical errors caused confusion in its application and interpretation. Science respects logic and facts, not authority—only pseudoscience starts from authoritative judgment. In order to solve the above problems, this paper carries out the following processing in the process of mathematical deduction: First, return to the essence of the spatial distance matrix behind the spatial weight matrix, and respect the basic distance axiom. The global spatial weight matrix is obtained by global normalization of spatial contiguity matrix. The globally normalized spatial weight matrix is used to replace Anselin’s row-normalized weight matrix. In this way, the connotation of the concept before and after is unified and the logic is consistent, so as to avoid reasoning mistakes. Second, start from the original idea of Moran’s index and Geary’s coefficient. The normalized local Moran’s index is defined, and the population standard deviation is used to standardize the size variable; the normalized local Geary’s coefficient is defined, and the sample standard deviation is used to standardize the size variable. Third, start from the original intention of Anselin [19]. Anselin gives two sets of local Moran’s index and local Geary’s coefficient. But there is inconsistency between them. By examining the reasoning process, we can find that the reason for the error lies in the logic error caused by the unintentional concept replacement. According to the sign system and simplification principle of this paper, we transform Anselin’s second set of local Moran index and local Geary coefficient formulae. Comparing the two sets of results, we can see the problems and thus understand the similarities and differences between the two sets of formulae (Tables 8 and 9).

Table 9. Comparison of between normalized LISA and the equivalent transformation results of Anselin’s second set of LISA definitions.

Category	Measure	Definition in this paper	Anselin’s definition
Moran’s I	Global Moran’s I	$I = \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} z_{i} z_{j} = z^{T} W z$	$I = \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} z_{i} z_{j}$
	Local Moran’s I	$I_{i} = z_{i} \sum_{j = 1}^{n} w_{i j} z_{j}$	$I_{i} = \frac{z_{i}}{V_{i}} \sum_{j = 1}^{n} v_{i j} z_{j}$
	Sum of local Moran’s I	$\sum_{i = 1}^{n} I_{i} = I$	$\sum_{i = 1}^{n} I_{i} \approx n I$
Geary’s C	Global Geary’s C	$C = \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} {(z_{i}^{} - z_{j}^{})}^{2}$	$C = \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} {(z_{i}^{} - z_{j}^{})}^{2}$
	Local Geary’s C	$\begin{array}{l} C_{i} = \frac{1}{2} \sum_{j = 1}^{n} w_{i j} {(z_{i}^{} - z_{j}^{})}^{2} \\ = \frac{n - 1}{2 n} (\sum_{j = 1}^{n} w_{i j} (z_{i}^{2} + z_{j}^{2}) - 21 \end{array}$	$C_{i} = \frac{1}{V_{i}} \sum_{j = 1}^{n} v_{i j} (z_{i} - z_{j})$
	Sum of Local Geary’s C	$\sum_{i = 1}^{n} C_{i} = C = \frac{n - 1}{n} (e^{T} W z^{2} - I)$	$\sum_{i = 1}^{n} C_{i} \approx \frac{2 n^{2}}{n - 1} C$

Open in a new tab

Note: For comparison, Anselin’s definitions are transformed and re-expressed with new symbols. However, the new expressions are completely equivalent to Anselin’s original expressions.

Finally, it is appropriate to briefly discuss the definition of spatial weight matrix. Spatial autocorrelation analysis depends on spatial contiguity matrix, which has multiple definitions. In fact, definition of spatial contiguity involves different spatial effects. Spatial effects of geographical processes fall into two categories: action at a distance and local action [37]. Local action can be expressed with step function in mathematics and nominal variable in value. In spatial autocorrelation analysis, the spatial contiguity matrix based on local action is mainly applicable to relationships between regions. The spatial contiguity relationship of regions bears three ways of definitions, that is, Rook’s contiguity, Bishop’s contiguity, and Queen’s contiguity [38]. Rook’s contiguity plus Bishop’s contiguity yields Queen’s contiguity. In fact, Rook’s contiguity corresponds to von Neumann’s neighborhood definition, while Queen’s contiguity corresponds to Moore’s neighborhood definition [39]. Action at a distance can be reflected by certain distance, including Euclidean distance, travel time, transportation mileage and so on. When converting distances into spatial contiguity matrix, a certain spatial contiguity function needs to be adopted. Common spatial contiguity functions include absolute step function, relative step function, exponential function, and distance inverse function (a type of hyperbolic function) [6, 12, 27]. Distance-based spatial contiguity matrix is suitable for networks of locations such as urban systems. In this case, based on the step function, spatial contiguity is represented by nominal variable (dummy variable in discrete format); based on other functions, the spatial contiguity is represented by metric variable (continuous variable). Although the function expressions are different, the logic behind them is consistent with one another. Mathematics is the pinnacle of logic. In mathematics, the most basic function is exponential function. Various forms of simple functions can be reduced to exponential function. The step function is an extreme form of an exponential function, and moving average on the step function can yield an inverse distance function [40]. So, using different functions to define spatial contiguity matrices will definitely affect the calculation results, but it has no impact on the mathematical reasoning results and the logical relationships behind them. The reason why row normalization weight matrix affects mathematical reasoning results is because the logic behind the spatial weight matrix has been changed, and the logic is regulated by the distance axiom. Scientific research typically involves three worlds: the real world, the mathematical world, and the computational world [41]. The process of mathematical transformation and derivation belongs to the mathematical world, while the selection of spatial weight matrix forms belongs to the computational world. The key is to choose the appropriate spatial contiguity matrix definition method for different geographic systems based on different situations [27]. One obvious drawback of this study is the lack of empirical analysis based on different types of spatial weight matrices. Therefore, the influence of types and structure of spatial contiguity matrixes on theoretical modelling and computational results of spatial autocorrelation appears hollow.

5 Conclusions

The global spatial autocorrelation coefficients reflect the sum of any two geographical elements in a region, while the local spatial autocorrelation indexes reflect the sum of correlation between a geographical element and all other geographical elements. The sum of parts is proportional to the whole. The first set of local Moran indexes and Geary coefficients defined by Anselin [19] is effective and consistent with the idea of global Moran index and Geary coefficient. However, the second set of local Moran indexes and local Geary coefficients defined by him are not equivalent to the first set of parameters. The non-normalized spatial weight matrix is isomorphic to the sum-based normalized spatial weight matrix, but not isomorphic to the row-based normalized spatial weight matrix. The derived results based on non-normalized spatial weight matrix cannot be directly applied to the mathematical relations based on row-normalized spatial weight matrix. The key issue rests that Anselin [19] directly applied the derived results based on the non-normalized spatial weight matrix to the relationship formula based on the row-normalized spatial weight matrix. This paper is devoted to correcting the unintentional mistakes in his reasoning process and gives the third set of definitions of local Moran indexes and local Geary coefficient in canonical forms. The newly-defined local Moran index and local Geary coefficient are simple and concise. The improved expressions are consistent with the original intention of Anselin [19] and the statistical essence of global Moran index and global Geary coefficient.

Local spatial autocorrelation analysis is a methodology developed on the basis of global spatial autocorrelation analysis. The progress of science has no end. The main points of this paper are summarized as follows. Firstly, the LISA defined in literature is of great significance for analysis of local spatial autocorrelation, but there are also some faults. The first set of LISA is based on the definition of centralized variables and non-normalized spatial contiguity matrix, lacking clear boundary values and critical value. The second set of local LISA is based on the definitions of standardized variables and row-normalized spatial weight matrix, which ignores the global relationship behind the local analysis. One of the results is that the two sets of indexes are not equivalent to one another. In addition, the population standard deviation is adopted when defining the second local Geary coefficients, which violates the original intention of Geary coefficient. All the indexes lack clear boundary values and critical value, and they are uncoupled from the correlation coefficient. One consequence is that the analysis process is complex; the other is that the conclusions drawn from the two sets of indexes are often inconsistent with each other. Secondly, the LISA expression is reconstructed by using the sum-normalized spatial weight matrix and standardized size variables based on z-score to eliminate the defects of Anselin’s LISA definition. By doing so, we have canonical spatial autocorrelation measurements. The sum-based globally normalized spatial weight matrix is used to replace the row-based locally normalized spatial weight matrix. The population standard deviation is used to standardize the variables when defining the local Moran indexes, and the sample standard deviation is used to standardize the variables when defining the local Geary coefficient. The local LISA problem of Anselin [19] can be solved effectively and the results are more concise and simpler. The results given in this paper are equivalent to those given by Anselin’s first set of formulas, i.e. first sets of local Moran index and local Geary coefficient, but they are not linearly proportional to the results of the second set of formulas, namely the second sets of local Moran index and local Geary coefficient.

Supporting information

S1 File. Anselin’s derivation and expressions for LISA.

This is a microcosm of Anselin’s paper on LISA. The key parts of Anselin’s mathematical reasoning are extracted, and the main errors in the reasoning process are revealed. This file uses Anselin’s original symbol system. Through this file, readers can more easily grasp the essence of the problem.

(DOCX)

pone.0303456.s001.docx^{(106.4KB, docx)}

S2 File. Value transformation methods and formulae.

This file show common concepts and methods of value transformation and corresponding formulae for variable standardization. This document clarifies some confusion and inappropriate expressions regarding variable standardization in the literature.

(DOCX)

pone.0303456.s002.docx^{(37.6KB, docx)}

S1 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2000.

This file includes the dataset of spatial distances and city population in 2000, global Moran’s indexes and Geary’s coefficients, three sets of local Moran’s index, and three sets of local Geary’s coefficients. The original data and calculation process are displayed for readers.

(XLSX)

pone.0303456.s003.xlsx^{(89.8KB, xlsx)}

S2 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2010.

This file includes the dataset of spatial distances and city population in 2010, global Moran’s indexes and Geary’s coefficients, three sets of local Moran’s index, and three sets of local Geary’s coefficients. All the results are tabulated for comparison and references.

(XLSX)

pone.0303456.s004.xlsx^{(86.6KB, xlsx)}

Acknowledgments

My student, Dr. Yuqing Long, has extracted spatial distance matrix data from the Beijing Tianjin Hebei urban network map for me, and I would like to express my gratitude. I would like to thank the anonymous reviewer and Dr. Yuxia Wang whose interesting and constructive comments were very helpful in improving the quality of this paper. The academic editor, Dr. Yuxia Wang, put in tremendous effort to invite reviewers for this paper, and I am particularly grateful for it.

Data Availability

The data underlying the results presented in the study are available from the supporting information files.

Funding Statement

The project is funded by the National Natural Science Foundation of China (42171192). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Hartshorne R. Perspective on the Nature of Geography. Chicago: Rand McNally & Company; 1959. [Google Scholar]
2.Hu ZL, Chen YG, Liu T. Three laws of the changes in economic geography. Economic Geography. 2018; 38(10): 1–4 [In Chinese]. [Google Scholar]
3.Martin GJ. All Possible Worlds: A History of Geographical Ideas (4th Revised Edition). New York, NY: Oxford University Press; 2005. [Google Scholar]
4.Schaefer FK. Exceptionalism in geography: a methodological examination. Annals of the Association of American Geographers. 1953; 43: 226–249. [Google Scholar]
5.Griffith DA. Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization. Berlin: Springer; 2003. [Google Scholar]
6.Haggett P, Cliff AD, Frey A. Locational Analysis in Human Geography. London: Edward Arnold Ltd.; 1977. [Google Scholar]
7.Geary RC. The contiguity ratio and statistical mapping. The Incorporated Statistician. 1954; 5: 115–145. [Google Scholar]
8.Moran PAP. The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B. 1948; 37(2): 243–251. [Google Scholar]
9.Moran PAP. Notes on continuous stochastic phenomena. Biometrika. 1950; 37: 17–33. [PubMed] [Google Scholar]
10.Cliff AD, Ord JK. Spatial Autocorrelation. London: Pion Limited; 1973. [Google Scholar]
11.Cliff AD, Ord JK. Spatial Processes: Models and Applications. London: Pion Limited; 1981. [Google Scholar]
12.Odland J. Spatial Autocorrelation. London: SAGE Publications; 1988. [Google Scholar]
13.Anselin L. The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Fischer M, Scholten HJ, Unwin D (eds.). Spatial Analytical Perspectives on GIS. London: Taylor & Francis; 1996. pp.111–125. [Google Scholar]
14.Tobler W. A computer movie simulating urban growth in the Detroit region. Economic Geography. 1970; 46(2): 234–240. [Google Scholar]
15.Tobler W. On the first law of geography: A reply. Annals of the Association of American Geographers. 2004; 94(2): 304–310. [Google Scholar]
16.Fotheringham AS. Trends in quantitative methods I: Stressing the Local. Progress in Human Geography. 1997; 21: 88–96. [Google Scholar]
17.Fotheringham AS. Trends in quantitative method Ⅱ: Stressing the computational. Progress in Human Geography. 1998; 22: 283–292. [Google Scholar]
18.Fotheringham AS. Trends in quantitative methods III: Stressing the visual. Progress in Human Geography. 1999; 23(4): 597–606. [Google Scholar]
19.Anselin L. Local indicators of spatial association—LISA. Geographical Analysis. 1995; 27(2): 93–115. [Google Scholar]
20.Getis A, Aldstadt J. Constructing the spatial weights matrix using a local statistic. Geographical Analysis. 2004; 36 (2): 90–104. [Google Scholar]
21.Getis A, Ord JK. An analysis of spatial association by use of distance statistic. Geographical Analysis. 1992; 24(3):189–206. [Google Scholar]
22.Ord JK, Getis A. Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis. 1995; 27(4): 286–306. [Google Scholar]
23.Goodchild MF. GIScience, geography, form, and process. Annals of the Association of American Geographers. 2004; 94(4): 709–714. [Google Scholar]
24.de Jong P, Sprenger C, van Veen F. on extreme values of Moran’s I and Geary’s C. Geographical Analysis. 1984; 16(1): 985–999. [Google Scholar]
25.Tiefelsdorf M, Boots B. The exact distribution of Moran’s I. Environment and Planning A. 1995; 27(6): 985–999. [Google Scholar]
26.Xu F. Improving spatial autocorrelation statistics based on Moran’s index and spectral graph theory. Urban Development Studies. 2021; 28(12): 94–103 [In Chinese]. [Google Scholar]
27.Chen YG. On the four types of weight functions for spatial contiguity matrix. Letters in Spatial and Resource Sciences. 2012; 5(2): 65–72. [Google Scholar]
28.Getis A. Spatial weights matrices. Geographical Analysis. 2009; 41(4): 404–410. [Google Scholar]
29.Chen YG. New approaches for calculating Moran’s index of spatial autocorrelation. PLoS ONE. 2013; 8(7): e68336. doi: 10.1371/journal.pone.0068336 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Chen YG. Spatial autocorrelation approaches to testing residuals from least squares regression. PLoS ONE. 2016; 11(1): e0146865. doi: 10.1371/journal.pone.0146865 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Magnello E, van Loon B. Introducing Statistic: A Graphic Guide. London: Icon Books; 2009. [Google Scholar]
32.Chen YG. Spatial autocorrelation equation based on Moran’s index. Scientific Reports. 2023; 13: 19296. doi: 10.1038/s41598-023-45947-x [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Wheelan C. Naked Statistics: Stripping the Dread from the Data. New York and London: W. W. Norton & Company; 2013. [Google Scholar]
34.Taylor PJ. Quantitative Methods in Geography. Prospect Heights, Illinois: Waveland Press; 1983. [Google Scholar]
35.Louf R, Barthelemy M. Scaling: lost in the smog. Environment and Planning B: Planning and Design. 2014; 41: 767–769. [Google Scholar]
36.Long YQ, Chen YG. Multi-scaling allometric analysis of the Beijing-Tianjin-Hebei urban system based on nighttime light data. Progress in Geography. 2019; 38(1): 88–100 [In Chinese]. [Google Scholar]
37.Chen YG, Li YJ, Feng S, Man XM, Long YQ. Gravitational scaling analysis on spatial diffusion of COVID-19 in Hubei province, China. PLoS ONE. 2021; 16(6): e0252889. doi: 10.1371/journal.pone.0252889 [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Widip CA, Utomo WH, Yulianto SJP. Identification of spatial patterns of food insecurity regions using Moran’s I (Case study: Boyolali regency). International Journal of Computer Applications. 2013; 72(2): 54–62. [Google Scholar]
39.Batty M, Couclelis H, Eichen M. Urban systems as cellular automata. Environment and Planning B: Planning and Design. 1997; 24: 159–164. [Google Scholar]
40.Chen YG. Power-law distributions based on exponential distributions: Latent scaling, spurious Zipf’s law, and fractal rabbits. Fractals. 2015; 23(2): 1550009. [Google Scholar]
41.Casti JL. Would-Be Worlds: How Simulation Is Changing the Frontiers of Science. New York: John Wiley and Sons; 1996. [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0303456.r001

Decision Letter 0

Yuxia Wang

21 Feb 2024

PONE-D-23-35394Reconstruction and Normalization of LISA for Spatial AnalysisPLOS ONE

Dear Dr. Chen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Dear authors, I continued inviting around 30 reviewers but only received one comments. To ensure a timely review, I served as another reviewer. Please the suggestions and comments.

Please submit your revised manuscript by Apr 06 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Yuxia Wang

Academic Editor

PLOS ONE

Journal requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

2. Note from Emily Chenette, Editor in Chief of PLOS ONE, and Iain Hrynaszkiewicz, Director of Open Research Solutions at PLOS: Did you know that depositing data in a repository is associated with up to a 25% citation advantage (https://doi.org/10.1371/journal.pone.0230416)? If you’ve not already done so, consider depositing your raw data in a repository to ensure your work is read, appreciated and cited by the largest possible audience. You’ll also earn an Accessible Data icon on your published paper if you deposit your data in any participating repository (https://plos.org/open-science/open-data/#accessible-data).

3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript:

[This research was sponsored by the National Natural Science Foundation of China (Grant No. 42171192). The support is gratefully acknowledged.]

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

[The author(s) received no specific funding for this work.]

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

5. We note that Figure 1 in your submission contain [map/satellite] images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Figure 1 to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an ""Other"" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

6. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

The authors conduct a series of rigorous mathematical reasoning of LISA showing that using row-normalized spatial weight matrix would violate the second basic requirement for LISA. As stated by the authors, this is not substantial innovation, but it is helpful in figuring the logic of local spatial autocorrelation statistics. I have some minor comments.

1. Page 4. The spatial contiguity matrix V is not explained in detail. There are many definitions of spatial contiguity matrix, and would Rook, Queen, or distance-based matrix have any difference on the calculation of LISA?

2. Page 4. Is it necessary to stress that i≠j in Equation (1)?

3. Page 4 to Page 5. It might be that I misunderstand something. The z-score normalization is only divided by σ, what is the meaning of σ. It seems that σ is calculate by x. But if we treat x_i-x ® as a whole, the normalization should be based on the σ of x_i-x ®. Compared with equation 1, dividing only by σ. could not be called z-score normalization.

4. Page 6 Table 1. What is the benefit of using the normalized weight matrix instead of the original one?

5. Page 2 In the first paragraph of introduction, “Gravity models, spatial interaction models, and spatial autocorrelation analysis are the main approaches…”, parallel relationship might not be appropriate for gravity model and spatial interaction model since the former is one type of the latter.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The article is an interesting work. Provide clarifications on the local spatial association indicator (LISA) and the local Geary indicator, widely used in the literature. This article aims to reconstruct the calculation formulas of local Moran indices and Geary coefficients through mathematics, presenting corrections or modifications to these indicators. Finally, it presents an application to real data.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-23-35394 End.pdf

pone.0303456.s005.pdf^{(1.2MB, pdf)}

PLoS One. 2024 May 22;19(5):e0303456. doi: 10.1371/journal.pone.0303456.r002

Author response to Decision Letter 0

11 Apr 2024

Please see the attached file entitled "Response to Reviewers"

Attachment

Submitted filename: Response to Academic Editor and Reviewer 2024-03-15.docx

pone.0303456.s006.docx^{(112.2KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0303456.r003

Decision Letter 1

Yuxia Wang

25 Apr 2024

Reconstruction and Normalization of LISA for Spatial Analysis

PONE-D-23-35394R1

Dear Dr. Chen,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Yuxia Wang

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0303456.r004

Acceptance letter

Yuxia Wang

10 May 2024

PONE-D-23-35394R1

PLOS ONE

Dear Dr. Chen,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Yuxia Wang

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. Anselin’s derivation and expressions for LISA.

(DOCX)

pone.0303456.s001.docx^{(106.4KB, docx)}

S2 File. Value transformation methods and formulae.

(DOCX)

pone.0303456.s002.docx^{(37.6KB, docx)}

S1 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2000.

(XLSX)

pone.0303456.s003.xlsx^{(89.8KB, xlsx)}

S2 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2010.

(XLSX)

pone.0303456.s004.xlsx^{(86.6KB, xlsx)}

Attachment

Submitted filename: PONE-D-23-35394 End.pdf

pone.0303456.s005.pdf^{(1.2MB, pdf)}

Attachment

Submitted filename: Response to Academic Editor and Reviewer 2024-03-15.docx

pone.0303456.s006.docx^{(112.2KB, docx)}

Data Availability Statement

The data underlying the results presented in the study are available from the supporting information files.

[pone.0303456.ref001] 1.Hartshorne R. Perspective on the Nature of Geography. Chicago: Rand McNally & Company; 1959. [Google Scholar]

[pone.0303456.ref002] 2.Hu ZL, Chen YG, Liu T. Three laws of the changes in economic geography. Economic Geography. 2018; 38(10): 1–4 [In Chinese]. [Google Scholar]

[pone.0303456.ref003] 3.Martin GJ. All Possible Worlds: A History of Geographical Ideas (4th Revised Edition). New York, NY: Oxford University Press; 2005. [Google Scholar]

[pone.0303456.ref004] 4.Schaefer FK. Exceptionalism in geography: a methodological examination. Annals of the Association of American Geographers. 1953; 43: 226–249. [Google Scholar]

[pone.0303456.ref005] 5.Griffith DA. Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization. Berlin: Springer; 2003. [Google Scholar]

[pone.0303456.ref006] 6.Haggett P, Cliff AD, Frey A. Locational Analysis in Human Geography. London: Edward Arnold Ltd.; 1977. [Google Scholar]

[pone.0303456.ref007] 7.Geary RC. The contiguity ratio and statistical mapping. The Incorporated Statistician. 1954; 5: 115–145. [Google Scholar]

[pone.0303456.ref008] 8.Moran PAP. The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B. 1948; 37(2): 243–251. [Google Scholar]

[pone.0303456.ref009] 9.Moran PAP. Notes on continuous stochastic phenomena. Biometrika. 1950; 37: 17–33. [PubMed] [Google Scholar]

[pone.0303456.ref010] 10.Cliff AD, Ord JK. Spatial Autocorrelation. London: Pion Limited; 1973. [Google Scholar]

[pone.0303456.ref011] 11.Cliff AD, Ord JK. Spatial Processes: Models and Applications. London: Pion Limited; 1981. [Google Scholar]

[pone.0303456.ref012] 12.Odland J. Spatial Autocorrelation. London: SAGE Publications; 1988. [Google Scholar]

[pone.0303456.ref013] 13.Anselin L. The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Fischer M, Scholten HJ, Unwin D (eds.). Spatial Analytical Perspectives on GIS. London: Taylor & Francis; 1996. pp.111–125. [Google Scholar]

[pone.0303456.ref014] 14.Tobler W. A computer movie simulating urban growth in the Detroit region. Economic Geography. 1970; 46(2): 234–240. [Google Scholar]

[pone.0303456.ref015] 15.Tobler W. On the first law of geography: A reply. Annals of the Association of American Geographers. 2004; 94(2): 304–310. [Google Scholar]

[pone.0303456.ref016] 16.Fotheringham AS. Trends in quantitative methods I: Stressing the Local. Progress in Human Geography. 1997; 21: 88–96. [Google Scholar]

[pone.0303456.ref017] 17.Fotheringham AS. Trends in quantitative method Ⅱ: Stressing the computational. Progress in Human Geography. 1998; 22: 283–292. [Google Scholar]

[pone.0303456.ref018] 18.Fotheringham AS. Trends in quantitative methods III: Stressing the visual. Progress in Human Geography. 1999; 23(4): 597–606. [Google Scholar]

[pone.0303456.ref019] 19.Anselin L. Local indicators of spatial association—LISA. Geographical Analysis. 1995; 27(2): 93–115. [Google Scholar]

[pone.0303456.ref020] 20.Getis A, Aldstadt J. Constructing the spatial weights matrix using a local statistic. Geographical Analysis. 2004; 36 (2): 90–104. [Google Scholar]

[pone.0303456.ref021] 21.Getis A, Ord JK. An analysis of spatial association by use of distance statistic. Geographical Analysis. 1992; 24(3):189–206. [Google Scholar]

[pone.0303456.ref022] 22.Ord JK, Getis A. Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis. 1995; 27(4): 286–306. [Google Scholar]

[pone.0303456.ref023] 23.Goodchild MF. GIScience, geography, form, and process. Annals of the Association of American Geographers. 2004; 94(4): 709–714. [Google Scholar]

[pone.0303456.ref024] 24.de Jong P, Sprenger C, van Veen F. on extreme values of Moran’s I and Geary’s C. Geographical Analysis. 1984; 16(1): 985–999. [Google Scholar]

[pone.0303456.ref025] 25.Tiefelsdorf M, Boots B. The exact distribution of Moran’s I. Environment and Planning A. 1995; 27(6): 985–999. [Google Scholar]

[pone.0303456.ref026] 26.Xu F. Improving spatial autocorrelation statistics based on Moran’s index and spectral graph theory. Urban Development Studies. 2021; 28(12): 94–103 [In Chinese]. [Google Scholar]

[pone.0303456.ref027] 27.Chen YG. On the four types of weight functions for spatial contiguity matrix. Letters in Spatial and Resource Sciences. 2012; 5(2): 65–72. [Google Scholar]

[pone.0303456.ref028] 28.Getis A. Spatial weights matrices. Geographical Analysis. 2009; 41(4): 404–410. [Google Scholar]

[pone.0303456.ref029] 29.Chen YG. New approaches for calculating Moran’s index of spatial autocorrelation. PLoS ONE. 2013; 8(7): e68336. doi: 10.1371/journal.pone.0068336 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303456.ref030] 30.Chen YG. Spatial autocorrelation approaches to testing residuals from least squares regression. PLoS ONE. 2016; 11(1): e0146865. doi: 10.1371/journal.pone.0146865 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303456.ref031] 31.Magnello E, van Loon B. Introducing Statistic: A Graphic Guide. London: Icon Books; 2009. [Google Scholar]

[pone.0303456.ref032] 32.Chen YG. Spatial autocorrelation equation based on Moran’s index. Scientific Reports. 2023; 13: 19296. doi: 10.1038/s41598-023-45947-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303456.ref033] 33.Wheelan C. Naked Statistics: Stripping the Dread from the Data. New York and London: W. W. Norton & Company; 2013. [Google Scholar]

[pone.0303456.ref034] 34.Taylor PJ. Quantitative Methods in Geography. Prospect Heights, Illinois: Waveland Press; 1983. [Google Scholar]

[pone.0303456.ref035] 35.Louf R, Barthelemy M. Scaling: lost in the smog. Environment and Planning B: Planning and Design. 2014; 41: 767–769. [Google Scholar]

[pone.0303456.ref036] 36.Long YQ, Chen YG. Multi-scaling allometric analysis of the Beijing-Tianjin-Hebei urban system based on nighttime light data. Progress in Geography. 2019; 38(1): 88–100 [In Chinese]. [Google Scholar]

[pone.0303456.ref037] 37.Chen YG, Li YJ, Feng S, Man XM, Long YQ. Gravitational scaling analysis on spatial diffusion of COVID-19 in Hubei province, China. PLoS ONE. 2021; 16(6): e0252889. doi: 10.1371/journal.pone.0252889 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303456.ref038] 38.Widip CA, Utomo WH, Yulianto SJP. Identification of spatial patterns of food insecurity regions using Moran’s I (Case study: Boyolali regency). International Journal of Computer Applications. 2013; 72(2): 54–62. [Google Scholar]

[pone.0303456.ref039] 39.Batty M, Couclelis H, Eichen M. Urban systems as cellular automata. Environment and Planning B: Planning and Design. 1997; 24: 159–164. [Google Scholar]

[pone.0303456.ref040] 40.Chen YG. Power-law distributions based on exponential distributions: Latent scaling, spurious Zipf’s law, and fractal rabbits. Fractals. 2015; 23(2): 1550009. [Google Scholar]

[pone.0303456.ref041] 41.Casti JL. Would-Be Worlds: How Simulation Is Changing the Frontiers of Science. New York: John Wiley and Sons; 1996. [Google Scholar]

PERMALINK

Reconstruction and normalization of LISA for spatial analysis

Yanguang Chen

Roles

Abstract

1 Introduction

2 Theoretical results

2.1 Local spatial autocorrelation measurements

2.1.1 The first formula of local Moran index

Table 1. Three sets of LISAs researched in this paper based on Anselin’s work.

2.1.2 The second formula of local Moran index

2.1.3 The formula of local Geary coefficient

2.2 Revised and normalized results

2.2.1 Adjustment of symbol system and clarification of concept

Table 2. Comparison between Anselin’s symbol system and the symbol system in this paper.

Table 3. Value transformation methods, calculation formulas, and properties of converted variables.

2.2.2 Definition of normalized local Moran’s index

2.2.3 Definition of normalized local Geary’s coefficient

3 Empirical analysis

3.1 Study area and data

Table 4. Beijing-Tianjin-Hebei city population and its centralization and standardization results.

Table 5. Spatial distance matrix (dij) of Beijing-Tianjin-Hebei cities based on traffic mileage.

3.2 Calculation results

Fig 1. A schematic flowchart of the conversion relationship from Moran’s index to different types LISAs.

Table 6. Comparison of three sets of local Moran index values in two years.

Table 7. Comparison of three sets of local Geary coefficient values in two years.

Fig 2. The relationships between three sets of local Moran’s indexes of BTH cities in 2010.

4 Questions and discussion

Table 8. Functions and problems of Anselin’s LISA and the improved effect of this paper.

Table 9. Comparison of between normalized LISA and the equivalent transformation results of Anselin’s second set of LISA definitions.

5 Conclusions

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Yuxia Wang

Roles

Author response to Decision Letter 0

Decision Letter 1

Yuxia Wang

Roles

Acceptance letter

Yuxia Wang

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 5. Spatial distance matrix (d_ij) of Beijing-Tianjin-Hebei cities based on traffic mileage.