Skip to main content
PLOS One logoLink to PLOS One
. 2024 May 22;19(5):e0303456. doi: 10.1371/journal.pone.0303456

Reconstruction and normalization of LISA for spatial analysis

Yanguang Chen 1,*
Editor: Yuxia Wang2
PMCID: PMC11111027  PMID: 38776327

Abstract

The local indicators of spatial association (LISA) are important measures for spatial autocorrelation analysis. However, there is an inadvertent fault in the mathematical processes of deriving LISA in literature so that the local Moran and Geary indicators do not satisfy the second basic requirement for LISA: the sum of the local indicators is proportional to a global indicator. This paper aims at reconstructing the calculation formulae of the local Moran indexes and Geary coefficients through mathematical derivation and empirical evidence. Two sets of LISAs were clarified by new mathematical reasoning. One set of LISAs is based on non-normalized weights and non-centralized variable (MI1 and GC1), and the other set is based on row normalized weights and standardized variable (MI2 and GC2). The results show that the first set of LISAs satisfy the above-mentioned second requirement, but the second the set cannot. Then, the third set of LISA was proposed and can be treated as canonical forms (MI3 and GC3). This set of LISAs satisfies the second requirement. The observational data of city population and traffic mileage in Beijing-Tianjin-Hebei region of China were employed to verify the theoretical results. This study helps to clarify the misunderstandings about LISAs in the field of geospatial analysis.

1 Introduction

Geography has two core concepts on location effect: difference and dependence. The former is related to a classical topic of geography, while the latter is related to spatial correlation analysis. The concept of spatial difference is also termed regional differences, which came from areal differentiation [13]. The traditional concept of difference seems to be in contradiction with the pursuit of general laws, so geography embarks on the road of "exceptionalism" [4]. After the quantitative revolution (1953–1976), geography began to attach importance to spatial organization and correlation, which indicates spatial dependence. Spatial interaction models and spatial autocorrelation analysis are the main approaches to research spatial correlation processes [5, 6]. Spatial autocorrelation is originally a biological statistic concept, which is mainly used to evaluate whether the spatial sampling results meet the traditional statistical requirements [79]. When geographers introduced spatial autocorrelation measure into geospatial analysis, they found that there are few spatial uncorrelated phenomena. In this context, the spatial autocorrelation analysis method was developed [1012]. The early spatial autocorrelation analysis was only at the global level, rarely involving the local level, so it provided limited geospatial information. In other words, the initial spatial autocorrelation focuses on spatial dependence rather than spatial difference. After the theoretical revolution in the later period of the quantitative revolution was frustrated, the traditional regional trend of thought of geography returned quietly, and the concept of regional difference was again valued by geographers with a new expression of spatial heterogeneity [13]. Tobler proposed the first law of geography based on spatial dependence [14], and Harvey proposed that spatial heterogeneity be the second law of geography [15]. The study of spatial heterogeneity naturally involves spatial locality. According to Fotheringham [1618], there are three trends in the development of quantitative geography: localization, computation and visualization. In this sense, local spatial autocorrelation analysis came into being [13, 1922]. Therefore, spatial difference (heterogeneity) and spatial correlation (dependency) have reached the same goal through different routes [13, 23].

Local spatial autocorrelation analysis is developed on the basis of global spatial autocorrelation analysis. The Local Indicators of Spatial Association (LISA) proposed by Anselin [19] plays an important role in the local correlation analysis of geographical research. LISA includes local Moran indexes and local Geary coefficients. These spatial statistics, together with the G index proposed by Getis and Ord [21] and Moran scatterplot proposed by Anselin [13], have become systematic tools for local autocorrelation analysis. However, even the wisest are not always free from error. The Anselin’s outstanding paper contains some important issues that need to be addressed. The main problems are as follows. First, there is an unintentional mistake of mathematical reasoning resulted from step skip of mathematical transformation. This mistake leads readers to misunderstand the relationship between global normalized spatial weight matrix and row-normalized spatial weight matrix. Second, the row-normalized spatial weight matrix violates the distance axiom. A spatial weight matrix is based on distance matrix or generalized distance matrix, which must conforms to distance axiom. Otherwise, the calculation result of the global or local Moran’s index may appear abnormal. Third, the basic difference between Moran’s index and Geary’s coefficient was omitted. Moran’s index is based on spatial population, while Geary’s coefficient is based on spatial sample. Different definitions lead to different application directions. However, in the definitions of LISA, the local Geary’s coefficient is based on spatial population rather spatial sample. This is not consistent with original aim of defining Geary’s coefficient.

The above issues cause a series of consequences. First, the two sets of LISA values are not equivalent to each other. For example, the ratios of the LISA values based on non-normalized spatial weight matrix to the LISA values based on normalized spatial weight matrix are not constants. This is a serious logical problem. As we know, if two measures are equivalent to one another, the ratio of the two measures is a constant. For example, the ratio of Student’s t statistic to Pearson’s part correlation coefficient is a constant, which equals the square root of the ratio of residuals mean square deviation to total sum of squares. Second, sometimes, the calculated values of Moran’s index and Geary’s coefficient exceed reasonable upper and lower limits. Moran’s index bear two sets of boundary values at least. One is absolute boundary values, that is -1 and 1, which depend on the mathematical structure of Moran’s index formula and can be proved by conditional extremum principle of quadratic form. The other is relative boundary values, which are determined by the maximum and minimum eigenvalues of normalized spatial weight matrix [2426]. Beyond the boundary values of spatial statistics is another logical problem. One of the key reasons lies in that symmetric spatial contiguity matrix is replaced by asymmetric row normalized spatial weight matrix in the process of mathematical deduction. What is more, Anselin’s LISA lack clear boundary value and critical value. Anyway, spatial statistics represent a kind of measures, which may be used to describe or infer. No matter where the goal is, a good measure should have a clear critical value or boundary value. For example, the boundary values of Pearson correlation coefficient is -1 and 1, and the critical value is 0. The purpose of this paper is to develop the spatial measures based on LISA. The rest parts are organized as below. In Section 2, Anselin’s mathematical reasoning process is sorted out and his unintentional mistakes are corrected. Based on the mathematical derivation, the local Moran index and local Geary coefficient will be normalized. In addition, the strict mathematical relationship between Moran’s indexes and Geary’s coefficients are derived. In Section 3, the observational data of the system of cities in Beijing-Tianjin-Hebei region in China will be employed to testify the improved results. In Sections 4 and 5, the related questions are discussed, and finally, the discussion will be concluded by summarizing the main points of this study.

2 Theoretical results

2.1 Local spatial autocorrelation measurements

2.1.1 The first formula of local Moran index

One of the bases of spatial analysis is spatial proximity matrix, which can be measured by spatial distance matrix. Spatial distance matrix or spatial proximity matrix can be transformed into spatial contiguity matrix by means of spatial weight function such as negative power law or step function [27, 28]. A spatial contiguity matrix can be treated as non-normalized spatial weight matrix. Suppose that there are n elements in a geographical region, and this size of the ith element is measured by xi (i = 1, 2,…,n). The size variable x are not standardized and the spatial contiguity matrix V = [vij] is not transformed into the globally normalized spatial weight matrix W = [wij]. Note that the so-called global normalization refers to the normalization of a matrix or vector by the sum of its elements. So, global normalization can also be termed sum-normalization or sum-based normalization. Correspondingly, row-normalization is a type of local normalization which can also be called row-based normalization. Using the symbol systems defined in this context, we can extract two sets of local spatial autocorrelation statistics (Table 1). The first local Moran index formula defined by Anselin [19] is as follows

Ii*=(xix¯)j=1nvij(xjx¯)=yij=1nvijyj, (1)

where yi=xix¯, yj=xjx¯ denote centralized size variables, and x¯ refers to mean value. In Eq (1), ij, otherwise vij = 0. The centralized variables can be transformed into standardized variables by means of z-score formula. Based on population standard derivation, the standardized variables can be expressed as

zi=yiσ=xix¯σ,zj=yjσ=xjx¯σ,

where z denotes standardized variable, and σ refers to population standard deviation. The sum of Eq (1) is

i=1nIi*=i=1nyij=1nvijyj=i=1nj=1nvijyiyj, (2)

which is essentially the sum of spatially weighted outer products of centralized variables. The spatial weight coefficient is not normalized by sum. The sum of the elements in spatial contiguity matrix is

V0=i=1nj=1nvij. (3)
Table 1. Three sets of LISAs researched in this paper based on Anselin’s work.
Item Index Weight matrix Size variable Symbol
First set of local LISA Local Moran’s I No normalization Centralization MI1
Local Geary’s C No normalization Centralization GC1
Second set of local LISA Local Moran’s I Row normalization Standardization based on population standard deviation MI2
Local Geary’s C Row normalization Standardization based on population standard deviation GC2
Third set of local LISA Local Moran’s I Global normalization Standardization based on population standard deviation MI3
Local Geary’s C Global normalization Standardization based on sample standard deviation GC3

Note: If a spatial dataset is large enough, the distinction between population standard derivation and sample standard derivation can be ignored. However, sometimes the spatial data set is not so large, and this difference cannot be ignored, otherwise biased calculation results may lead to inappropriate conclusions.

Dividing Eq (2) by V0 yields spatial precision weighted auto-covariance as follows

Cov=1i=1nj=1nviji=1nIi*=1V0i=1nj=1nvijyiyj. (4)

Furthermore, the spatial weighted covariance can be divided by the population variance of the size variable, which is called the second moment in literature [19], that is

σ2=1ni=1n(xix¯)2=1ni=1nyi2. (5)

The result is global Moran’s index, I = Cov/σ2. It can be expanded as

I=1i=1nj=1nviji=1nIi*1ni=1nyi2=ni=1nj=1nvijyiyjV0i=1nyi2=1σ2V0i=1nj=1nvijyiyj=i=1nj=1nwijzizj, (6)

where wij is the element of the globally normalized weight matrix W. According to Anselin [19], Eq (6) can be expressed as

I=1σ2V0i=1nIi*. (7)

The relationship between the sum of Anselin’s first local Moran’s indexes and the global Moran’s index is obtained as below

i=1nIi*=σ2V0I=γI. (8)

The proportionality coefficient in Eq (8) is

γ=σ2V0=(1ni=1nyi2)(i=1nj=1nvij), (9)

which represents the general expression of the ratio of the sum of local Moran’s indexes to the global Moran’s index. Please note that Eqs (8) and (9) are derived from the relations based on non-normalized spatial weight matrix. They cannot be directly applied to the mathematical processes based on row-normalized spatial weight matrix. According to Anselin [19], Eq (3) can be replaced by a vector indicating the sum of rows of the spatial contiguity matrix as below

Vi=j=1nvij. (10)

Correspondingly, spatial contiguity matrix can be normalized by row. Anselin called it row-standardized spatial weights matrix [19]. In this way, Eq (4) becomes a locally weighted spatial auto-covariance, that is

Covi=Ii*j=1nvij=1Vij=1nvijyiyj. (11)

The summation of Eq (11) is

i=1nCovi=i=1nIi*Vi=i=1nj=1nvijViyiyj. (12)

Based on Eqs (11) and (12), it is impossible to obtain the global spatial weighted auto-covariance, and it is impossible to derive the simple summation relationship between local Moran index and global Moran index. If so, the reasoning from Eq (4) to Eq (9) will be invalid.

It can be seen that the local-global relationship based on Anselin’s first local Moran index formula suggests a global normalized weight matrix with symmetry. The first local Moran index formula of Anselin [19] is correct, it satisfy the two requirements defined by Anselin [19]. The shortcoming lies in that it is not standardized. A good measure should have a clear critical value (reference value) or a pair of explicit boundary values. However, the local Moran index calculated by Eq (1) has neither boundary values nor clear threshold value.

2.1.2 The second formula of local Moran index

Suppose that the variables are standardized, the spatial contiguity matrix is transformed into a spatial weight matrix which is normalized by row. In this way, V0 in is replaced by Vi in Eq (4). Thus, revised Eq (4) divided by population variance yields the second local Moran’s index formula of Anselin [19], Ii** = Covi/σ2, that is

Ii**=1σ2j=1nvijViyiyj=1σ2yij=1nwij*yj, (13)

where wij* denotes the elements in the row-normalized spatial weight matrix, V*. Apparently, Eq (13) is based on Eqs (10) and (11). Thus, in terms of Eq (10), the sum of the spatial weight matrix is

V0*=i=1nj=1nvijVi=i=1n(1Vij=1nvij)=i=1n(1)=n. (14)

The variance of standardized variable is 1, namely, σ2 = 1. For normalized matrix by row, the sum is V0* = n, thus we have

γ=σ2V0*=V0*=n. (15)

Substituting Eq (15) into Eq (8) seems to yield the following relation

i=1nIi**=nI, (16)

which is once of relations given by Anselin [19]. Note that the symbols have been slightly changed. That is, V0 is replaced by V0*, and Ii* is replaced by Ii**. The new added asterisk indicates the inherent difference between the two sets of local Moran’s indexes. On the surface, there is no problem at all in the mathematical derivation process. However, Anselin [19] inadvertently made a mistake in above reasoning process (S1 File). Looking at Eq (14) alone, we may think that there is no problem. However, by summing Eq (13), it is impossible to extract an independent Eq (14), and this is exactly the problem. In fact, Anselin [19] unintentionally replaced a mathematical concept by directly applying the derived results based on non-normalized weight matrices to the relationship formula based on row-normalized spatial weight matrices. Regardless of whether the spatial contiguity matrix is symmetric or not, the non- normalized spatial weight matrix and the row normalized spatial weight matrix are not isomorphic to each other. However, the non-normalized spatial weight matrix is isomorphic to the sum-based normalized spatial weight matrix.

Mathematical deduction problems can be revealed through logical analysis, and also can be reflected through empirical analysis. Let us check the problem from another view of angle. The relation between the second set of local Moran’s indexes of Anselin [19] and global Moran’s index can be derived from Eq (13). The summation of the local Moran’s indexes based on Eq (13) is

i=1nIi**=1σ2i=1nj=1nvijViyiyj=V0i=1nj=1nwijVizizj=i=1nj=1nwij*zizj. (17)

By variable standardization, the population standard deviation becomes 1 unit, i.e., σ2 = 1. However, the row sum of spatial contiguity matrix Vi is not a constant. It can neither be eliminated nor converted to a constant. Therefore, no constant proportionality relation between the second set of local Moran’s index and the global Moran’s index. If and only if Eq (6) is introduced into Eq (17) can the proportional relationship similar to Eq (8) be derived. Based on Eq (6), Eq (17) can be re-expressed as

i=1nIi**=i=1nj=1nwij*zizji=1nj=1nwijzizjI. (18)

Unfortunately, we cannot prove the following relation:

i=1nj=1nwij*zizj=ni=1nj=1nwijzizj=nI. (19)

This lends further support to the judgment that Eq (16) does not hold. However, the proportional relationship given in Eqs (17) and (18) can be easily verified by the observational data. Another view of angle is to examine the ratios of two sets of local Moran indices. If the ratios are constant, the two definitions are equivalent to one another, otherwise they are not. In fact, the values in the first set of local Moran indexes divided by the corresponding values in the second set of local Moran indexes yields

Ii*Ii**=σ2j=1nvijyiyjj=1nvijViyiyj=σ2Vi, (20)

which, obviously, is a variable that changes with Vi rather than a constant.

It can be seen that the ratios of two sets of local Moran’s indexes are not constant, so they are not equivalent to each other. This suggests that, the second set of local Moran indexes cannot satisfy the second requirement of Anselin [19], which said, “The sum of the local indicators is proportional to a global indicator”. The reason for the fault is that Anselin [19] inadvertently replaced a concept in this mathematical derivation. Concretely speaking, the globally normalized symmetric weight matrix W becomes the locally normalized asymmetric weight matrix V*. This way violates the law of identity of concepts and the principle of logical consistency in mathematical reasoning.

2.1.3 The formula of local Geary coefficient

The global Geary coefficient is complementary to the global Moran index: the former is oriented to spatial sample analysis, and the latter is based on spatial statistical population. Similar to the treatment of local Moran index, two local Geary statistics were defined by Anselin [19]. It is assumed that the variables are not standardized and the spatial contiguity matrix is not transformed into a global normalized spatial weight matrix. Anselin [19] defined the first local Geary’s coefficient as

Ci*=j=1nvij(yiyj)2, (21)

in which the divisor 2 is ignored. Suppose that the variable is standardized, and the spatial contiguity matrix is transformed into a row normalized spatial weight matrix. Anselin [19] defines the second local Geary coefficient as

Ci**=1σ2j=1nwij*(yiyj)2. (22)

Summation of Eq (21) divided by the population variance σ2 is

1σ2i=1nCi*=ni=1nj=1nvij(yiyj)2i=1nyi2=2nV0n1(n1)i=1nj=1nvij(yiyj)22V0i=1nyi2=γcC, (23)

where C refers to global Geary coefficient. It can be expressed as

C=(n1)i=1nj=1nvij(yiyj)22V0i=1nyi2=12s2i=1nj=1nwij(yiyj)2=12i=1nj=1nwij(zi*zj*)2. (24)

in which z* referes to the standardized size variable based on the sample standard deviation s, i.e.,

zi*=yis=xix¯s,zj*=yjs=xjx¯s.

Here s denotes sample standard deviation, that is, s = σ(n/(n-1))1/2. In addition, the proportional coefficient between the sum of the first local Geary coefficient divided by the population variance and the global Geary coefficient is as below

γc=2nV0n1. (25)

Therefore, the relationship between the sum of the first local Geary coefficients and the global Geary coefficients is

i=1nCi*=2nV0σ2n1C=γcσ2C. (26)

This formula is correct, and it satisfies the two requirements given by Anselin [19]. However, it is neither direct nor standard. Dividing the summation of Eq (21) by both the population variance σ2 and the sum of the spatial weight matrix V0 to obtain the relationship between the local Geary’s coefficients and the global Geary coefficient, that is

i=1nCi**=ni=1nj=1nvij(yiyj)2V0i=1nyi2=2nn1(n1)i=1nj=1nwij(yiyj)22i=1nyi2=2nn1C. (27)

This is the corrected expression of the relationship between local Geary coefficient and global Geary’s coefficient, differing from that given by Anselin [19]. The reason is that derivation of this relationship is based on the global normalization of spatial weight matrix. However, due to the fact that divisor 2 is ignored in Eq (21), when n is sufficiently large in Eq (27), the sum of local Geary’s coefficients does not equal the global Geary’s coefficient. Based on the row-normalized weight matrix, the sum of local Geary’s coefficients is

i=1nCi**=ni=1n1Vij=1nvij(yiyj)2i=1nyi2=nV0i=1n1Vij=1nwij(yiyj)2i=1nyi2. (28)

The constant proportional relationship between local Geary coefficient and global Geary coefficient cannot be derived in terms of Eq (28). Anselin [19] believes that, according to Eq (25), for the weight matrix normalized by row, V0 = n, so there is γc = 2n2/(n-1), that’s right. Then he gave the following relation

i=1nCi**=2n2n1C=γcC. (29)

This is wrong and cannot be strictly derived by mathematical methods, nor can it be verified by observational data. Based on the row-normalized weight matrix, the correct result is

i=1nCi**=2nn1i=1nj=1nwij*(zi*zj*)2i=1nj=1nwij(zi*zj*)2C=γc*C, (30)

in which γc* represents the proportionality coefficient. The coefficient can be expressed as

γc*=2nn1i=1nj=1nwij*(zi*zj*)2i=1nj=1nwij(zi*zj*)2, (31)

which is not a constant. It cannot be proved that Eq (29) is equivalent to Eq (30). Moreover, starting from Eqs (21) and (22), the proportional relationship between the two sets of local Geary coefficients is

Ci*Ci**=σ2j=1nvij(yiyj)2j=1nvijVi(yiyj)2=σ2Vi=Ii*Ii**. (32)

This is obviously not a constant, but a variable that changes with the sum of the rows of the spatial proximity matrix. This shows that the two sets of local Geary coefficients are not equivalent to each other, and the ratio of the corresponding values of the two sets of local Geary coefficients is equal to the ratio of the values of the two sets of local Moran’s indices. In short, the second set of local Geary statistic does not satisfy the second requirement given by Anselin [19].

2.2 Revised and normalized results

2.2.1 Adjustment of symbol system and clarification of concept

Concept is the cornerstone of logic. If and only if a concept is clear, there will be no mistakes in reasoning. The premise of mathematical reasoning is the symbolization of concepts. Confusion of symbols can easily lead to mistakes in reasoning. The main reason for the inconsistency between the two sets of LISA proposed by Anselin [19] is the unintentional concept substitution caused by the symbol mixing of spatial measure matrixes. At present, there are several problems about spatial autocorrelation in geographical literature.

Firstly, the symbols of the spatial weight matrix need to be improved. The symbols of spatial contiguity matrix (SCM), say, [1/dij], and those of spatial weight matrix (SWM), say, [vij/∑∑vij], where vij = 1/dij, are confused with each other. The two matrixes are regarded as equivalence and are both represented by the same symbol [wij]. In fact, the spatial distance matrix can be transformed into a spatial contiguity matrix according to a certain distance decay function, and the weight matrix can be obtained by normalizing the spatial contiguity matrix [29]. Despite the final result is the same in the case of symbol confusion, the expression form causes many unnecessary misunderstandings for beginners. This paper distinguishes the symbols as follows: SCM is represented by V, its elements are represented by vij; SWM is represented by W, and its elements are expressed as wij. Thus we have SCM, V = [vij], and SWM, W = [wij] = [vij/∑∑vij].

Secondly, the definitions of spatial matrixes need to be explained. After the spatial contiguity matrix (SCM) is transformed into the spatial weight matrix (SWM), the global normalization and local normalization by row are confused. Anselin [19], the original founder of the local Moran index, adopted the method of row normalization (he term the processing “row-standardization”). The sum of the SWM elements is thus equal to n. However, this method will lead to two results: (1) The symmetry of the spatial distance matrix is broken. Spatial weight matrix comes from spatial distance matrix or generalized spatial distance matrix. One of the important properties of distance measure is symmetry: dij = dji holds for all i and j [30]. This is one of the four principles of the distance axioms (positivity, specification, symmetry, and triangle inequality). (2) The absolute value of the calculated local Moran index may exceed 1 sometimes. Moran index is an autocorrelation coefficient whose absolute value should fall between—1 and 1 in theory. As for the special boundary values of Moran’s index determined by the maximum and minimum eigenvalues of the spatial weight matrix, it should be discussed in another work.

Thirdly, the meanings and symbols of the two types of variance are different. The population variance is often confused with the sample variance in spatial statistics. Moran’s index is defined based on population variance, and Geary’s coefficient is defined based on sample variance [29]. According to Fisher’s symbol system in statistics, the population variance is expressed as σ2, and the denominator in the formula is n; the sample variance is expressed as s2, and the denominator in the formula is n-1 in the formula [31]. The relationship between them is σ2 = (n-1)s2/n.

Fourth, the difference in numbering between rows and columns needs to be noted. There is sometimes confusion between row summation and column summation. The sum based on row vector is expressed as summation by j, and the sum of column vector is expressed as summation by i. Based on globally normalized weight matrix, the difference is only formal and has nothing to do with the results. However, based on row-normalized weight matrix, the results of row summation differs from the results of column summation.

Fifth, the methods of value transformation need to be particularly clarified. The concepts of normalization and standardization are always confused in literature. Generalized standardization includes normalization. However, both standardization and normalization have different definition methods and corresponding calculation formulas. The transformation formula of variables should be determined according to different research objectives (S2 File).

In order to make it easy for readers to understand, it is necessary to distinguish symbols, and then clarify the concept of variable transformation. There are three principles for adopting symbols in this paper: First, the principle of consensus. Priority will be given to the conventional expression in the field of mathematical statistics. For example, the population standard deviation is expressed as σ, and the sample standard deviation is expressed as s [31]. Second, the principle of direction. For example, the spatial weight matrix represents W because “W” it is the capital form of the initial of “weight”. Third, the principle of distinction. For example, the spatial contiguity matrix represents V, so as to distinguish it from the spatial weight matrix W, and this distinguishing facilitates mathematical reasoning. Among the above three principles, the distinction principle is the most important (Table 2). In the spatial autocorrelation literature, centralization variables (such as defining local Moran’s index), standardized variables (such as simplifying the calculation of global Moran index) and globally normalized variables (such as simplifying the calculation of Getis-Ord’s index) are used, respectively (Table 3). In the literature, when the spatial weight matrix is normalized by row, the concept of row standardization is adopted, but the calculation formula is not given [19]. This can easily lead to misunderstandings for beginners of spatial autocorrelation analysis.

Table 2. Comparison between Anselin’s symbol system and the symbol system in this paper.
Measure set Anselin This paper
Spatial proximity matrix (SPM) -- U = {dij}
Spatial contiguity matrix (SCM): non-normalized SWM W = {wij} V = {vij}
Row-normalized spatial weight matrix (RSWM) W = {wij} --
Sum-normalized spatial weight matrix (SSWM) -- W = {wij}
Row-normalized spatial weight matrix W = {wij} --
Sum of elements of spatial contiguity matrix S 0 V 0
Sum of elements of spatial weight matrix S 0 W 0
Size variable -- xi, xj
Centralized variable zi, zj yi, yj
Standardized variable -- zi, zj
Population variance m 2 σ 2
Sample variance -- s 2
Global Moran’s I I I
Local Moran’s I I i I i
Global Geary’s I c C
Local Geary’s I c i C i

Note: In the context, the sum-normalized spatial weight matrix is also termed sum-based normalized spatial weight matrix or globally normalized spatial weight matrix by sum. Correspondingly, the row-normalized spatial weight matrix is also called row-based normalized spatial weight matrix or locally normalized spatial weight matrix by row.

Table 3. Value transformation methods, calculation formulas, and properties of converted variables.
Method Calculation formula Property
Centralization yi = xi-x¯ The mean value is 0
Standardization by z-score zi = (xi-x¯)/σ,
zi* = (xi-x¯)/s,
The mean value is 0 and the standard deviation is 1
Range normalization xi(r) = (xi-xmin)/(xmax-xmin) The values range from 0 to 1
Global normalization xi(t) = xi/∑ixi,
wij = vij /∑ijvij
The values come between 0 and 1 and the sum of the values equals 1

2.2.2 Definition of normalized local Moran’s index

Moran’s index is defined on the basis of population standard deviation rather than sample standard deviation. Accordingly, local Moran’s index should also be defined through population standard deviation. In light of Eq (7), canonical local Moran’s index can be defined as

Ii=Ii*σ2V0=1σ2yij=1nvijV0yj=zij=1nwijzj. (33)

Further, according to Eq (7), the relation between global Moran’s index and the sum of local Moran’s indexes is

I=i=1n(Ii*σ2V0)=i=1nIi. (34)

According to Eq (33), the relation between Anselin’s first set of local Moran indexes and the local Moran’s indexes formula improved in this paper is

Ii*=γIi=σ2V0Ii. (35)

Thus, for the globally normalized spatial weight matrix W and the standardized variable based on population standard deviation z, we have σ2 = 1, V0 = 1. Thus, Eq (9) should be replaced by

γ0=σ2V0=(1ni=1nzi2)(i=1nj=1nwij)=1. (36)

This suggests that, according to the second basic requirement for LISA from Anselin [19], the sum of normalized local Moran’s index equals the global Moran’s index.

2.2.3 Definition of normalized local Geary’s coefficient

Geary’s coefficient is defined on the basis of sample standard deviation rather than population standard deviation. Accordingly, local Geary’s coefficient should also be defined through sample standard deviation. The generalized Geary’s coefficient is another case [29]. In terms of Eq (26), global Geary’s coefficient can be expressed as

C=n12nV0σ2i=1nCi*=12V0s2i=1nCi*=i=1n(Ci*2V0s2)=i=1nCi, (37)

where s2 = 2/(n-1) reflects the relationship between sample variance s2 and population variance σ2. Thus local Geary’s coefficient can be defined as

Ci=Ci*2V0s2=12V0s2j=1nvij(yiyj)2=12j=1nwij(zi*zj*)2. (38)

Summing Eq (38) yields global Geary’s coefficient, that is, Eq (24). According to Eq (37), the relation between Anselin’s first set of Geary’s coefficient and the local Geary’s coefficient formula improved in this paper is

Ci*=γcσ2Ci=2s2V0Ci. (39)

Thus, for the globally normalized spatial weight matrix W and the standardized vector based on sample standard deviation z*, we have s2 = 1, V0 = 1. Thus, according to Eq (26), the relation between proportionality coefficients is

γcσ2=2s2V0=2. (40)

Moran’s index and Geary’s coefficient reflect the same problem from different angles of view. It can be proved that the relationship between global Moran’s I and global Geary’s C is as follows

C=i=1nj=1nvijyi2i=1nj=1nvijyiyjV01n1i=1nyi2=n1n(oTWz2zTWz)=n1n(oTWz2I), (41)

where z denotes standardized vector based on population standard deviation, z2 = diag(zzT) refers to a vector composed of the squares of the elements in z, oT = [1 1 … 1] is a ones vector in which all the elements are 1. The symbol “T” indicates transposition, and the function "diag" represents taking the diagonal elements of a matrix to form a vector. If the mean of the global Moran’s index is treated as I0 = 1/(1-n), the mean of global Geary’s coefficient, C0, can be estimated by

C0=n1n(eTWz2I0)=n1n(eTWz211n)=n1neTWz2+1n. (42)

Further, the relationship between local Moran’s indexes and local Geary’s coefficient can be derived. From Eq (38) it follows

Ci=12V0nn1σ2j=1nvij(yiyj)2=n12nj=1nwij(zizj)2. (43)

Changing the form of Eq (43) yields

Ci=n12n(j=1nwij(zi2+zj2)2j=1nwijzizj)=n12n(j=1nwij(zi2+zj2)2Ii). (44)

This means that there is a strict numerical conversion relationship between local Moran’s indexes and local Geary’s coefficient, although they describe the same problem from different angles. It can be seen that Eq (41) can be obtained by summing Eq (44).

In the new framework for LISA, the spatial weight matrix is normalized by sum. This is a type of global normalization in value transformation. There are several benefits to using a globally normalized weight matrix. We know that mathematics is a science relying highly on form in a sense. The same mathematical method often has vastly different effects when expressed in different forms. For spatial autocorrelation, using a normalized spatial weight matrix instead of a non-normalized weight matrix results in at least the following advantages. First, by normalized weight matrix, it is very convenient to calculate the global Moran’s index I and local Moran’s indexes Ii, and reflect the clear relationship between the two, I and Ii [29]. Second, normalizing weight matrix, we can obtain a standardized Moran’s scatterplot, where the slope of the trend line is exactly equal to the global Moran’s index value [32]. Third, based on normalized weight matrix, the structure of the parameters of the spatial autoregressive models can be clearly revealed using the spatial autocorrelation coefficients. Fourth, it makes the values of local Moran’s index and local Geary’s coefficient more intuitive. The fourth advantage mentioned above is more relevant to the research in this work. Many basic measures and models of spatial statistical analysis are rooted in conventional statistics and are created by analogy with time series analysis methods. The common measures and models of time series analysis, such as autocorrelation coefficients and autoregressive models, are also rooted in traditional statistical theories. The development of statistics took place in the wider context of the Victorian culture of measurement [31]. For simplicity’s sake, the numerous data of measurement results are usually condensed into an index [33]. In this case, an index is often treated as a characteristic measurement [6, 34]. A good index either has a pair of clear boundary values, a clear critical value, or even a combination of both. Based on standardized variable and globally normalized spatial weight matrix, the values of the local Moran’s indexes fall between -1 and 1, the corresponding critical value is 0; and the values of the local Geary’s coefficient falls between 0 and 2, and the corresponding critical value is 1.

3 Empirical analysis

3.1 Study area and data

The results of mathematical deduction ultimately need to be verified through mathematical reasoning and empirical analysis. After all, the success of sciences rests with their great emphasis on the role of quantifiable data and their interplay with models [35]. Taking cities in Beijing, Tianjin and Hebei (BTH) region as an example, we can make a concise calculation case study. This is a demonstrative case, not an explanatory case. In other words, this example is used to verify the reasoning results rather than to study the spatial structure and characteristics of BTH urban systems. The study area includes Beijing city, Tianjin city, and the main cities of Hebei Province. The study region is also termed Jing-Jin-Ji (JJJ) region in literature [36]. The cities are all of prefecture level and above, and the number of cities is n = 13. The size measurement is the city population of the fifth census in 2000 and the sixth census in 2010. Town population is not taken into account. At present, urban population has the definitions of regional total population, municipal population, city population and urban population consisting city population and town population. This case uses the city population, which can better reflect the characteristics of city size. City population size can be reflected by night light area in map [32, 36]. The population size was processed by centralization (y), population-based standardization (z) and sample-based standardization (z*) (Table 4). As for the spatial weight matrix, the basic data is derived from the traffic mileage between cities (Table 5). The spatial weight function adopts the special negative power law, the inverse proportion function, which is actually the intersection of power law and hyperbolic function. Thus, the spatial contiguity is defined as

vij={1/dij,ij0,i=j, (45)

where dij denotes the distance by road between city i and city j. On this basis, the traffic mileage matrix (U) can be transformed into a spatial contiguity matrix (V), which can be changed to the global normalization weight matrix (W) and row normalization weight matrix (W*).

Table 4. Beijing-Tianjin-Hebei city population and its centralization and standardization results.

City 2000 2010
x y z z* x y z z*
Beijing 949.6688 769.1377 2.9976 2.8800 1555.2378 1284.2528 2.9870 2.8698
Tianjin 531.3702 350.8391 1.3673 1.3137 885.6234 614.6384 1.4296 1.3735
Shijiazhuang 193.0579 12.5268 0.0488 0.0469 275.6871 4.7021 0.0109 0.0105
Tanshan 140.3887 -40.1424 -0.1564 -0.1503 163.7579 -107.2271 -0.2494 -0.2396
Qinhuangdao 70.7267 -109.8044 -0.4279 -0.4112 95.1872 -175.7978 -0.4089 -0.3928
Handan 107.1068 -73.4243 -0.2862 -0.2749 111.7417 -159.2433 -0.3704 -0.3558
Xingtai 53.6282 -126.9029 -0.4946 -0.4752 63.7797 -207.2053 -0.4819 -0.4630
Baoding 90.2496 -90.2815 -0.3519 -0.3381 98.0177 -172.9673 -0.4023 -0.3865
Zhangjiakou 79.6580 -100.8731 -0.3931 -0.3777 90.0218 -180.9632 -0.4209 -0.4044
Chengde 32.5821 -147.9490 -0.5766 -0.5540 49.8293 -221.1557 -0.5144 -0.4942
Cangzhou 44.3561 -136.1750 -0.5307 -0.5099 48.9701 -222.0149 -0.5164 -0.4961
Langfang 29.5879 -150.9432 -0.5883 -0.5652 46.6539 -224.3311 -0.5218 -0.5013
Hengshui 24.5229 -156.0082 -0.6080 -0.5842 38.2976 -232.6874 -0.5412 -0.5200
Mean 180.5311 0.0000 0.0000 0.0000 270.9850 0.0000 0.0000 0.0000
σ 256.5845 256.5845 1.0000 0.9608 429.9496 429.9496 1.0000 0.9608
s 267.0616 267.0616 1.0408 1.0000 447.5057 447.5057 1.0408 1.0000

Table 5. Spatial distance matrix (dij) of Beijing-Tianjin-Hebei cities based on traffic mileage.

City Beijing Tianjin Shijiazhuang Tanshan Qinhuangdao Handan Xingtai Baoding Zhangjiakou Chengde Cangzhou Langfang Hengshui
Beijing 0 160.8855 321.7625 185.4770 288.9055 479.9810 430.2520 187.1300 198.1975 194.5940 233.4440 83.2755 299.7580
Tianjin 160.8855 0 344.5825 101.4105 242.6355 454.8400 425.3890 201.9420 332.9375 280.6470 138.6135 86.1555 259.8555
Shijiazhuang 321.7625 344.5825 0 423.7510 568.1560 167.2815 114.0840 138.9090 430.8215 506.6400 221.7565 283.2495 142.5935
Tanshan 185.4770 101.4105 423.7510 0 151.3880 547.4205 517.8910 289.5120 376.8000 185.3500 215.0285 144.6130 352.4360
Qinhuangdao 288.9055 242.6355 568.1560 151.3880 0 711.7120 662.2960 433.9170 481.3360 222.2030 375.5205 292.9180 508.4835
Handan 479.9810 454.8400 167.2815 547.4205 711.7120 0 53.4600 296.7465 606.6940 664.8585 335.0465 440.4685 214.2995
Xingtai 430.2520 425.3890 114.0840 517.8910 662.2960 53.4600 0 245.8830 557.3515 615.1295 299.4430 391.1260 167.0325
Baoding 187.1300 201.9420 138.9090 289.5120 433.9170 296.7465 245.8830 0 278.0950 372.0075 150.5130 147.8300 144.8405
Zhangjiakou 198.1975 332.9375 430.8215 376.8000 481.3360 606.6940 557.3515 278.0950 0 372.8730 411.7425 257.5700 455.2955
Chengde 194.5940 280.6470 506.6400 185.3500 222.2030 664.8585 615.1295 372.0075 372.8730 0 407.1040 259.8085 495.3555
Cangzhou 233.4440 138.6135 221.7565 215.0285 375.5205 335.0465 299.4430 150.5130 411.7425 407.1040 0 149.7245 140.0620
Langfang 83.2755 86.1555 283.2495 144.6130 292.9180 440.4685 391.1260 147.8300 257.5700 259.8085 149.7245 0 237.8790
Hengshui 299.7580 259.8555 142.5935 352.4360 508.4835 214.2995 167.0325 144.8405 455.2955 495.3555 140.0620 237.8790 0

3.2 Calculation results

For the data of two years and two statistics, i.e., local Moran index and local Geary coefficient, three sets of calculation results are given, respectively. The calculation process is simple, easy to understand, and the author’s calculations can be repeated by readers using Microsoft Excel (See S1 and S2 Datasets). For the local spatial statistics defined by Anselin [19], the first set of local Moran index is expressed as MI1, the second set of local Moran index as MI2; the first set of local Geary coefficients is expressed as GC1, and the second set of local Geary coefficients is written as GC2. Accordingly, the modified local Moran index and Geary coefficient are expressed as MI3 and GC3, respectively (Fig 1). The results are as follows. First, the ratio of MI1 to MI2 is not a constant, and the ratio of GC1 to GC2 is also not a constant. This proves that the two sets of local Moran indices and the two sets of local Geary coefficients of Anselin [19] are not equivalent to one another; Secondly, the ratio of MI1 to MI3 is a constant, and the ratio of GC1 to GC3 is also a constant. It is proved that the first set of local Moran index of Anselin [19] is equivalent to the modified local Moran index in this paper, and the first set of local Geary coefficient of Anselin [19] is also equivalent to the modified local Geary coefficient of this paper (Tables 6 and 7). The reason is that the first set of local Moran index and local Geary coefficient defined by Anselin [19] are based on symmetric spatial contiguity matrix. The modified statistics in this paper are based on the globally normalized spatial weight matrix which is symmetric, while the second set of local Moran index and local Geary coefficient defined by Anselin [19] are based on the locally normalized spatial weight matrix, in which the symmetry is broken.

Fig 1. A schematic flowchart of the conversion relationship from Moran’s index to different types LISAs.

Fig 1

(Note: Moran’s index is taken as an example in this figure. By analogy, we can know the conversion process of the Geary’s coefficient. In fact, using Eqs (42) and (44), we can achieve the numerical conversion between Moran’s index and Geary’s coefficient readily).

Table 6. Comparison of three sets of local Moran index values in two years.

City 2000 2010
Local MI1 Local MI2 Local MI3 MI1/MI2 MI1/MI3 Local MI1 Local MI2 Local MI3 MI1/MI2 MI1/MI3
Beijing -2686.4966 -0.7067 -0.0612 3801.3644 43916.8725 -7140.4536 -0.6690 -0.0579 10673.67042 123312.1000
Tianjin -387.0133 -0.0951 -0.0088 4071.1117 43916.8725 -1175.2192 -0.1028 -0.0095 11431.08104 123312.1000
Shijiazhuang -23.1481 -0.0068 -0.0005 3385.2705 43916.8725 -14.4935 -0.0015 -0.0001 9505.340198 123312.1000
Tanshan -121.7919 -0.0343 -0.0028 3547.3310 43916.8725 -603.5770 -0.0606 -0.0049 9960.382257 123312.1000
Qinhuangdao -142.9763 -0.0607 -0.0033 2356.2158 43916.8725 -379.2385 -0.0573 -0.0031 6615.906335 123312.1000
Handan 170.5561 0.0533 0.0039 3202.3026 43916.8725 594.8129 0.0662 0.0048 8991.593275 123312.1000
Xingtai 185.0124 0.0511 0.0042 3618.1153 43916.8725 637.3519 0.0627 0.0052 10159.13409 123312.1000
Baoding -92.0058 -0.0244 -0.0021 3771.5181 43916.8725 -335.7750 -0.0317 -0.0027 10589.86662 123312.1000
Zhangjiakou -231.9379 -0.1057 -0.0053 2194.2630 43916.8725 -708.7104 -0.1150 -0.0057 6161.166944 123312.1000
Chengde -363.3994 -0.1476 -0.0083 2461.9446 43916.8725 -889.9662 -0.1287 -0.0072 6912.777246 123312.1000
Cangzhou -194.7349 -0.0538 -0.0044 3620.4838 43916.8725 -561.9455 -0.0553 -0.0046 10165.78443 123312.1000
Langfang -1369.3138 -0.3073 -0.0312 4455.7783 43916.8725 -3399.6518 -0.2717 -0.0276 12511.16811 123312.1000
Hengshui 27.8793 0.0081 0.0006 3431.1735 43916.8725 120.3620 0.0125 0.0010 9634.229089 123312.1000
Sum -5229.3702 -1.4299 -0.1191 43916.8725 570919.3421 -13856.5039 -1.3523 -0.1124 123312.1000 1603057.3005
Expected -5229.3702 -1.5480 -0.1191 43916.8725 570919.3421 -13856.5039 -1.4608 -0.1124 123312.1000 1603057.3005

Table 7. Comparison of three sets of local Geary coefficient values in two years.

City 2000 2010
Local GC1 Local GC2 Local GC3 GC1/GC2 GC1/GC3 Local GC1 Local GC2 Local GC3 GC1/GC2 GC1/GC3
Beijing 41036.8054 10.7953 0.4313 3801.3644 95153.2237 113754.5272 10.6575 0.4258 10673.6704 267176.2168
Tianjin 12819.0307 3.1488 0.1347 4071.1117 95153.2237 37929.2182 3.3181 0.1420 11431.0810 267176.2168
Shijiazhuang 2908.7705 0.8592 0.0306 3385.2705 95153.2237 8029.3420 0.8447 0.0301 9505.3402 267176.2168
Tanshan 5340.6947 1.5056 0.0561 3547.3310 95153.2237 15962.5572 1.6026 0.0597 9960.3823 267176.2168
Qinhuangdao 3628.6681 1.5400 0.0381 2356.2158 95153.2237 10073.4191 1.5226 0.0377 6615.9063 267176.2168
Handan 2044.0978 0.6383 0.0215 3202.3026 95153.2237 5920.6445 0.6585 0.0222 8991.5933 267176.2168
Xingtai 2655.7337 0.7340 0.0279 3618.1153 95153.2237 7227.0101 0.7114 0.0270 10159.1341 267176.2168
Baoding 5080.6946 1.3471 0.0534 3771.5181 95153.2237 14731.9805 1.3911 0.0551 10589.8666 267176.2168
Zhangjiakou 4499.9163 2.0508 0.0473 2194.2630 95153.2237 12851.4607 2.0859 0.0481 6161.1669 267176.2168
Chengde 5353.0964 2.1743 0.0563 2461.9446 95153.2237 14332.0819 2.0733 0.0536 6912.7772 267176.2168
Cangzhou 5400.0965 1.4915 0.0568 3620.4838 95153.2237 15101.1057 1.4855 0.0565 10165.7844 267176.2168
Langfang 13324.4547 2.9904 0.1400 4455.7783 95153.2237 35822.5797 2.8632 0.1341 12511.1681 267176.2168
Hengshui 4161.8231 1.2129 0.0437 3431.1735 95153.2237 10946.6401 1.1362 0.0410 9634.2291 267176.2168
Sum 108253.8824 30.4883 1.1377 43916.8725 1236991.9079 302682.5671 30.3506 1.1329 123312.1000 3473290.8178
Expected 108253.8824 32.0446 1.1377 43916.8725 1236991.9079 302682.5671 31.9099 1.1329 123312.1000 3473290.8178

Using the calculation results, we can verify two key equations. The relationship between the sum of the first set of local Moran indexes and the global Moran index satisfies Eq (8), and the relationship between the sum of the first set of local Geary coefficients and the global Geary coefficient satisfies Eq (26). However, the relationship between the sum of the second set of local Moran indexes and the global Moran index does no satisfy Eq (16), and the relationship between the sum of the second set of local Geary coefficients and the global Geary coefficient does not satisfy Eq (27). The sum of spatial contiguity matrices is V0 = 0.6671. In 2000, the population variance of city population in Beijing-Tianjin-Hebei region is σ2 = 65835.5974, thus γ = σ2V0 = 43916.8725, the global Moran index is I = -0.1191, and the sum of the first set of local Moran indexes is ∑Ii* = -5229.3702 = γI = 43916.8725*(-0.1191). On the other hand, n = 13, γc = 2nV0/(n-1) = 1.4453, and the global Geary coefficient is C = 1.1377, so the sum of the first set of local Geary coefficients is ∑Ci* = 108253.8824 = γcσ2C = 1.4453*65835.5974*1.1377. However, the sum of the second set of local Moran indices is ∑Ii** = -1.4299, while n*I = 13*(-0.1191) = -1.5480. The two values are not equal to one another (-1.4299≠-1.5480). The sum of the second set of local Geary coefficients is ∑Ci** = 30.4883, and 2n2*C/(n-1) = 28.1667*1.1377 = 32.0446. The two values are not equal to one another (30.4883≠32.0446). These results indicate that, based on the conventional formula for the second sets of LISA, Anselin’s [19] second basic requirement cannot be met. The sum of the third set of local Moran index is equal to the global Moran index, the ratio of the first set of local Moran indexes to the corresponding third set of local Moran indexes is γ = σ2V0 = 43916.8725, which is a constant; the sum of the third set of local Geary coefficients equals the global Geary coefficient, and the ratio of the first set of local Geary coefficients to the corresponding third set of local Geary coefficient is γcσ2 = 1.4453* 65835.5974 = 95153.2237 is a constant (Tables 6 and 7). This suggests that, based on improved formulae, Anselin’s [19] second basic requirement can be met by the calculation results.

The calculation result of one year may be regarded as an isolated case, so we might as well take a look at the situation in 2010. Based on the 6th census data, the population variance of Beijing-Tianjin-Hebei city population is σ2 = 184856.6464, thus γ = σ2V0 = 123312.1000, the global Moran index is I = -0.1124, and the sum of the first set of local Moran indexes is ∑Ii* = -13856.5039 = γI = 123312.1000*(-0.1124). On the other hand, γc = 1.4453, and the global Geary coefficient is C = 1.1329, so the sum of the first set of local Geary coefficients is ∑Ci* = 302682.5671 = γcσ2C = 1.4453*184856.6464*1.1329. However, the sum of the second set of local Moran indices is ∑Ii** = -1.3523, while n*I = 13*(-0.1124) = -1.4608 (Fig 2(A)). The two numbers are not equal to each other (-1.3523≠-1.4608). The sum of the second set of local Geary coefficients is ∑Ci** = 30.3506, and 2n2*C/(n-1) = 28.1667*1.1329 = 31.9099. The two numbers are not equal to each other (30.3506≠31.9099). These results once again indicate that Anselin’s [19] second basic requirement cannot be satisfied through common formula. The sum of the third set of local Moran index is equal to the global Moran index, the ratio of the first set of local Moran indexes to the corresponding numbers in the third set of local Moran index is γ = σ2V0 = 123312.1000 (Fig 2(B)); the sum of the third set of local Geary coefficients equals the global Geary coefficient, and the ratio of the first set of local Geary coefficient to the corresponding third set of local Geary coefficient is γcσ2 = 1.4453* 184856.6464 = 267176.2168 is a constant (Tables 6 and 7). This suggests that, based on new formulae, Anselin’s [19] second basic requirement can be satisfied once again by the calculation results. It can be seen that the calculation results of the two years fully support the previous theoretical inferences and related judgments.

Fig 2. The relationships between three sets of local Moran’s indexes of BTH cities in 2010.

Fig 2

(a) MI2 vs MI1 (high correlation). (b) 2MI3 vs MI1 (perfect fit) (Note: The second set of local Moran’s indexes (MI2) are highly correlated with the first local Moran’s indexes (MI1), but not equivalent to one another. The third set of local Moran’s indexes (MI3) is equivalent to the first set of local Moran’s indexes (MI1). The coefficient 1/γ = 1/123312.1000 = 0.000008110. MI2 does not satisfy the second requirement for LISAs given by Anselin [19]).

4 Questions and discussion

The re-expressed local Moran indexes and the local Geary coefficients in this work are derived from Anselin’s correct definition and relationship, without substantial innovation. The contribution of this study lies in three aspects. First, it clarifies a series of logical misunderstandings of local spatial autocorrelation statistics and gives the correct expressions. Second, it normalizes the local spatial autocorrelation statistics, and the canonical results are helpful for more convenient application. Third, it clarifies a number of fundamental concepts related to spatial autocorrelation that have long been confused in literature. In terms of the tradition of statistics, important concepts and their symbols have been distinguished. Especially, it emphasizes the distance axiom hidden behind the spatial weight matrix. If the spatial contiguity matrix is normalized by row, the locally normalized spatial weight matrix will bear a different mathematical structure from the non-normalized spatial weight matrix and the globally normalized spatial weight matrix by sum. Applying the results derived from the models based on non-normalized spatial weight matrix to the relation formulae based on row-normalized spatial weight matrix results in wrong mathematical expressions. Generally speaking, spatial contiguity matrix is of symmetry. Therefore, non-normalized spatial weight matrix and globally normalized spatial weight matrix are symmetric. Substitution of symmetric spatial weight matrix with asymmetric spatial weight matrix leads to two wrong relations: First, the sum of local Moran index based on standardized variable and local normalized weight matrix is equal to n times of global Moran index; Second, the sum of local Geary coefficients based on standardized variable and local normalized weight matrix is equal to 2n2/(n-1) times of global Geary coefficient. In fact, the two relations can never be derived from Anselin’s original assumptions.

The errors based on the wrong relations are not too significant in many cases, but the results have a far-reaching impact on geographical analysis. Concretely speaking, these incorrect relationships lead to a series of problems (Table 8): (1) The relationship between the definitions of two local Moran indexes is broken (not equivalent to each other). The first set of local LISA is based on symmetric spatial adjacency matrix, and the second set is based on asymmetric spatial weight matrix normalized by row. As a result, the ratio of the values of the two sets of parameters is not a constant. (2) When defining the local spatial autocorrelation index, we only consider the relationship between one element and other elements. The pairwise correlation between all elements is ignored. That is, for the local index of the ith geographical element, only the relationships between element i and element j are taken into account, the relationships between element j and element k are neglected (i, j, k = 1,2,3,…,n). In this case, the wholeness of a geographical system is overlooked in the local spatial analysis. (3) The absolute value of the local Moran index may exceed 1, thus decoupling from the concept of correlation coefficient. Moran’s index was proposed by analogy with Pearson correlation. The values of Moran’s index comes between -1 and 1. (4) The parameters are lack of clear boundary value and critical value. The absolute boundary values of Moran index is -1 and 1. The critical value is 0 in theory and 1/(1-n) in experience. The boundary values of the Geary coefficient are 0 and 2, and the critical value is theoretically 1. In addition, Anselin [19] used the population standard deviation to replace the sample standard deviation when defining the local Geary coefficient. Where logic is concerned, no problem; while where history is concerned, there is problem: the result violates the original intention of the definition of Geary coefficient. In spatial analysis, it is sometimes difficult to distinguish between spatial samples and spatial populations. Moran’s index, which is derived from Pearson correlation coefficient, as indicated above, is a statistics based on population standard deviation. Geary’s coefficient is defined by analogy with Durbin-Watson statistics based on sample standard deviation in order to make up for the deficiency of Moran’s index. To define the local Geary coefficient, we should respect the original meaning of the definition of the Geary coefficient, so that the local Geary coefficient can be effectively associated with the global Geary coefficient. From the existing literature, some readers have found Anselin’s mistakes. Some scholars adopt a compromise approach. For example, they use the global normalized spatial weight matrix instead of the local normalized spatial weight matrix by row, but multiply n in front of the corrected local Moran index calculation formula—I found this kind of treatment in some teaching courseware. This ensures that the sum of local Moran indexes is equal to n times the global Moran index.

Table 8. Functions and problems of Anselin’s LISA and the improved effect of this paper.

Definer Variable Statistic Function Advantages and disadvantages
Anselin Central variable and non-normalized symmetric contiguity matrix First local Moran’s I Reflect local spatial dependence Simple but lack of clear boundary value and critical value (reference value)
First local Geary’s C Reflect local spatial dependence Simple but lack of clear boundary value and critical value (reference value)
Standard variable and row-normalized asymmetric weight matrix Second Moran’s I Reflect local spatial dependence from the perspective of population Decoupled from the first definition of local Moran’s I; Decoupling from correlation coefficient; The relationships between two elements in the system is ignored
Second Geary’s C Reflect local spatial dependence from the perspective of population Decoupled from the first definition of local Geary’s C; Decoupling from the analogy with the Durbin-Watson statistic; The relationships between two elements in the system is ignored; sample standard deviation is replaced by population standard deviation
This paper Standardized variable and global normalized symmetric weight matrix Third Moran’s I Reflect local spatial dependence from the perspective of population Equivalent to the first definition of local Moran’s I; Linked to correlation coefficient; The spatial relationship of other elements other than the target geographical elements is considered; There are clear boundary values and critical values
Third Geary’s C Reflect local spatial dependence from the perspective of samples Equivalent to the first definition of local Geary’s C; Linked to generalized Durbin-Watson statistics; The spatial relationship of other elements other than the target geographical elements is considered; Return to the sample analysis perspective of global Geary coefficient; There are clear boundary values and critical values

As we know, Anselin is a well-known outstanding scholar in the field of geographical spatial analysis. Due to the far-reaching influence of Anselin’s work, its logical errors caused confusion in its application and interpretation. Science respects logic and facts, not authority—only pseudoscience starts from authoritative judgment. In order to solve the above problems, this paper carries out the following processing in the process of mathematical deduction: First, return to the essence of the spatial distance matrix behind the spatial weight matrix, and respect the basic distance axiom. The global spatial weight matrix is obtained by global normalization of spatial contiguity matrix. The globally normalized spatial weight matrix is used to replace Anselin’s row-normalized weight matrix. In this way, the connotation of the concept before and after is unified and the logic is consistent, so as to avoid reasoning mistakes. Second, start from the original idea of Moran’s index and Geary’s coefficient. The normalized local Moran’s index is defined, and the population standard deviation is used to standardize the size variable; the normalized local Geary’s coefficient is defined, and the sample standard deviation is used to standardize the size variable. Third, start from the original intention of Anselin [19]. Anselin gives two sets of local Moran’s index and local Geary’s coefficient. But there is inconsistency between them. By examining the reasoning process, we can find that the reason for the error lies in the logic error caused by the unintentional concept replacement. According to the sign system and simplification principle of this paper, we transform Anselin’s second set of local Moran index and local Geary coefficient formulae. Comparing the two sets of results, we can see the problems and thus understand the similarities and differences between the two sets of formulae (Tables 8 and 9).

Table 9. Comparison of between normalized LISA and the equivalent transformation results of Anselin’s second set of LISA definitions.

Category Measure Definition in this paper Anselin’s definition
Moran’s I Global Moran’s I I=i=1nj=1nwijzizj=zTWz I=i=1nj=1nwijzizj
Local Moran’s I Ii=zij=1nwijzj Ii=ziVij=1nvijzj
Sum of local Moran’s I i=1nIi=I i=1nIinI
Geary’s C Global Geary’s C C=12i=1nj=1nwij(zi*zj*)2 C=12i=1nj=1nwij(zi*zj*)2
Local Geary’s C Ci=12j=1nwij(zi*zj*)2=n12n(j=1nwij(zi2+zj2)21 Ci=1Vij=1nvij(zizj)
Sum of Local Geary’s C i=1nCi=C=n1n(eTWz2I) i=1nCi2n2n1C

Note: For comparison, Anselin’s definitions are transformed and re-expressed with new symbols. However, the new expressions are completely equivalent to Anselin’s original expressions.

Finally, it is appropriate to briefly discuss the definition of spatial weight matrix. Spatial autocorrelation analysis depends on spatial contiguity matrix, which has multiple definitions. In fact, definition of spatial contiguity involves different spatial effects. Spatial effects of geographical processes fall into two categories: action at a distance and local action [37]. Local action can be expressed with step function in mathematics and nominal variable in value. In spatial autocorrelation analysis, the spatial contiguity matrix based on local action is mainly applicable to relationships between regions. The spatial contiguity relationship of regions bears three ways of definitions, that is, Rook’s contiguity, Bishop’s contiguity, and Queen’s contiguity [38]. Rook’s contiguity plus Bishop’s contiguity yields Queen’s contiguity. In fact, Rook’s contiguity corresponds to von Neumann’s neighborhood definition, while Queen’s contiguity corresponds to Moore’s neighborhood definition [39]. Action at a distance can be reflected by certain distance, including Euclidean distance, travel time, transportation mileage and so on. When converting distances into spatial contiguity matrix, a certain spatial contiguity function needs to be adopted. Common spatial contiguity functions include absolute step function, relative step function, exponential function, and distance inverse function (a type of hyperbolic function) [6, 12, 27]. Distance-based spatial contiguity matrix is suitable for networks of locations such as urban systems. In this case, based on the step function, spatial contiguity is represented by nominal variable (dummy variable in discrete format); based on other functions, the spatial contiguity is represented by metric variable (continuous variable). Although the function expressions are different, the logic behind them is consistent with one another. Mathematics is the pinnacle of logic. In mathematics, the most basic function is exponential function. Various forms of simple functions can be reduced to exponential function. The step function is an extreme form of an exponential function, and moving average on the step function can yield an inverse distance function [40]. So, using different functions to define spatial contiguity matrices will definitely affect the calculation results, but it has no impact on the mathematical reasoning results and the logical relationships behind them. The reason why row normalization weight matrix affects mathematical reasoning results is because the logic behind the spatial weight matrix has been changed, and the logic is regulated by the distance axiom. Scientific research typically involves three worlds: the real world, the mathematical world, and the computational world [41]. The process of mathematical transformation and derivation belongs to the mathematical world, while the selection of spatial weight matrix forms belongs to the computational world. The key is to choose the appropriate spatial contiguity matrix definition method for different geographic systems based on different situations [27]. One obvious drawback of this study is the lack of empirical analysis based on different types of spatial weight matrices. Therefore, the influence of types and structure of spatial contiguity matrixes on theoretical modelling and computational results of spatial autocorrelation appears hollow.

5 Conclusions

The global spatial autocorrelation coefficients reflect the sum of any two geographical elements in a region, while the local spatial autocorrelation indexes reflect the sum of correlation between a geographical element and all other geographical elements. The sum of parts is proportional to the whole. The first set of local Moran indexes and Geary coefficients defined by Anselin [19] is effective and consistent with the idea of global Moran index and Geary coefficient. However, the second set of local Moran indexes and local Geary coefficients defined by him are not equivalent to the first set of parameters. The non-normalized spatial weight matrix is isomorphic to the sum-based normalized spatial weight matrix, but not isomorphic to the row-based normalized spatial weight matrix. The derived results based on non-normalized spatial weight matrix cannot be directly applied to the mathematical relations based on row-normalized spatial weight matrix. The key issue rests that Anselin [19] directly applied the derived results based on the non-normalized spatial weight matrix to the relationship formula based on the row-normalized spatial weight matrix. This paper is devoted to correcting the unintentional mistakes in his reasoning process and gives the third set of definitions of local Moran indexes and local Geary coefficient in canonical forms. The newly-defined local Moran index and local Geary coefficient are simple and concise. The improved expressions are consistent with the original intention of Anselin [19] and the statistical essence of global Moran index and global Geary coefficient.

Local spatial autocorrelation analysis is a methodology developed on the basis of global spatial autocorrelation analysis. The progress of science has no end. The main points of this paper are summarized as follows. Firstly, the LISA defined in literature is of great significance for analysis of local spatial autocorrelation, but there are also some faults. The first set of LISA is based on the definition of centralized variables and non-normalized spatial contiguity matrix, lacking clear boundary values and critical value. The second set of local LISA is based on the definitions of standardized variables and row-normalized spatial weight matrix, which ignores the global relationship behind the local analysis. One of the results is that the two sets of indexes are not equivalent to one another. In addition, the population standard deviation is adopted when defining the second local Geary coefficients, which violates the original intention of Geary coefficient. All the indexes lack clear boundary values and critical value, and they are uncoupled from the correlation coefficient. One consequence is that the analysis process is complex; the other is that the conclusions drawn from the two sets of indexes are often inconsistent with each other. Secondly, the LISA expression is reconstructed by using the sum-normalized spatial weight matrix and standardized size variables based on z-score to eliminate the defects of Anselin’s LISA definition. By doing so, we have canonical spatial autocorrelation measurements. The sum-based globally normalized spatial weight matrix is used to replace the row-based locally normalized spatial weight matrix. The population standard deviation is used to standardize the variables when defining the local Moran indexes, and the sample standard deviation is used to standardize the variables when defining the local Geary coefficient. The local LISA problem of Anselin [19] can be solved effectively and the results are more concise and simpler. The results given in this paper are equivalent to those given by Anselin’s first set of formulas, i.e. first sets of local Moran index and local Geary coefficient, but they are not linearly proportional to the results of the second set of formulas, namely the second sets of local Moran index and local Geary coefficient.

Supporting information

S1 File. Anselin’s derivation and expressions for LISA.

This is a microcosm of Anselin’s paper on LISA. The key parts of Anselin’s mathematical reasoning are extracted, and the main errors in the reasoning process are revealed. This file uses Anselin’s original symbol system. Through this file, readers can more easily grasp the essence of the problem.

(DOCX)

pone.0303456.s001.docx (106.4KB, docx)
S2 File. Value transformation methods and formulae.

This file show common concepts and methods of value transformation and corresponding formulae for variable standardization. This document clarifies some confusion and inappropriate expressions regarding variable standardization in the literature.

(DOCX)

pone.0303456.s002.docx (37.6KB, docx)
S1 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2000.

This file includes the dataset of spatial distances and city population in 2000, global Moran’s indexes and Geary’s coefficients, three sets of local Moran’s index, and three sets of local Geary’s coefficients. The original data and calculation process are displayed for readers.

(XLSX)

pone.0303456.s003.xlsx (89.8KB, xlsx)
S2 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2010.

This file includes the dataset of spatial distances and city population in 2010, global Moran’s indexes and Geary’s coefficients, three sets of local Moran’s index, and three sets of local Geary’s coefficients. All the results are tabulated for comparison and references.

(XLSX)

pone.0303456.s004.xlsx (86.6KB, xlsx)

Acknowledgments

My student, Dr. Yuqing Long, has extracted spatial distance matrix data from the Beijing Tianjin Hebei urban network map for me, and I would like to express my gratitude. I would like to thank the anonymous reviewer and Dr. Yuxia Wang whose interesting and constructive comments were very helpful in improving the quality of this paper. The academic editor, Dr. Yuxia Wang, put in tremendous effort to invite reviewers for this paper, and I am particularly grateful for it.

Data Availability

The data underlying the results presented in the study are available from the supporting information files.

Funding Statement

The project is funded by the National Natural Science Foundation of China (42171192). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Hartshorne R. Perspective on the Nature of Geography. Chicago: Rand McNally & Company; 1959. [Google Scholar]
  • 2.Hu ZL, Chen YG, Liu T. Three laws of the changes in economic geography. Economic Geography. 2018; 38(10): 1–4 [In Chinese]. [Google Scholar]
  • 3.Martin GJ. All Possible Worlds: A History of Geographical Ideas (4th Revised Edition). New York, NY: Oxford University Press; 2005. [Google Scholar]
  • 4.Schaefer FK. Exceptionalism in geography: a methodological examination. Annals of the Association of American Geographers. 1953; 43: 226–249. [Google Scholar]
  • 5.Griffith DA. Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization. Berlin: Springer; 2003. [Google Scholar]
  • 6.Haggett P, Cliff AD, Frey A. Locational Analysis in Human Geography. London: Edward Arnold Ltd.; 1977. [Google Scholar]
  • 7.Geary RC. The contiguity ratio and statistical mapping. The Incorporated Statistician. 1954; 5: 115–145. [Google Scholar]
  • 8.Moran PAP. The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B. 1948; 37(2): 243–251. [Google Scholar]
  • 9.Moran PAP. Notes on continuous stochastic phenomena. Biometrika. 1950; 37: 17–33. [PubMed] [Google Scholar]
  • 10.Cliff AD, Ord JK. Spatial Autocorrelation. London: Pion Limited; 1973. [Google Scholar]
  • 11.Cliff AD, Ord JK. Spatial Processes: Models and Applications. London: Pion Limited; 1981. [Google Scholar]
  • 12.Odland J. Spatial Autocorrelation. London: SAGE Publications; 1988. [Google Scholar]
  • 13.Anselin L. The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Fischer M, Scholten HJ, Unwin D (eds.). Spatial Analytical Perspectives on GIS. London: Taylor & Francis; 1996. pp.111–125. [Google Scholar]
  • 14.Tobler W. A computer movie simulating urban growth in the Detroit region. Economic Geography. 1970; 46(2): 234–240. [Google Scholar]
  • 15.Tobler W. On the first law of geography: A reply. Annals of the Association of American Geographers. 2004; 94(2): 304–310. [Google Scholar]
  • 16.Fotheringham AS. Trends in quantitative methods I: Stressing the Local. Progress in Human Geography. 1997; 21: 88–96. [Google Scholar]
  • 17.Fotheringham AS. Trends in quantitative method Ⅱ: Stressing the computational. Progress in Human Geography. 1998; 22: 283–292. [Google Scholar]
  • 18.Fotheringham AS. Trends in quantitative methods III: Stressing the visual. Progress in Human Geography. 1999; 23(4): 597–606. [Google Scholar]
  • 19.Anselin L. Local indicators of spatial association—LISA. Geographical Analysis. 1995; 27(2): 93–115. [Google Scholar]
  • 20.Getis A, Aldstadt J. Constructing the spatial weights matrix using a local statistic. Geographical Analysis. 2004; 36 (2): 90–104. [Google Scholar]
  • 21.Getis A, Ord JK. An analysis of spatial association by use of distance statistic. Geographical Analysis. 1992; 24(3):189–206. [Google Scholar]
  • 22.Ord JK, Getis A. Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis. 1995; 27(4): 286–306. [Google Scholar]
  • 23.Goodchild MF. GIScience, geography, form, and process. Annals of the Association of American Geographers. 2004; 94(4): 709–714. [Google Scholar]
  • 24.de Jong P, Sprenger C, van Veen F. on extreme values of Moran’s I and Geary’s C. Geographical Analysis. 1984; 16(1): 985–999. [Google Scholar]
  • 25.Tiefelsdorf M, Boots B. The exact distribution of Moran’s I. Environment and Planning A. 1995; 27(6): 985–999. [Google Scholar]
  • 26.Xu F. Improving spatial autocorrelation statistics based on Moran’s index and spectral graph theory. Urban Development Studies. 2021; 28(12): 94–103 [In Chinese]. [Google Scholar]
  • 27.Chen YG. On the four types of weight functions for spatial contiguity matrix. Letters in Spatial and Resource Sciences. 2012; 5(2): 65–72. [Google Scholar]
  • 28.Getis A. Spatial weights matrices. Geographical Analysis. 2009; 41(4): 404–410. [Google Scholar]
  • 29.Chen YG. New approaches for calculating Moran’s index of spatial autocorrelation. PLoS ONE. 2013; 8(7): e68336. doi: 10.1371/journal.pone.0068336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Chen YG. Spatial autocorrelation approaches to testing residuals from least squares regression. PLoS ONE. 2016; 11(1): e0146865. doi: 10.1371/journal.pone.0146865 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Magnello E, van Loon B. Introducing Statistic: A Graphic Guide. London: Icon Books; 2009. [Google Scholar]
  • 32.Chen YG. Spatial autocorrelation equation based on Moran’s index. Scientific Reports. 2023; 13: 19296. doi: 10.1038/s41598-023-45947-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wheelan C. Naked Statistics: Stripping the Dread from the Data. New York and London: W. W. Norton & Company; 2013. [Google Scholar]
  • 34.Taylor PJ. Quantitative Methods in Geography. Prospect Heights, Illinois: Waveland Press; 1983. [Google Scholar]
  • 35.Louf R, Barthelemy M. Scaling: lost in the smog. Environment and Planning B: Planning and Design. 2014; 41: 767–769. [Google Scholar]
  • 36.Long YQ, Chen YG. Multi-scaling allometric analysis of the Beijing-Tianjin-Hebei urban system based on nighttime light data. Progress in Geography. 2019; 38(1): 88–100 [In Chinese]. [Google Scholar]
  • 37.Chen YG, Li YJ, Feng S, Man XM, Long YQ. Gravitational scaling analysis on spatial diffusion of COVID-19 in Hubei province, China. PLoS ONE. 2021; 16(6): e0252889. doi: 10.1371/journal.pone.0252889 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Widip CA, Utomo WH, Yulianto SJP. Identification of spatial patterns of food insecurity regions using Moran’s I (Case study: Boyolali regency). International Journal of Computer Applications. 2013; 72(2): 54–62. [Google Scholar]
  • 39.Batty M, Couclelis H, Eichen M. Urban systems as cellular automata. Environment and Planning B: Planning and Design. 1997; 24: 159–164. [Google Scholar]
  • 40.Chen YG. Power-law distributions based on exponential distributions: Latent scaling, spurious Zipf’s law, and fractal rabbits. Fractals. 2015; 23(2): 1550009. [Google Scholar]
  • 41.Casti JL. Would-Be Worlds: How Simulation Is Changing the Frontiers of Science. New York: John Wiley and Sons; 1996. [Google Scholar]

Decision Letter 0

Yuxia Wang

21 Feb 2024

PONE-D-23-35394Reconstruction and Normalization of LISA for Spatial AnalysisPLOS ONE

Dear Dr. Chen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.  

Dear authors, I continued inviting around 30 reviewers but only received one comments. To ensure a timely review, I served as another reviewer. Please the suggestions and comments.

Please submit your revised manuscript by Apr 06 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Yuxia Wang

Academic Editor

PLOS ONE

Journal requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

2. Note from Emily Chenette, Editor in Chief of PLOS ONE, and Iain Hrynaszkiewicz, Director of Open Research Solutions at PLOS: Did you know that depositing data in a repository is associated with up to a 25% citation advantage (https://doi.org/10.1371/journal.pone.0230416)? If you’ve not already done so, consider depositing your raw data in a repository to ensure your work is read, appreciated and cited by the largest possible audience. You’ll also earn an Accessible Data icon on your published paper if you deposit your data in any participating repository (https://plos.org/open-science/open-data/#accessible-data).

3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. 

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript: 

[This research was sponsored by the National Natural Science Foundation of China (Grant No. 42171192). The support is gratefully acknowledged.]

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. 

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 

 [The author(s) received no specific funding for this work.]

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

5. We note that Figure 1 in your submission contain [map/satellite] images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Figure 1 to publish the content specifically under the CC BY 4.0 license.  

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an ""Other"" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

6. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

The authors conduct a series of rigorous mathematical reasoning of LISA showing that using row-normalized spatial weight matrix would violate the second basic requirement for LISA. As stated by the authors, this is not substantial innovation, but it is helpful in figuring the logic of local spatial autocorrelation statistics. I have some minor comments.

1. Page 4. The spatial contiguity matrix V is not explained in detail. There are many definitions of spatial contiguity matrix, and would Rook, Queen, or distance-based matrix have any difference on the calculation of LISA?

2. Page 4. Is it necessary to stress that i≠j in Equation (1)?

3. Page 4 to Page 5. It might be that I misunderstand something. The z-score normalization is only divided by σ, what is the meaning of σ. It seems that σ is calculate by x. But if we treat x_i-x ® as a whole, the normalization should be based on the σ of x_i-x ®. Compared with equation 1, dividing only by σ. could not be called z-score normalization.

4. Page 6 Table 1. What is the benefit of using the normalized weight matrix instead of the original one?

5. Page 2 In the first paragraph of introduction, “Gravity models, spatial interaction models, and spatial autocorrelation analysis are the main approaches…”, parallel relationship might not be appropriate for gravity model and spatial interaction model since the former is one type of the latter.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The article is an interesting work. Provide clarifications on the local spatial association indicator (LISA) and the local Geary indicator, widely used in the literature. This article aims to reconstruct the calculation formulas of local Moran indices and Geary coefficients through mathematics, presenting corrections or modifications to these indicators. Finally, it presents an application to real data.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-23-35394 End.pdf

pone.0303456.s005.pdf (1.2MB, pdf)
PLoS One. 2024 May 22;19(5):e0303456. doi: 10.1371/journal.pone.0303456.r002

Author response to Decision Letter 0


11 Apr 2024

Please see the attached file entitled "Response to Reviewers"

Attachment

Submitted filename: Response to Academic Editor and Reviewer 2024-03-15.docx

pone.0303456.s006.docx (112.2KB, docx)

Decision Letter 1

Yuxia Wang

25 Apr 2024

Reconstruction and Normalization of LISA for Spatial Analysis

PONE-D-23-35394R1

Dear Dr. Chen,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Yuxia Wang

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Yuxia Wang

10 May 2024

PONE-D-23-35394R1

PLOS ONE

Dear Dr. Chen,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Yuxia Wang

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Anselin’s derivation and expressions for LISA.

    This is a microcosm of Anselin’s paper on LISA. The key parts of Anselin’s mathematical reasoning are extracted, and the main errors in the reasoning process are revealed. This file uses Anselin’s original symbol system. Through this file, readers can more easily grasp the essence of the problem.

    (DOCX)

    pone.0303456.s001.docx (106.4KB, docx)
    S2 File. Value transformation methods and formulae.

    This file show common concepts and methods of value transformation and corresponding formulae for variable standardization. This document clarifies some confusion and inappropriate expressions regarding variable standardization in the literature.

    (DOCX)

    pone.0303456.s002.docx (37.6KB, docx)
    S1 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2000.

    This file includes the dataset of spatial distances and city population in 2000, global Moran’s indexes and Geary’s coefficients, three sets of local Moran’s index, and three sets of local Geary’s coefficients. The original data and calculation process are displayed for readers.

    (XLSX)

    pone.0303456.s003.xlsx (89.8KB, xlsx)
    S2 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2010.

    This file includes the dataset of spatial distances and city population in 2010, global Moran’s indexes and Geary’s coefficients, three sets of local Moran’s index, and three sets of local Geary’s coefficients. All the results are tabulated for comparison and references.

    (XLSX)

    pone.0303456.s004.xlsx (86.6KB, xlsx)
    Attachment

    Submitted filename: PONE-D-23-35394 End.pdf

    pone.0303456.s005.pdf (1.2MB, pdf)
    Attachment

    Submitted filename: Response to Academic Editor and Reviewer 2024-03-15.docx

    pone.0303456.s006.docx (112.2KB, docx)

    Data Availability Statement

    The data underlying the results presented in the study are available from the supporting information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES