Abstract
Spatial association rule mining can reveal the inherent laws of spatial object interdependence and is an important part of spatial data mining. Most of the existing algorithms for mining local spatial association rules are oriented towards the spatial association between two categories of points and cannot fully reflect the spatial heterogeneity of complex spatial relations among multiple categories of points. In addition, the interactions between points in different categories are often asymmetrical. However, the existing algorithms ignore this asymmetry. To address the above problems, an algorithm for mining local spatial association rules for point data of multiple categories based on position quotients is proposed. First, the proximity relationship between points is determined by an adaptive filter, and the spatial weight value is given according to Gaussian kernel function. Then, the multivariate local colocation quotient of each point is calculated to measure the strength of the local regional spatial association rule. Finally, the Monte Carlo simulation function is used to generate a random sample distribution to test the significance of the results. The algorithm is verified on artificial simulation data and real Point of Interest (POI) data. The experimental results show that the algorithm can identify significant association regions of different spatial association rules for point sets.
Keywords: Spatial association rule, Spatial data mining, Colocation rule, Public service facility
1. Introduction
Spatial association rule mining has played an important role in the fields of the ecological environment [1], public social security [2] and economic development. Moreover, spatial association rule mining is a popular topic and has attracted the research interest of geographers. According to the First Law of Geography, everything is related to everything else, but nearby things are more closely related than distant things [3]. The distribution of points in space is not random and is affected by various geographical and nongeographical factors. The phenomenon in which different categories of points in space are interdependent is called spatial association. There is spatial heterogeneity in spatial associations [4]. In most cases, the spatial association is not significant at the global scale but is significant at the local scale. Therefore, mining local spatial association rules has become a new research hotspot in spatial data mining. For example, in terms of the local spatial associations, criminal behaviour often occurs around entertainment places, and this is a strong association. However, the strength of the association between different types of crimes and entertainment places also varies. Crimes such as robbery and violence often occur around entertainment places, while crimes such as burglary and fraud do not have a strong spatial association with these entertainment places. Additionally, a certain type of entertainment place can be spatially associated with multiple types of criminal behaviour. Therefore, mining the local spatial associations of multi-category points has important practical significance.
Recently, most existing local spatial association research methods have focused on the spatial association between the two categories of points and cannot fully reflect the complex spatial relationships of objects. Furthermore, most of the existing methods are based on spatial colocation patterns [5]. However, the methods based on spatial colocation patterns use the default assumption that the interactions between spatial elements are equal, and cannot distinguish the asymmetric interactions between different spatial elements. For example, the spatial distribution of hospitals does not depend on the spatial distribution of pharmacies, while pharmacies are usually concentrated around the hospitals. Therefore, there are asymmetric interactions between hospitals and pharmacies. Exploring the asymmetric spatial interactions between hospitals and pharmacies can aid in location selection for hospitals and pharmacies in urban planning. Therefore, mining the spatial heterogeneity of asymmetric interactions and applying it to various datasets are highly important for spatial association rule mining. In addition, determining adjacent relationships is an important problem in the mining of spatial association rules. In most of methods, adjacent relationships are judged by a global unified threshold. However, neighbourhood relationships may be misjudged with a unified threshold when the points are centrally distributed in local space.
To address the above problems, we propose an algorithm for mining local spatial association rules and construct a new index the multivariate local colocation quotient (MV-LCLQ), to measure the strength of local regional spatial association rules. First, we judge the adjacency relationships of points via an adaptive filter based on K-neighbourhoods to eliminate the influence of spatial statistics as much as possible. Then, the strength of each local spatial association rule is calculated by the MV-LCLQ based on the idea of the location quotient. Finally, the point data are randomly simulated, and the simulated sample distribution of the MV-LCLQ is recalculated. The significance of the MV-LCLQ is tested through the simulated sample distribution.
The remainder of this article is organized as follows: In Section 2, the related research on mining spatial association rules is briefly reviewed. In Section 3, the definition of the problem, the description of the multivariate local equivalence quotient, the significance test of the index and the evaluation of the algorithm are introduced in detail. The verification of the algorithm through simulated data and the mining results of real POI data are presented in Section 4. Finally, conclusions and suggestions for future work are presented in Section 5. Table A below shows the acronyms used in this paper and their meanings. Table B below shows the symbols used in this paper and their meanings.
2. Literature review
In this section, we review the current research status of spatial colocation pattern mining algorithms, spatial association rule mining, and spatiotemporal association rule mining, and we analyse the advantages and disadvantages of existing algorithms.
2.1. Algorithms for mining spatial colocation patterns
A spatial colocation pattern refers to an observed regularity of proximity or group occurrence of objects in space. A spatial colocation pattern is also a special spatial association between features. Methods of mining spatial colocation patterns have been widely used in urban planning [6], industrial layout [7] and urban security [8]. According to the mining scale, existing algorithms for mining spatial colocation patterns can be roughly divided into algorithms for mining global spatial colocation patterns and algorithms for mining local spatial colocation patterns [9], [10].
Algorithms for mining global spatial colocation patterns aim at mining significant spatial colocation patterns in the global scope. Global algorithms can be divided into algorithms based on frequent item sets and density-based algorithms.
Algorithms based on frequent item sets aim to discover association patterns between item combinations that occur more frequently than a predetermined threshold in different geographic spatial regions. In the research on algorithms based on frequent item sets, some scholars have replaced transaction sets with spatial neighbourhoods and proposed the spatial colocation pattern participation index. Spatial colocation patterns are mined by presetting participation index thresholds [11], [12]. Huang et al. incorporated the definition of the cross K function and further proposed the global spatial colocation mode of maximum participation ratio mining [13]. Yoo et al. mined spatial colocation patterns and reduced the time needed to generate spatial colocation instances in a connectionless way [14].
Density-based algorithms utilize high-density areas in the data space to mine spatial association patterns, while considering the distribution characteristics of objects in different spatial regions. In the research on density-based algorithms, several scholars have calculated the network kernel density of spatial features and mined spatial colocation patterns by evaluating the strength of the distribution associations between spatial features [15]. Xiao et al. divided the study area into units and identified spatial colocation instances in the unit with the highest density to judge whether the spatial colocation pattern in the current unit was significant or not [16].
Algorithms based on frequent item sets and the density-based algorithms have achieved good results in mining global spatial association patterns. However, due to spatial heterogeneity [4], spatial colocalization patterns in local regions may not be significant at the global level. Therefore, the application of algorithms based on frequent item sets or density-based methods in mining local spatial association patterns is very limited.
To address the problem that mining global spatial colocation patterns can not effectively identify significant spatial colocation patterns in local regions, scholars have carried out a large number of studies on mining local spatial colocation patterns [17]. In early studies on mining local spatial colocation patterns, the research area was divided into subunits by partitioning, and the local spatial colocation pattern was identified in each subunit by global spatial colocation pattern mining algorithm. Common partitioning methods include quadtree partitioning [18] and K-nearest neighbourhood graph partitioning [19]. At present, local spatial colocation pattern mining methods can be roughly divided into clustering-based local spatial colocation pattern mining methods and visualization-based local spatial colocation pattern mining methods. In the study of clustering-based local spatial colocation pattern mining algorithms, Ding et al. divided the research area into subunits, mined spatial colocation patterns in each subunit, and identified significant regions of local spatial colocation patterns through grid-based clustering [20]. Cai et al. proposed a distributed clustering method to extract candidate data by combining clustering methods with statistical methods and established the statistical significance of the results with the distribution offset correction method to discover significant spatial colocation patterns [21]. In visualization-based methods, some scholars generate density surfaces by kernel density estimation and then mine local spatial colocation patterns by mixing red and blue colours [22], [23]. Some scholars have also proposed an RGB colour mixing model based on two-colour mixing to mine local spatial colocation modes [24]. In addition, to solve the problem that parameter setting in the algorithm may affect the mining results, scholars have introduced fuzzy set theory [25], parameter tests [26] and the Delaunay triangulation network [27] to reduce the influence of parameter setting on spatial colocation pattern mining results.
Existing algorithms for mining spatial colocation patterns can discover spatial colocation patterns at different scales. However, spatial association rules are different from those for spatial colocation mining. A spatial colocation pattern can only reflect the frequent occurrences of different spatial objects or phenomena in adjacent locations, which is a specific spatial association. The results of mining spatial colocation patterns are the colocation associations of spatial points, however, the associations between spatial points are often asymmetric. The one-way dependence of spatial points often leads to spatial associations based on spatial colocation patterns ignores the one-way dependence of spatial points.
2.2. Mining spatial association rules
Association rule mining is an important part of data mining [28], [29], [30]. Spatial association rule mining is significantly different from transaction-based association rule mining. Spatial association rule mining is oriented towards spatial data which often show spatial dependence and spatial heterogeneity. According to the mining scale, spatial association rule mining methods can be divided into global spatial association rule mining and local spatial association rule mining [31].
Global spatial association rule mining algorithms are mostly based on traditional transactions used to mine spatial association rules [32], [33], [34]. For example, Koperski et al. extended the transaction association mining algorithm to spatial association mining and mined spatial association rules by minimum support and minimum confidence [35]. Some scholars discretized spatial relations and attributes and proposed the FP-growth algorithm based on a frequent pattern tree structure to improve the efficiency of spatial association pattern mining [36]. Some scholars reconstructed FP-tree data structures to improve FP-growth algorithm. Although algorithms for mining global spatial association patterns have good effects at the global scale, their application in local spatial association pattern mining is limited due to the complexity of local spatial objects and the spatial heterogeneity of local space.
Regarding early local association rule mining algorithms, some scholars used rough sets and Boolean inference to find association rules from local frequent item sets, and these traditional transaction-based local association pattern mining algorithms have also been applied to spatial and other dimensions [37]. Considering spatial heterogeneity, several new algorithms for local spatial association rule mining have been proposed. Common local methods can be divided into methods based on frequent item sets and methods based on statistics. In the study of methods based on frequent item sets, Jang et al. proposed a region-based frequent-rule growth algorithm to search for association rules in dense areas and discovered spatial association rules in local areas [38]. In the study of methods based on statistics. Cromley et al. proposed the local equivalent method to mine the local spatial association rules between two categories of features by comparing the probability of discovering spatial association rules and the expected probability of discovering spatial association rules in local regions [39]. By applying local Geary's C statistics to mine local spatial association rules, Anselin mined the spatial associations between different features [40]. Wang et al. developed a simulation-based statistical test for the local indicator of the colocation quotient (LCLQ) [41]. Chen et al. used the LCLQ method with POI data to identify the spatial associations and heterogeneity among six medical resources in Wuhan, China [42]. Several scholars have proposed an algorithm combining light GBM and Apriori to establish a mining model of strong association rules [43]. Zhang et al. proposed an association rule-mining algorithm based on spatial autocorrelation to find local hot spots with the local Moran's I, and mined spatial association rules with the Apriori algorithm [44]. In addition, several scholars have applied the local Moran's I indicator to mine local spatial association rules for urban facilities, public health and other applications [45], [46].
Regardless of whether local or global mining algorithms are used, most of the existing spatial association rule mining algorithms are oriented toward points of two categories, while few algorithms are available for points of multiple categories. The expansion of spatial association rules from two categories of points to multiple categories of points leads to an increase in the dimensions, which impacts the effectiveness of the algorithms. The spatial associations between actual spatial features are complex and multivariate, and binary spatial association rules cannot fully reflect complex spatial relations at different points of different categories.
2.3. Mining spatiotemporal association rules
With the continuous development of spatial and temporal big data, mining of cooccurring temporal and spatial data has become a new research hotspot. Cai et al. proposed a method to detect spatiotemporal co-occurrence patterns (STCOPs) against the null hypothesis that the spatiotemporal distributions of different features are independent of each other [47]. Several scholars proposed a framework for mining spatiotemporal association patterns based on complex events, and evaluated the proposed method by analysing air pollution in the Beijing-Tianjin-Hebei region, this method avoids the shortcomings of traditional algorithms that model geographical phenomena as simple spatiotemporal point events [48]. To address the deficiency of using unilateral distance information to measure the distance between uncertain events, several scholars developed and proposed a probabilistic distance-based uncertain spatiotemporal co-occurrence pattern (USTCP) discovery method to determine the spatiotemporal co-occurrence patterns (STCOP) in multiple types of crimes where events frequently occur in adjacent spaces and times [49].
To better explain the different methods, a summary of the relevant literature is shown in Table 1. With the rapid development of technology, massive spatiotemporal data acquisition has become easier. There are several meaningful association patterns in these spatiotemporal data, showing a trend from spatial association to spatiotemporal association. However, due to the scale, diversity and heterogeneity of the data, it is still challenging to extract and understand these patterns from spatiotemporal data. In this paper, a new algorithm for local spatial association rule mining for point data of multiple categories is proposed, and one-to-many spatial association rules are also provided.
Table 1.
Summary of the relevant literatures.
| Theoretical categories | Authors | Sketches | Methods |
|---|---|---|---|
| Frequent item sets | Shekhal et al. (2001) [12] | Propose a notion of user-specified neighbourhoods in place of transactions to specify groups of items. | Transaction-based approach |
| Density-based algorithm | Cheruiyot et al. (2022) [15] | Calculate the network kernel density of spatial features and mine spatial colocation patterns. | Density-based algorithm |
| Location quotient | Leslie et al. (2011) [50] | Present the colocation quotient (CLQ) to quantify spatial association between categories of a population. | CLQ |
| Wang et al. (2017) [41] | Develop a simulation-based statistical test for the local indicator of colocation quotient (LCLQ). | LCLQ | |
| Chen et al. (2023) [42] | Present the first analysis of spatial patterns and directional spatial associations between six medical resources across Wuhan city by POI data and LCLQ method. | LCLQ | |
| Local Moran's I | Zhang et al. (2022) [44] | Propose a method for association rule mining based on spatial autocorrelation clustering events and apply it to polymetallic ore deposits. | Adopt local Moran's I, Apriori algorithm |
| Sansuk et al. (2023) [46] | Explore the spatial autocorrelation between socioeconomic factors, health service factors and sepsis mortality. | Local indicators of spatial association(LISA) | |
| Nonparametric significance test | Cai et al. (2019) [51] | Develop both point-dependent and location-dependent network-constrained summary statistics to construct the null model of the test, and model the degree of co-location patterns' prevalence as the significance level. | Network-constrained summary statistics |
| Spatiotemporal episode pattern mining | He et al. (2020) [48] | Propose a novel complex event-based spatiotemporal association pattern mining framework. | An adaptive spatiotemporal episode pattern mining algorithm |
3. Methodology
In this section, we introduce definitions relevant to the research problem, as well as a detailed description of the algorithm proposed in the paper, and we elaborate on the statistical significance test method for the algorithm results.
3.1. Problem definition
Spatial association rule mining involves the discovery of implicit rules between objects or events in a geospatial database. Spatial association is not only the result of interactions among different features but also an inherent expression of the first law of geography. According to the mining scale, methods of spatial association rule mining can be divided into global and local methods. As shown in Fig. 1(a) and Fig. 1(c), global spatial association rule mining aims to discover spatial association rules that are significant at the global scale. As shown in Fig. 1(b) and Fig. 1(d), local spatial association rule mining aims to discover spatial association rules that are nonsignificant at the global scale but significant at the local scale. According to the categories of data, spatial association rules are divided into bivariate and multivariate spatial association rules. As shown in Fig. 1(a) and Fig. 1(b), bivariate spatial association rules are used to find the spatial association rules between two categories of point data in a global or local region. As shown in Fig. 1(c) and Fig. 1(d) multivariate spatial association rules aim to relate three or more categories of point data. The interactions among different categories of spatial points lead to complex and diverse spatial association rules.
Figure 1.
Mining type of spatial association rule. (a) Mining bivariate association rules in global space, (b) Mining bivariate spatial association rule in local space, (c) Mining multivariate spatial association rule in global space, (d) Mining multivariate spatial association rule in local space.
Spatial association rule mining is easily affected by two factors, asymmetry of the interactions between spatial elements and agglomeration effects. As shown in Fig. 2(a), category B points exist only around category A points, so category B points can be considered to be associated with category A points, but half of category A points are not dependent on category B points; therefore, category A points cannot be regarded as associated with category B points. However, Fig. 2(b) and Fig. 2(a) are similar, and the spatial forces on points of categories A and B are symmetric. Local space association rule algorithms must be able to distinguish between these two cases. As shown in Fig. 2(c), the spatial association rules need to be mined to determine the proximity relationships of points, and a fixed filter is commonly used to determine the proximity relationship by means of a preset distance threshold. However, due to the agglomeration effect, points in some regions may aggregate, and a global unified threshold may misjudge the spatial proximity relationship, leading to inaccurate mining results. The asymmetry of interactions between spatial elements and agglomeration effects are two important issues in mining spatial association rules [50].
Figure 2.
Two issues to be considered in the mining of spatial association rules. (a, b) Asymmetry in the spatial association rules, (c) spatial association rules are easily affected by spatial statistics.
3.2. Algorithm description and implementation
A one-to-one spatial association rule between points of category A and points of category B is usually denoted as “”. Directional spatial associations can be denoted as “” or “”, where “” represents the attraction of category A to category B. Asymmetric attraction usually stems from the unidirectional dependence between different classes, which is called the directionality of spatial association. For example, stationery stores are usually located around schools, but the distribution of schools does not depend on that of stationery stores. Stationery stores can therefore be considered to depend on schools, i.e. stationery school→store. There are many similar examples of spatial association directionality, such as the local spatial association between drugstores and hospitals. Due to spatial heterogeneity and complex spatial relationships among different types of point data, such directional spatial associations also exist among multicategory point data in local regions. The multivariate local spatial association introduced in this paper refers to the attraction of multiple categories of point sets to a certain category of point set in a local region.
Determining the directionality of a multivariate spatial association rule requires finding adjacent points to the origin point of the association rule. In Fig. 3, a ternary association rule is taken as an example. A, B and C represent different types of point data. The ternary association rule with A as the origin point has many ways to find B and C. For example, and represent chain distributions. represents a divergent distribution. represents a ring distribution. In the chain-like distribution, A is taken as the origin point to search for the first category of adjacent points, and then the first category of adjacent points is taken as the starting point to search for the next category of adjacent points. The drawback of the chain-like ternary association rule is that the spatial distance between the two end points may be large. According to the first law of geography, if the distance between the two end points is far, the association degree in space is weak, so the chain-like multivariate association model is not accurate. The divergent association rule of involves finding two categories of association points at the same time, which is a reasonable multivariate association rule. The ring distribution is a special association rule. Unlike the divergent association rule, the dependencies form a closed loop. Considering that the multivariate association rule mining algorithm is not limited to only ternary, variables but also extends to n variables, the divergent ternary association rule is denoted as , and additional multivariate association rules can be denoted as .
Figure 3.
Different methods for searching binary proximity points.
Because spatial association rules can be directional, the algorithm determines the directionality of association rules through spatial colocation rules. A spatial colocation rule is the conditional probability of observing another set of spatial features when a certain set of observed spatial features has been observed. For example, in the binary spatial association rule, the points of category A are defined as the starting point of the association rule, and the points of category B are defined as the adjacent association points. The adjacent relationship is judged by searching outwards from the points of category A. All points in the neighbourhood of a point of category A are regarded as adjacent points of the point of category A. A spatial colocation rule is based on the probability of finding points of category B in the neighbourhood of points of category A. A multivariate association rule also has directionality, and association rules of the same feature category but different directions are also different. The three spatial association rules , and are different. The difference in the three spatial association rules is mainly due to the use of different starting points for adjacent point searches.
In the process of spatial association rule mining, it is important to judge the proximity relationship between two features. A point within the neighbourhood bandwidth of a specified feature space is usually identified as a neighbour point. The common methods for determining the bandwidth of a spatial neighbourhood include spatial fixed filters and spatial adaptive filters [52]. The spatial fixed filter judges the neighbour relationship from the fixed bandwidth. The points within the bandwidth are regarded as the neighbour points. The selection of the bandwidth of the spatial fixed filter determines the neighbour relationship. If the association between spatial objects occurs only within a fixed range, a fixed filter should be used. A global uniform fixed bandwidth is easily affected by spatial statistics, resulting in misjudgement of the neighbouring relationships. The spatial adaptive filter allows the bandwidth to fluctuate from point to point, ensuring that a certain number of points are located in the neighbourhood bandwidth, and the bandwidth of each point is usually different. To determine the neighbourhood range and neighbourhood relationship more accurately and avoid the influence of spatial statistics on the results, this algorithm determines the proximity relationship via a spatial adaptive filter [39].
Based on Leslie's colocation quotient CLQ and Cromley's local colocation quotient LCLQ, a multivariate local colocation quotient (MV-LCLQ) is proposed, which aims to discover the multivariate association rules in local regions. The MV-LCLQ draws on the calculation method of regional quotients in economic geography [53]. The larger the MV-LCLQ value is, the stronger the local spatial association strength. Taking the ternary spatial association rule as an example, the MV-LCLQ of the ith point of category A is equal to the ratio of the probability of discovering points of category B and points of category C simultaneously and the expected probability of discovering them simultaneously in the neighbourhood of points of category A. The formula for is presented in equation (1):
| (1) |
Where represents the ith point of category A; is the probability of the actual simultaneous occurrence of points of category B and points of category C in the neighbourhood of ith point of category A; is the expected probability of simultaneous occurrence of points of category B and points of category C in the neighbourhood of points of category A.
The expected probability of the simultaneous occurrence of points of category B and points of category C is equal to the product of the expected probability of occurrence of points of category B and points of category C, the formula is presented in equation (2):
| (2) |
Where is the number of points of category B in the study area; is the number of points of category C in the study area; N is the total number of points of all categories in the study area.
is the probability of simultaneous occurrence of points of category B and points of category C in the neighbourhood of the ith point of category A, and the formula is presented in equation (3):
| (3) |
Where is the number of points of category B in the neighbourhood of the ith point of category A; is the number of points of category C in the neighbourhood of the ith point of category A; is the total number of points of all categories in the neighbourhood of the ith point of category A.
Considering the first law of geography, the smaller the distance is, the stronger the association is. The geographical relationship between points is judged by geographical weight, and the geographical spatial weight of each adjacent point in the neighbourhood of the target point is determined by Gaussian kernel. The formula is presented in equation (4):
| (4) |
Where is the geographical weight of the jth adjacency point in the neighbourhood of the ith point of category A; is the distance from the ith point of category A to the jth adjacency point; is the bandwidth distance of the ith point of category A. The bandwidth distance is calculated by Euclidean distance. The adjacency set of the ith point of category A is sorted by distance value from small to large.
Considering the geographical weight relationship between the starting point of category A and the adjacent points, the probability of the simultaneous occurrence of points of category B and points of category C in the neighbourhood of the ith point of category A is presented in equation (5):
| (5) |
Where represents a binary function. The value of is 1 when the category of the jth adjacent point of the ith point of category A is X, and the value of is 0 when the category of the jth adjacent point of the ith point of category A isn't X. X represents the category of a point.
This is a ternary local spatial association rule, and the algorithm supports additional multivariate local association rule mining. The formulas for are presented in equations (6), (7) and (8):
| (6) |
| (7) |
| (8) |
Where is the multivariate local co-location quotient in the spatial association model ; denotes the probability of simultaneous occurrence of points of category in the neighbourhood of the ith point of category A; denotes is the probability of simultaneous occurrence of points in the global scope; represents the number of points of category B in the global scope; represents the number of points of category n in the global scope; N represents the total number of points of all categories in the global scope; n represents number of point categories in associated rule; represents the geographic weight from the ith point of category A to the jth point in the neighbourhood; is a binary function. The value of is 1 when the category of the jth adjacent point of the ith point of category A is X, and the value of is 0 when the category of the jth adjacent point of the ith point of category A isn't X. X represents the category of a point.
The global MV-LCLQ value is the average of the local MV-LCLQ values of all points of class A. If the value of MV-LCLQ is greater than or equal to 1, it indicates that there is an association in the global scope. If the value is less than 1, it indicates that there is no spatial association in the global scope. The formula is presented in equation (9):
| (9) |
Where, is the global MV-LCLQ; represents the number of points of category A.
We can obtain MV-LCLQ value by optimization procedures, and the pseudocode is presented in Algorithm 1.
Algorithm 1.
Multivariate Local Colocation Quotient (MV-LCLQ).
In summary, the flowchart of the proposed algorithm in this paper is shown in Fig. 4:
Figure 4.
Flowchart of the proposed algorithm.
Step 1: Input the point dataset, calculate the numbers of points of different categories, and calculate the observed probability of occurrence of the spatial association rules in the point set.
Step 2: According to the category of the spatial association rule points, perform K-nearest neighbour analysis with the set parameters to obtain the K-nearest neighbour table.
Step 3: Assign geographic spatial weights to the Euclidean distance between each pair of points in the nearest neighbour table using a Gaussian kernel function.
Step 4: Count the number of the starting points of the association rule and the number of points conforming to the association rule, and calculate the expected probability of occurrence of global region association rules based on the geographical space weights.
Step 5: Calculate the observed MV-LCLQ of each local region through the expected probability and actual probability.
Step 6: Label the categories of points in the point set by restricted random labelling and update the categories of points in the nearest neighbour table through the unique value fields of the points.
Step 7: Assign geographical weights using Gaussian kernel functions and calculate the MV-LCLQ of the local region after random simulation.
Step 8: Obtain the simulated MV-LCLQ dataset through multiple simulations and extract the simulated MV-LCLQ dataset that meets the requirement of the two-tailed significance test.
Step 9: Obtain the significant p-value of the simulated MV-LCLQ and observed MV-LCLQ through the two-tailed significance test.
3.3. Statistical significance test
To determine whether the calculated MV-LCLQ is statistically significant, the experimental results are tested for significance through statistical tests. The null hypothesis of MV-LCLQ is that “given the point set, there is no spatial association between the pairs of classified subsets of points.” That is, we take the geometric pattern of the point sets as given and search for associations that cannot be explained by the geometric pattern of point sets alone. With reference to random samples generated through Monte Carlo simulation, the spatial location of the point sets remains unchanged, and the categories of the point sets are relabelled through the restricted random labelling method proposed by Kronenfeld and Leslie [41], [54]. In the relabelling process, the distribution frequency of each point category remains unchanged. During the simulation at point i, the categories of other points are randomly labelled. After random simulation, the number of points of each category remains unchanged. As shown in Fig. 5(a) and Fig. 5(b), the positions and categories of the points in the boxes are not changed, and the positions of the other points are also unchanged. Only the categories of points outside of the boxes are relabelled. After relabelled, the total number of points of each category in the study area remains unchanged. After each simulation trial, the MV-LCLQ at each point is recalculated, and the simulated sample distribution of the MV-LCLQ at each point is obtained through multiple simulations. The number of trials in which the simulated MV-LCLQ is less than or equal to the observed MV-LCLQ, and the number in which it is greater than or equal to the observed value, are determined, and the smaller number is selected and multiplied by two to determine the number of simulated samples required for the significance test. The two-tailed significance test is performed on the observed MV-LCLQ value and the selected simulated sample at each point. [51]
Figure 5.
Schematic diagram of restricted random labelling. (a) Given an observed marked point rule, (b) implies that labels are randomly assigned to all individuals except for those in the restriction set enclosed in square boxes.
The calculation of the MV-LCLQ and the significance test are implemented in Python. This algorithm is implemented by constructing a script tool based on ArcMap10.2. The tool provides two types of proximity determination methods, namely, a fixed-bandwidth filter and an adaptive filter. The adaptive filter supports a specified number of nearest neighbours and a specified number of K nearest neighbours. The algorithm is implemented by generating the nearest neighbour table or spatial weight matrix. In the significance test, Monte Carlo simulation is used to relabel the categories of points, but it does not need to regenerate the nearest neighbour table or spatial weight matrix. The generated neighbour analysis generates a neighbour table with a time complexity of O ((N-s) s), where s represents the number of starting point sets and N represents the total number of point sets. Each Monte Carlo simulation needs to reassign the categories of points other than the starting point, but there is no need to regenerate the neighbour table. Therefore, the time complexity of the algorithm is the largest of O((N-s) s) and O((N-s) m), where m is the number of simulations.
4. Case study
In this section, we verify the proposed algorithm through artificial simulation data, compare it with the existing LCLQ algorithm, and apply the proposed algorithm to real data to mine four kinds of local spatial association rules.
4.1. Experiment of simulated dataset
To verify the significance test of the algorithm, artificial simulation data with labels are constructed. There are 41 points in the artificial simulation dataset, including 8 points of category A, 12 points of category B, 10 points of category C and 11 points of category D. The spatial distribution of the artificial simulation data is shown in Fig. 6. Taking the spatial association rule as an example, there are many points of category B and points of category C around the local areas of and in Fig. 6, showing strong spatial association. A group of points of category B and points of category C can be found around , and , with relatively weak spatial association. In a small range of , and , category B points and category C points cannot be found at the same time, that is, within a certain range of , and , the intensity of the spatial association rule is 0. Then p-value is the probability that the values of MV-LCLQ are the same as the actual observed results after multiple simulations under the assumption that the null hypothesis is correct and that the value of MV-LCLQ is calculated each time by random simulation.
Figure 6.
Spatial distribution of synthetic dataset.
To verify the existence of spatial association asymmetry, we calculated the one-way global spatial associations of each pair of types of points in the three categories of points namely, A, B and C. In this experiment, K was taken as 3, and the results of 1000 simulated runs are shown in Table 2. Table 2 shows that the values of spatial association rules with the same two categories of points but different directions are not the same, indicating that the spatial association interactions between objects are asymmetric.
Table 2.
MV − LCLQGlobal values of different binary spatial associations.
| Spatial association rule | A → B | B → A | A → C | C → A | B → C | C → B |
|---|---|---|---|---|---|---|
| 1.05 | 1.26 | 1.47 | 1.39 | 1.02 | 0.72 |
To further verify the effectiveness of the algorithm, we compared the proposed MV-LCLQ with the existing binary local indicator of the colocation quotient LCLQ. The dataset is shown in Fig. 5, with K set to 5. The results of binary LCLQ mining and MV-LCLQ mining , are compared in Table 3.
Table 3.
Comparison of MV-LCLQ and LCLQ when K = 5.
| POINT_ID | A1 | A2 | A3 | A4 | A5 | A6 | A7 | A8 | |
|---|---|---|---|---|---|---|---|---|---|
| LCLQ() | LCLQ | 0.04 | 2.51 | 1.12 | 1.56 | 0.70 | 1.63 | 0.55 | 0.00 |
| p-value(100 times) | 0.46 | 0.06 | 0.70 | 0.72 | 0.94 | 0.28 | 0.92 | 0.32 | |
| p-value(1000 times) | 0.46 | 0.08 | 0.56 | 0.70 | 1.00 | 0.38 | 0.89 | 0.39 | |
| MV-LCLQ() | MV-LCLQ | 0.82 | 1.82 | 0.00 | 1.21 | 1.16 | 3.26 | 1.06 | 0.00 |
| p-value(100 times) | 0.10 | 0.25 | 0.00 | 0.18 | 0.67 | 0.00 | 0.79 | 0.00 | |
| p-value(1000 times) | 0.00 | 0.00 | 0.00 | 0.01 | 0.04 | 0.00 | 0.01 | 0.00 |
As shown in Table 3, when the MV-LCLQ of each point of category A is greater than 0, the corresponding LCLQ value is also greater than 0. When there are points of category B in the vicinity of points of category A, it is possible to have both points of category B and points of category C points near points of category A. That is, the LCLQ greater than 0 is a prerequisite for the MV-LCLQ greater than 0. However, when the LCLQ is greater than 0, the MV-LCLQ may not necessarily be greater than 0. For example, the neighbourhoods of are the points of category B and the points of category D, but there is no point of category C near , so the value of MV-LCLQ is 0. As shown in Table 3, the p-value of the LCLQ is significantly greater than that of the MV-LCLQ, indicating that the statistical significance of the association pattern increases with increasing association pattern complexity. When the number of simulations is 1000, most of the p-values of the points of category A are greater than or equal to 0.38, except for the p-value of which is 0.08, and this cannot pass the 95% confidence test. Through comparative analysis, LCLQ can identify only the local-region binary spatial association rules, while MV-LCLQ obtains more significant results in mining complex local spatial association patterns.
The ternary association rule is mined from the simulation dataset, and the proximity relationship is judged by different K-nearest neighbours, where the K value represents the number of K-nearest neighbours. Simulation numbers of 100 and 1000 are used, and the results are shown in Table 4.
Table 4.
Spatial association rule A → BC mining results when K is different.
| POINT_ID | A1 | A2 | A3 | A4 | A5 | A6 | A7 | A8 |
|---|---|---|---|---|---|---|---|---|
| : | ||||||||
| MV-LCLQ | 0.00 | 2.98 | 0.00 | 1.69 | 1.38 | 3.03 | 1.19 | 0.00 |
| p-value(100 times) | 0.00 | 0.00 | 0.00 | 0.68 | 0.36 | 0.00 | 0.54 | 0.00 |
| p-value(1000 times) | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| : | ||||||||
| MV-LCLQ | 0.82 | 1.82 | 0.00 | 1.21 | 1.16 | 3.26 | 1.06 | 0.00 |
| p-value(100 times) | 0.10 | 0.25 | 0.00 | 0.18 | 0.67 | 0.00 | 0.79 | 0.00 |
| p-value(1000 times) | 0.00 | 0.00 | 0.00 | 0.01 | 0.04 | 0.00 | 0.01 | 0.00 |
| : | ||||||||
| MV-LCLQ | 1.02 | 1.03 | 0.00 | 1.22 | 1.10 | 1.91 | 0.87 | 0.80 |
| p-value(100 times) | 0.09 | 0.24 | 0.00 | 0.88 | 0.31 | 0.21 | 0.56 | 0.19 |
| p-value(1000 times) | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Taking and 1000 simulations as an example, the MV-LCLQ values of are the largest, followed by those of . It is obvious from Fig. 6 that there are only points of categories B and C around and , and the association is strong. There is no point of category B or C around , or , so the result is 0. There are points of category B and points of category C around , and , but there are also points of category D that do not belong to this association rule; thus, the result is approximately 1, which is mainly due to the geographical weight. With the increase in the number of K-nearest neighbours, the number of points of category B and points of category C in the neighbourhood of and increase, which leads to an increase in the MV-LCLQ values of and . However, with the increase in the number of K-nearest neighbours, points of class D are added in the neighbourhood of , and , which leads to smaller MV-LCLQ values for , and . Although the number of K-nearest neighbours increases from 3 to 7, there are still no points of category B or category C in the neighbourhood of , thus, the MV-LCLQ values of are still 0, while the MV-LCLQ values of , and fluctuate due to the continuous changes in the number of points of category B, category C and category D in the neighbourhoods of , and . These results show that the selection of bandwidth has a great influence on the MV-LCLQ results. According to the p-values in Table 4, most of the results are significant. With increasing simulation time, the p-value decreases and tends towards 0, which also proves that the simulation data are highly representative.
The algorithm was implemented on computers configured with an Intel Core I5-9300H CPU, a GTX 1660Ti graphics card, 16 GB of DDR4 memory, and a 1TB SSD. Fig. 7(a) shows the variation in running time with different numbers of points in the ternary local association rule. With the continuous increase in the number of points, the algorithm running time also increases, but the growth is relatively stable. Fig. 7(b) shows the running time of the algorithm in mining association rules of different categories with 1000 points. In this case, it can be seen that when the number of categories is greater than 7, the running time increases rapidly. Unlike that of the pure computer domain algorithm, the running time of the algorithm proposed in this paper mainly affects complicated geospatial analysis processes.
Figure 7.
Running time of Algorithm. (a) The variation of running time with different number of points in the ternary local association rule, (b) the running time of the algorithm to mine association rules of different categories in 1000 points.
4.2. Experiment of real-world dataset
4.2.1. Study area and data description
To verify the effectiveness of the algorithm, multivariate local spatial association rules were mined from the POI data of urban public service facilities. The five administrative districts of the Chengdu Center in Sichuan Province were selected as the study area. The five administrative districts are Jinniu District, Chenghua District, Qingyang District, Wuhou District and Jinjiang District, which are the political, economic and cultural centres of Chengdu. The POI dataset used in this study was obtained from Gaodemap (https://www.amap.com/), which is one of the major internet maps in China, and the attributes of the POI data, including longitude, latitude, name, address and type, were obtained. A total of 9206 POIs were used in the dataset, including coffee houses, tea houses, ice cream shops and dessert houses, the categories of which are represented by A, B, C and D, respectively. The POI data were categorized according to the original categories of the POI dataset from GaodeMap. The number of POIs of each category is shown in Table 5. Considering the small number of dessert houses, to determine whether there is a local spatial association between the distribution of dessert houses and the other three types of POI, the ternary local spatial association rules (, and ) and the quaternion local spatial association () are mined. The association rule shows that the probability of simultaneous occurrence of coffee houses and tea houses in the neighbourhood of dessert houses exceeds the expected probability. The association rule indicates that the probability of the simultaneous occurrence of coffee houses and ice cream shops in the neighbourhood of dessert houses exceeds the expected probability. The association rule indicates that the probability of the simultaneous occurrence of tea houses and ice cream shops in the neighbourhood of dessert houses exceeds the expected probability. The association rule indicates that the probability of the simultaneous occurrence of coffee houses, tea houses and ice cream shops in the neighbourhood of dessert houses exceeds the expected probability.
Table 5.
POI information of real dataset.
| Category | Code name | Count | Percentage (%) |
|---|---|---|---|
| Coffee house | A | 1465 | 15.91 |
| Tea house | B | 4828 | 52.44 |
| Ice cream shop | C | 2435 | 26.45 |
| Dessert house | D | 478 | 5.20 |
4.2.2. Result
To determine the appropriate neighbourhood relationship, the results for the four spatial association rules under different value K of neighbourhood points are compared, and the results are shown in Fig. 8. The test is run 1000 times with different values of K in the adaptive filter, namely, K is set to 3, 5, 7, and 9.
Figure 8.

Results of spatial global association rule mining under different K proximity relations.
As shown in Fig. 8, the values of are greater than 1, indicating that the association rule is significant at the global level. The values of the other association rules are less than 1. For the same value of K, the value of for is the smallest, indicating no significant association at the global level. With the increase in the number K of adjacent points, the values of all association rules increase. Under the same spatial rule, when , the value of clearly increases with the increase in K. When , the growth of the MV-LCLQ is smaller than that when , and the results tend to become stable. Therefore, the adaptive filter with is selected to mine local spatial association patterns.
The results of the four local association rules mined when the number of neighbourhood points of the adaptive filter K is 7 are shown in Table 6. The nonsignificant according to the significance test, as well as points that are associated and are insignificant according to the significance test. The number of nonsignificant points in the four association rules is small. There are many points with MV-LCLQ values between 0 and 1, which indicates that most of the points show no association in the local region. Among the POIs with MV-LCLQs greater than 2.0, the number of points under the association rule is the largest, which reflects the high value of for .
Table 6.
Number of D points with different MV-LCLQ values under four local spatial association rules.
| Category | D → AB | D → AC | D → BC | D → ABC |
|---|---|---|---|---|
| Not significant | 65 | 54 | 31 | 28 |
| 0.0-1.0 | 324 | 233 | 361 | 378 |
| 1.0-2.0 | 65 | 48 | 86 | 72 |
| >2.0 | 24 | 143 | 0 | 0 |
The results of the four local association rules mined when the number of neighbours of the adaptive filter K is 7 are shown in Fig. 9. The results of MV-LCLQ that were not significant after significance test (p-value>0.05) are shown as hollow circles in Fig. 9, indicating that the results were not statistically significant. The other results in Fig. 9 correspond to MV-LCLQs with strong significance, which pass the significance test. In addition to the administrative divisions, the whole research area can be divided into three circles from the inside to the outside by the relationship with the expressway: the inner city circle, the suburban circle and the outer suburban circle. As shown in Fig. 9(a), the association rule mining results show that the local areas significantly associated with the association rule are mainly concentrated in the urban centre and south of Wuhou District, while the other areas are not significant, which indicates that coffee houses and tea houses jointly depend on dessert houses in space, forming significant local association regions. The spatial association of in the inner city circle of the research area is strong, while there are many insignificant associations in the outer ring area of the city. This could be because there are more white-collar workers in the central urban area who have a higher demand for coffee and afternoon tea.
Figure 9.
Mining results of different association rules (K = 7). (a) Mining result of D → AB, (b) Mining result of D → AC, (c) Mining result of D → BC, (d) Mining result of D → ABC.
As shown in Fig. 9(b), most local areas show significant associations because the association rule is globally significant, which indicates that consumers who prefer desserts also prefer coffee and ice cream. Among the significant association areas, POI data with MV-LCLQs greater than 2 account for a greater proportion of the total area; these data are mostly distributed in the central area of the city and southeastern Wuhou District. The significant local association areas of the rule are mainly concentrated in the central area and distributed southwards along the east side of Wuhou District, while the other areas are not significant. As shown in Fig. 9(c), the significant local association areas of the association rule are mainly distributed at the edge and in the suburbs of the city. The MV-LCLQs of most POI points in the core area of the city are less than 1, indicating that the association in the central city is not significant and that there are different consumer groups for tea houses and ice cream shops in the central urban area. The mining results of the association rule are shown in Fig. 9(d). The obvious local association areas of the association rule are concentrated in the centre and suburbs of the city, which indicates that the spatial distribution of D is easily influenced by the interactions among A, B and C.
In Fig. 9(b), the MV-LCLQ values of the POI points distributed southeast of Wuhou District are larger, while in the same range, as shown in Fig. 9(d), the MV-LCLQ values of the POI points distributed southeast of Wuhou District are smaller, possibly because the proportion of young residents in Wuhou District is relatively high. Young people have greater demand for coffee and ice cream but less demand for tea.
In order to verify the accuracy and reliability of the mining results, considering that the mining of association rules in real scenarios may be affected by other categories of catering POI data, we added four categories of catering POI data in the experiment based on the real dataset, namely, bakery, fast food restaurant, foreign food restaurant and leisure food restaurants. There are eight categories of catering POI data, namely coffee house, tea house, ice cream shop, dessert house, bakery, fast food restaurant, foreign food restaurant and leisure food restaurant, and their numbers and proportions are shown in Table 7. The number of catering POIs in the real dataset is 19,279, which has nearly doubled. Therefore, is selected as the judgment of the nearest neighbour relationship, and the mining results are shown in Fig. 10. Compared with the results of Fig. 9, it can be seen that the spatial distributions of the results of Fig. 10(a), 10(b), 10(c) and 10(d) are similar to Fig. 9(a), 9(b), 9(c) and 9(d), respectively. Therefore, it can be seen that the mining results of significantly strong association patterns are not easily affected by the interference of other categories of points.
Table 7.
The information of all categories of catering POI dataset.
| Category | Count | Percentage (%) |
|---|---|---|
| Coffee house | 1465 | 7.60 |
| Tea house | 4828 | 25.04 |
| Ice cream shop | 2435 | 12.63 |
| Dessert house | 478 | 2.48 |
| Bakery | 1884 | 9.77 |
| Fast food restaurant | 6675 | 34.62 |
| Foreign food restaurant | 1413 | 7.33 |
| Leisure food restaurant | 101 | 0.53 |
Figure 10.
Mining results of different association rules (K = 14) under the real scenario case considering all categories of catering POI data. (a) Mining result of D → AB, (b) Mining result of D → AC, (c) Mining result of D → BC, (d) Mining result of D → ABC.
5. Discussion and conclusions
In summary, we propose a local spatial association rule mining algorithm that is used to measure the strength of local spatial associations by constructing the local quotient of multivariate MV-LCLQs. First, an adaptive filter is used to determine the spatial proximity between different points, and the nearest-neighbour table is generated. The geographical weights are assigned to the nearest neighbour table by Gaussian kernel function. Then, the MV-LCLQ is obtained by calculating the ratio of the probability of finding spatial association patterns in local areas to the expected probability. Finally, restricted random labelling is performed on the data, and the simulated sample distribution of the MV-LCLQ is recalculated. The significance of the MV-LCLQ is tested through the simulated sample distribution. We verify the algorithm with artificial simulated data and real data. With the artificial simulation data, is used to prove the existence of asymmetry between spatial objects, and the experimental comparison with the existing algorithm LCLQ proves the effectiveness of the proposed algorithm. Furthermore, the proposed algorithm is applied to real POI data, and four kinds of local spatial association rules for four categories of POI data are mined. The experimental results show that the proposed algorithm can recognize such local spatial association rules, and its results can also provide a reference basis for shop location selection. The results show that the proposed algorithm can identify local spatial association rules. Moreover, the results show that the proposed algorithm overcomes the limitations of one-to-one spatial association mining and can identify one-to-many spatial association patterns. The proposed algorithm can not only quantitatively measure the intensity of local spatial association patterns but also express one-way dependencies in spatial association mining.
The algorithm provides a new way of thinking for mining spatial association rules. The asymmetric interaction force between different categories of point data measures the degree of attraction of multiple categories of point data to a certain category of point set, further identifies significant areas, and mines unique multivariate local spatial association patterns. The algorithm can not only be used for urban service facilities to mine local spatial association rules between different categories of such facilities but can also be applied to other spatial classifications of point data, including spatial associations between crime and urban facilities or between pollutants and morbidity, as well as industrial spatial associations. By examining the spatial association patterns of different regions, we can further analyse the driving forces of regional differences and identify deeper influencing factors.
In existing research, adaptive filters still need to set parameters, such as the parameter K, which has a certain impact on the mining results. In future research, appropriate filters should be selected based on the type and actual situation of mining patterns, and comparative analysis should be conducted by setting a more precise parameter table to ensure the accuracy and reliability of the parameters.
In the process of POI-oriented spatial association rule mining, distance is calculated by the Euclidean distance between points in this paper, while the Manhattan distance may be more consistent with real city distances. According to the specific research areas examined in the future, we can also choose the most realistic distance calculation method by comparing the Manhattan distance or Chebyshev distance.
In addition, POI represents a spatial object as a point, which ignores the real physical size of the spatial object. Hence, the calculated distance may deviate from the real distance. The proposed algorithm can also be extended from spatial point datasets to complex spatial objects to analyse the multivariate local spatial associations between complex objects.
In terms of dimensionality, the algorithm can be expanded from multivariate spatial association pattern mining to multivariate spatiotemporal association pattern mining to explore more complex correlations under different spatiotemporal conditions. Moreover, with the expansion of related research from space to both space and time, the algorithm should be further improved to explore the spatiotemporal variation characteristics of local regional association patterns.
Funding
This research was funded by Shandong Soft Science Project (Grant No.: 2023RKY07013), the Housing and Urban Rural Development Science and Technology Project of Shandong Province (Grant No.: 2018-K2-03), University Science and Technology Project of Shandong Province (Grant No.: J18KB088), Ministry of Education Industry-University Cooperative Education Project (Grant No.: 220604903123137), Ministry of Education Industry-University Cooperative Education Project (Grant No.: 220602746121513).
CRediT authorship contribution statement
Fei Cai: Writing – review & editing, Writing – original draft, Project administration, Funding acquisition, Conceptualization. Jie Chen: Writing – original draft, Validation, Supervision, Methodology, Formal analysis, Data curation, Conceptualization. Telin Chen: Writing – review & editing, Validation, Methodology, Formal analysis, Data curation. Banghua Zhang: Software, Resources. Wenping Fan: Data curation, Software.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
The authors would like to thank the editor and the anonymous reviewers for their helpful feedback.
Appendix A. Acronym
In Table A we included the list of acronyms we reported in the paper.
Table A.
Acronyms defined along the paper.
| Acronym | Description |
|---|---|
| CLQ | Co-location Quotient |
| FP-growth | Frequent Pattern growth |
| FP-tree | Frequent Pattern tree |
| KNN | K-nearest neighbourhood |
| LCLQ | Local indicator of Colocation Quotient |
| LISA | local indicators of spatial association |
| GBM | Gradient Boosting Machine |
| MV-LCLQ | Multivariate Local Colocation Quatient |
| POI | Point of interest |
| STCOP | Spatiotemporal Co-occurrence Pattern |
| USTCP | Uncertain Spatiotemporal Co-occurrence Pattern |
Appendix B. Symbol
In Table B we included the list of symbols we reported in the paper.
Table B.
Symbols defined along the paper.
| Symbol | Description |
|---|---|
| The ith point of category A. | |
| The probability of the actual simultaneous occurrence of points of category B and points of category C in the neighbourhood of ith point of category A. | |
| PA→BC | The expected probability of simultaneous occurrence of points of category B and points of category C in the neighbourhood of points of category A. |
| The probability of simultaneous occurrence of points of category B and points of category C in the neighbourhood of the ith point of category A. | |
| NB | The number of points of category B in the study area. |
| N | The total number of points of all categories in the study area. |
| The number of points of category B in the neighbourhood of the ith point of category A. | |
| The total number of points of all categories in the neighbourhood of the ith point of category A. | |
| Wij | The geographical weight of the jth adjacency point in the neighbourhood of the ith point of category A. |
| dij | The distance from the ith point of category A to the jth adjacency point. |
| bi | The bandwidth Euclidean distance of the ith point of category A. |
| f(j,X) | Represents a binary function. The value of f(j,X) is 1 when the category of the jth adjacent point of the ith point of category A is X, and the value of f(j,X) is 0 when the category of the jth adjacent point of the ith point of category A isn't X. X represents the category of a point. |
| The probability of simultaneous occurrence of points of category B...Z in the neighbourhood of the ith point of category A. | |
| PA→B...Z | The probability of simultaneous occurrence of B...Z points in the global scope. |
| NB | The number of points of category B in the global scope. |
| Nn | The number of points of category n in the global scope. |
| N | The total number of points of all categories in the global scope. |
| s | The number of starting point set. |
| m | The number of simulation times. |
| K | The number of the nearest neighbours. |
Data availability
The data and code are submitted as supplementary data file and also are provided as the following link: https://figshare.com/s/cc249f3d68a0ecd31b75.
References
- 1.Zheng K., Qin W., Du X. Global land surface dry/wet conditions mining based on spatial-temporal association rules. Earth Space Sci. 2021;8(9) [Google Scholar]
- 2.Yoo J.S., Park S.J., Raman A. 2019 IEEE International Conference on Big Knowledge (ICBK) IEEE; 2019. Micro-level incident analysis using spatial association rule mining; pp. 310–317. [Google Scholar]
- 3.Tobler W.R. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 1970;46(sup1):234–240. [Google Scholar]
- 4.Anselin L. vol. 4. Springer Science & Business Media; 1988. Spatial Econometrics: Methods and Models. [Google Scholar]
- 5.He Z., Deng M., Xie Z., Wu L., Chen Z., Pei T. Discovering the joint influence of urban facilities on crime occurrence using spatial co-location pattern mining. Cities. 2020;99 [Google Scholar]
- 6.Kuo P.-F., Lord D. Applying the colocation quotient index to crash severity analyses. Accid. Anal. Prev. 2020;135 doi: 10.1016/j.aap.2019.105368. [DOI] [PubMed] [Google Scholar]
- 7.Liu Z., Chen X., Xu W., Chen Y., Li X. Detecting industry clusters from the bottom up based on co-location patterns mining: a case study in Dongguan, China. Environ. Plan. B: Urban Anal. City Sci. 2021;48(9):2827–2841. [Google Scholar]
- 8.Xia Z., Li H., Chen Y., Yu W. Detecting urban fire high-risk regions using colocation pattern measures. Sustain. Cities Soc. 2019;49 [Google Scholar]
- 9.Cai J., Kwan M.-P. Discovering co-location patterns in multivariate spatial flow data. Int. J. Geogr. Inf. Sci. 2022;36(4):720–748. [Google Scholar]
- 10.Zhang H., Zhou X., Tang G., Zhang X., Qin J., Xiong L. Detecting colocation flow patterns in the geographical interaction data. Geogr. Anal. 2022;54(1):84–103. [Google Scholar]
- 11.Shekhar S., Huang Y. Proc. Spatio-Temporal Symposium on Databases. 2001. Co-location rules mining: a summary of results. [Google Scholar]
- 12.Shekhar S., Huang Y. International Symposium on Spatial and Temporal Databases. Springer; 2001. Discovering spatial co-location patterns: a summary of results; pp. 236–256. [Google Scholar]
- 13.Huang Y., Shekhar S., Xiong H. Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans. Knowl. Data Eng. 2004;16(12):1472–1485. [Google Scholar]
- 14.Yoo J.S., Shekhar S. A joinless approach for mining spatial colocation patterns. IEEE Trans. Knowl. Data Eng. 2006;18(10):1323–1337. [Google Scholar]
- 15.Cheruiyot K. Detecting spatial economic clusters using kernel density and global and local Moran's / analysis in Ekurhuleni metropolitan municipality, South Africa. Reg. Sci. Policy Pract. 2022;14(2):307–327. [Google Scholar]
- 16.Xiao X., Xie X., Luo Q., Ma W.-Y. Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2008. Density based co-location pattern discovery; pp. 1–10. [Google Scholar]
- 17.Cai J., Deng M., Guo Y., Xie Y., Shekhar S. Discovering regions of anomalous spatial co-locations. Int. J. Geogr. Inf. Sci. 2021;35(5):974–998. [Google Scholar]
- 18.Celik M., Kang J.M., Shekhar S. Seventh IEEE International Conference on Data Mining (ICDM 2007) IEEE; 2007. Zonal co-location pattern discovery with dynamic parameters; pp. 433–438. [Google Scholar]
- 19.Qian F., Chiew K., He Q., Huang H. Mining regional co-location patterns with knng. J. Intell. Inf. Syst. 2014;42:485–505. [Google Scholar]
- 20.Ding W., Eick C.F., Yuan X., Wang J., Nicot J.-P. A framework for regional association rule mining and scoping in spatial datasets. Geoinformatica. 2011;15:1–28. [Google Scholar]
- 21.Cai J., Xie Y., Deng M., Tang X., Li Y., Shekhar S. Significant spatial co-distribution pattern discovery. Comput. Environ. Urban Syst. 2020;84 [Google Scholar]
- 22.Zhou M., Ai T., Wu C., Gu Y., Wang N. A visualization approach for discovering colocation patterns. Int. J. Geogr. Inf. Sci. 2019;33(3):567–592. [Google Scholar]
- 23.Zhou M., Ai T., Zhou G., Hu W. A visualization method for mining colocation patterns constrained by a road network. IEEE Access. 2020;8:51933–51944. [Google Scholar]
- 24.Kuo P.-F., Lord D. A visual approach for defining the spatial relationships among crashes, crimes, and alcohol retailers: applying the color mixing theorem to define the colocation pattern of multiple variables. Accid. Anal. Prev. 2021;154 doi: 10.1016/j.aap.2021.106062. [DOI] [PubMed] [Google Scholar]
- 25.Wang M., Wang L., Zhou L. Fuzzy Systems and Data Mining V. IOS Press; 2019. Spatial colocation pattern mining with the maximum membership threshold; pp. 1092–1100. [Google Scholar]
- 26.Barua S., Sander J. Proceedings of the 26th International Conference on Scientific and Statistical Database Management. 2014. Mining statistically sound co-location patterns at multiple distances; pp. 1–12. [Google Scholar]
- 27.Tran V., Wang L. Delaunay triangulation-based spatial colocation pattern mining without distance thresholds. Stat. Anal. Data Min. ASA Data Sci. J. 2020;13(3):282–304. [Google Scholar]
- 28.Agrawal R., Imieliński T., Swami A. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. 1993. Mining association rules between sets of items in large databases; pp. 207–216. [Google Scholar]
- 29.Ghazikhani A., Babaeian I., Gheibi M., Hajiaghaei-Keshteli M., Fathollahi-Fard A.M. A smart post-processing system for forecasting the climate precipitation based on machine learning computations. Sustainability. 2022;14(11):6624. [Google Scholar]
- 30.Zhang C., Tian G., Fathollahi-Fard A.M., Wang W., Wu P., Li Z. Interval-valued intuitionistic uncertain linguistic cloud Petri net and its application to risk assessment for subway fire accident. IEEE Trans. Autom. Sci. Eng. 2020;19(1):163–177. [Google Scholar]
- 31.Shaheen M., Abdullah U. Carm: context based association rule mining for conventional data. Comput. Mater. Continua. 2021;68(3) [Google Scholar]
- 32.Ghadi M., Laouamer L., Nana L., Pascu A. A blind spatial domain-based image watermarking using texture analysis and association rules mining. Multimed. Tools Appl. 2019;78:15705–15750. [Google Scholar]
- 33.Wang Y., Song W. Research on hierarchical mining algorithm of spatial big data set association rules. Advanced Hybrid Information Processing: Third EAI International Conference, ADHIP 2019; Nanjing, China, September 21–22, 2019, Proceedings, Part II; Springer; 2019. pp. 200–208. [Google Scholar]
- 34.Dao T.H.D., Thill J.-C. Crimescape: analysis of socio-spatial associations of urban residential motor vehicle theft. Soc. Sci. Res. 2022;101 doi: 10.1016/j.ssresearch.2021.102618. [DOI] [PubMed] [Google Scholar]
- 35.Koperski K., Han J. International Symposium on Spatial Databases. Springer; 1995. Discovery of spatial association rules in geographic information databases; pp. 47–66. [Google Scholar]
- 36.Han J., Pei J., Yin Y. Mining frequent patterns without candidate generation. SIGMOD Rec. 2000;29(2):1–12. [Google Scholar]
- 37.Mazarbhuiya F.A., Abulaish M., Mahanta A.K., Ahmad T. Mining local association rules from temporal data set. Pattern Recognition and Machine Intelligence: Third International Conference, PReMI 2009; New Delhi, India, December 16–20, 2009 Proceedings 3; Springer; 2009. pp. 255–260. [Google Scholar]
- 38.Jang H.-J., Yang Y., Park J.S., Kim B. Fp-growth algorithm for discovering region-based association rule in the iot environment. Electronics. 2021;10(24):3091. [Google Scholar]
- 39.Cromley R.G., Hanink D.M., Bentley G.C. Geographically weighted colocation quotients: specification and application. Prof. Geogr. 2014;66(1):138–148. [Google Scholar]
- 40.Anselin L. A local indicator of multivariate spatial association: extending Geary's C. Geogr. Anal. 2019;51(2):133–150. [Google Scholar]
- 41.Wang F., Hu Y., Wang S., Li X. Local indicator of colocation quotient with a statistical significance test: examining spatial association of crime and facilities. Prof. Geogr. 2017;69(1):22–31. [Google Scholar]
- 42.Chen Q., Cheng J., Tu J. Analysing the global and local spatial associations of medical resources across Wuhan city using poi data. BMC Health Serv. Res. 2023;23(1):96. doi: 10.1186/s12913-023-09051-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lee A.J., Chen Y.-A., Ip W.-C. Mining frequent trajectory patterns in spatial–temporal databases. Inf. Sci. 2009;179(13):2218–2231. [Google Scholar]
- 44.Zhang B., Jiang Z., Chen Y., Cheng N., Khan U., Deng J. Geochemical association rules of elements mined using clustered events of spatial autocorrelation: a case study in the Chahanwusu river area, Qinghai province, China. Appl. Sci. 2022;12(4):2247. [Google Scholar]
- 45.Deng Y., Yu W., Liu M., Chen Y. The urban facilities before and after the Covid-19 pandemic: spatial association patterns mining in Wuhan, China. Appl. Spat. Anal. Policy. 2023:1–33. [Google Scholar]
- 46.Sansuk J., Laohasiriwong W., Sornlorm K. Spatial association between socio-economic health service factors and sepsis mortality in Thailand. Geosp. Health. 2023;18(2) doi: 10.4081/gh.2023.1215. [DOI] [PubMed] [Google Scholar]
- 47.Cai J., Deng M., Liu Q., Chen Y., He Z., Tang J. A statistical method for detecting spatiotemporal co-occurrence patterns. Int. J. Geogr. Inf. Sci. 2019;33(5):967–990. [Google Scholar]
- 48.He Z., Deng M., Cai J., Xie Z., Guan Q., Yang C. Mining spatiotemporal association patterns from complex geographic phenomena. Int. J. Geogr. Inf. Sci. 2020;34(6):1162–1187. [Google Scholar]
- 49.Chen Y., Cai J., Deng M. Discovering spatio-temporal co-occurrence patterns of crimes with uncertain occurrence time. ISPRS Int.l J. Geo-Inf. 2022;11(8):454. [Google Scholar]
- 50.Leslie T.F., Kronenfeld B.J. The colocation quotient: a new measure of spatial association between categorical subsets of points. Geogr. Anal. 2011;43(3):306–326. [Google Scholar]
- 51.Cai J., Deng M., Liu Q., He Z., Tang J., Yang X. Nonparametric significance test for discovery of network-constrained spatial co-location patterns. Geogr. Anal. 2019;51(1):3–22. [Google Scholar]
- 52.Fotheringham A.S., Brunsdon C., Charlton M. John Wiley & Sons; 2003. Geographically Weighted Regression: the Analysis of Spatially Varying Relationships. [Google Scholar]
- 53.Isserman A.M. The location quotient approach to estimating regional economic impacts. J. Am. Inst. Plann. 1977;43(1):33–41. [Google Scholar]
- 54.Kronenfeld B.J., Leslie T.F. Restricted random labeling: testing for between-group interaction after controlling for joint population and within-group spatial structure. J. Geogr. Syst. 2015;17:1–28. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data and code are submitted as supplementary data file and also are provided as the following link: https://figshare.com/s/cc249f3d68a0ecd31b75.










