Abstract
Statistical modelling of a spatial point pattern often begins by testing the hypothesis of spatial randomness. Classical tests are based on quadrat counts and distance-based methods. Alternatively, we propose a new statistical test of spatial randomness based on the fractal dimension, calculated through the box-counting method providing an inferential perspective contrary to the more often descriptive use of this method. We also develop a graphical test based on the log–log plot to calculate the box-counting dimension. We evaluate the performance of our methodology by conducting a simulation study and analysing a COVID-19 dataset. The results reinforce the good performance of the method that arises as an alternative to the more classical distances-based strategies.
Keywords: Box-counting dimension, Complete spatial randomness Fractal dimension, Poisson distribution, Spatial point patterns
Introduction
Spatial statistics is the branch of statistics that deals with the modelling of realisations of spatially indexed stochastic processes (Schabenberger and Gotway, 2017). This field covers three acknowledged areas: geostatistics, areal data, and spatial point patterns (Cressie, 1991). The last one concerns the analysis of the spatial distribution of locations of events such as earthquakes, landslides or forest fires (Baddeley et al., 2013). Other examples are patterns of towns in a region, trees in a forest or galaxies in space (Ripley, 1977). In all these cases, the relative position of points is compared with clustered, random, or regular generating processes (Bivand et al., 2013). For a theoretical review on spatial point patterns, the reader is referred to Daley and Vere-Jones (2008), Diggle (2013), Illian et al. (2008), and Møller and Waagepetersen (2004). A more practical overview can be found in Baddeley et al. (2015) and Gaetan and Guyon (2010). The implementation of methods and models to analyse point patterns with R software R Core Team (2020) is described in Baddeley et al. (2015), Bivand et al. (2013) and Plant (2012). The theory of spatial point patterns is an active research field with challenging theoretical problems and applications in a broad range of sciences such as agriculture, astronomy, biology, climatology, ecology, epidemiology, geology, among many others (Baddeley et al., 2006).
Complete spatial randomness (CSR) describes a point process whereby point events occur within a given study area in a completely random fashion. It is synonymous with a homogeneous spatial Poisson process. Usually, the first step in analysing a spatial point pattern is to test for CSR. If the hypothesis is not rejected, one can assume that the given point pattern is random, and we refer to it as a homogeneous Poisson point pattern (Illian et al., 2008). Generally, for this purpose, some tests based on quadrat counts and distances between locations of events are used (Banerjee et al., 2015).
There are indeed several distance-based functions that are often used in testing for CSR (Diggle, 2013). As widely used tools, we have the distribution of distances from an event to its nearest neighbour (function G), the distribution of distances from an arbitrary point of the plane to its nearest neighbour (F), the function J (calculated in terms of F and G), and the number of events encountered up to a given distance of any particular event (Ripley’s K function) (Baddeley et al., 2013). However, these functions are only known under a few theoretical models and are mathematically unknown for many other types of spatial dependence. As functions that depend on distances, we have to choose a particular metric. The usual one is the Euclidean distance, but these functions have to be adapted when this distance is not realistic. These ones are highly time-consuming when the number of points increases. With the current technologies, we have point patterns with thousands of events, and it is often the case that we cannot calculate the K-function, say. All these sorts of drawbacks motivate our proposal for testing the hypothesis of CSR. Specifically, we propose a new alternative computationally efficient for testing this hypothesis of CSR, which is based on calculating the fractal dimension (Wiegand and Moloney, 2013) utilizing the box-counting method (Foroutan-pour et al., 1999). Several authors have used box-counting and the spectrum of generalised dimensions to analyse point patterns (see for example Salvadori et al. 1997; Tuia and Kanevski 2008, and Vega et al. 2015). However, all these contributions use this tool from a descriptive point of view. One intuitive advantage of the methodology considered here is that the statistic defined can summarise the information of the spatial point pattern in only one value. Also, since it does not depend on distances, it is not necessary to consider the edge effect. This makes computation more straightforward and much faster than the classical tests.
The notion of a fractal dimension was introduced by Mandelbrot (1967) who used it as an indicator of surface roughness. A shape with a higher fractal dimension is rougher than one with a lower dimension. Many methods exist for estimating the fractal dimension. Box-counting, R/S analysis and the variation method can be used for this purpose (Breslin and Belward, 1999). Fractal dimension and its estimation using the box-counting method have been used in different fields of statistics. We can find contributions, among other statistical contexts, in time series analysis (Kopytov et al., 2016), clustering analysis (Bones et al., 2016), principal components analysis (Mo and Huang, 2010) and geostatistics (Vidal et al., 2010). As mentioned before, we show how these concepts can be used in point pattern analysis from an inferential perspective. We then develop our test based on the box-counting dimension of a spatial point pattern. We also propose a graphical test in the line of the classical graphical tests based on G, F or K functions. We evaluate the performance of our methodology by conducting a simulation study through three known spatial structures that can be generated using the library spatstat (Baddeley et al., 2015) in R (R Core Team, 2020). In all cases, the results are consistent with those found by using the functions G, F and Ripley’s K (Diggle, 2003).
The paper is organised as follows. Section 2 introduces the box-counting methodology. Section 3 presents the proposed test, an illustration through simulated and real data, and a power study. Section 4 describes a graphical approach to test for CSR (also based on the box-counting dimension). Section5 shows an application of the method to a real data set of COVID-19 cases in Cali, Colombia. The article ends with a brief discussion and suggestions for further research.
Box-counting estimation of the fractal dimension for spatial point patterns under CSR
The hypothesis of CSR for a spatial point pattern asserts that the number of events in any region follows a Poisson distribution with a given mean count per uniform subdivision. The events of a pattern are independently and uniformly distributed over space. In other words, the events are equally likely to occur anywhere and do not interact with each other. Here, we use uniform in the sense of following a uniform probability distribution across the study region, not in the sense of “evenly” dispersed across the study region. There are no interactions amongst the events, as the intensity of events does not vary over the plane. Thus, the independence assumption would be violated if the existence of one event either encouraged or inhibited the occurrence of other events in the neighbourhood. In this sense, CSR acts as a benchmark hypothesis to distinguish between randomness and clustering or regularity due to some form of interaction.
A fractal is a non-regular geometric shape with the same degree of non-regularity at all scales. It can be treated as a self-similar structure in the sense that even an indefinitely small part of a shape is geometrically similar to the whole (Debnath, 2006). The fractal dimension is a ratio providing a statistical index of complexity comparing how the details in a pattern change with the scale at which they are measured (Falconer, 2004). The dimension of self-similar fractals is given by
| 1 |
where M is the number of self-similar pieces, and is a scale factor, such that . In Eq. (1) corresponds to the logarithm to the base 10. We use this same notation throughout the paper to be consistent with the related published literature. The use of in Eq. (1) is quite limited in practice. An alternative is using the box-counting method (Liebovitch and Toth, 1989). Suppose the object of interest is covered with a number of non-overlapping squares with sides of length . The box-counting estimation of the fractal dimension (hereinafter box-counting dimension) is given by (Addison, 1997)
| 2 |
In practice, D in (2) is calculated as the slope of a linear regression between and . Given a number of values (), D is defined by means of the linear model (see Addison (1997)).
| 3 |
Expected value of D under CSR
We now show how the box-counting dimension given in (3) can be adapted to the context of spatial point patterns and can be used to test the hypothesis of CSR. Under CSR, we have that the number of events in a square A, with area |A| and sides of length k (without loss of generality, we can take ), is Poisson distributed with mean , where is the constant intensity of the point process; that is, the probability function of the number of events in A is
| 4 |
From (4), we have
Assume the original square A is divided into non-overlapping squares with sides of length (see Fig. 1).
Fig. 1.

The square A with sides of length 1 is divided into non-overlapping squares with sides of length
Then, denoting , we have
| 5 |
Under the CSR condition, , the mean of a homogeneous Poisson process. From Eq. (5)
and consequently
Define the random variable as the number of squares of side containing at least one event, that is, corresponds to the number of squares required to cover the point pattern (see Figs. 1 and 2). This variable can be defined as
| 6 |
The expected value of in (6) is
Note that in order to define in (3), it is requiered to find . Using the first-order Taylor expansion of log around , we have
| 7 |
Taking expectation in (3) and using (7), we have under CSR
| 8 |
When in (8), we have
In general, if A (Fig. 1) is a square with side length , assuming again that , we then have
Using again the first-order Taylor expansion of log around , we have
| 9 |
Then, under CSR, the expected value of the fractal dimension for a square of side k calculated with the box-counting method is defined by the linear model
| 10 |
Note that in (10) is usually unknown. Based on just one realisation of a homogeneous Poisson process, we can then estimate by n (the number of points of the observed point pattern) to estimate . Taking in (10) we obtain
| 11 |
Fig. 2.

Graphical representation of , , and (for )
log–log relationship
The functional relationship between and defined in Eq. (10) allows to characterise the behaviour of . depends on k (the side length of the original square) and on (the expected number of events of the spatial point pattern in A). Given , the shape of the curves does not change (Fig. 3). Note that the greater the value of k, the more the curve is shifted to the left. Likewise, given a fixed k, the effect of is reflected on the maximum of . The greater , the greater the value at which becomes constant (Fig. 4).
Fig. 3.
Relation between and log , , when the initial square has sides of length k (0.01, 0.1, 1, 10, 100) and the expected number of events is . Black points at each curve correspond to coordinates and , respectively (see text for explanations on these values). The slopes of the dashed lines define (see Eq. 15). At each case . Black lines (slope 2) correspond to (limit when )
Fig. 4.
Relation between and , , according to the expected number of points of the pattern (), when initial square has sides of length . Black points at each curve corresponds to coordinates and , respectively (see text for explanations on these values). The slopes of the dashed lines define (see Eq. 15). These are, respectively, 1.60 (), 1.80 (), 1.87 () and 1.90 (). In general when , (black line)
The minimum number of boxes required to cover the point pattern is obtained when (initial square). In this case . The ordinate for this value is . On the other hand, the maximum number of partitions (corresponding to the minimum size of ) is found when the expected number of events in is . Under this scenario, we have
| 12 |
Replacing (12) into (10) with , we obtain
| 13 |
The log–log plots (Figs. 3 and 4) show a multifractal behaviour, i.e. the dependence between and is non-linear. The box-counting dimension D in (3) is usually calculated with the portion of the data that allows to fit a linear model (see, for example, Kenkel, 2013; Mou and Wang, 2014; Vega et al., 2015, and Jaquette and Schweinhart, 2013). This option might not be appropriate to discriminate between the different types of spatial point patterns. In this context, it is important to take into account the minimum and maximum values of the log–log curves. Thus, here, we propose to characterise the relationship between and in Eq. (10) with the slope of the straight line defined by the points and (see black points in Figs. 3 and 4), that is, the slope calculated with the coordinates
| 14 |
We denote this slope as instead of to emphasise that we do not employ the traditional linear fitting used in box-counting estimation. Under CSR, we have
| 15 |
CSR testing using the statistic
In practice with real data, in Eq. (15) is unknown. In this case in order to test for CSR, we can take with n the number of points of the observed pattern, namely we assume that . In this scenario, the expected value of under CSR is defined as
| 16 |
and its estimation is given by
| 17 |
where , , and are defined similarly as in (16), and is calculated from the scatter plot between and . Specifically, is the ordinate corresponding to the abscissa , with k the side length of the square. In practice, some mathematical interpolation procedure (linear, polynomial, etc) can be required to calculate . By way of illustration, we show the results found with a simulation from , . Figure 5 shows the spatial distribution of n=114 simulated events in the unit square, the number of events per cell for each one of the three partitions , and the value of at each case.
Fig. 5.
Simulation size of a spatial point pattern with in a square of side (top left). The numbers at each panel indicate how many events are falling into each box. corresponds to the number of boxes with one or more events for (top right), (bottom left), and (bottom right), respectively
We observe that the smaller the size of the partition, the greater the number of boxes without events (the number of boxes with zeros). Calculating and for , we obtain the log–log scatter plot (white circles) shown in Fig. 6. Its behaviour, as expected, is similar to the theoretical log–log curve under CSR (red line). The black points in this plot are the coordinates used to calculate the expected box-counting dimension under the null hypothesis (Eq. 16). In this case . The intersection of the blue lines corresponds to the coordinate () ( is found by linear interpolation between the two nearest values), which is replaced in Eq. (17) to find the estimated box-counting dimension (). Generating m simulations from and repeating the procedure above described, we can find m estimations under the null hypothesis of CSR. A value of at the extreme of the tail of the null distribution would indicate that the spatial randomness hypothesis should be rejected. Analogously, defining we reject the randomness hypothesis if this value is at the extreme of the tail of the corresponding null distribution. Using B may be preferable because in all cases (regardless of the type of pattern considered), the zero will be the reference value of the centre of the distribution (see Fig. 7). This is illustrated in Sect. 3.1 with a simulation study. The procedure to test for CSR based on , , and B above described is summarised in Algorithm 1, where in addition to presenting in a schematic way, the steps required to perform the spatial randomness test using the box-counting method, it is shown how to estimate the corresponding p value. 
Fig. 6.
Scatter plot (white circles) obtained from the pairs , , calculated with a simulation size from . The red line is the theoretical log–log curve under CSR (assuming ). Black circles are the coordinates used to calculate the box-counting dimension () under the null hypothesis (Eq. (16)). The intersection of the blue lines corresponds to the coordinate () used to obtain the estimated box-counting dimension ()(Eq. (17)). is calculated by linear interpolation of the two nearest points and
Fig. 7.
Simulated point patterns under spatial randomness (top left), clustering (centre left) and point pattern cells (bottom left). The number of events is in all cases. On the right, we show the null distributions of the statistic B generated by Monte Carlo simulation (histogram density estimation (grey) and kernel density estimation (red)). Dashed black lines correspond to the quantiles and () of the B values simulated. The dashed blue line corresponds to the B calculated with the point pattern given on the left panel
An illustration of the test
As an initial review of the goodness of fit of the test proposed in Sect. 3, we present the results of a Monte Carlo simulation study to describe the test behaviour under the three types of points structures generally considered in point pattern analysis. Based on simulations from Poisson homogeneous and Matérn cluster point patterns and a real dataset (point pattern cells), we display the performance of the statistic B, and implicitly of the statistic , under randomness, clustering, and inhibition.
We show in Fig. 7 realisations of three spatial point patterns with different underlying structures: a spatial point pattern size n=42 simulated from a homogeneous Poisson model (top left), a cluster point pattern of events (centre left) generated from a Matérn cluster process with parameters , and a regular point pattern (bottom left), which corresponds to the database cells widely known and used as example in many works on point patterns (Ripley, 1977, 1981; Diggle, 1983). Note that in the case of the Matérn process, we use instead of to avoid confusion with the notation in Eq. (16). The intensity of the Matérn cluster process is (Waagepetersen, 2007), and the level of aggregation is determined by parameter r. Fixed and , the aggregation level increases when r decreases (Fig. 8).
Fig. 8.
Simulations of Matérn cluster point patterns with parameters (). The mean at each case is . The number of events simulated according to the r value are: 182 (r=0.1), 111 (r=0.2), 65 (r=0.4), 77 (r=0.6), 73 (r=0.8), and 79 (r=1.0)
We particularly looked for simulations of size 42 to generate the point patterns under randomness and clustering (top left and centre left of Fig. 7) so that the results were more easily comparable with those of the cells pattern (which has 42 events). The distributions of the statistic B on the right of Fig. 7 were generated assuming a fixed n, although the results do not change significantly if an unconditional simulation is considered. The functions rpoispp and rMatClust of the spatstat library (Baddeley et al., 2015) of R (R Core Team, 2020) were used to simulate the random and clustered patterns. The point pattern cells are also available in spatstat. We apply the methodology presented in Sect. 3 to test the hypothesis of CSR with each one of these datasets. Employing the n values in Table 1 and Eqs. (16) and (17), we calculate for each one of the patterns in Fig. 7, , , and B (Table 1).
Table 1.
Expected number of events (), number of events recorded (n), expected box-counting dimension conditional to n (), and estimates ( and B) for each one of the three types of point patterns considered
| Point Pattern | n | B | |||||
|---|---|---|---|---|---|---|---|
| Poisson homogeneous | 42 | 42 | 1.754 | 1.762 | 0.007 | 0.056 | 0.070 |
| Matérn Cluster | 40 | 42 | 1.754 | 1.556 | 0.198 | 0.074 | 0.093 |
| Cells | 42 | 1.754 | 1.897 | 0.132 | 0.065 | 0.067 |
and are the quantiles and of B distribution ()
A quick inspection of the results in Table 1 reveals that the value of found with the point pattern simulated under CSR (top left of Fig. 7) is very close to the expected value of under complete spatial randomness, while in the other two cases, is relatively far from this value of reference (below when the pattern is cluster and above if it is inhibitory). The same information is taken considering the B statistic. (In this case, the reference is zero.) The value of B under the Poisson process is close to zero, while the B values of the Matérn cluster and cells patterns are far from zero (above when the pattern is cluster and below if it is inhibitory). The distribution of the statistic B under the null hypothesis was estimated generating 500 simulations from Poisson() (see the histograms in right panel of Fig. 7), that is, for , we obtained and . A kernel density estimation (Sheater, 2004) of the B distribution is also obtained at each case (red curves in right panel of Fig. 7). We use a Gaussian kernel, and the bandwidth is defined using the Silverman’s rule (Sheater, 2004). Note in Fig. 7 that we obtain three different distributions of B under randomness. Only one of these distributions could have been used. However, to present the results in more detail, we include three sets of independent simulations. Using the , and the function quantile of the library stats of R (R Core Team, 2020), the percentiles and of the B distribution (black dashed lines in Fig. 7) were calculated. The null hypothesis of CSR is rejected at each case if the B values are lower or greater than the estimated percentiles and , respectively. The kernel density estimates (histograms and red curves) in Fig. 7 suggest that the distributions of B under CSR are symmetric around zero. A large value of B (in the upper tail of the distribution of B) will indicate that the pattern under study is clustered. On the contrary, a very low value of B (lower tail of the distribution of B) will give evidence that the process of interest follows an inhibition model.
Two aspects are noted from Table 1 and Fig. 7. On the one hand, the B value calculated with the point pattern simulated under randomness (0.007) (dashed blue line in the top right panel of Fig. 7) is around the centre of the null distribution, i.e. as expected, the test indicates that there is not evidence to reject the null hypothesis of CSR. On the other hand, for the Matérn cluster point pattern (centre left) and cells (bottom left), the values of the statistic B (Table (1)) are on the tails of the corresponding distributions under randomness (on the right in the case of the Matérn cluster process and the opposite for the inhibition pattern (Fig. 7)), that is, these indicate that the hypothesis of randomness should be rejected. In summary, the plots on the right panel of Fig. 7 show that the test proposed (based on B or ) in all the three cases came to the correct decision. If the hypothesis of spatial randomness is rejected, it indicates whether the pattern is cluster or inhibitory. From Table 1, it is important to note that conditional on n there is a value of reference () for the randomness hypothesis. The simulation-based distributions allow to establish whether the estimate is significantly different from this value. The value of B allows measuring the strength of inhibition or clustering. The smaller or larger (further from zero) B is, the greater the degree of inhibition or clustering, respectively, of the point pattern under consideration.
Power of the test
We generate realisations of a Matérn cluster process with parameters () (Waagepetersen, 2007). The method used generates a uniform Poisson point process of “parent” points with intensity . Then, each parent point is replaced by a random cluster of “offspring” points, the number of points per cluster being Poisson distributed, and their positions being placed and uniformly inside a disc of radius scale (r) centred on the parent point (Waagepetersen, 2007). We use the function rMatClust of the library spatstat (Baddeley et al., 2015) to generate the simulations. For six selected values of r (0.1, 0.2, 0.4, 0.6, 0.8, and 1.0), one resulting simulated process is shown in Fig. 8. From these plots, it gets clear that the smaller r, the greater the aggregation, and therefore more evidence to reject the hypothesis of spatial randomness. On the contrary, if r increases the configuration of points look more similar to a realisation of a random process under CSR. With this result in mind, in order to estimate the rejection probability of the test under different levels of spatial aggregation, we decided to propose a simulation study considering a more extensive set of values of r between 0.1 and 1 (0.1, 0.15, 0.20, ..., 0.90, 0.95, 1). Point patterns with a high level of aggregation are initially generated (using small r values), and then, (increasing r), we simulate others with point configurations similar to those obtained under spatial randomness. The procedure used is analogous to that described in Algorithm 1. Specifically, for each r value, the rejection probability of the CSR hypothesis is estimated using the iterative procedure given in Algorithm 2. The rejection probabilities for each r are shown in Table 2. According to the values from this table, it is clear that there is an inverse relationship between r (column 1) and the probability of rejecting the null hypothesis (column 13). The lower the r value, the greater , i.e. the more evident the spatial aggregation, the greater the rejection probability of the complete spatial randomness hypothesis. On the contrary, when the value of r tends to one, the corresponding rejection probabilities of the randomness hypothesis tend to zero. We include in Table 2 the first 10 values of B (of the total of 500) with the corresponding associated empirical p values. It is clear from these values that there is (in general) a transition in the B values. When r is small (r = 0.1, 0.15, 0.20), the values of B tend to be relatively large, and therefore, the simulation-based p values are close to zero, while when r is large (r = 0.9, 0.95, 1.00) the opposite occurs, the values of B tend to be relatively small (close to zero or even negative), and consequently, the corresponding empirical p values are greater than . The table results suggest that the proposed test is unbiased, i.e. the power of the test increases when the level of spatial aggregation increases.
Table 2.
Statistic B and empirical p values obtained with the first 10 simulations (out of 500) of Matérn cluster point patterns (). The rejection probability of the null hypothesis () is calculated based on the 500 simulations
| r | Results from 10 simulations | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.10 | B calculated | 0.37 | 0.50 | 0.39 | 0.43 | 0.24 | 0.46 | 0.39 | 0.28 | 0.28 | 0.45 | 1.00 |
| pvalue | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ||
| 0.15 | B calculated | 0.18 | 0.29 | 0.25 | 0.30 | 0.22 | 0.44 | 0.47 | 0.24 | 0.26 | 0.13 | 1.00 |
| pvalue | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ||
| 0.20 | B calculated | 0.18 | 0.16 | 0.10 | 0.19 | 0.09 | 0.16 | 0.08 | 0.13 | 0.19 | 0.20 | 1.00 |
| pvalue | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ||
| 0.25 | B calculated | 0.14 | 0.23 | 0.06 | 0.25 | 0.10 | 0.13 | 0.13 | 0.13 | 0.01 | 0.06 | 0.96 |
| pvalue | 0.06 | 0.04 | 0.08 | 0.09 | 0.09 | 0.11 | 0.20 | 0.11 | 0.08 | 0.13 | ||
| 0.30 | B calculated | 0.12 | 0.00 | 0.03 | 0.06 | 0.01 | 0.04 | 0.09 | 0.19 | 0.11 | 0.21 | 0.78 |
| pvalue | 0.00 | 0.52 | 0.02 | 0.00 | 0.41 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | ||
| 0.35 | B calculated | 0.00 | 0.01 | 0.07 | 0.03 | 0.13 | 0.23 | 0.00 | 0.17 | 0.05 | 0.02 | 0.64 |
| pvalue | 0.00 | 0.00 | 0.01 | 0.00 | 0.12 | 0.24 | 0.00 | 0.00 | 0.01 | 0.20 | ||
| 0.40 | B calculated | 0.02 | 0.06 | 0.05 | 0.15 | 0.02 | 0.02 | 0.00 | 0.03 | 0.03 | 0.01 | 0.46 |
| pvalue | 0.09 | 0.00 | 0.08 | 0.00 | 0.06 | 0.28 | 0.44 | 0.02 | 0.03 | 0.37 | ||
| 0.45 | B calculated | 0.06 | 0.00 | 0.09 | 0.05 | 0.01 | 0.02 | 0.02 | 0.02 | 0.00 | 0.02 | 0.43 |
| pvalue | 0.01 | 0.40 | 0.03 | 0.01 | 0.75 | 0.11 | 0.22 | 0.82 | 0.52 | 0.14 | ||
| 0.50 | B calculated | 0.00 | 0.04 | 0.03 | 0.05 | 0.04 | 0.06 | 0.03 | 0.05 | 0.00 | 0.00 | 0.23 |
| pvalue | 0.41 | 0.01 | 0.08 | 0.01 | 0.07 | 0.00 | 0.22 | 0.01 | 0.37 | 0.47 | ||
| 0.55 | B calculated | 0.00 | 0.04 | 0.00 | 0.00 | 0.10 | 0.02 | 0.01 | 0.01 | 0.00 | 0.04 | 0.20 |
| pvalue | 0.44 | 0.01 | 0.46 | 0.48 | 0.00 | 0.81 | 0.43 | 0.19 | 0.46 | 0.6 | ||
| 0.60 | B calculated | 0.01 | 0.00 | 0.02 | 0.01 | 0.01 | 0.01 | 0.02 | 0.02 | 0.01 | 0.01 | 0.17 |
| pvalue | 0.29 | 0.56 | 0.18 | 0.62 | 0.77 | 0.70 | 0.84 | 0.09 | 0.30 | 0.64 | ||
| 0.65 | B calculated | 0.00 | 0.03 | 0.02 | 0.01 | 0.00 | 0.05 | 0.02 | 0.02 | 0.01 | 0.1 | 0.10 |
| pvalue | 0.49 | 0.10 | 0.92 | 0.72 | 0.30 | 0.98 | 0.14 | 0.97 | 0.59 | 0.74 | ||
| 0.70 | B calculated | 0.04 | 0.04 | 0.05 | 0.01 | 0.00 | 0.03 | 0.01 | 0.01 | 0.03 | 0.00 | 0.08 |
| pvalue | 0.06 | 0.06 | 0.05 | 0.64 | 0.64 | 0.58 | 0.06 | 0.36 | 0.21 | 0.58 | ||
| 0.75 | B calculated | 0.04 | 0.01 | 0.03 | 0.00 | 0.01 | 0.01 | 0.03 | 0.02 | 0.00 | 0.05 | 0.08 |
| pvalue | 0.04 | 0.16 | 0.92 | 0.50 | 0.66 | 0.26 | 0.10 | 0.86 | 0.43 | 0.98 | ||
| 0.80 | B calculated | 0.04 | 0.03 | 0.01 | 0.01 | 0.00 | 0.00 | 0.02 | 0.01 | 0.01 | 0.00 | 0.06 |
| pvalue | 0.98 | 0.06 | 0.34 | 0.62 | 0.48 | 0.50 | 0.10 | 0.13 | 0.74 | 0.44 | ||
| 0.85 | B calculated | 0.00 | 0.01 | 0.02 | 0.02 | 0.01 | 0.00 | 0.01 | 0.01 | 0.01 | 0.02 | 0.04 |
| pvalue | 0.56 | 0.30 | 0.88 | 0.84 | 0.35 | 0.52 | 0.24 | 0.28 | 0.28 | 0.18 | ||
| 0.90 | B calculated | 0.01 | 0.01 | 0.01 | 0.00 | 0.02 | 0.01 | 0.03 | 0.03 | 0.01 | 0.01 | 0.04 |
| pvalue | 0.74 | 0.70 | 0.80 | 0.50 | 0.88 | 0.60 | 0.08 | 0.02 | 0.71 | 0.32 | ||
| 1.00 | B calculated | 0.01 | 0.05 | 0.01 | 0.01 | 0.00 | 0.02 | 0.02 | 0.02 | 0.02 | 0.03 | 0.02 |
| pvalue | 0.35 | 0.01 | 0.22 | 0.76 | 0.56 | 0.83 | 0.12 | 0.12 | 0.83 | 0.04 | ||
CSR testing using the log–log plot
In the analysis of spatial point patterns, the test for CSR is often based on graphical methods. Generally, the distribution functions of the event–event distance (function G (Clark and Evans, 1954)), point–event distance (function F (Bartlett, 1964)), and the number of events encountered up to a given distance of any particular event (function Ripley’s K Ripley (1977)) are employed for this purpose. These functions are typically inspected by plotting the empirical function calculated from the data, together with the theoretical function of the homogeneous Poisson process with the same average intensity Baddeley et al. (2015). To assess the statistical significance of deviations between the observed and theoretical functions, it is required to know the expected variability when the pattern is completely random. To this purpose, simulated realisations under CSR are generated, and pointwise envelopes based on the minimum and maximum are calculated. In this Section, we show how the log–log plot defined in Sect. 2.2 can be used as an alternative to the functions G, F, and K to test graphically for CSR. The steps to define the graphical test based on the log–log plot are the following. Initially, calculate the log–log plot defined in Sect. 2.2 with the observed dataset. Then simulate m realisations from , and for each simulated point pattern obtain the log–log plot. From the generated m curves, define pointwise envelopes as in the case of the G, F, and K functions mentioned above. We illustrate the use of the log–log function based on the same point patterns considered in Sect. 3.1 (Fig. 7). The results obtained are compared with those found with the G, F, and K functions. We use the library spatstat (Baddeley et al., 2015) to generate the envelopes. Specifically, the functions Gest, Fest, Kest and envelope of the same library were used for carrying out the graphical tests. In all cases, the simulations to obtain the envelopes were conditioned to have the same number of events as the original point pattern ( (random), (regular), and (clustered)). Figures 9, 10 and 11 show the corresponding envelopes (grey shading) for the G (top left), F (top right), K (bottom left) and log–log functions (bottom right) generated from the point patterns in Fig. 7. The obtained results with the log–log function in all cases are in accordance with those given by the functions G, F, and K (Figs. 9, 10 and 11), that is, the estimated log–log function is inside the envelopes in the case of the Poisson pattern (Fig. 9) and outside of envelopes in the case clustering and inhibition (cells) (Figs. 10 and 11, respectively). From an empirical point of view, we can note that the log–log plot has the same performance as the traditional G, F, and K functions. The log–log function has an analogous interpretation to the F function. There is clustering when the estimated function is below the envelopes and inhibition when it is above (Figs. 10 and 11). The results based on the log–log plot are also consistent with those described in Sect. 3.1. Recall that under inhibition, the estimated box-counting dimension () is greater than expected under randomness (), or the opposite if the process is clustered (). A similar result can be identified from Figs. 10 and 11. The log–log plot for the pattern cells (Fig. 11) is above the envelopes, that is, it is greater than the expected log–log curve under CSR. Likewise, we can see in Fig. 10 (Matérn cluster process) that the estimated log–log function (black line) is below the envelopes, that is, the log–log plot for a clustered point pattern is lower than the expected under CSR. These results suggest a direct relationship between these two approaches.
Fig. 9.
Envelopes of G (top left), F (top right), K (bottom left), and log–log (bottom right) functions calculated with a Poisson process
Fig. 10.
Envelopes of G (top left), F (top right), K (bottom left), and log–log (bottom right) functions calculated with a Matérn cluster process
Fig. 11.
Envelopes of G (top left), F (top right), K (bottom left), and log–log (bottom right) functions calculated with the point pattern cells
Spatial randomness test of COVID-19 cases in Cali, Colombia
Spatial statistics has emerged as a helpful tool in epidemiology to describe the spatial and spatio-temporal spread and incidence of different pathogens. This area of statistics is commonly used today in the study of the COVID-19 spread (a disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Chhikara et al. (2020)). Spatial statistics allows an understanding of how the COVID-19 outbreak is spatially distributed (Ramírez-Aldana et al., 2020). Studying the spatial behaviour (at the local and regional level) of the spread by COVID-19 is essential for the formulation of control and mitigation measures by government and health authorities. For this reason, there has been a growing number of academic and scientific works related to the spatial modelling of its spread patterns (Kang et al. 2020; Miller et al. 2020). In this section, we show how the methodology given in Sect. 3 can be used for this purpose.
In particular, we apply the test proposed to a dataset corresponding to COVID-19 cases recorded in March 2020 in the metropolitan area of Cali city, located in the southwest region of Colombia (Fig. 12). The virus was confirmed to have reached Colombia in March 2020. Between March 2 and 31, 2020, there were 443 reports of COVID-19 infections in Cali. As input for our analysis, we take 405 spatial coordinates corresponding to the spatial residence locations of the infected people in this municipality. We exclude the duplicate coordinates. (The infections of several people in the same place are considered a single event.) In Fig. 13, it is shown the spatial distribution of the events in this month. The southernmost part of the city is rural and unpopulated, so we carry out the analysis by delimiting the perimeter to the inhabited area.
Fig. 12.
Geographical location of the study area. Cali is the capital of the Department of Valle del Cauca
Fig. 13.

COVID 19 infection sites (March 2020) in the urban area of Cali, Colombia. The sites that are outside the perimeter of the urban area (symbol +) are not considered in the analysis
Observing both the right panel in Fig. 12 and the point pattern in Fig. 13, we identify zones with high cases burden. The most significant aggregation of cases is given in the city’s south. However, other minor hot spots are placed to the west, the east, and the north. A detailed description of this respect is given in Cuartas (2020). Based on the coordinates of the spatial point pattern in Fig. 13, we estimated the functions G , F, and K (Fig. 14). The three plots are concordant and confirm the above; they allow us to conclude that the specific pattern of COVID-19 cases in Cali city during the first month of the pandemic was clustered. We also found the distribution (under CSR) of the statistics B defined in Sect. 3 (the top left panel of Fig. 14) and its calculated value (dashed blue line in Fig. 14) with the point pattern in Fig. 13. The B value is on the right tail of the distribution. Consequently, it indicates that the null hypothesis of CSR must be rejected, (The same conclusion given by the classic graphical tests.)
Fig. 14.
Tests based on the expected value of the box-counting dimension (top left) and the functions G (top right), F (bottom left), and K (bottom right)
We have analysed just one dataset of COVID-19 cases. The four strategies allow us to reach the same conclusion. However, there are implicitly advantages in using the method based on the box-counting estimation. On the one hand, we have a p value (see Algorithm 2 for its estimation), which allows being conclusive. (Sometimes the graphical tests are not.) On the other hand, using B (equivalently in ) the point pattern under study is characterised with just one value. This opens the doors to the application of many traditional techniques (regression, ANOVA, longitudinal data analysis, time series, etc.) in those situations in which there is a collection of point patterns to be analysed simultaneously (obtained, for example, in different periods or under various experimental conditions).
Conclusions and further research
We have proposed a test to evaluate the hypothesis of complete spatial randomness based on the fractal dimension and its estimation by the box-counting methodology. Also, a graphical test is derived. Using simulated point patterns under randomness, inhibition and clustering, we found that the two approaches have a good performance. The results are concordant and coherent with those obtained employing classical graphical tests (G, F, and K functions). The graphical interpretation of the proposed test is similar to that obtained with the F function. The tests are not based on distances, and therefore, it is not necessary to consider the edge effect. A simulation study was carried out to show the behaviour of the test proposed under the null hypothesis (randomness) and the classical alternatives (inhibition and clustering). The simulation results were satisfactory. A detailed study about the power of the test was also conducted. This allows us to conclude that the test has a good performance under different levels of clustering. An advantage of the methodology considered is that a statistic is calculated ( or equivalently B), which allows summarising the information of the point pattern in just one value. This can be useful from many inferential perspectives. For example, for modelling spatio-temporal point patterns or comparing groups of point patterns through ANOVA.
Acknowledgements
This work is part of the research project “Modelación Espacio-Temporal del Covid-19 en Colombia” financed by Dirección de Investigación of Universidad Nacional de Colombia. We thank the epidemiological surveillance group of the Secretary of Health of Cali for providing us with the analysed information.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Yolanda Caballero, Email: ybcaballero@unal.edu.co.
Ramón Giraldo, Email: rgiraldoh@unal.edu.co.
Jorge Mateu, Email: mateu@mat.uji.es.
References
- Addison P. Fractals and Chaos: an illustrated course. London: CRC Press; 1997. [Google Scholar]
- Baddeley A, Gregori P, Mateu J, Stoica R, Stoyan D. Case studies in spatial point process modeling. Berlin: Springer; 2006. [Google Scholar]
- Baddeley A, Turner R, Mateu J, Bevan A. Hybrids of Gibbs point process models and their implementation. J. Stat. Softw. 2013;55(11):1–43. doi: 10.18637/jss.v055.i11. [DOI] [Google Scholar]
- Baddeley A, Rubak E, Turner R. Spatial point patterns: methodology and applications with R. Boca Raton: Chapman and Hall/CRC; 2015. [Google Scholar]
- Banerjee S, Carlin B, Gelfand A. Hierarchical modeling and analysis for spatial data. Boca Raton: CRC Press; 2015. [Google Scholar]
- Bartlett M. The spectral analysis of two-dimensional point processes. Biometrika. 1964;51(3/4):299–311. doi: 10.2307/2334136. [DOI] [Google Scholar]
- Bivand R, Pebesma E, Gomez-Rubio V. Applied spatial data analysis with R. Berlin: Springer; 2013. [Google Scholar]
- Bones C, Romani L, de Sousa E. Clustering multivariate data streams by correlating attributes using fractal dimension. J. Inf. Data Manag. 2016;7(3):249–249. [Google Scholar]
- Breslin M, Belward J. Fractal dimensions for rainfall time series. Math. Comput. Simul. 1999;48(4–6):437–446. doi: 10.1016/S0378-4754(99)00023-3. [DOI] [Google Scholar]
- Clark P, Evans F. Distance to nearest neighbor as a measure of spatial relationships in populations. Ecology. 1954;35(4):445–453. doi: 10.2307/1931034. [DOI] [Google Scholar]
- Chhikara B, Rathi B, Singh J, Poonam F. Corona virus SARS-CoV-2 disease COVID-19: infection, prevention and clinical advances of the prospective chemical drug therapeutics. Chem. Biol. Lett. 2020;7(1):63–72. [Google Scholar]
- Cressie N. Statistics for spatial data. Hoboken: Wiley; 1991. [Google Scholar]
- Cuartas, et al.: SARS-coV-2 spatio-temporal analysis in Cali. Colombia. Revista de Salud Pública 22(2), 1–6 (2020) [DOI] [PubMed]
- Daley D, Vere-Jones D. An introduction to the theory of point processes. Berlin: Springer; 2008. [Google Scholar]
- Debnath L. A brief historical introduction to fractals and fractal geometry. Int. J. Math. Educat. Sci. Technol. 2006;37(1):29–50. doi: 10.1080/00207390500186206. [DOI] [Google Scholar]
- Diggle P. Statistical analysis of spatial point patterns. Cambridge: Academic Press; 1983. [Google Scholar]
- Diggle, P.: Statistical analysis of spatial point patterns. Edward Arnold (2003)
- Diggle P. Statistical analysis of spatial and spatio-temporal point patterns. Boca Raton: CRC Press; 2013. [Google Scholar]
- Falconer K. Fractal geometry: mathematical foundations and applications. Hoboken: Wiley; 2004. [Google Scholar]
- Foroutan-pour K, Dutilleul P, Smith D. Advances in the implementation of the box-counting method of fractal dimension estimation. Appl. Math. Comput. 1999;105(2–3):195–210. doi: 10.1016/S0096-3003(98)10096-6. [DOI] [Google Scholar]
- Gaetan C, Guyon X. Spatial statistics and modeling. Berlin: Springer; 2010. [Google Scholar]
- García, L., Bravo, L., Collazos, P., Ramírez, O., Carrascal, E., Nuñez, M., Portilla, Millan, E.: Métodos del Registro de Cáncer en Cali. Colombia. Revista Colombia Médica 49(1), 109–120 (2018) [DOI] [PMC free article] [PubMed]
- Illian J, Penttinen A, Stoyan H, Stoyan D. Statistical analysis and modelling of spatial point patterns. Hoboken: Wiley; 2008. [Google Scholar]
- Jaquette J, Schweinhart B. Fractal dimension estimation with persistent homology: a comparative study. Commun. Ecol. 2013;84:105163. doi: 10.1016/j.cnsns.2019.105163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kang D, Choi H, Kim J, Choi J. Spatial epidemic dynamics of the COVID-19 outbreak in China. Int. J. Infect. Dis. 2020;94:96–102. doi: 10.1016/j.ijid.2020.03.076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kenkel N. Sample size requirements for fractal dimension estimation. Commun. Ecol. 2013;14(2):144–152. doi: 10.1556/ComEc.14.2013.2.4. [DOI] [Google Scholar]
- Kopytov, V., Petrenko, V., Tebueva, F., Streblianskaia, N.: An improved brown’s method applying fractal dimension to forecast the load in a computing cluster for short time series. Indian J. Sci. Technol. 9(19), 93909 (2016)
- Liebovitch L, Toth T. A fast algorithm to determine fractal dimensions by box counting. Phys. Lett. A. 1989;141(8–9):386–390. doi: 10.1016/0375-9601(89)90854-2. [DOI] [Google Scholar]
- Mou, D., Wang, Z.: Fractal dimension of well logging curves associated with the texture of volcanic rocks. In: 2014 international conference on mechatronics, electronic, industrial and control engineering (MEIC-14), (2014)
- Mandelbrot B. How long is the coast of Britain? Statistical self-similarity and fractional dimension. Science. 1967;156:636–644. doi: 10.1126/science.156.3775.636. [DOI] [PubMed] [Google Scholar]
- Mandelbrot B. The fractal geometry of nature. New York: Freeman; 1982. [Google Scholar]
- Miller L, Bhattacharyya R, Miller A. Spatial analysis of global variability in Covid-19 burden. Risk Manag. Healthc. Policy. 2020;13:519–522. doi: 10.2147/RMHP.S255793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mo D, Huang S. Fractal-based intrinsic dimension estimation and its application in dimensionality reduction. IEEE Trans. Knowl. Data Eng. 2010;24(1):59–71. [Google Scholar]
- Møller J, Waagepetersen R. Statistical inference and simulation for spatial point processes. London: Chapman and Hall/CRC; 2004. [Google Scholar]
- Plant R. Spatial data analysis in ecology and agriculture using R. London: CRC Press; 2012. [Google Scholar]
- R Core Team. (2020): R: A Language and Environment for Statistical Computing. R foundation for statistical computing, Vienna, Austria, https://www.R-project.org/
- Ramírez-Aldana R, Gomez-Verjan J, Bello-Chavolla O. Spatial analysis of COVID-19 spread in Iran: insights into geographical and structural transmission determinants at a province level. PLoS Neglect. Trop. Dis. 2020;14(1):e0008875. doi: 10.1371/journal.pntd.0008875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ripley B. Modelling spatial patterns. J. R. Stat. Soc. Ser. B. 1977;39(2):172–192. [Google Scholar]
- Ripley B. Spatial statistics. Hoboken: Wiley; 1981. [Google Scholar]
- Salvadori G, Ratti S, Belli G. Modelling spatial patterns. Environ. Sci. Pollut. Res. 1997;4(2):91–98. doi: 10.1007/BF02986286. [DOI] [PubMed] [Google Scholar]
- Schabenberger O, Gotway C. Statistical methods for spatial data analysis. London: Chapman and Hall/CRC; 2017. [Google Scholar]
- Sheater S. Density estimation. Stat. Sci. 2004;19(4):588–597. [Google Scholar]
- Tuia, D., Kanevski, M.: Environmental monitoring network characterization and clustering. Geostatistics, machine learning and Bayesian maximum entropy, advanced mapping of environmental data (2008) pp. 19–46
- Vega, C., Golay, J., Kanevski, M.: Multifractal portrayal of the Swiss population. Cybergeo: Eur. J. Geogr., (2015) http://journal.openedition.org/cybergeo/26829
- Vidal E, Vieira S, Clerici I, Paz A. Fractal dimension and geostatistical parameters for soil microrelief as a function of cumulative precipitation. Scientia Agricola. 2010;67(1):78–83. doi: 10.1590/S0103-90162010000100011. [DOI] [Google Scholar]
- Wiegand T, Moloney K. Handbook of spatial point-pattern analysis in ecology. London: CRC Press; 2013. [Google Scholar]
- Waagepetersen R. An estimating function approach to inference for inhomogeneous Neyman-Scott processes. Biometrics. 2007;63:252–258. doi: 10.1111/j.1541-0420.2006.00667.x. [DOI] [PubMed] [Google Scholar]












