Skip to main content
PLOS One logoLink to PLOS One
. 2021 Feb 9;16(2):e0246796. doi: 10.1371/journal.pone.0246796

Spatial dependence in the rank-size distribution of cities – weak but not negligible

Rolf Bergs 1,*
Editor: Yannis Ioannides2
PMCID: PMC7872244  PMID: 33561181

Abstract

Power law distributions characterise several natural and social phenomena. Zipf’s law for cities is one of those. The study views the question of whether that global regularity is independent of different spatial distributions of cities. For that purpose, a typical Zipfian rank-size distribution of cities is generated with random numbers. This distribution is then cast into two different settings of spatial coordinates. For the estimation, the variables rank and size are supplemented by considerations of spatial dependence within a spatial econometric approach. Results suggest that distance potentially matters. This finding is further corroborated by four country analyses even though estimates reveal only modest effects.

1 Introduction

Zipf’s law of the rank-size distribution of cities is regarded as an enthrallment of rare social physics. Krugman [1] has described this phenomenon even as an embarrassment for economic theory (p. 42–46). At first glance, the relationship between size and rank suggests a tautological relationship because size directly determines rank and vice versa. The independent variable is thus perhaps no true predictor but could just be part of a simple universal statistical phenomenon. The power law exponent typically close to -1 and a determination coefficient (R2) close to one are an indication of that. Therefore some emphatically questioned the relevance for economic analysis, notably Gan et al. [2]. Yet, such a striking ubiquitous regularity has motivated the exploration of hidden explanatory factors behind. Various authors, such as Gabaix, Fujita et al., Brakman et al., Reggiani and Nijkamp and Ioannides have done this [37]. In contrast to the Zipf distribution of frequency of words in languages [8], the rank-size distribution of cities appears slightly more varied between countries as e.g. shown by Rosen and Resnick [9] and less stable over time as e.g. found by Brakman et al. [10] but there is a secular convergence that is explained by Gibrat’s law and its resulting steady state [3]. The fact that this happens in all countries, regardless of their economic structures and histories, still lacks a truly sufficient explanation. When comparing such power law distributions for different types of data one criterion could perhaps add minor insight: A spatial versus non-spatial context. Spatial dependence in terms of contiguity or distance between cities of different rank or size may affect Zipf’s law for cities; in contrast, space can never predict the rank-distribution of words.

To have a closer look at that context, I first reflect on theoretical considerations of spatial dependence in Zipf’s law before simulating a typical rank-size distribution of cities for a varied distribution of spatial coordinates. The objective is to see how much influence spatial distance could have on the ranks of cities and thus on the shape of the distribution. Especially in countries with a geographical concentration of bigger cities there is reason to assume that these cities have evolved due to certain spatial advantages (e.g. raw materials, climate, accessibility or certain random determinants). Those city clusters are often characterised by specific industries of national importance. Whether and how dispersion and concentration forces determine the rank-size distribution of cities has been a widely researched object in urban economics.

Surprisingly, there has been little research shedding light on spatial dependence associated with Zipf’s law. Lalanne views the dichotomic urban structure of Canada [11]. She rejects the Zipf law and its underlying scale invariance and shows that the Canadian urban structure has evolved in a deterministic process based on urban size (inhabitants within administratively defined boundaries), previous growth and the spatial setting. Coefficients for the years 1971 to 2001 vary between -0.77 and -0.81. The spatial component is not part of the Pareto regression; instead growth of cities is regressed on size and previous growth in standard spatial regression models (SEM and SAR). Le Gallo and Chasco explore Zipf’s law for Spain by applying a SUR model which they cast into spatial autoregressive and error specifications. Zipf’s law does not hold between 1900 and 2001. While the simple OLS estimate varies between -0.54 and -0.66, the extended spatial models deviate even further from Zipf’s law, thus revealing spatial impacts [12]. Cheng and Zhuang [13], who look at urban evolution in central China under consideration of Zipf’s law, show that the estimation of the Pareto coefficient has displayed an undulatory pattern between 1985 and 2009. They cannot confirm Zipf’s law at any point of time. The OLS estimates are then augmented by the use of spatial autoregressive or spatial error specifications. Like in the study of Le Gallo and Chasco [12], spatial dependence increases the deviation from Zipf’s law. It is, however, to be stressed that in the three studies on Canada, Spain and China the size of cities is not defined by functionality but the number of inhabitants within administratively defined boundaries of all cities, i.e. including the lower tail of the distribution. This is a reason why Zipf’s law often does not hold [5: 301–306].

In the study at hand I intend to show (i) whether and how spatial dependence of a Zipfian rank-size distribution varies among different geographical settings of cities and (ii) how these settings behave differently along the entire distribution of cities and specifically the upper Pareto tail. Evidence suggests that the rank-size distribution is not homogeneously following a Pareto shape, but rather a combination of an upper Pareto and a lower lognormal section. By using a switching model, Ioannides and Skouras [14] show for US cities with 2000 Census places data that there is a narrow transition corridor around slightly more than 60,000 inhabitants where the upper tail Pareto distribution merges with a lower tail lognormal distribution. They reject Eeckhout’s standpoint [15], that the entire rank-size distribution of cities is best described by a lognormal distribution. Evidence of the hybrid distribution was corroborated by several further studies, e.g. Malevergne et al. [16], so that this is explicitly considered in my paper.

In this paper I first explore an ideal type random-generated hybrid distribution with two different spatial settings. This should demonstrate that the chosen econometric methodology is powerful enough to detect distance effects if the patterns are sufficiently strong. The simulation exercise is then followed by four country studies covering the USA, the United Kingdom, Germany and Slovenia, the latter representing a former smaller province of Yugoslavia. For Slovenia, not only population was used as the size variable but also the detected extent of natural cities to better represent their true functional size. In essence, the paper is a theory-led artificial simulation of Zipf’s law enriched with real world studies.

2 Some theoretical considerations

The spatial relevance of the rank-size distribution of cities was already emphasised by Zipf himself in his widely noticed lemma [17]: If effort of interaction among the possible pairs of cities is optimum (with least effort for all individuals), the cities (settlements) of different size are ranked in a way so that the total population of a country Sc equals the sum of a harmonic series:

Sc=Sp1α+Sp2α+Sp3α+Sprnα=r=1nSprα, (1)

where Sp is the population of the primacy city, r is the rank of an individual city and α is the power exponent determining the shape; in case of a perfect Zipfian relationship, the cumulative distribution function then follows

S=BRα (2)

or in its reversed Pareto form with R (rank) as the dependent variable:

R=CSα (3)

with α = 1 and B = C. This power law describes a scale-invariant pattern with very few large and very many small items as it is found in many natural systems. According to Zipf, the slope of that particular distribution necessitates the effort of interaction between the communities to be at a minimum when multiplied by the distances d between the communities. Zipf‘s lemma describes a stylized equilibrium model in which there is a scattered distribution of settlements close to the raw materials (first economy) and one big city where all the raw materials are processed (second economy). Since living in either place will create opportunity costs for any dweller, both economies are in conflict over unification and diversification. The conflict between those forces plays a central role in the determination of the effort-minimizing number, location and sizes of settlements or, with other words: the built environment is created so that costs of primary production, processing, and the transport of goods and factors between the two economies are minimized. Obviously, an equilibrium is found when the magnitude of the centrifugal and centripetal forces is equal. In this optimum case α = 1, and the equilibrium is then Pareto efficient. If one imagines a growing network of cities, it becomes obvious that the number of connections represents an economic value (utility), and this again is highest the minimum possible effort is needed. As explained by Kak [18], the value of a potential network with n items (cities) then grows in proportion to n ∙ log(n). This explains a power law behavior as a precondition of least effort.

According to Zipf [17], this pattern only works in social systems that exactly produce what they consume and where all members of the population receive an equal share of the national income. This understanding very obviously assumes constant returns to scale in both economies. Consequently, if the system is not in an equilibrium (e.g. with the occurrence of increasing returns), diversifying (centrifugal) and unifying (centripetal) forces do not offset each other. In this case the slope of the power law changes. It is to be stressed that Zipf’s spatial equilibrium essentially depends on the existence of spatial heterogeneity. Perfect divisibility of space would rule out any equilibrium (Starret’s spatial impossibility theorem) [5, 19]. Hence, (i) for cities to evolve anyway, indivisibility is needed and (ii) for cities to evolve efficiently with optimum allocation of resources needed for interaction, their rank-size distribution should converge to Zipf’s law. This deserves some closer examination since indivisibility of space unveils an important explanatory limitation of Zipf’s considerations of a spatial equilibrium.

In Zipf’s model, the difference between the first and second economy is solely explained by their respective functional roles. However, the evolution of the spatial economy, characterised by a dynamic rural-urban differentiation, essentially exhibits spatial factor and goods price differentials that originate from several interrelated determinants, such as increasing returns in manufacturing production [20], monopolistic competition and higher real urban income through the supply of a variety of substitutable goods and lower transport costs [21], stronger knowledge spillovers in agglomerated urban settings [22], trickle-down effects of individual specialised skills on the local qualification and productivity levels [23] or agglomeration fuelled by entrepreneurial uncertainty and risk [24] to mention some. In those settings consumers aim to maximise their utility not only by minimising effort of transport but also to maximise their real income through an optimum choice of expenditure on food and on the variety of substitutable manufactured products. The “love for variety effect”, scale economies and less transportation effort are circularly caused. They are a bonus for larger markets [5] and constitute an urban amenity.

Interestingly, the major thread of the subsequent theoretical literature on Zipf’s law since the 1950s centered around statistical and largely non-economic explanations based on random growth of population [25]. Later the size- and variance-independent growth of cities was discussed to explain the inherent fractal dimension of Zipf’s law by Gibrat’s law. In these models the potential spatial dimension was largely ignored. Indeed, in Gabaix‘s [3] model of zero normalized city growth, space and distance between cities do not suggest to be meaningful factors under the strict assumption of Gibrat’s law. In this model, the economic foundation of Gibrat’s law is explained by scale-independent regional and policy shocks with the same variance for all cities in addition to specific shocks that affect particular industries, thus implying a decreasing variance with city size. However, for the upper tail of the city size distribution, industrial shocks may die out so that, according to Gabaix, variance rather depends on the policy and regional shocks.

While urban economics and regional science in the 20th century generated an abundance of theoretical models to explain agglomeration economies, most of them had pursued a partial focus. During the last twenty years, economic geography models became more consolidated as to nest different hypotheses that may constitute forces of agglomeration. An important seed of those efforts was the seminal work of Fujita, Krugman and Venables [4] which reveals the sensitivity of a spatial evolution path simulated by the relationship between the share of manufacturing employment and transport costs adjusted with few decisive parameter settings (substitution elasticity, iceberg losses). Those determine tipping points (bifurcations) causing either centrifugal or centripetal spatial evolution paths. This work also contributed to more insight into the economic determinants of the rank-size distribution of cities. Brakman et al. [5] develop a core economic geography model with monopolistic competition and further extensions to explore the behavior of the spatial economy under different parameter adjustments. By extending their model with congestion, as the major counterforce of agglomeration, they simulate Zipf’s law historically and show that there is an N-shaped pattern of the Zipf coefficient over time from the pre-industrial to the post-industrial era (for log of city size as the dependent variable and log of rank as the predictor). In this model, aimed to explain Zipf’s law, space is explicitly considered, but in terms of agglomeration, rather than inter-city distances.

Only recently, a growing record of research stressing the relevance of distance and accessibility in the evolution of urban space can be observed. Indeed, the functional differentiation between Zipf‘s first and second (spatial) economy closely corresponds to the Central Places theory provided the existence of agglomeration economies is not ignored. Distance, or the effort to cover it, is then the major friction in city interaction. This makes spatial distance not only important for city interaction but eventually also relevant to the rank-size distribution of those cities.

In a highly comprehensive analysis to capture the determinants of Zipf’s law economically and spatially, Ioannides [7] demonstrates the limits of explanatory power of otherwise well-founded theories. This relates to independently and identically distributed (i.i.d.) growth rates of cities when using normalised city sizes and to an entirely lognormal city size distribution determined by Gibrat’s law [3, 15]. In a growing and increasingly urbanizing economy the assumption that city growth rates are essentially i.i.d. may not further hold. Especially the relationship between the variation of fixed costs and the number of production sites appears to be an important element to explain a power law distribution in the upper tail of the city sizes. The finding reveals important explanatory power of the Central Places theory. Firms with lower fixed costs are spatially more dispersed (i.e. in big and small cities), while those with high fixed costs locate close to those with lower fixed costs (usually in larger cities). This refers to the work of Hsu [26] who concludes that “… The power law for cities and firms and the NAS [“Number-Average-Size”] rule arise when the distribution of scale economies is regularly varying. In fact, this is the condition for ensuring that a central place hierarchy is a fractal structure…” (p. 923). The Central Place theory can be thus reconciled with the mainstream economic theories and a power law behavior of the upper tail of the city distribution: Larger cities are not only more diversified than small ones because many small and big industries are agglomerated there but because bigger cities specialize in industries with higher scale economies.

In a very recent analysis of this thread of studies, Mori et al. [27] compare real with random city systems at national level and within the hierarchy of a system with central places. They find strong evidence of a fractal dimension in the rank-size distribution of cities, but this is not governed by random growth of cities but rather by local city systems surrounding major cities. In another study, Jiang et al. [28] explores the system of cities from the viewpoint of the design of space and finds that cities are not isolated but coherent entities within a connected whole, whereas cities themselves comprise coherent hotspots. Also arguing with the Central Place theory, Jiang concludes that the order of the built environment corresponds to the order in nature and that scaling law and spatial dependence “… are fundamental not only to geographical phenomena, but also to any other living structure that recurs between the Planck length and the size of the universe itself. …” (p. 311).

In addition to economic geography models there are also further recent contributions aimed to explain a spatial Zipf law from the viewpoint of social physics. With a probability based approach of urban evolution, Rybski et al. simulate formation and growth of cities under the assumption of Tobler’s law, namely that urban growth takes place close to other urban settlements [29]. In comparative simulations with different numbers of iterations and different strengths of distance decay γ they show, that for sites in a grid with a central site already occupied (w = 1) any other site j in that grid (w = 0) will be occupied with a probability:

pj=Akjwkdj,kγkjdj,kγ, (4)

where dj,k is the Euclidian distance between locationas j and k, and A is a normalized constant so that the maximum probability is 1. Hence, in this simulation, supported by a real world study on the urban development of Paris, evolution of new sites solely depends on distance to sites already existing. Despite the fact that the simulated urban evolution is not random but deterministic the authors confirm Zipf’s law and scale invariance of the clusters generated, except the primate one. Thus, Zipf’s law can be also reproduced by “spatial explicit preferential attachment”.

A well-known example of such peculiar spatial trajectories is the Ruhr area in Germany. Here the theory of Central Places seems to fail in explaining the fact that bigger cities (more than 100,000 inhabitants) are just medium centers or even cities with minor central relevance. Findings of Dobkins and Ioannides [30] for US cities also confirm that large cities tend to have large neighbors. This may suggest a possible inconsistency with the Central Place theory, not for the regional setting of cities (as new neighbors entering are still relatively small compared to the older ones) but perhaps from a national viewpoint with a larger variation of city sizes within the different centrality classes. Here it may happen that size of cities in urban clusters does not not anymore correspond to centrality. But also an opposite type of spatial settings is imaginable, such as for big urban areas surrounded by only particularly small municipalities with little centrality function, e.g. the Berlin urban zone.

Such peculiar agglomeration phenomena then essentially imply positive or negative spatial autocorrelation of city ranks or sizes and may have an influence on the Pareto coefficient at national scale, so that size and growth of such cities are not necessarily random but spatially autoregressive (or disturbed by spatially autocorrelated error) and thus partly depending on rank or size of neighbor settlements and their growth.

In the real world, Zipf’s law for cities is never absolutely perfect. Many empirical studies have shown this, as mentioned earlier. Reasons for that can stem from the regional political economy, notably a functionally inadequate administrative delineation of urban space or the politically emphasized weight of the primate city (in many cases the capital). A further reason can be the typical hybrid distribution form mentioned earlier (Pareto and lognormal) and whether the full or a curtailed range of city sizes is regarded; coefficients estimated can differ strongly. In addition to such data issues, increasing returns or congestion can have an influence on the rank-size distribution of cities. This does not at all mean that Zipf’s law fails in such cases. A spatial distance influence can improve the fit of the power law or it can move α further away from 1. But as long as evolution of big cities in urban clusters tends to exhibit spatial dependence in the rank-size distribution, its effects would be essentially concealed by a non-spatial regression. In conclusion, this suggests a spatial econometric approach when testing Zipf’s law.

3 Methodological approach

The central assumption is as follows: For the typical upper Pareto (3,1) tail the expected exponent α is approximately 1 (±0.1) as empirically confirmed ubiquitously. In its log-linear form the above described cumulative distribution function (3) to be estimated is:

ln(R)=ln(C)αln(S)+ε (5)

where the residual error is assumed i.i.d with εN(0,σ2). OLS or maximum likelihood are possible estimators of α. For the combined Pareto and lognormal tails the exponent usually does not fit Zipf’s law but only for the Pareto section. In addition to that, the theoretical considerations put forward earlier suggest that the rank-size distribution of cities is potentially affected by spatial distance in the sense of Tobler’s law. Such spatial forces are however not incorporated into Zipf’s law, so that in theory a Pareto exponent α≈1 in one country may remain stable under consideration of spatial dependence of rank while, in another one, the incorporation of such interaction could perhaps lead to a minor or major change of α. For spatial dependence to be considered for the rank-size distribution of cities, some methodological considerations are relevant: Compared to gravity estimations that address the interaction of places (i.e. the number of combinations), the estimation of the rank-size distribution cannot include distance as one regular predictor. Either distance enters the model as a large matrix of single independent variables (one for every city combination) or one controls for spatial spillover or error of the residuals in regression analyses. It is to be stressed that a stand-alone construction of numerous independent distance variables would ignore the possible endogeneity of distance (spillover effects of the dependent variable or residual spatial autocorrelation). The underlying economic rationale is the utility of interaction with respect to spatial distance between cities of either similar or very different ranks. A more precise approach would be thus a spatial econometric procedure [31]:

ln(R)=ρWln(R)+ln(C)αln(S)+ε(SAR) (6)

or

{ln(R)=ln(C)αln(S)+ν(SEM)ν=λWν+ε (7)

where W is a N x N row-standardised weight matrix (inverse distance) to capture a potential distance effect and C is a constant while ρ (spatial spillover) and λ (spatial autocorrelation in the residuals) in addition to α (direct effects) are the coefficients estimated. The error term ν in the SEM case consists of spatial error and the residual ε.

The right choice between both models can be determined by different tests, such as the z-score of Moran‘s I of the residuals and (Robust) Lagrange multiplier (LM) tests. The different estimation types in Tables 1 and 2 correspond to the respective choice.

Table 1. Simulated spatial extension of Zipf’s law (full rank-size distribution).

Endogenous variable: ln(Rank) I II
ln(Size) -0.403 -0.322
(Standard error) (0,024)*** (0,028)***
Constant 3.871 1.057
(Standard error) (0,037)*** (2,998)***
λ -0.321 0.969
(Standard error) (0,650) (0,030)***
ρ
(Standard error)
Log Likelihood -78.581 -44.774
Wald-Test λ and ρ = 0
χ2 0.244 1016.678
z-score: Moran’s I (resid.) 0.141 10.081
(p) (0,888) (0,000)
Lagrange Multiplier (LM) 0.163 80.748
(p) (0,686) (0,000)
Robust LM 0.028 15.462
(p) (0,867) (0,000)
Obs. 109 109
OLS
ln(Size) -0.403 -0.403
(Standard error) (0,025)*** (0,025)***
Constant 3.870 3.870
(Standard error) (0,049)*** (0,049)***
Adj. R2 0.709 0.709

Note: Column I shows estimates for the arrangement of all city ranks with a normal distribution across space (see Fig 1) while column II displays the respective estimates for the geographically ranked arrangement of cities (see Fig 2). The choice between either SAR or SEM was determined by the z-score of Moran’s I of the residuals and the Lagrange multiplier test statistic.

Source: Own data simulations.

Table 2. Simulated spatial extension of Zipf’s law (rank-size distribution for the Pareto tail).

Endogenous variable: ln(Rank) I II
ln(Size) -0.995 -0.862
(0,018)*** (0,029)***
Constant 5.086 3.846
(0,430)*** (0,208)***
λ - -
- -
ρ -0.056 0.262
(0,145) (0,050)***
Log Likelihood 39.618 50.343
Wald-Test λ and ρ = 0
χ2 0.149 27.289
z-score: Moran’s I (resid.) - -
(p) - -
Lagrange Multiplier (LM) 0.146 23.817
(0,703) (0,000)
Robust LM 0.081 14.067
(0,776) (0,000)
Obs. 50 50
OLS
ln(Size) -0.995 -0.995
(0,018)*** (0,018)***
Constant 4.921 4.921
(0,039)*** (0,039)***
Adj. R2 0.984 0.984

Note: Column I shows the estimates for arrangement of the Pareto tail of the city ranks with a normal distribution across space (see Fig 1) while column II displays the respective estimates for the geographically ranked arrangement of cities (see Fig 2). The choice between either SAR or SEM was determined by the z-score of Moran’s I of the residuals and the Lagrange multiplier test statistic.

Source: Own data simulations.

Spatial lags can be also expected for the independent variable. In an extended Spatial Durbin model, both, the dependent as well as independent variables appear simultaneously as lagged variables. As proposed by Halleck Vega and Elhorst, a simpler approach to consider the spatial lag of the predictor is offered by the SLX model [32]:

ln(R)=ln(C)αln(S)±θWln(S)+ε. (8)

The coefficients α and θ can be estimated by OLS. This model is applied in addition to the SEM/SAR estimations to control for spatial dependence of the variable S.

One major shortcoming of all such spatial econometric procedures needs to be stressed: Inverse distance never properly represents the effort needed to access a close or distant place. Natural transport infrastructure, topographical characteristics and the energy resources available also determine mutual accessibility and city interaction. Distance is thus only a proxy for effort and time needed, given it is understood in the same way as Zipf had defined the problem in his lemma. But this caveat applies to all such spatial econometric models as long as there are no differentiated data that can replace inverse distance in the spatial weight matrix. In the end, a comparative view over time (different years) based on a true effort-specific weight matrix could much better reveal the important dynamic of spatial dependence in specific geographical settings during phases of major structural change. But this would be a subject of future research.

4 Data

The two simulated “countries” describe a distribution with in each case 109 observations for cities, the upper 50 being Pareto (3,1) distributed and the lower 59 with a lognormal shape. Both distributions are consecutively random-generated and then matched into one data set. The first step is the generation of 100 observations for both distributions. The upper 50 observations of the Pareto set are then matched at the point with the next smaller observation in the lognormal set. The proportion of observations in both tails could be different, e.g. exhibiting a larger lognormal tail, but this would only affect the shape of the full distribution. The only purpose is to simulate one realistic rank-size distribution and to explore how sensitive it reacts on changing geographic coordinates. In the next step this hybrid rank-size distribution is combined with different distributions of coordinates X and Y, the first one being randomly generated to fit a normal distribution (Fig 1). Based on this configuration a spatial weight matrix is derived. With a normal distribution of both X and Y coordinates big and small cities are spread evenly.

Fig 1. Normal spatial distribution of cities (Pareto and lognormal tails).

Fig 1

In the second variation (Fig 2) with the same rank distribution the normally distributed coordinates of X and Y are both ranked as well, so that all cities are geographically positioned on a diagonal line, ordered along rank, the biggest city in the outer North-East, the smallest one in the outer South-West (like a one-dimensional von-Thunen assembly).

Fig 2. Spatial distribution of cities with ranked coordinates (Pareto and lognormal tails).

Fig 2

It is expected that in this case both, rank as well as size, exhibit spatial autocorrelation even though such a setting is hardly encountered in the real world.

The only purpose of this extreme setting is to show the possible potential of distance impact depending on the distribution of coordinates. With other words, identical coefficients confirming Zipf’s law may have a different meaning for different countries.

The two simulated cases are then compared with respect to the stability of α and the significance of the spatial parameters ρ or λ respectively. I hypothesise that with a normally distributed arrangement of coordinates the spatial weight parameters are insignificant and meaningless, representing a Zipf distribution of the upper tail similar to that of words in a language. However, when modifying the coordinates and the spatial distribution of cities by building clusters of cities with a different level of size we may expect some stronger and significant distance impact. The interesting question is then how stable the original Zipf distribution of the upper tail remains.

Finally, and in addition to the simulation analysis, this artificial exercise needs to be examined in the real world. The two questions are: do we find countries with significant spatial dependence in the rank-size distribution of cities and, if yes, how strong could it be? For that purpose I examine the spatial distance influence on Zipf’s law with population data on US, German, British and Slovene urban areas respectively. For Slovenia as a particular case of young and small country, also natural cities (extracted from nocturnal satellite imagery) are explored in order to better capture the true functional urban space in that country. The respective image segmentation methodology is further explained in Bergs [33]. The source of data is the National Oceanic and Atmospheric Administration (NOAA) [34].

For the USA, the UK and Germany the database is truncated below 100,000 inhabitants. This is in line with Gabaix [3], Giesen and Südekum [35] and Brakman et al. [5]. Slovenia, being a former minor province of Yugoslavia, represents a lower scaling level with the primate city slightly more than twice as large as the above truncation point. Therefore cities larger than 10,000 inhabitants are covered. To demonstrate that all cities regarded are within the upper Pareto tail, a Shapiro-Wilk test was carried out for the log-transformed observations of population (or natural size).

As for the simulations, the SAR and the SEM model were used. In order to control for possible spatial dependence of the predictor in the country studies, SLX as an alternative spatial model was also tested. In all country models, the dependent variable was modified to R-1/2 (Gabaix-Ibragimov estimate) to avoid a potential bias of standard errors [36].

5 Results

First I take a look at the different results of the simulation exercise. Table 1 shows the full rank-size distribution, while Table 2 displays the estimates only for the upper (Pareto) tail. Regarding case I, Zipf’s law is only confirmed for the curtailed distribution: α≈1 (Table 2). The lognormal extension with the smaller cities reduces the estimate of α to a large extent. As expected, the spatial coefficients λ of the full distribution and ρ of the Pareto tail are not significant. The normal distribution of coordinates of big and small cities leads to zero spatial dependence. With or without the spatial extension of the model the α coefficient remains the same. Zipf’s law is well confirmed for the Pareto tail.

Case II shows the same rank-size distribution but ordered geographically along the coordinates. Now, there is a significant spatial error influence confirmed by λ for the full distribution and a spillover effect ρ for the Pareto tail. The absolute value of α decreases substantially when incorporating spatial dependence. Zipf’s law cannot be further confirmed for the upper (Pareto) tail, even though the random-generated distribution had been a Pareto (3,1) one. At a first glimpse this finding may be puzzling, but it simply confirms the potential effect of a spatial arrangement with extremely enhanced spatial autocorrelation.

The simulated distributions of coordinates show that, in theory, spatial distance may have a potential impact on the coefficient of the rank-size distribution of cities. The stylised models above are however artificial and not likely to be encountered in the real world. Therefore data of four countries (three big ones, one small) are used to see how the spatial arrangement of cities may influence the estimate of rank-size distribution in the real world. As expected, the results generated are less spectacular than for simulation II but still suggesting spatial dependence to play a role in the rank-size distribution of cities for some countries:

For all samples the Shapiro-Wilk test rejects the Null hypothesis of normal distribution of the log-transformed observations in the upper tail, so the distributions regarded represent the respective Pareto tail.

Estimates for the rank-size distribution of US urban zones are very much in line with Zipf’s law (α = 1.005). The spatial error effect is small, however highly significant, and slightly improves the estimate. Hence, there is a very minor distance effect. A similar result is obtained for Germany: In the distribution of urban zones a significant spatial error improved the Pareto coefficient from 0.930 to 0.948. I also compared this estimate with German cities proper. In this sample, a coefficient of α = 1.239 could not confirm Zipf’s law; however even here a significant spatial lag effect moves the estimate slightly closer to Zipf (α = 1.227).

In contrast to the USA and Germany, distance effects are insignificant in the case of the United Kingdom, both, for spatial error (not displayed) as well as spatial lag.

The Slovene case constitutes itself a bit different. There is no distance effect on the rank-size distribution of municipalities larger than 10,000 inhabitants. However, when viewing natural cities extracted from night satellite images we find that spatial spillovers (ρ) are significant at the p<0.05 level; the α coefficient, however, changes from 0.983 (within the Zipf tolerance of α = 1±0.1) to 0.860. In this case, spatial dependence implies a deterioration of the rank-size distribution.

To complete the econometric findings by viewing a possible spatial lag of the predictor, the SLX model did not reveal significant spillover effects for any of the countries (Table 3). Hence, spatial dependence is only found for the dependent variable R.

Table 3. Detection of spatial dependence by SLX regressions for selected countries (rank-size distribution of upper tails).

Endogenous variable: USA (urb. zones) Germany (urb. zones) Germany (cities proper) UK (urban zones) Slovenia (municipalities)
ln(Rank-0.5)
ln(Size) -1.007 -0.946 -1.239 -1.055 -0.851
(Standard error) (0.004)*** (0.020)*** (0.140)*** (0.013)*** (0.073)***
Constant 17.234 15.162 18.077 15.966 9.592
(Standard error) (0.126)*** (0.444)*** (0.675)*** (0.350)*** (1.836)***
θ 0.002 -0.037 0.107 0.008 0.034
(Standard error) (0.009) (0.027) (0.051) (0.232) (0.132)
Adj. R2 0.993 0.978 0.988 0.989 0.943
Breusch-Pagan test (p>χ2) (0.000) (0.000) (0.000) (0.000) (0.678)
Obs. 409 50 91 74 12

Note: The estimation of spatial dependence in the SLX model is limited to the spatial lag of the predictor variable. For Slovenia it was not possible to run this regression for the segmented VIIRS patches because coordinates are generated by image analysis (ImageJ). These have an equal area projection but are not transformable into kilometer distances by the Vincenty formula.

Data sources: See Table 4.

Now, an interesting question is, from where such spatial dependence effects of rank may originate. For this purpose the local Moran I coefficients (LISA indicator) may offer insight [37]. The spatial weight matrices are those generated for the SEM/SAR regressions (data sources: see Table 4). In Fig 3 the resulting coefficients and the p-values for the USA, Germany, the UK and Slovenia are displayed and compared. The p-values are particularly important for the interpretation.

Table 4. Detection of spatial dependence by SEM/SAR regressions for selected countries (rank-size distribution of upper tails).

Endogenous variable: ln(Rank-0.5) USA (urban zones>100,000 inhabitants) Germany (urban zones >100,000 inhabitants) Germany (cities proper >100,000 inhabitants) United Kingdom (urban zones>100,000 inhabitants) Slovenia (municipalities >10,000 inhabitants) Slovenia (segmented VIIRS patches; upper tail)
ln(Size) -1.004 -0.948 -1.227 -1.059 -0.875 -0.860
(Standard error) (0.004)*** (0.012)*** (0.014)*** (0.013)*** (0.057)*** (0.071)***
Constant 17.171 14.735 18.738 15.801 10.275 5.765
(Standard error) (0.048)*** (0.148)*** (0.286)*** (0.307)*** (0.483)*** (0.227)***
λ -0.781 -3.520 - - - -
(Standard error) (0.263)*** (0.856)*** - - - -
ρ - - -0.224 0.110 -0.045 -0.549
(Standard error) - - (0.100)* (0.101) (0.237) (0.240)*
Log Likelihood 499.317 57.624 86.480 73.348 6.395 8.389
Wald-Test λ and ρ = 0
χ2 8.412 16.880 4.981 1.177 0.036 5.255
z-score: Moran’s I (resid.) -2.637 -2.137 - - - -
(p) (1.992) (1.967) - - - -
Lagrange Multiplier (LM) 7.357 3.872 4.099 1.438 0.048 3.465
(p) (0.007) (0.049) (0.043) (0.230) (0.827) (0.063)
Robust LM 7.034 4.902 4.487 1.121 0.065 3.502
(p) (0.008) (0.027) (0.034) (0.290) (0.798) (0.061)
Obs. 451 73 91 79 16 11
OLS: ln(Rank-0.5)
ln(Size) -1.005 -0.930 -1.239 -1.056 -0.881 -0.983
(Standard error) (0.004)*** (0.015)*** (0.014)*** (0.013)*** (0.051)*** (0.064)***
Constant 17.178 14.503 18.213 16.087 10.271 5.786
(Standard error) (0.049)*** (0.186)*** (0.169)*** `(0.160)*** (0.517)*** (0.309)***
Adj. R2 0.993 0.982 0.989 0.989 0.955 0.959
Shapiro-Wilk (p>z) 0.000 0.000 0.000 0.000 0.001 0.000

Note: The choice between either SAR or SEM was determined by the z-score of Moran’s I of the residuals and the Lagrange multiplier test statistic.

Missing geographical coordinates are compiled from: https://worldpopulationreview.com/countries/cities and Wikipedia.

Fig 3. LISA coefficients (local spatial autocorrelation) for the biggest urban areas.

Fig 3

A striking evidence is that most of the 25 biggest urban zones in the USA exhibit significant spatial autocorrelation. Negative LISA coefficients prevail but there is a remarkable spread especially for the largest observations, e.g. a LISA coefficient of +3.9 for New York, outside the range of the Y axis. For Germany, the nine biggest urban zones exhibit significant spatial autocorrelation; the strongest being the Ruhr area on rank 1 (for German cities proper only five out of the biggest). For the UK, a significant LISA coefficient is only found for the first three urban zones. For Slovenia, spatial autocorrelation can only be established for the capital city area (Ljubljana). Getting back to the spatial econometric estimates, it is to be remembered that the most significant distance impact is found for the US urban areas, followed by German urban areas. This seems to be reflected by the LISA coefficients.

To summarize, the estimations discussed above display partly significant though modest spatial dependence in the city rank-size distribution of few selected countries. The existence of a significant type II error thus confirms the existence of spatial dependence. Probably there may be stronger or weaker such disturbances in other countries which are not regarded in this small sample. A large comparative study covering all countries, ideally over time and with more realistic weight matrices in the spatial econometric models (see earlier), could shed light on the global variation of spatial dependence in Zipf’s law for cities.

6 Conclusion and further interpretation

Compared to Zipf’s law for words in languages the results suggest that in case of cities, their spatial arrangement matters: Zipf‘s law for cities will behave like Zipf’s law for words only if small and big cities are normally distributed in space. This is shown by the two simulations. In the predominant theory, during time cities may change their size, but the slope of the rank-size distribution remains rather stable [3]. This has been explained by its scale invariance and city growth independent of city size (Gibrat’s law). Hence, the change of city ranks might be well explained by economic forces, but it is not directly visible in a changing slope of the rank-size distribution of cities. For this thread of argumentation spatial impact has no particular relevance. But studies combining Zipf’s law with the Central Place theory show that a spatial relationship between centers of different layers is also in line with scale invariance [2628]. Zipf’s law can be also established in a model where distance exclusively governs the probability of city formation and growth [29]. There is thus reason to assume that dispersion and concentration forces determine the geographical distribution and centrality levels of cities, occasionally with more or less spatial dependence in their rank-size distribution. A spatial econometric approach suggests to shed light on such residual spatial dependence. If Gan et al. [2] were right, and Zipf’s law represents nothing more than a pure statistical relationship, the extension of the model with spatial distance effects would not change α. Where such spatial impact is significant, whether strong or modest, Zipf’s law for cities is certainly more than a pure statistical phenomenon.

Supporting information

S1 File

(ZIP)

Data Availability

Data are available from Harvard Dataverse: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/EK4CNU

Funding Statement

The author is affiliated as partner with PRAC. PRAC is a private institute organised as a partnership company. The research represented by the paper at hand is a secondary outcome of a current research project, being funded by the European Commission „Horizon2020“ programme (grant No. 727988). The funder has not been involved in the study design, data collection, decision to publish and preparation of the manuscript. The only criterion to be fulfilled by the author is the thematical relevance of this manuscript for the purpose of the a.m. research project. The funder provided support in the form of salaries for authors [RB], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section. My commercial affiliation (PRAC) did not play any role in this context. The paper was exclusively prepared and written by myself.

References

  • 1.Krugman P. Development, Geography and Economic Theory. MIT Press, Cambridge (Mass.) 1997 [Google Scholar]
  • 2.Gan L, Li D, Song S. Is the Zipf law spurious in explaining city-size distributions? Econ Lett. 2006; 92(2): p. 256–262 [Google Scholar]
  • 3.Gabaix X. Zipf’s Law for Cities: An Explanation. Q J Econ. 1999; 114(3): p. 739–767 [Google Scholar]
  • 4.Fujita M, Krugman P, Venables AJ. The Spatial Economy: Cities, Regions and International Trade. MIT Press, Cambridge (Mass.) 1999 [Google Scholar]
  • 5.Brakman S, Garretsen H, van Marrewijk C. The New Introduction to Geographical Economics. Cambridge University Press, Cambridge: 2009 [Google Scholar]
  • 6.Reggiani A, Nijkamp P. Did Zipf anticipate Socio-economic Spatial Networks? Environ Plann B. 2015; 42(3): p. 468–489 [Google Scholar]
  • 7.Ioannides YM. From Neighborhood to Nations: The Economics of Social Interactions. Princeton University Press, Princeton (NJ) 2013 [Google Scholar]
  • 8.Zipf GK. Human Behaviour and the Principles of Least Effort. Addison Wesley, New York: 1949. [Google Scholar]
  • 9.Rosen KT, Resnick M. The Size and Distribution of Cities: An Examination of Pareto Law and Primacy. J Urban Econ. 1980; 8(2): p. 165–186 [Google Scholar]
  • 10.Brakman S, Garretsen H, van Marrewijk C. The return of Zipf: Towards a further understanding of the rank-size distribution. J Reg Sci. 1999; 39(1): p. 183–213 [Google Scholar]
  • 11.Lalanne A. Zipf’s Law and Canadian Urban Growth. Urb Stud. 2014; 51(8): p. 1725–1740 [Google Scholar]
  • 12.Le Gallo J, Chasco C. Spatial analysis of urban growth in Spain, 1900–2001. Empir Econ. 2008; 34: p. 59–80 [Google Scholar]
  • 13.Cheng K, Zhuang Y. Spatial Econometric Analysis of the Rank-size Rule for the Urban System: A Case of Prefectural-level Cities in China’s Middle Area. Sci Geogr Sinica. 2012; 32(8): p. 905–912 [Google Scholar]
  • 14.Ioannides Y, Skouras S. US city size distribution: Robustly Pareto, but only in the tail. J Urb Econ. 2013; 73: p. 18–29 [Google Scholar]
  • 15.Eeckhout J. Gibrat’s Law for (All) Cities. Am Econ Rev. 2004; 94(5): p. 1429–1451 [Google Scholar]
  • 16.Malevergne Y, Pisarenko V, Sornette D. Testing the Pareto against the lognormal distributions with the uniformly most powerful unbiased test applied to the distribution of cities. Phys Rev E. 2011; 83: 036111 10.1103/PhysRevE.83.036111 [DOI] [PubMed] [Google Scholar]
  • 17.Zipf JK. The P1P2/D hypothesis: On the intercity movement of persons. Am Soc Rev. 1946; 11(6): p. 677–686 [Google Scholar]
  • 18.Kak S. Power series models of self-similarity in social networks. Inf Sci. 2017; 376: p. 31–38 [Google Scholar]
  • 19.Starret D. Market allocations of location choice in a model with free mobility. J Econ Theory. 1978; 17(1): p. 21–37 [Google Scholar]
  • 20.Krugman P. Increasing returns and economic geography. J Pol Econ. 1991; 99(3): p. 483–499 [Google Scholar]
  • 21.Fujita M, Krugman P. A monopolistic competition model of urban systems and trade In Huriot JM, Thisse JF (eds): Economics of Cities—Theoretical Perspectives. Cambridge University Press, Cambridge: 2000 [Google Scholar]
  • 22.Black D, Henderson V. A theory of urban growth. J Pol Econ. 1999; 107(2): p. 252–284 [Google Scholar]
  • 23.Eaton J, Eckstein Z. Cities and growth: Theory and evidence from France and Japan. Reg Sci Urb Econ. 1997; 27: p. 443–474 [Google Scholar]
  • 24.Strange W, Hejazi W, Tang J. The uncertain city: Competitive instability, skills, innovation and the strategy of agglomeration. J Urb Econ. 2006; 59: p. 331–351 [Google Scholar]
  • 25.Simon HA. On a class of skew distribution functions. Biometrika. 1955; 42(3/4): p. 425–440 [Google Scholar]
  • 26.Hsu WT. Central place theory and city size distribution. Econ J. 2011: 122 (563): p. 903–932 [Google Scholar]
  • 27.Mori T, Smith TE, Hsu WT. Common power laws for cities and spatial fractal structures. Proc Natl Acad Sci U S A. 2020; 117(12): p. 6469–6475 10.1073/pnas.1913014117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Jiang B. A Topological Representation for Taking Cities as a Coherent Whole. Geogr Anal. 2017; 50(3): p. 298–313 [Google Scholar]
  • 29.Rybski D, García Cantú Ros A, Kropp JP. Distance-weighted city growth. Phys Rev E. 2013; 87:04214 10.1103/PhysRevE.87.042114 [DOI] [PubMed] [Google Scholar]
  • 30.Dobkins LH, Ioannides YM. Spatial interactions among U.S. cities: 1900–1990. Reg Sci Urb Econ. 2001; 31: p. 701–731 [Google Scholar]
  • 31.Harris R, Moffat J, Kravtsova V. In search of ‚W‘. Spat Econ Anal. 2011; 6(3): p. 249–270 [Google Scholar]
  • 32.Halleck Vega S, Elhost P. The SLX model. J Reg Sci. 2015; 55(3): p. 339–363 [Google Scholar]
  • 33.Bergs R. The detection of natural cities in the Netherlands–Nocturnal satellite inagery and Zipf’s law. Rev Reg Res. 2018; 38(2): p. 111–140 [Google Scholar]
  • 34.National Oceanic and Atmospheric Administration. Version 1 VIIRS Day/Night Band Nighttime Lights. 2017. https://ngdc.noaa.gov/eog/viirs/download_dnb_composites.html.
  • 35.Giesen K, Südekum J. Zipf’s law for cities in the regions and the country. J Econ Geogr. 2011; 11(4): p. 667–686 [Google Scholar]
  • 36.Gabaix X, Ibragimov R. Rank-1/2: A simple way to improve the OLS estimation of tail exponents. J Bus Econ Stat. 2011; 29(1): p. 24–39 [Google Scholar]
  • 37.Anselin L.: Local indicators of spatial association–LISA. Geogr Anal. 1995; 27: p. 93–115 [Google Scholar]

Decision Letter 0

Yannis Ioannides

4 Aug 2020

PONE-D-20-16834

Spatial dependence in the rank-size distribution of cities

PLOS ONE

Dear Dr. Bergs,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please make your best effort to address both referees' concerns who are top specialists on the topic. This is very important for the process, since one of them recommends rejection and the other major revision. If you cannot address their concerns adequately, please explain in a separate note why this is so. In particular, if you disagree with them, please explain at length.

Please submit your revised manuscript by Sep 18 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Yannis Ioannides

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2.Thank you for stating the following in the Financial Disclosure section:

[Funding was provided by the European Commission via its Horizon2020 research funding (https://cordis.europa.eu/project/id/727988/de). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.].   

We note that one or more of the authors are employed by a commercial company: PRAC

  1. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

2. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.  

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and  there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: See report.

Reviewer #2: This paper presents estimates from rank-size regressions that control for spatial effects. The main focus is on whether controlling for spatial effects influences estimates of power law exponents significantly. In the Netherlands (2011), Slovenia (2017) and Austria (2017) my reading of the results suggests the influence is weak and that the estimates of spatial dependence suggest moderate to weak dependence.

Regarding methodology:

1. We are left wondering why focus on Netherlands, Slovenia and Austria rather than more widely studied countries like the US or even better a very comprehensive list of countries, ideally at several points in time. The more countries the better in my opinion, but if a selection is made the basis of that selection needs to be explained. Any statistical methodology is undermined if it is applied to an arbitrary subset of the potential data.

2. I would like to see the results from simple rank-size regressions alongside the regressions that control for spatial dependence.

3. Would it be econometrically sensible to control for spatial dependence in Gabaix-Ibragimov regressions like those of Table 3? If you cannot answer this question, it may be interesting to nevertheless run these regressions subject to appropriate disclaimers in order to allow direct comparisons of estimates.

4. I would prefer to see the discussion surrounding simulated data significantly condensed as I am not sure it adds much.

5. It may be worth noting that combinations of lognormal-power laws similar to those used in the simulations have been studied by Ioannides & Skouras 2013.

Regarding data: No specific link to data is provided so I am not sure whether the source provided in the last sentence of section 2 is sufficient. I would prefer to see a link to the data actually used in the regressions (including to simulated data), not a link e.g. to a NOAA source from which the data were derived after extensive manipulation according to methods published elsewhere.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PLOS D-20-16834 (1).pdf

PLoS One. 2021 Feb 9;16(2):e0246796. doi: 10.1371/journal.pone.0246796.r002

Author response to Decision Letter 0


15 Sep 2020

Re.: Spatial dependence in the rank-size distribution of cities

Rebuttal letter #20-16834

Dear editor and reviewers,

Thank you very much for your highly valuable critique and suggestions. In the revised manuscript I tried my best to address all your points raised. Originally, my paper had been a secondary statistical outcome of our ongoing research project on rural-urban interaction in the EU. I submitted the paper in May 2020 to the preprint server arXiv. Shortly afterwards I was kindly invited by PlosOne to submit my/our research to a forthcoming special issue on „Complex city systems“. In short, this is the background of my original paper. Inspired by your points raised I have now completely revised the manuscript, also using larger datasets for a better empirical foundation. I thus hope that the effort has helped to meet the expectations of the reviewers and the editor.

Reviewer 1:

1.There is really no economic model here; the model is purely statistical. The gravity model is not derived in this context from primitives, and is rather hokey. This is, in a way, a step backward.

I fully agree with this view. It is not sufficient to determine a finding purely statistically when arguing from the viewpoint of economics. Zipf’s law is however extremely tricky when it comes to explaining it with economic theory (Krugman labeled the law as „spooky“). I now tried to solve the issue of the insufficiently explained relationship with gravity in a new section by first going back to Zipf’s own explanation of the „principle of least effort“ (lines 73-113) and then to link this very early insight (Zipf’s lemma) to the respective recent findings in the context of Central Places theory (with a reference to the suggested paper of Mori, Smith and Hsu and a second paper on that by Jiang). Both papers make clear that distance and spatial dependence matters in these Zipf-style regressions, so that they offer important justification to consider a spatial econometric approach (lines 119-138).

2.The pre-Gabaix and post-Gabaix periods are distinguished by the presence of economic models in the post-Gabaix work. The problem with these models is often that they are designed to generate Zipf-like laws, and nothing else.

This was a useful hint for me. My understanding is that Gabaix‘ model of normalized zero growth of cities determined by the underlying Gibrat’s law is only part of explaining scale invariance in Zipf’s law. The papers by Mori et al. (2020) and Jiang (2020) show that the functional differentiation of centrality matters, also with explanatory power for the fractal dimension of the law. But there seems even no contradiction between size independent (random) growth and a deterministic evolution of urban space. Therefore, for my argumentation I also added the important findings from Rybski‘s et al. paper on distance-weighted city growth that confirms scale invariance and Zipf’s law with „spatial explicit preferenial attachment“ (lines 139-159).

3.This is not the first paper to notice the transition between Pareto at the top of the distribution and lognormal at the bottom (lines 55-69); Ioannides and Skouras, JUE, 2013. Not even cited.

It would be a misunderstanding to interpret my formulations as a claim to be the first one noticing this peculiarity.This is not al all the case. For my earlier paper (Rev Reg Res 2018 38(2)) I had explored the well-known controversy between Malevergne et al. and Eeckhout. So, this knowledge obtained I just took over from that earlier research. Reading the paper by Ioannides and Skouras (2013) has much further enriched the argumentation to differentiate the distribution form of city rank-size.The JUE paper is now addressed and cited (lines 54-64).

4. There is a theory for why distance should matter in these Zipf-style regressions, called Central Place theory, which is almost as famous as Zipf’s law. It is the long-time pursuit of Wen-Tai Hsu. See for a recent example, Mori, Smith and Hsu, PNAS, 2020, 117(12) 6469-6475.

I fully agree. See my explanations on point 1 above

5. Given the spatial dependence in rank R (the dependent variable) I wonder if there is also spatial dependence in population S, the independent variable.

For the four country studies I now additionally tested spatial dependence of the predictor and the dependent variable in a Spatial Durbin model. This procedure did not properly work as estimates are mostly outside their allowed intervals. A simpler approach to address spatial dependence of the predictor is the Spatial-lag-of-X (SLX) model that directly shows spatial spillovers of the variable S by OLS estimation. In contrast to the SEM/SAR models, no significant spatial dependence was found for S. This finding is now documented by an additional table (lines 311-315) and two further paragraphs inserted (lines 211-217 and 346-348).

6. The structure of spatial dependence in the regression should really be derived from the underlying economics

This critique applies to every spatial econometric modelling that defines W simply as a contiguity or inverse distance matrix. Therefore, on the one hand, inverse distance that I used in my Stata exercises is admittedly not more than a proxy. A more realistic matrix reflecting the effort (real costs) to cover distance, given the heterogeneity of space, such as local topography or energy resources available, would be in fact a superior solution. However, the bottleneck for that is missing data. I have therefore added a clarifying paragraph stressing this general weakness (lines 218-227). On the other hand, a spatial econometric approach would shed light on the endogeneity of spatial distance in the Zipf regressions. This is a specific advantage, because here peculiar forms of deterministic spatial evolution (e.g. the German Ruhr area) suggest spatial dependence in the Zipf regressions already by viewing a map (clusters of big cities with often minor centrality relevance). In those cases, spatial dependence does not necessarily conceal a distribution that is not in line with Zipf (i.e. a false positive signal). On the contrary, as estimates for the US and Germany show, spatial dependence can also lead to estimates closer to Zipf’s law. Therefore, I additionally had a look at the city distribution in the upper tails (25 biggest cities for the USA, Germany and the UK, the 16 biggest for Slovenia) in terms of local spatial autocorrelation (lines 355-362). For the US and Germany there are many significant local Moran-I (LISA) coefficients, while this is not the case for the UK and Slovenia. Results of the local Moran I analysis thus correspond to the SEM/SAR estimates. A further figure on the LISA analysis was added to the paper (lines 353-354).

Reviewer 2:

1. We are left wondering why focus on Netherlands, Slovenia and Austria rather than more widely studied countries like the US or even better a very comprehensive list of countries, ideally at several points in time. The more countries the better in my opinion, but if a selection is made the basis of that selection needs to be explained. Any statistical methodology is undermined if it is applied to an arbitrary subset of the potential data.

I fully understand and agree to this concern. In fact, the research is funded by the European Commission in the context of a project dealing with rural-urban interaction covered by some EU country studies including Austria, Slovenia and the Netherlands where we had tested spatial size (segmented by cluster analysis of night satellite images) with Zipf’s law. I just used those data for the first manuscript. In addition to that, the EU funding of an open access paper should also be justified by the respective project context, e.g. the countries covered by that project. Hence, the selection had not been arbitrary. But still, the reviewer’s argument is admittedly too strong. Therefore, an alternative justification could be derived from the theoretical context elaborated, and hopefully this is convincing. I therefore explored databases on urban areas for three bigger countries (USA, UK and Germany) and left only Slovenia as a particular case of a young EU country (former province of Yugoslavia) for comparison purposes (Results for the Netherlands and Austria are now removed). It was, however, not possible for the three bigger countries to extract and transform the VIIRS night-light data in order to classify natural urban space. There are millions of observations in the highly resolved digital images for which outliers in such skewed pixel distributions need first to be removed by a quite demanding procedure. The computing capacity in our small institute is not sufficient for procedures with such extremely large datasets. Therefore, for the Zipf style regressions on the US, the UK and Germany I used population data as usual. Here I deemed it useful not to take cities proper but (functional) urban areas, because evidence has shown that urban areas much better represent a Zipfian distribution. A further comparative analysis over time was not possible given the limited time for the revision. But the original purpose of the paper was to show that there can be spatial dependence in Zipf’s law for cities rather than exploring change of spatial dependence over time in Zipf’s law. Nevertheless, such an extended comparative or panel approach combined with a more realistic spatial weight matrix (cf. Point 6 of reviewer 1) is now discussed as an interesting open question for future research on this thread of regional science (lines 224-227).

2. I would like to see the results from simple rank-size regressions alongside the regressions that control for spatial dependence.

These simple rank-size regressions had been (and still are) displayed in the tables alongside the spatial regressions including their interpretation. Those might have been either overlooked when reading the manuscript, or I have misunderstood the point raised. Nevertheless, I fully agree that without the simple rank-size regressions the respective spatial regressions would hardly be meaningful (see tables 1, 2 and 3: rows „OLS ln(Size)“ and the respective remarks in the section „Results“) .

3. Would it be econometrically sensible to control for spatial dependence in Gabaix-Ibragimov regressions like those of table 3? If you cannot answer this question, it may be interesting to nevertheless run these regressions subject to appropriate disclaimers in order to allow direct comparisons of estimates.

Yes, my earlier idea had been anyway to generally apply the Gabaix-Ibragimov approach for the country estimates, i.e. also for the ML estimator that is needed in the SEM/SAR regressions. This approach had been also pursued by le Gallo & Chasco (2008) in one of the very few spatial econometric studies on Zipf’s law. Estimates differ only minimally from the simple regression model. Now, all the country estimates displayed are Gabaix-Ibragimov ones (lines 271-272).

4. I would prefer to see the discussion surrounding simulated data significantly condensed as I am not sure it adds much.

I have deleted the simulations III and IV and just concentrated on the two potential extreme cases of normally distributed coordinates versus spatially ranked ones. This simulation should reveal the potential of spatial dependence in such Zipf regressions.

5. It may be worth noting that combinations of lognormal-power laws similar to those studied in the simulations heve been studied by Ioannides & Skouras 2013

As mentioned in the answer on point 3 of reviewer 1, the suggested JUE paper is now addressed and cited.

6. No specific link to data is provided so I am not sure whether the source provided in the last sentence of section 2 is sufficient. I would prefer to see a link to the data actually used in the regressions (including to simulated data), not a link e.g. to a NOAA source from which the data were derived after extensive manipulation according to methods published elsewhere

This is absolutely right. However, the guidelines say that „PLOS journals require authors to make all data necessary to replicate their study’s findings publicly available without restriction at the time of publication.“ During the online submission procedure I was therefore uncertain whether to attach the data files together with the manuscript or only once such a paper is accepted for publication. Shortly afterwards I contacted the editorial staff of PlosOne asking whether I can still upload the data for the peer review process. I was told that it will be fine to provide the full datasets once the manuscript has been accepted, so there was no need for me to do anything by then.

Reviewer 2 also remarked that estimates on Austrian, Slovene and Dutch natural urban space suggest minor to moderate spatial dependence. This is true, and I also believe that there is hardly any country where spatial dependence will change the estimate of the rank-size distribution to a large extent. The purpose of my paper was to show that there can be minor but significant spatial dependence; this is also confirmed by the estimates for the USA and Germany. It was less my intention to uncover major disturbance induced by spatial dependence. But with statistically significant spatial dependence Zipf’s law would not be a purely tautological phenomenon of a pre-determined spurious correlation as put forward by some authors.

Attachment

Submitted filename: Response to reviewers_16834.docx

Decision Letter 1

Yannis Ioannides

24 Dec 2020

PONE-D-20-16834R1

Spatial dependence in the rank-size distribution of cities

PLOS ONE

Dear Dr. Bergs,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Feb 07 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Yannis Ioannides

Academic Editor

PLOS ONE

Additional Editor Comments (if provided):

Dear author:

I have immense respect for both referees, and wish to encourage you to revise according to

Reviewer 2, who submitted a detailed report.

I also want you to heed the comments of Reviewer 1, who now is very encouraging in his direct communication with me. I agree with him that section 2 needs more work, so as the paper be more appealing to economists who read it. And, most certainly, this is a worthy goal. Reviewer 1 writes, inter alia:

" I think that the author has done as good a job as it is possible to

do to address my comments regarding the empirical part of the paper. But I find the theory

(Section 2) to be very annoying. It is not theory in a sense that a decent economist would

recognize, as it is in the tradition of econophysics rather than mainstream economics. The

econophysics models tend to be fairly mechanical models (including stochastic elements)

rather than using an equilibrium based on individual optimization. A hint is that prices

are nowhere to be found in this paper."

There is a well-developed theory with behavioral foundations, including notably Gabaix's QJE paper, which you cite, and material in Ch. 8 of Yannis M. Ioannides, From Neighborhoods to Nations, Princeton University Press, 2013. I think heeding Reviewer 2's critique will improve the paper enormously.

All the best!

Looking forward to reviewer an updated version, which I very much hope that you will undertake.

Yannis M. Ioannides

Academic Editor

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: This revision has addressed all the concerns I raised in my first report and the author has clearly made a serious effort at improving the quality of his paper.

However, the revision has also revealed some new problems which I summarize below:

1. My reading of the new empirical results is that the author is able to detect only a very weak impact of distance between cities on power exponent estimates in the four countries he examines. I think the author should say this more clearly (instead in the abstract, he states his finding as "distance matters"). The author should also avoid conflating distance with "spatial effects" - there may well be other spatial effects he has not tested for. There may also be distance effects in other countries or data, so the author should be more explicit that he is conducting an analysis with limited power and subject to significant type 2 error when interpreted as a test of "spatial dependence in the rank-size distribution of cities" (perhaps the title itself should be modified to reflect the more modest nature of the analysis). Summarizing, there is a little too much overselling for my taste, but this may be a style issue.

2. The results of the second simulated data set in Table 2.II are puzzling. They suggest that even when the data really is generated to satisfy Zipf's law, the econometric approach used reveals a significant deviation from Zipf's law. This suggests a problem with the econometric method or its application or the data. Maybe I am missing something, but if so the author needs to explain.

3. I am not fully comfortable with the author's description of the data used in the simulation. The author says he draws the "upper" 50 cities from a Pareto and the "lower" 59 from a lognormal. If the draws are really random, the largest lognormal draw could be larger than the smallest Pareto draw, but the phrasing suggests this cannot happen, or at least did not happen in the two draws the author used. The author should explain this more clearly and make sure he isn't choosing a sample with the properties he wants. While I don't expect the author to do this at this stage, the proper way to simulate this data would be from a single distribution which had both a lognormal and a pareto component.

4. It should be made clearer that the value of the simulations is to demonstrate that the chosen econometric methodology is powerful enough to detect distance effects if the patterns are sufficiently strong. In my view the simulations are purely a prelude to motivate the empirical analysis.

5. I would like to see more detailed table legends, so that tables can be interpreted without having to refer to the text. We have to guess what Columns I and II mean. Please explain each item in the table in detail in the legend - a little spoon feeding for the reader can only help.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-20-16834 (2).pdf

PLoS One. 2021 Feb 9;16(2):e0246796. doi: 10.1371/journal.pone.0246796.r004

Author response to Decision Letter 1


22 Jan 2021

Re.: Spatial dependence in the rank-size distribution of cities

Rebuttal letter #20-16834

Dear editor and reviewers,

thank you again for your further effort to review the revised version of my paper. The review reports are very encouraging. All suggestions are highly valuable and could be addressed in the revised manuscript. I hope this revised manuscript meets your expectations and look forward to your response. Below you find my answers separately on all comments.

Reviewer #1 (including the suggestion of the editor):

" I think that the author has done as good a job as it is possible to do to address my comments regarding the empirical part of the paper. But I find the theory (Section 2) to be very annoying. It is not theory in a sense that a decent economist would recognize, as it is in the tradition of econophysics rather than mainstream economics. The econophysics models tend to be fairly mechanical models (including stochastic elements) rather than using an equilibrium based on individual optimization. A hint is that prices are nowhere to be found in this paper."

There is a well-developed theory with behavioral foundations, including notably Gabaix's QJE paper, which you cite, and material in Ch. 8 of Yannis M. Ioannides, From Neighborhoods to Nations, Princeton University Press, 2013. I think heeding Reviewer 2's critique will improve the paper enormously

Reply: I have now substantially revised and complemented section 2. Zipf’s law explained with early considerations by Zipf himself is now complemented by considerations of the economics of agglomeration. A useful starting point seemed to me the indivisibility of space (Starret’s spatial impossibility theorem) as a precondition for an equilibrium that cannot be explained by Zipf’s lemma. Here I found it worth to resort to the evolution of urban economic modeling from the 1970s to new geographical economics since around 2000 to grasp the increasingly „spatial“ understanding of Zipf’s law from the economics viewpoint (in particular the more recent contributions of the Central Place theory). In addition to the paper of Gabaix (1999b) I now discussed the relevant findings of Fujita et al (1999), Brakman et al. (2009) and those in more recent books and papers stressing the importance of Central Place theory for the explanation of Zipf’s law. Here I resorted to the book of the editor (Ioannides 2013), Hsu (2013), Mori et al. (2020) and Jiang (2017) [LINES 118-130; 142-182]. A useful argument for taking the path of spatial econometrics I also found in Dobkins and Ioannides (2001) on spatial interactions among US cities. [LINES 210-215].

Reviewer #2: This revision has addressed all the concerns I raised in my first report and the author has clearly made a serious effort at improving the quality of his paper.

However, the revision has also revealed some new problems which I summarize below:

1. My reading of the new empirical results is that the author is able to detect only a very weak impact of distance between cities on power exponent estimates in the four countries he examines. I think the author should say this more clearly (instead in the abstract, he states his finding as "distance matters"). The author should also avoid conflating distance with "spatial effects" - there may well be other spatial effects he has not tested for. There may also be distance effects in other countries or data, so the author should be more explicit that he is conducting an analysis with limited power and subject to significant type 2 error when interpreted as a test of "spatial dependence in the rank-size distribution of cities" (perhaps the title itself should be modified to reflect the more modest nature of the analysis). Summarizing, there is a little too much overselling for my taste, but this may be a style issue.

Reply: I agree with this comment. My original research interest was led by the idea that spatial dependence could potentially matter, and here the emphasis still had been on the simulations which reveal major potential of such disturbances. Now the emphasis is on the country studies where spatial autocorrelation detected is weak but still partly significant. Hence I changed the title of the paper and the wording in a number of paragraphs in particular to avoid suggesting something like globally valid relationships [Abstract; LINES 320-322; 444-449; 468-469]. I also revised the formulations conflating spatial effects with distance effects [Various revisions of wording in the text; cf. track-change file].

2. The results of the second simulated data set in Table 2.II are puzzling. They suggest that even when the data really is generated to satisfy Zipf's law, the econometric approach used reveals a significant deviation from Zipf's law. This suggests a problem with the econometric method or its application or the data. Maybe I am missing something, but if so the author needs to explain.

Reply: With this simulation I deliberately aimed to generate a most powerful disturbance effect on the Zipf coefficient by maximising spatial autocorrelation. In the former manuscript this was only commented by a half-sentence. In the new version I added a further explanation that is in fact central in a way as it shows the theoretical potential of such effects [LINES 396-398]. By the way, with a repeated estimation to rule-out error I got the same results.

3. I am not fully comfortable with the author's description of the data used in the simulation. The author says he draws the "upper" 50 cities from a Pareto and the "lower" 59 from a lognormal. If the draws are really random, the largest lognormal draw could be larger than the smallest Pareto draw, but the phrasing suggests this cannot happen, or at least did not happen in the two draws the author used. The author should explain this more clearly and make sure he isn't choosing a sample with the properties he wants. While I don't expect the author to do this at this stage, the proper way to simulate this data would be from a single distribution which had both a lognormal and a pareto component.

Reply: I have now explained how this hybrid distribution was generated. My original idea had been to generate just one typical rank-size distribution of cities and as long as this can be sufficiently rigged by distorting the location of cities via the coordinates the exercise would prove the potential existence of spatial dependence in Zipf’s law. It is right that the lower (lognormal) tail could be also much larger. This would have an influence on the estimation of the entire distribution, however not when looking at the upper Pareto tail. This is now explained in section 4 [LINES 289-296].

4. It should be made clearer that the value of the simulations is to demonstrate that the chosen econometric methodology is powerful enough to detect distance effects if the patterns are sufficiently strong. In my view the simulations are purely a prelude to motivate the empirical analysis.

Reply: I fully agree with that. The issue is now clarified [LINES 68-70].

5. I would like to see more detailed table legends, so that tables can be interpreted without having to refer to the text. We have to guess what Columns I and II mean. Please explain each item in the table in detail in the legend - a little spoon feeding for the reader can only help.

Reply: I agree to that and added respective explanatory notes to Tables 1 and 2 [LINES 354-357; 368-371].

Further corrections

In addition to revisions suggested by the reviewers and the editor I also corrected the SEM and SAR estimates of the four countries and the respective LISA figures. For the population based estimations it was necessary to change the geographical projection of the raw data into an equal area one. The correction reveals only negligible differences, most of them even improving the estimates expected [LINES 374-375; 431-432]. In addition some few transcription errors were corrected. There was no need to change the projection for the simulations and the estimation on segmented natural cities in Slovenia. In those cases, coordinates already represent an equal area projection. There was also no need to modify the projection of the SLX estimates as the respective Stata command automatically calculates the true distances by the Vincenty formula. For Slovene municipalities (>10,000 inhabitants) the SLX estimation was to be corrected because of a wrong entry of the distance threshold. But this has not changed the insignificant result.

Some few further style and orthographic corrections can be identified by inspecting the track-change version of the manuscript.

Attachment

Submitted filename: Rebuttal2.docx

Decision Letter 2

Yannis Ioannides

27 Jan 2021

Spatial dependence in the rank-size distribution of cities - weak but not negligible

PONE-D-20-16834R2

Dear Dr. Bergs,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Yannis Ioannides

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Dear author:

thank you for patiently and diligently dealing with the editorial comments on your submission.

I am happily recommending acceptance of your submission for publication by PLOS One.

All the best

Yannis M. Ioannides

Academic Editor

Reviewers' comments:

Acceptance letter

Yannis Ioannides

29 Jan 2021

PONE-D-20-16834R2

Spatial dependence in the rank-size distribution of cities –  weak but not negligible

Dear Dr. Bergs:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Yannis Ioannides

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (ZIP)

    Attachment

    Submitted filename: PLOS D-20-16834 (1).pdf

    Attachment

    Submitted filename: Response to reviewers_16834.docx

    Attachment

    Submitted filename: PONE-D-20-16834 (2).pdf

    Attachment

    Submitted filename: Rebuttal2.docx

    Data Availability Statement

    Data are available from Harvard Dataverse: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/EK4CNU


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES