Skip to main content
Royal Society Open Science logoLink to Royal Society Open Science
. 2017 Oct 18;4(10):170590. doi: 10.1098/rsos.170590

The scaling structure of the global road network

Emanuele Strano 1,2,3,, Andrea Giometto 4,5, Saray Shai 6, Enrico Bertuzzo 7,5, Peter J Mucha 6, Andrea Rinaldo 5,8
PMCID: PMC5666254  PMID: 29134071

Abstract

Because of increasing global urbanization and its immediate consequences, including changes in patterns of food demand, circulation and land use, the next century will witness a major increase in the extent of paved roads built worldwide. To model the effects of this increase, it is crucial to understand whether possible self-organized patterns are inherent in the global road network structure. Here, we use the largest updated database comprising all major roads on the Earth, together with global urban and cropland inventories, to suggest that road length distributions within croplands are indistinguishable from urban ones, once rescaled to account for the difference in mean road length. Such similarity extends to road length distributions within urban or agricultural domains of a given area. We find two distinct regimes for the scaling of the mean road length with the associated area, holding in general at small and at large values of the latter. In suitably large urban and cropland domains, we find that mean and total road lengths increase linearly with their domain area, differently from earlier suggestions. Scaling regimes suggest that simple and universal mechanisms regulate urban and cropland road expansion at the global scale. As such, our findings bear implications for global road infrastructure growth based on land-use change and for planning policies sustaining urban expansions.

Keywords: spatial networks, global land use, urbanization, global road network

1. Introduction

Modern civilizations developed along with road networks, simple and efficient systems designed to colonize free land, improve human mobility and move goods among locations. Today, the road system grooves and fragments the 19 million hectare surface of the Earth with more than 14 million km of paved surface. At the backbone of human colonization, road expansion embodies a complex blend of economic growth and often unsustainable development [1,2]. From an economic perspective, road expansion has typically been associated with economic growth, poverty reduction [3] and urbanization processes [4]. However, roads cutting through ecosystems may cause severe environmental degradation, like habitat fragmentation and biodiversity loss, facilitating urban sprawl and efficient deforestation [5,6]. Pressure from global population growth with the resulting increase in food demand [7,8] and from the ongoing global urban transition [9,10] will foster a massive road expansion in the upcoming decades. It has been estimated that the total paved length will increase by an additional 25 million km by 2050 [5]. Controlling such an expansion will be of crucial importance for global environmental conservation strategies and sustainable agricultural development. Yet despite the fundamental role of road expansion in global human–environment relations and some attempts made to reconcile their double-edged consequences [5], a quantitative and exhaustive description of the structure of the global road network (GRN) necessary to model this expansion is currently missing.

Statistical laws governing road networks have been extensively explored [11,12]. Such studies investigated issues such as scaling [1316], road centrality [17], evolution [4,1820] and urban sprawl [21], within the general domain of complex networks analysis [22]. However, such efforts have focused on urban road networks, neglecting connections among nodes serving areas dedicated to non-urban land uses. Here, by coupling a detailed dataset on the global road network with global land-use inventories, we provide an analysis of the structure of the network of major roads as of year 2015 and examine its dependence on land use. Such analysis is carried out by studying the distributions of road lengths and their scaling relationships.

2. Results

The GRN has been extracted from commercial vectorial maps of major roads on the Earth that are used mostly for navigation and cartography (see Material and methods). The GRN contains the major roads network for 2015 organized in four hierarchies: primary roads with limited access (H1), primary roads with non-limited access (H2), secondary roads (H3) and local roads (H4). The GRN does not contain local urban roads but only the major ones; therefore, the GRN can be used to analyse the major road infrastructure but not the urban morphology related to urban block size and form. The total road length deduced from the GRN database (14 522 470 km in 2015) is much larger than that estimated from a dataset recently used for global road environmental impact estimation (gROADS) [5,23], which accounts for a total road length of 7 644 410 km. The gROADS dataset comprises only hierarchies H1, H2 and H3 for the year 2009. Comparing gROADS with the correspondent subset of GRN, one can estimate an annual growth of 5.7% between 2009 and 2015. However, by considering all GRN road hierarchies (H1∪H2∪H3∪H4), up to 30% of the total road length at the global scale was not represented in previous analyses.

Starting from the original database, we produced the GRN as a primal road network [24], in which nodes are the road junctions and links are road segments, and each link carries a weight indicating its length (l). To extract the final GRN network, we removed street junctions connecting only two roads and split each link, regardless of hierarchy, at any intersection with three or more roads. Defining three layers of major land uses—urban, cropland and seminatural—we labelled each road with the land use it belongs to. These three land-use classes, corresponding to the three main land uses on the Earth, are extensively used in the global land-use literature [25]. We extracted the global urban footprint for 2013 from night-time lights (NTLs) [26] and cropland from a recent global cropland inventory for 2005 [27]. We then assigned each road to three mutually exclusive land-use classes as follows: roads have been labelled as urban (U) if they totally belong to urban areas, cropland (C) if they totally or partially overlap a cropland area and seminatural (Sn) if they are free of any agricultural or urban land use (that is, they are not dominated by direct human presence, e.g. roads crossing remote areas or natural parks). Figure 1a shows a global overview of the GRN and the three classes. It is important to note that the adopted labelling methodology allows us to avoid multiple land-use associations and treat croplands differently, as a single road segment can connect more than one cropland unit. A detailed description of the dataset along with an atlas of detailed visualizations and spatial analysis methodologies are provided in Material and methods and electronic supplementary material. Key to our spatial inventory at the global scale [8,25], we show that the threshold used to discriminate land-use classes, such as illumination threshold, does not affect our main results (see electronic supplementary material). Moreover, considering that cropland had very little growth in the last 10 years [8], data from different years should not affect the main results.

Figure 1.

Figure 1.

(a) Visualization of the global road network (GRN), with different colours representing classification into different land uses: urban (red), cropland (green) and seminatural (turquoise). (b) Fraction of total road length composed by roads with length less than l for each land-use class. (c) Probability distributions p(l) of road length for different land uses. (d) Collapseof distributions obtained by plotting p(l)lγ versus l/〈lα (see Material and methods), where 〈l〉 is computed separately for each road class, for α=1 and γ=1. The inset shows an objective function which reaches its minimum at the exponent α that gives the best collapse [28] (see Material and methods). A slight deviation between the urban and cropland distribution tails is visible in d, caused by very long cropland roads that are associated with small cropland patches (see Material and methods). Roads associated with seminatural areas deviate visibly from the collapse of the curves.

We here focused on a seemingly simple yet fundamental road network feature: the lengths (l) of road segments [29,30]. The GRN in the year 2015 spanned a total length of 14 522 470 km divided into more than 3 million road segments, with the mean road (segment) length independent of land use 〈l〉=4.8 km (with standard deviation (s.d.) 8.9 km).

The U, C and Sn classes cover, respectively, 12%, 37% and 51% of the GRN by road length as shown in figure 1b. These percentages depend mildly on the threshold levels used to discriminate between classes (see Material and methods and electronic supplementary material). The mean road lengths in the different land-use classes are 〈lU=1.2 km (s.d. 1.3 km), 〈lC=7.4 km (s.d. 9.0 km) and 〈lSn=7.0 km (s.d. 12.0 km). The large standard deviation values highlight an important feature of road length distributions: these distributions are heavy-tailed and potentially reminiscent of power laws [31], and thus are not grouped tightly around a typical value. The mean road length in different land-use classes highlights a second feature of these distributions: C and Sn roads are generally longer those U roads. Indeed, the distribution of U road lengths is very different from that of C and Sn segment lengths. Expectedly, cities encompass shorter streets than agricultural or seminatural areas. But apart from these different mean lengths, how fundamentally different are these distributions? Are they possibly rescaled versions of the same universal distribution, obtained by varying only the mean road length? To test this hypothesis, we examined whether road length distributions can be described by a finite-size scaling form [3234]:

p(l)=1lγF(llα), 2.1

where 〈l〉 is the mean road length in the land-use class of interest and the function F(⋅) is identical across classes. Normalization of the distribution requires that the exponents γ and α must satisfy α=1/(2−γ) [34]. Furthermore, F(x) must satisfy appropriate limiting behaviours as x→0 and x (see [34] for details). We verified our hypothesis (equation (2.1)) on the functional form of p(l) by data collapse [28], i.e. by plotting lγp(l) versus l/〈lα for each land-use class separately and varying α until the curves describing each class collapse onto the same curve, thus providing a plot of the scaling function F. Figure 1c shows the plot of p(l) for each land use and figure 1d shows the resulting plot with α=1 and γ=1 (see Material and methods for details on probability distribution collapse). We find that road length distributions associated with urban and cropland classes collapse onto nearly the same curve, whereas the road length distribution associated with seminatural areas fails to do so. Different definitions of urban boundaries and different methods for the classification of roads crossing land-use boundaries led to indistinguishable results (see electronic supplementary material). Therefore, road length distributions in urban and cropland areas share the same fundamental structure at the global scale, despite large differences in the mean road length.

Equation (2.1) describes the ensemble distributions of road lengths obtained by grouping roads belonging to urban and cropland patches of different extents. Following previous approaches [35,36], we fragmented urban land use into separate components by means of spatial contiguity of urbanized and cropland cells, extracting all urban and cropland patches on the Earth (see Material and methods). As in our global analysis, we labelled each road with the land use, as well as the area of the patch it belongs too. We then study the road length distribution as a function of patch area for urban and cropland patches (figure 2a,d), finding different scaling behaviours in urban patches compared to cropland ones as explained below.

Figure 2.

Figure 2.

Length distributions for urban (ac) and cropland (df) roads conditional on patch areas. Road length distributions conditional on various values of urban (a) and cropland (d) patch area A, divided into logarithmic bins (colour-coded as indicated in the insets of b,e). (b,e) Road length distributions rescaled according to equations (2.2) and (2.3), respectively. The insets show the mean road lengths 〈l | AU (b) and 〈l | AC (e) as functions of A. Distributions on double-logarithmic scales of total road length L in urban (c) and cropland (f) patches of different areas, considering all urban and cropland patches on the Earth. Red lines and dots indicate the mean total road length as a function of patch areas. Colourmaps display logarithmic counts of patches in base 10.

We found that the mean road length 〈l | AU within an urban patch of area A increases sublinearly with A, 〈l | AUAδ with δ=0.41±0.02 (mean ± s.e. estimated via least squares fit of log-transformed data) for areas below AUT≃4×107 m2, above which it remains relatively constant (figure 2b, inset). The distribution pU(l | A) of urban road lengths conditional on the urban patch area A (figure 2b) appears to be well described by the scaling form:

pU(l|A)=1lGU(ll|AU), 2.2

where GU is a scaling function with suitable properties [34], as verified by data collapse (figure 2a,b). For each patch, we computed the total road length L and found that the mean total road length in urban patches increases sublinearly with A, 〈LUAβ with β=0.62±0.01 (again, via least squares fit of log-transformed data) below AUT, above which it becomes effectively linear (β=1.06±0.02) (figure 2c). As 〈l | AU is relatively constant above AUT, the linear scaling of 〈LU implies that the mean total number of roads increases linearly with A above AUT.

Road length statistics in cropland patches also display multiple scaling regimes, although with different behaviour compared to urban areas. Specifically, the mean road length 〈l | AC associated with a cropland patch of area A displays a triphasic behaviour (figure 2e, inset). Initially, 〈l|AC increases until A≃105 m2, although very few crop patches are found below this scale and we will thus neglect these data in our discussion. Then, 〈l | AC decreases until ACT≃109 m2 and remains relatively constant above ACT. The distribution pC(l | A) of cropland road lengths conditional on the cropland patch area A (figure 2d) appears to be well described by the scaling form:

pC(l|A)=1lGC(ll|AC), 2.3

where GC is a scaling function with suitable properties [34], as demonstrated by data collapse in figure 2d,e. Analogous to the distribution of urban road lengths, the distribution pC(l | A) is invariant for all cropland areas larger than ACT. Conversely, the mean total road length in cropland patches (figure 2f) is approximately constant below ACT and increases linearly above this threshold (exponent β=0.95±0.05, estimated via least squares fit of log-transformed data), implying that the mean total number of cropland roads also increases linearly with A above ACT. We thus found that the average road length versus cropland patch area is well described by 〈l|AC with ACT=(7.4±0.5)×107 m2.

The above results lead to a natural question: are the scaling functions GC and GU related? Superimposing the two scaling functions (figures 2b,e, as shown in figure 3a) suggests that GCGU, such that equations (2.2) and (2.3) approximately coincide, although with different dependencies of 〈l | AU and 〈l | AC on A.

Figure 3.

Figure 3.

(a) Superimposed rescaled urban (red curves) and cropland (green curves) data from figures 2b,e, demonstrating that the scaling functions GU and GC coincide, further confirming the universality of road length distributions in different land-use classes. (b) According to our approximation (see Material and methods), the ensemble distribution of urban road lengths (dashed red curve here and in figure 1c) coincides with the distribution of urban road lengths belonging to urban patches larger than A>108 m2 (red solid curve). The same approximation also holds for the distribution of cropland road lengths, but the tail of the ensemble distribution of cropland road lengths (green dashed line) is ‘fatter’ than the distribution of cropland road lengths associated with cropland areas larger than A>109 m2 (solid green line), leading to a slight deviation in the collapse of the tails of the ensemble distributions visible in figure 1d.

The scaling structure of a road network can be also investigated by observing the recursive fragmentation dependent on the hierarchy of roads. The GRN subdivides the surface of the Earth into non-overlapping regions called faces (roads networks are planar graphs consisting of a series of land cells suitably surrounded by road segments [4]). We defined ensembles of faces for each road hierarchy (H1–H4) and investigated the probabilistic relationship between the size of any face and the length of the roads within it at each scale of observation. For example, by considering only the coarse-grained view provided by highway faces alone (H1), it is possible to extract a series of large faces and study how they are fragmented at the smaller scale by major roads (H2). Then, similarly, we can study how the faces of major roads (H2) are fragmented by secondary roads (H3) and so on. Figure 4a gives a global overview of the hierarchical organization of GRN and figure 4b,c illustrates the procedure of nested fragmentation. Formally, once all road segments are labelled by their hierarchical class, EH(i), a protocol has been set up to assign each of them to a face of area AH(i−1), on which we speculate that the length distribution of such road segments is conditional. This is equivalent to a coarse-graining procedure. We binned the areas of all faces (i.e. at each scale of observation) into 10 log-binned intervals (A(k), k=1,…,10) to account for variable numerosity of the samples [31]. We collected the road lengths {l}A(k) (k=1,…,10) belonging to the faces whose areas are included in the kth face area bin (irrespective of the hierarchical class of these roads) and computed the relative proportion p(l | A)=p(l | A(k)) that measures the probability to find a road of a given length in the area bin A(k). We then tested whether the curves lγp(l | A) plotted against l/〈l | Aα (where 〈l | A〉 is the average road length in the set {l}k) for each of the area bins collapse onto the same curve, i.e. whether p(l | A) displays finite-size scaling (figure 4d,e). By using the full dataset, a satisfactory collapse has been found for α=1.1 and γ=1.1 as shown in figure 4d,e. Such collapse indicates a universal scaling curve that regulates the fragmentation of the road network at all scales of observation.

Figure 4.

Figure 4.

(a) An overview of the hierarchical organization of the GRN. (b) A detailed view of the Indian road network, where each colour, from red to green, represents the proper hierarchy from H1 to H4. (c) A sketch illustrating the process ofhierarchical fragmentation by which, starting from H1, each face is fragmented by the link of the lower hierarchy. (d) Probability distributions p(l) of the link belonging to a face in the range k. (e) Collapse of road length distributions, with the best collapse found for p(l)l1.1 versus l/〈l1.1.

3. Discussion

Our study provides evidence for general statistical laws describing global road lengths conditional on land use. At the global scale, urban and cropland roads share a universal distribution after rescaling that satisfies (equation (2.1) with α=γ=1)

p(l)=1lF(ll), 3.1

where F(⋅) is independent of the land-use class, with land use instead impacting the mean road length 〈l〉. At finer scales, we investigated the dependence of road length distributions on the area of the homogeneous patch under study, separately for each land-use class. We found that road length statistics conditional on patch area closely match a universal scaling law of the form

p(l|A)=1lG(ll|A), 3.2

where G(⋅) is independent of the land-use class (i.e. G=GU=GC). That is, land use affects p(l | A) solely through the scaling of the mean road length 〈l | A〉 with patch area A.

Equation (3.2) has interesting implications. Because 〈l | AU and 〈l | AC are approximately constant above the respective thresholds, AUT and ACT, the distributions pU(l | A) and pC(l | A) are invariant for all cities and cropland patches larger than such a threshold. In particular, because AUT4×107m2 is the area of a relatively small town (approx. 3.6 km radius), our results suggest that the distribution of urban road lengths is virtually independent of the urban patch size for nearly all cities. Moreover, the finite-size scaling functions F and G are quite similar in shape and nearly coincide if one neglects small urban and cropland patches, i.e. below the critical sizes AUT and ACT (see Material and methods). The observation of nearly universal probability distributions of urban and cropland road lengths suggests that the processes that govern road expansion are to a large extent inevitable, regardless of climate, topography and social and economic constraints, echoing general results on self-organized patterns [37,38]. Properties arising from physical constraints imposed by the planarity of the road network may also be at play [19].

These results suggest that local conditions, such as the socio-economic development or the demography of a specific region, may simply accelerate or delay the development of road infrastructure, affecting the characteristic length scale of the road length distribution but not its scaling form. Significantly, we find that road length distributions belonging to seminatural areas do not collapse onto the same distribution, even after rescaling. We argue that such diversity stems from the diverse purpose of roads in natural areas which are not built specifically for direct access to the land and are therefore regulated by diverse, site-specific processes and possibly far from optimized. The linkage of optimality and self-organization is known to induce a variety of scaling phenomena, as shown by studies on network structures derived from selection principles [18,3942], including small-world constructs [43], fluvial trees [4446] and the topology of the fittest networks [47]. At this stage, however, attributing the departure of the scaling features of roads serving seminatural areas to transient or non-optimized processes would be premature.

Urban scaling approaches have suggested that larger cities require less infrastructure than bigger ones. Indeed, if the total road length (L) scales sublinearly with the total population of a city (P) within its boundary, i.e. LPη with 0.7<η<0.9 [48,49], such scaling would be directly related to the management of mega-cities [50] which would potentially be more efficient than small towns. Our results put such scaling results in a different perspective. According to the linear scaling shown in figure 2c, an urban connected component of 109 m2 and ten components of 108 m2 together require approximately the same total length of major roads. This holds for all urban patches bigger than AUT≃4×107 m2, thus including all major cities. This result is consistent with other recent studies performed for a set of cities in Europe and the USA [51] and for the entire UK [52], showing that the actual sizes of urban patches scale linearly with the total lengths of major roads within them. This then implies that the proposed sublinear scaling between length and population is due to a superlinear scaling between population density and city size, and that road network lengths are not a good approximation of urban population density. However, more analysis is necessary to prove such speculation, because the suggestion of sub-linearity in the scaling relationship between L and population may have been an artefact of census-based data, which are built regardless of land use within the political boundary of a given region. Therefore, it seems plausible that roads from different land-use classes, with correspondingly different characteristic lengths, may have been mixed. Moreover, census areas are defined differently among various countries, thus precluding cross-country comparative analyses. Indeed, it has been shown that different methods to define city boundaries, based on tuning urban population density, could drastically affect the scaling of urban measurements [5254]. We used here a definition of urban patches only based on contiguity of urban land, therefore the proposed results must be interpreted on the basis of used datasets and adopted methodology.

Although the above implications are in need of deeper scrutiny, which should include scaling analyses of different urban measurables (e.g. energy and material flows sensu [50] or other infrastructures), our findings can be directly used as an urban planning tool aimed to estimate road infrastructure requirements. For example, the analysis of global urban evolution, carried out by using 16 night-light layers (see electronic supplementary material), reveals that the total number of urban patches is proportional to AUtot and thus also to LUtot. Therefore, our ansatz suggests, for example, that a 10% increase of global urban surface would imply a parallel increase in total urban road length of 10% (see electronic supplementary material for details). As it has been estimated that by 2030 the total urban area, AUtot, could potentially grow by 200% [9,10], such urban expansion would entail a total length Ltot of 3 485 400 km of new paved urban roads.

Our approach also suggests that cropland patches smaller than ACT=7×107m2 require a road investment per unit area larger than cropland patches above such a threshold. That said, of course many other issues must be considered before suggesting a development policy for agricultural land based on these observations. Most importantly, agricultural development policies must account for the preservation of ecological corridors and minimize the fragmentation of wildlands. Relatedly, because the total road length in cropland patches scales linearly with the area, large cropland patches (say, of area A) require the same investment in road paving as n smaller cropland patches of area A/n, provided that A/n>ACT. From this perspective, larger cropland patches are predicted to be as efficient in terms of road infrastructure requirements as smaller ones (within the linear regime highlighted in figure 2d). By contrast, cropland areas smaller than ACT require a relatively greater length of roads (figure 2e,f). Such cropland patches are typically composed of several small and scattered cropland units located in wild or mountainous areas. A typical example is given by scattered agricultural plots in forest areas (see electronic supplementary material; figure 3a,b); in such configurations, long roads serve tiny cropland units, indicating potentially higher environmental impacts of sparse agricultural expansion. Meanwhile, cropland patches around urban areas are generally more compact and more fragmented, in the linear regime. Because cropland roads can connect more than one cropland patch, however, the proportion NCACtotLCtot would overestimate LCtot. Moreover, by considering the competition among cropland patches and seminatural areas and the scarcity of residual free land to colonize [8], future cropland roads might merely partially substitute for the existing seminatural roads, thus partially conserving the existing extent of road length, Ltot. Such a hypothesis is corroborated by the similarity of the distributions p(l)C and p(l)Sn shown in figure 1c, which implies a similar level of fragmentation of seminatural areas. An important distinction arises therefore between urban and cropland areas in that the site-specific degree of reuse of existing infrastructure in cropland areas prevents the same kind of strict predictability allowed for urban areas.

4. Material and methods

4.1. Data preparation and data fusion

The general idea here is to transfer the land-use information, represented by continuous values, onto the road network, which is represented by lines vectorial geometry. To do so, a series of spatial analysis operations have been performed, mainly in ArchMap, Qgis and Python environments. An illustration of the data preparation schema is presented in the electronic supplementary material. Data preparation includes the extraction of the urban footprints, a preparation of the road network and a join phase between land uses and the road network. Below we report details of each phase; in the electronic supplementary material, we report detailed visualization of the final road network.

4.2. Urban footprints extraction

Urban footprint data have been extracted from inter-calibrated night-time light (DMSP-OLS) remote imagery using original images and data processing by NOAA’s National Geophysical Data Center and DMSP data collected by the US Air Force Weather Agency (https://www.ngdc.noaa.gov). The NTL data contain continuous values from D=1 to D=63, representing the intensity of stable light in a grid of 30 arc seconds (approx. 1 km). DMSP-OLS have been widely used to characterize urban footprint and urban evolution at the global scale [23,26,55,56]. After removal of gas flaring, the Jenks cluster algorithm has been applied to extract an urban settlement mask [57]. The Jenks algorithm is an unsupervised classification method which imposes the number of clusters and is widely used in geographical analysis to cluster continuous surface datasets in separated areas. Starting from a set of randomly selected values, the Jenks algorithm works to minimize the variance inside classes while maximizing variance between classes. We thus classified the entire NTL into three classes and took the most illuminated class to be the urban footprint, considering a pixel to be classified as urban if D>28. We tested other classification methods as well as different numbers of classes; but given the sharp separation between highly and poorly illuminated places, different values of D did not affect the main results of the scaling analysis (see the electronic supplementary material). Using the same procedure, we classified urban footprints from 1997 to 2012.

4.3. Cropland

The 1 km global IIASA-IFPRI [27] cropland likelihood map has been developed by integrating a number of individual cropland maps at global to regional to national scales. The individual map products include existing global land cover maps such as GlobCover 2005 and MODIS v. 5, regional maps such as AFRICOVER, and national maps from mapping agencies and other organizations. IIASA-IFPRI is a public dataset that can be downloaded from the Geo-wiky platform (http://cropland.geo-wiki.org/downloads/). Detailed visualizations of the cropland mask are provided in the electronic supplementary material.

4.4. Global road network preparation and land-use labelling method

Geo-located vectorial road data are usually produced for GPS navigation and cartography. Our dataset was produced by and acquired from DeLorme (GARMIN, Yarmouth, ME, USA). These data represent a complete and updated dataset of primary and secondary roads at the global scale. These data have commercial restrictions; additional information and request for data samples can be addressed to the contact details at https://developer.garmin.com/datasets/digital-atlas/. The topology of the road network has been corrected using Archmap software and ad hoc Python scripts for the purpose of joining connected roads at junctions with only two roads and to remove small road links representing highway ramps and crossroads intersections which are not representative of any fragmentation process and are potential noise for the statistical analysis. Roads shorter than 100 m cover only ≈0.03% of the total road length; these roads, as confirmed by an extensive and scrupulous inspection of the GRN dataset, appear to be highway ramps or road segments for large road junctions and were therefore excluded from our analyses. Coupling the GRN with two global land-use inventories, roads have been classified into three categories: urban roads, i.e. roads that entirely belong to urban areas; cropland roads, i.e. road that intersect or belong to cropland areas; and seminatural roads, i.e. roads that are completely free of direct urban or cropland use (see electronic supplementary material for a detailed visualization of the final classified road network). Data of road network are released under licence agreement with DeLorme (GARMIN, Yarmouth, ME, USA).

4.5. Probability distribution collapse

Normalization of the distribution p(l)=lγF(l/〈lα) requires that the exponents γ and α satisfy α=1/(2−γ) [34]. Furthermore, F(x) must satisfy appropriate limiting behaviours for x→0 and x (see [34] for details). We verified our hypothesis (equation (2.1)) on the functional form of p(l) by data collapse [28], i.e. by plotting lγp(l) versus l/〈lα for each land-use class separately and varying α until the curves describing each class collapse onto the same curve, thus providing a plot of the scaling function F(⋅). We similarly verified our scaling ansätze (equations (2.1)–(2.3)) on the scaling form of the road length distributions via data collapse, as routinely used in statistical physics and beyond [28,34]. We used an objective method [28] to determine the scaling exponent that gives the best data collapse in terms of minimizing the area between the rescaled distributions (figure 1d, inset).

4.6. Relationship between scaling functions

The relationship between the scaling functions F and G is mediated by the distribution of patch areas p(A), via p(l)=p(A)p(l|A)dA. In general, the integral cannot be computed analytically; however, focusing for this calculation on urban roads and neglecting the sublinear regime A<AUT, we find

p(l)=p(A)p(l|A)dA=p(A)1lG(ll|A)dA=1lG(ll)p(A)dA=1lG(ll), 4.1

where we have used the independence of 〈l|A〉 on A that holds approximately for A>AUT. Equation (4.1) coincides with equation (3.2) for A>AUT. Within this approximation, therefore, the scaling functions F and G coincide. The same result can be derived for croplands, neglecting patches with area A<ACT. Deviations between F and G are thus ascribable to urban and cropland patches in the sublinear scaling regimes. The quality of the approximation can be tested by contrasting the ensemble distribution of urban road lengths (red dashed curves in figures 1c and 3b) with the distribution of urban road lengths conditional on the urban area being A>108 m2 (solid red curve in figure 3b, corresponding to the curves from yellow to red in figure 2b). Figure 3b shows that the two distributions overlap almost completely, implying that equation (4.1) is an excellent approximation for urban patches. The analogous approximation for croplands is slightly less satisfactory, caused by the fact that the longest roads are found in croplands of smaller area (see the inset of figure 2e). This unintuitive result appears to be due to long roads fragmenting wild areas for agricultural purposes. Therefore, the tail of the ensemble distribution of cropland road lengths (green solid line in figure 1c and green dashed line in figure 3b) is ‘fatter’ than the tail of the cropland road length distribution conditional on cropland area A>109 m2 (i.e. above the threshold area ACT, green solid line in figure 3b).

Supplementary Material

Methodological details, results and data visualization
rsos170590supp1.pdf (50.5MB, pdf)

Acknowledgements

E.S. thanks Francois Golay, Marc Barthelemy, Riccardo di Clemente and Marta Gonzalez.

Data accessibility

The data that support the findings of this study are generated combining three datasets, each of which is either publicly available or available under licence agreement. A detailed description of the data and the availability of each dataset is reported in Material and methods.

Authors' contributions

E.S., A.G., S.S., E.B., P.J.M. and A.R. designed research. E.S. performed spatial analysis. E.S., A.G., S.S. and E.B. performed research. E.S., A.G., S.S., E.B., P.J.M. and A.R. analysed data. E.S., A.G., S.S., E.B., P.J.M. and A.R. wrote the paper.

Competing interests

The authors declare no competing interests.

Funding

E.S., A.G., E.B. and A.R. acknowledge the support provided by ENAC (EPFL) through the Innovative research on urban dimension grant. S.S. and P.J.M. acknowledge support from the James S. McDonnell Foundation 21st Century Science Initiative—Complex Systems Scholar Award grant no. 220020315. E.S. has been funded by the Swiss National Science Foundation. A.G. acknowledges funds from the Swiss National Science Foundation Project P2ELP2_168498.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Methodological details, results and data visualization
rsos170590supp1.pdf (50.5MB, pdf)

Data Availability Statement

The data that support the findings of this study are generated combining three datasets, each of which is either publicly available or available under licence agreement. A detailed description of the data and the availability of each dataset is reported in Material and methods.


Articles from Royal Society Open Science are provided here courtesy of The Royal Society

RESOURCES