Skip to main content
PLOS One logoLink to PLOS One
. 2023 Dec 18;18(12):e0295848. doi: 10.1371/journal.pone.0295848

Improved prediction of hiking speeds using a data driven approach

Andrew Wood 1,*, William Mackaness 2, T Ian Simpson 1, J Douglas Armstrong 1
Editor: Yuxia Wang3
PMCID: PMC10727444  PMID: 38109382

Abstract

Hikers and hillwalkers typically use the gradient in the direction of travel (walking slope) as the main variable in established methods for predicting walking time (via the walking speed) along a route. Research into fell-running has suggested further variables which may improve speed algorithms in this context; the gradient of the terrain (hill slope) and the level of terrain obstruction. Recent improvements in data availability, as well as widespread use of GPS tracking now make it possible to explore these variables in a walking speed model at a sufficient scale to test statistical significance. We tested various established models used to predict walking speed against public GPS data from almost 88,000 km of UK walking / hiking tracks. Tracks were filtered to remove breaks and non-walking sections. A new generalised linear model (GLM) was then used to predict walking speeds. Key differences between the GLM and established rules were that the GLM considered the gradient of the terrain (hill slope) irrespective of walking slope, as well as the terrain type and level of terrain obstruction in off-road travel. All of these factors were shown to be highly significant, and this is supported by a lower root-mean-square-error compared to existing functions. We also observed an increase in RMSE between the GLM and established methods as hill slope increases, further supporting the importance of this variable.

Introduction

Knowing how fast people are able to walk between locations is critical information in many situations. In hiking and hillwalking scenarios, this information is vital for safety reasons. If you are leaving in the morning for a hike then it is good practice to provide an estimated return time such that emergency services can be contacted if you get into difficulty and do not return [1]. An inaccurate estimate for how long a route will take could lead to unnecessary callouts, or delay a callout in a situation where every minute is important. Furthermore, in circumstances where a hiker has gone missing, an accurate measure of walking speed can help to restrict a potential search area around a last known location. Finally, when out on a hike there are situations where hikers may be deciding whether to follow a footpath, or take a more direct cross-country route. Accurate estimates of the walking speed and time for both scenarios are required to be able to select the optimal route.

There are a multitude of factors which can impact the walking speed and time predictions for a route [2], although these can generally be split into two categories [3, 4]. The first category covers the individual effects which depend on who precisely is undertaking the walk, and when they are doing it. These effects include group size (larger groups often walk slower), age or fitness of participants, and weather conditions, as well as the aim of the walk (afternoon stroll vs. specific hike). The second category covers the fixed effects which will affect all individuals who attempt the same route. These include how steep the terrain is and whether the route is paved, along a track or in wild country.

Most of the individual effects cannot be modelled without considerable prior knowledge about the person who is planning a route. Therefore, most existing hiking route planners calculate the walking speed solely based on the terrain, and this is presented as the average time (or time range) it takes to complete a hike. It is then left up to the individual to tune the predicted time for a hike given their knowledge about personal ability and circumstances.

Formulae of varying complexity have been proposed to estimate human walking speed and time along a projected path. A popular early method that is still widely used was put forward by Naismith [5] which calculates walking time under normal conditions as:

an hour for every three miles on the map, with an additional hour for every 2,000 feet of ascent.

This approximates to a walking speed of 5 km/h with 10 minutes added on for every 100 m of ascent. This was later adjusted by Aitken [6], who introduced a reduced base movement speed of 4 km/h on surfaces which are not paths or roads. Naismith’s rule is still used today by Scout groups and other casual hikers due to the ease of calculating walking time by hand using a paper map. However, despite the widespread use, Naismith’s rule does have a well-known limitation; namely that the predicted speed does not change when descending a hill, regardless of the gradient.

An alternative hiking function proposed by Tobler [7], has become more popular in recent research and other situations where speeds do not need to be calculated by hand:

W=6*exp(-3.5|S+0.05|),

where

W = velocity (km/h)

S = gradient of slope.

Like Naismith’s rule, this gives a speed of 5 km/h on flat ground, with a maximum speed of 6 km/h on a mild descent (around 3 degrees). In a similar manner to Aitken’s correction, a factor of 0.6 is applied to the calculated speed for all off-road travel. Tobler’s function avoids the issues seen in Naismith’s rule when descending slopes, but it predicts a sharp peak in walking speed on mild descents, which may be unrealistic. The formulae discussed here are directly compared in Fig 1.

Fig 1. Existing functions used to calculate walking speed.

Fig 1

Naismith’s rule [5], Tobler’s hiking function [7] and Campbell et al.’s function [8] plotted as predicted walking speed in km/h against the slope in the direction of travel (walking slope) in degrees where positive is uphill. For Naismith’s function and Tobler’s function, on and off-path versions are shown.

Other studies have also looked at providing alternative methods to calculate walking speeds [911], but all continue to use walking slope as the main variable to determine walking speed (with various multiplicative factors applied for off-road travel).

When exploring speeds of fell-runners, Arnet [12] suggested that movement velocity may be dependent on three factors: obstruction (with different factors applied depending on the kind of obstruction), ascent in the run direction (walking slope) and slope of the terrain (hill slope). The actual values used in Arnet’s calculations cannot be directly applied to walking speeds as they were based on orienteering championships where participants were running.

Experience tells us that traversing on a steep hill (while maintaining constant elevation) is more difficult than traversing flat ground. However, the existing methods estimate the same walking speed for both situations. Similarly, high levels of terrain obstruction in off-road areas (such as a thick gorse bush) are much more difficult to walk through than empty fields. The simple multipliers for off-road travel in Aitken’s correction and Tobler’s function do not provide any further distinction between two such regions.

Wood and Schmidtlein [13], took all three of Arnet’s factors into account, and looked at evacuating citizens in the event of a hurricane. They applied Tobler’s function to both the hill slopes and walking slopes, and calculated the terrain obstruction coefficients based on energy usage rather than walking speed (using [14]). They accepted that these were likely not the correct values, but were unable to find any better alternatives. Campbell, Dennison, and Butler [15] conducted a study using lidar data to explore the effects of ground roughness and vegetation density on firefighter evacuation speeds, but they did not consider the hill slope separately.

All of the studies mentioned above utilised relatively small sample sizes. However, the rise in use of global navigation satellite systems (GNSS), more frequently referred to as GPS tracking, means that a data-driven approach to modelling walking speed is now possible, which provides two main benefits. Firstly, it is possible to access GPS tracks from a wide variety of regions and terrains. Secondly, each track can easily be broken down into individual sections, enabling specific route features to be investigated at much higher spatio-temporal resolution. This has been explored in recent work [8, 16], however the crowdsourced nature of these studies meant that data collection was not controlled, and thus that the data could not be assumed to consist wholly of walking or hiking tracks. In [16], data from hikes, jogs and runs was processed together, resulting in a very wide range of movement speed estimates. Campbell et al. attempted to overcome this in [8] by only considering data points with a speed between 0.2 m/s and 5 m/s (and the resulting model is shown in Fig 1). However, 5 m/s (18 km/h) is much higher than the maximum predicted speeds from existing methods (such as Naismith’s rule), so it is likely some non-walking data remained. Furthermore, applying a blanket 0.2 m/s minimum speed may well overlook valid datapoints recorded by particularly slow individuals, or in especially difficult regions. Finally, although these studies had the benefit of using large sample sizes, they both looked solely at the effect of the walking slope on speed, and did not explore additional variables.

Here we used a data-driven approach to explore the impact of all three factors discussed by Arnet on walking speeds. These are the walking slope, the hill slope and the terrain obstruction. We aimed to use these factors to develop a model for the walking speed for an average individual. As with the existing methods, this model did not seek to model individual effects, and would still require tuning based on personal ability or conditions.

Materials and methods

Data set, cleaning and key assumptions

Full details of the various datasets used in this study are provided in S1 File. Further, a detailed description of the data filtering processes, and choices/assumptions made during data processing are described in S2 File.

In summary, GPS tracks were obtained for hikes in the UK from Hikr.org [17] and OpenStreetMap (OSM) [18]. Elevation and walking slope values were calculated and added to every GPS point using data from the Ordnance Survey Terrain 5 Digital Terrain Map (DTM), which provides elevation data at 5 m intervals across the whole of the UK [19]. Hill slope values were found using the quadratic surface method [20, 21]. Each data point was then classified as on a paved road, on an unpaved road, or off road, determined by searching a 50 m radius around each point in an OSM Road dataset [22]. Paved and unpaved road classification was determined using [23], with the unpaved road values being ‘path’, ‘bridleway’ and ‘track’.

Terrain obstruction information was calculated using lidar datasets [2426], as the difference in values between a Digital Surface Map (DSM) and Digital Terrain Map (DTM). This meant that any physical feature which protruded from the ground was regarded as an obstruction. We had access to lidar data at 2 m resolution covering large areas of England and Wales, but the coverage was not complete. Of our off-road data (∼2,900 km, spread across over 1,200 tracks), over 2,000 km had lidar data available. Exploration of the lidar data (see S5 File) showed that there was a clear drop in walking speeds once the height of an obstruction was greater than 10 cm, beyond which the speed was relatively constant. We used this information to classify points into heavy obstruction (>10 cm) or light obstruction (< = 10 cm) for modelling purposes.

Visual inspection of the tracks showed that a large number contained long breaks which could impact the accuracy of a walking speed model. Fig 2 shows examples of regions where breaks are visible in a GPS track, and the process developed to identify these regions is outlined in Algorithm 1.

Fig 2. A GPS track where 3 breaks can be identified by finding point clusters.

Fig 2

Clusters of points can form on a GPS track when a break is taken during a hike. By identifying these clusters as potential breaks we are able to remove most break periods from the datasets used for our analysis of walking speeds. For full details of these and other data filtering methods see S2 File. Background images from OpenStreetMap and OpenStreetMap Foundation [27], visualised using QGIS [28].

Algorithm 1 Breakfinding process for a GPX track segment

1: Breakpoint_list = ∅

2: Find the median distance (rmedian) and speed (smedian) of the segment

3: for point (pi) in segment do

4:  Calculate travel direction quadrant and point angle

5:  Calculate break likelihood using the point speed and angle

6:  if speed == 0 or distance >1 km or duration >3 minutes then

7:   Breakpoint_list += pi

8:  end if

9:  if speed >10 km/h and duration(pi−1) >3 minutes then

10:   Breakpoint_list += pi

11:  end if

12: end for

13: for point (p) in segment do

14:  if Neighbourhood of p is a cluster (C) then ▷ See Defs 1 & 2, S2 File

15:   for point (pc) in C do

16:    if Neighbourhood of pc is a new cluster (Cn) then

17:     C = CCn

18:    end if

19:   end for

20:   Remove points at the ends of the cluster with low break likelihood

21:   Add ‘missing’ points to the cluster (to make a continuous run of points) to form a Potential Break (B*)

22:   if less than half the points in B* have low break likelihood and there is travel in opposite quadrants (Q1 & 3 or Q2 & 4) then

23:    Breakpoint_list += B*

24:   end if

25:  end if

26: end for

Where the datapoints in the original GPS track were under 50 m in length, they were merged together to minimise the effects of errors in the GPS location values. While doing this, the resulting distance was the sum of all distances in the constituent GPS points, so may be longer than the straight line distance between co-ordinates. Similarly, both hill and walking slope values, as well as obstruction height, were calculated as the weighted average of constituent points, weighted by point duration.

While the Hikr dataset consisted of tracks which were tagged as a walk or hike, within some of these there were segments where it was clear that the participant was driving to or from the hike location, based on the observed speeds. The OSM data, on the other hand, was not filtered by transport type. There were a large number of tracks which were clearly from faster modes of transport, as their speed was implausible for a hiker. A process to remove these non-walking tracks and segments was created, whereby the known Hikr walking segments were used to create filtering bounds of plausible walking speeds, which could then be applied to the remainder of the dataset. This process is summarised in Algorithm 2.

Algorithm 2: Filtering process for GPS data from Hikr and OpenStreetMap

1: Remove duplicate segments (containing sections with identical start location, end location, start time and duration)

2: Remove all segments with median speed >10 km/h

3: Remove all breaks with duration >30 seconds

4: Remove all breaks containing points with speed >10 km/h or distance >1 km

5: Merge remaining points into sections at least 50 m in length.

6: Recursively remove points with speed >10 km/h adjacent to a break, or the end of the track

7:

8: if Hikr data then

9:  if segment mean speed >10 km/h then

10:   remove segment

11:  end if

12:  Calculate filtering bounds    ▷ Eqs (1)—(4), S2 File

13: else

14:  Identify Key Points    ▷ see S2 File

15:  Remove single datapoints between Key Points

16:  Remove points where median speed between consecutive key points >Eq (1)

17:  while segment length is not consistent do

18:   Remove points with speed >10 km/h adjacent to a break, or the end of the track

19:   if segment median speed >Eq (1) or segment minimum speed >Eq (2) or segment upper quartile speed >Eq (3) or segment upper whisker speed <Eq (4) or segment duretion <2.5 minutes then

20:    Remove segment

21:   end if

22:  end while

23: end if

24:

25: Combine all segments into a single dataset

26: Remove the fastest and slowest 0.5% of the data

Following this, a decision was made to remove data from tracks found in Scotland. Lidar data covering the walking tracks was necessary to model the terrain obstruction, and was not sufficiently available in Scotland at the time of the study. Furthermore, analysis showed that that walking speeds in Scotland were at the extreme end of what is seen throughout the rest of the UK (see S4 File). Including this data without also including a corresponding extreme dataset where lidar data is available may result in incorrect modelling. All OSM track segments which took place within Scotland were excluded from further processing. Similarly Hikr tracks which were tagged as taking place in Scotland, and which fully took place in Scotland were excluded.

Our final modelling dataset consisted of 7,636 GPS tracks from England and Wales, with over 1.4 million individual data points and almost 88,000 km of travel. Each datapoint represented approximately 50–100 m of travel, and contained:

  • Start coordinate

  • End coordinate

  • Start time

  • Duration

  • Distance

  • Speed

  • Elevation

  • Walking slope

  • Hill slope

  • On-road flag

  • Paved road flag (if on-road)

  • Obstruction data available flag (if off-road)

  • Heavy obstruction flag (if off-road and obstruction data available)

Modelling

Model formulation

Pilot studies were conducted to identify an appropriate model framework, using tracks within Scotland (see S3 File). Generalised linear model (GLM) and generalised additive model (GAM) approaches were explored, and within both we looked at the relationship between the walking and hill slopes, and the walking speed, with a small number of prior assumptions. As it is more challenging to walk on steeper slopes, for both the hill and walking slope components we knew that the walking speed should be a decreasing function of the magnitude of slope (with some allowance for faster walking speeds on mild descents). Models which failed to predict this were removed under the assumption that the data were overfitted. Furthermore, previous work [11, 2931] has identified the existence of a critical gradient; the angle at which it is faster to zig-zag up a hill, rather than ascend directly. This occurs at a walking slope of around 15—21 degrees, so models which failed to predict the critical gradient occurring below 21 degrees were removed.

10-fold cross-validation was used to compare the remaining model parameters, looking at R-squared values, root-mean-squared error (RMSE) and mean absolute error. Where multiple models performed equally well, the simplest model was selected for ease of interpretabilty and real-world application. The selected model type was a Generalised Linear Model (GLM). Models were implemented using R version 3.6.1 [32].

Terrain types

Each of the three road types (paved road, unpaved road, off-road) was included in the model, both as factor variables, and as interaction terms with each of the slope variables.

Before adding terrain obstruction data to the model, we checked that there was no systematic difference between the walking speeds in regions where we had lidar data, and regions where we did not (see S5 File). Thus our findings in regions where lidar data was available could be extended to those where it was unavailable. Factor variables were then added to the model for each obstruction level (heavy, light or unknown obstruction).

Statistical analysis

Variables within the model were tested for significance using the Wald test, which allows us to account for correlation between points within the same track (coeftest function within lmtest package in R).

To measure the impact of our model, we compared walking speed predictions of our model against those of Naismith’s, Tobler’s and Campbell et al.’s models. Four different metrics were compared; the average percentage error, mean squared error (MSE), root-mean squared error (RMSE) and R squared value. These were explored when looking at both individual 50 m track sections, as well as predicted walking times for tracks as a whole. Finally, we isolated the off-road track sections in order to assess the improvement of our model at predicting walking speeds for off-path travel.

Results

We started by assembling a dataset of hikes derived from approximately 20,000 public GPS tracks. These tracks recorded a variety of transport methods and required significant filtering. This process included iterative data cleaning to remove erroneous or non-walking data and identify/remove breaks (e.g. Fig 2) to give us a final usable dataset containing 7,636 GPS tracks, with over 1.4 million individual data points and covering almost 88,000 km of travel in the U.K. Each data point represents at least 50 m of travel (with a mean distance of 60.3 m), and the breakdown of the data by slope angle and terrain type is shown in Table 1. Previous research has found that most walking takes place on low walking slopes [33], and this is evidenced by our data (∼98% of our data was from walking slopes of under 10 degrees).

Table 1. Total distance of data under different terrain conditions (km).

Hill Slope (degrees) |Walking Slope| (degrees)
0–10 10–20 >20 0–10 10–20 >20
Paved road 62159.1 7841.2 2081.9 70726.5 1277.3 78.4
Unpaved road 9996.9 2210.3 700.7 12421.7 460.0 26.2
Off Road (obstruction unknown) 773.5 114.2 17.8 871.7 31.7 2.0
Off Road (light obstruction) 1282.9 150.1 23.8 1424.6 30.6 1.7
Off Road (heavy obstruction) 428.7 105.2 28.5 543.5 18.5 0.4

Our curated hike dataset allowed us to create a data-driven model which we can directly compare with existing walking speed algorithms. The model formulation was selected using a small-scale exploratory study which considered data from Scotland (see S3 File). In this exploratory study, multiple different model types were explored which could fit the data, and which matched existing knowledge about walking speeds. Cross-validation methods showed that there was very little difference in performance of the best models, therefore the final model was a Generalised Linear Model (GLM), which was chosen as it was the simplest of those tested (we had no evidence that a more complex model would be superior). This choice also meant that our model was both easy to interpret, and simple to apply to future work.

This final GLM model included all three of the variables suggested by Arnet [12]:

v=exp(a+bφ+cθ+dθ2) (1)

where

v = walking speed (km/h)

φ = hill slope angle (degrees)

θ = walking slope angle (degrees)

Terrain obstruction level was included as a factor variable, while we considered the road types as both factor variables and interaction terms. Not all terms had a significant effect on all variables; we therefore created a model with all possible terms, and removed them one at a time (in order of least significance) until all remaining terms were significant to at least 95% confidence level (using Wald test). The final values for a, b, c and d are given in Table 2 for each of the terrain obstruction levels and road types. The critical gradient for this model is between 14—16 degrees when walking uphill and -16 − -18 degrees when walking downhill (depending on road and obstruction conditions), which is in line with previous findings.

Table 2. Final walking speed model variable coefficients.

a b c d
Paved road 1.580 -0.00389 -0.00726 -0.00218
Unpaved road 1.580 -0.00389 -0.00965 -0.00248
Off-road (obstruction unknown) 1.536 -0.00731 -0.00965 -0.00187
Off-road (light obstruction) 1.580 -0.00731 -0.00965 -0.00187
Off-road (heavy obstruction) 1.443 -0.00731 -0.00965 -0.00187

Fig 3 shows the predicted walking speeds under different conditions. The importance of including both the hill slope and terrain obstruction variables can be clearly seen when looking at the Off Road Light Obstruction speed predictions. When directly ascending or descending a slope, the walking speed is comparable to walking on a road. However, when traversing a slope while off road, the walking speed is comparable to traversing a slope of double the gradient while on a road or path. Similarly, comparing the walking speed predictions of Off Road Light Obstruction and Off Road Heavy Obstruction reveals that just 10 cm of vegetation (our cutoff point for heavy obstruction) can reduce the walking speed by more than 0.5 km/h.

Fig 3. Walking speed predictions under different terrain conditions.

Fig 3

When: (A) travelling directly up or down hills of varying slope (walking slope), (B) traversing across hills of varying slope (hill slope).

Fig 4 shows the same walking speed predictions as Fig 3, alongside the confidence interval for the mean walking speed for each terrain type. In the low-slope regions where most walking occurs, our model fits closely with the mean data confidence intervals. Our model does deviate from the confidence interval in some areas, particularly in high-slope and off-road regions. However, these are also the areas where we have the least amount of data (see Table 1). In Fig 4J the confidence interval for the mean would suggest that it is faster to walk on hill slopes of 30 degrees than hill slopes of 10 degrees. We have less than 30 km of data recorded in heavy obstruction regions on hill slopes of over 20 degrees, and less than 20 km of this had a walking slope magnitude of under 5 degrees (indicating that the slope was being traversed). Further, even within this range, the data is skewed towards the lower hill slope values. This lack of data explains the widening confidence interval, and counter-intuitive observations and we suggest that a targeted study would be required to collect more data in this region.

Fig 4. Walking speed predictions under different terrain conditions.

Fig 4

When: (A,C,E,G,I) travelling directly up or down hills of varying slope (walking slope), (B,D,F,H,J) traversing across hills of varying slope (hill slope). Also shown in each plot is the 95% confidence interval of the mean value of the walking speed for the terrain type, calculated at 5 degree intervals, using data bins with a width of 10 degrees. Note that the confidence intervals were calculated using only data which is within 5 degrees of directly ascending (A,C,E,G,I) or traversing (B,D,F,H,J) the slope.

Fig 5 compares the Paved Road and Off Road Heavy Obstruction speed predictions from our model against the existing functions from Naismith, Tobler and Campbell et al. When looking at the walking slope, the largest areas of deviation between our model and Naismith’s rule occurs when descending a slope, as Naismith’s rule does not predict a reduced speed in this scenario. For both Tobler’s and Campbell et al.’s functions, the shape of the walking slope component is relatively similar to our new model, with the main distinction being the peak predicted speed on flat ground. None of the existing functions account for the hill slope, which leads to large disparities when predicting the walking speed for slope traversals. A further example of this can be seen in S6 File, which shows the walking speeds for a simulated off-road route which encounters the full range of hill and walking slopes.

Fig 5. Comparison of new model and existing hiking functions.

Fig 5

Predicted walking speeds of the new model, Naismith’s rule, Tobler’s function and Campbell et al.’s function when: (A, C, E) travelling directly up or down hills of varying slope (walking slope), (B, D, F) traversing across hills of varying slope (hill slope).

When comparing the performances of each of the models (Table 3), the predicted speeds for individual 50 m sections had a lower RMSE and percentage error, and a higher R squared value using our new model than in the existing ones. The R-squared value is still very low, however we suggest that this is due to the variability within the data. We have previously acknowledged that there are many individual effects which can impact the walking speed, and which we did not attempt to capture in our model. Instead it captures the general trend of the walking speed for an average individual under average conditions, and does this better than existing models (evidenced by the improved RMSE).

Table 3. Comparison of new model against existing methods to calculate walking speeds.

New Model Naismith Tobler Campbell
Average % error 23.68 26.36 26.17 25.33
MSE 1.20 1.61 1.53 1.58
RMSE 1.10 1.27 1.24 1.26
R2 0.09 -0.22 -0.16 -0.19

To isolate the impact of each of the slope variables, we filtered the results to look at the data where a slope was being directly climbed or traversed. Figs 6A, 6B, 7A and 7B show the RMSE and mean residuals for each of the models, for data which was within 5 degrees of directly climbing (A) or traversing (B) hills of varying slope. From this we can clearly see that Naismith’s rule consistently overestimates walking speeds when descending a slope, and underestimates speeds when climbing a slope. When ascending or descending a slope, the RMSE of our GLM is similar to that of Tobler’s hiking function. However, one of the main areas where we see an improvement using our model is on slight declines. Tobler’s hiking function suggests that walking speed increases on mild descents up to a maximum of 6 km/h. It is clear from Fig 6A, that Tobler’s function overestimates the walking speed in this region. Campbell et al.’s function has a slightly lower RMSE value than our new model on the steepest walking slopes, however it underestimates the walking speeds on flat ground and mild slopes; the regions where most walking occurs. Improved walking speed predictions in this region therefore have the greatest impact in real-world situations. Within this region our model consistently has a lower RMSE than the existing functions, and a mean residual error close to 0 km/h.

Fig 6. Comparing RMSE values for the new model, Naismith’s rule, Tobler’s function and Campbell et al.’s function.

Fig 6

When: (A) travelling directly up or down hills of varying slope (all data, walking slope), (B) traversing across hills of varying slope (all data, hill slope), (C) travelling directly up or down hills of varying slope (off-road data only, walking slope), (D) traversing across hills of varying slope (off-road data only, hill slope). Campbell et al.’s function does not provide off-road speed estimates, so was not included in the off-road data comparisons.

Fig 7. Comparing mean residual values for the new model, Naismith’s rule, Tobler’s function and Campbell et al.’s function.

Fig 7

When: (A) travelling directly up or down hills of varying slope (all data, walking slope), (B) traversing across hills of varying slope (all data, hill slope), (C) travelling directly up or down hills of varying slope (off-road data only, walking slope), (D) traversing across hills of varying slope (off-road data only, hill slope). Campbell et al.’s function does not provide off-road speed estimates, so was not included in the off-road data comparisons.

We also see an improvement in RMSE when using our model to predict speeds for hill traversals (Fig 6B). We can note from Fig 7B that both Naismith’s rule and Tobler’s hiking function consistently overestimate the walking speed when traversing a slope, as they do not take into account the impact that the hill slope has on reducing walking speeds. The performance of Campbell et al’s model improves as the hill slope increases, although we suggest this is more due to it underestimating the speed on shallow slopes. We do see that the average error in our model increases as the hill slope increases, but we believe that this is due to limited volumes of data at high hill slopes (∼0.5% of our data occurs on hill slopes steeper than 40 degrees).

As well as looking at the overall performance of our new model, we looked to explore how well our model performed in off-road conditions, compared to the off-road adjustments for the existing functions (Naismith’s reduced base speed of 4 km/h, and Tobler’s correction factor of 0.6). Figs 6C, 6D, 7C and 7D show the RMSE and mean residuals, only considering data which was recorded in off-road conditions. From Figs 6C and 7C it is clear that Tobler’s function consistently underestimates the walking speed when off-road. The factor of 0.6 is a larger reduction in walking speed than is observed in practice. As we found when looking at our data as a whole, Naismith’s rule underestimates the walking speed when climbing a slope and overestimates when descending a slope. Our new model does not suffer from these problems, with both a lower RMSE and lower absolute mean residual value across all walking slopes. Both of these existing models also consistently underestimate walking speeds when traversing a slope, unlike our new model which has a mean residual of less than 0.4 km/h on slopes of up to 35 degrees. The error in predictions of our new model does increase as the hill slope increases, though the RMSE is generally lower than seen in the existing models. On the steepest hill slopes our model appears to perform less well than the existing ones, though only 0.2% of our off-road data occurred on a hill slope steeper than 40 degrees.

Although we have shown an improvement in walking speed predictions over short sections of routes, this did not translate to similar results when looking at predicted walking times for routes as a whole. Our model and all of the existing models which we have explored here had an average percentage error of 13.5%—15.5% when predicting the time taken for a complete route. However, based on the errors seen in Figs 6 and 7, we believe that this is a result of errors cancelling out over the course of a hike. For example while ascending a hill, Naismith’s rule will underestimate the walking speed (and thus overestimate the walking time), but it will then overestimate the walking speed on the subsequent descent, leading to a relatively accurate total time estimate. The results here suggest that Naismith’s rule, and other existing functions, are still a good rule of thumb to calculate route times as a whole, but time estimates for individual sections of a route will be less accurate than when using the new model found here.

Discussion

We have developed a model for walking speed which is very robust, due the large volume of data (88,000 km) used to build it, and which correlates with the data over a wider range of conditions than commonly used formulae. Data from tracks confirms that each of the walking slope, the hill slope and the terrain type or obstruction are significant factors in determining walking speeds. The model improves on existing methods to predict walking speeds (Figs 6 & 7). We have also shown the specific improvement that our new model has on predicting walking speeds in off-road conditions, compared to the simple off-road speed reductions used by existing models. The existing methods to calculate walking speeds require tuning for use in real-world scenarios, as there are many factors which can affect an individual’s walking speed beyond the slope and obstruction level (such as the weather, fitness level or age) [24]. The model presented here requires the same tuning as these existing methods but provides more a more accurate population average walking speed across a wide range of terrain and slope conditions.

Our results confirm that Naismith’s rule (Fig 1) is still a good rule-of-thumb to use when estimating the total walking time for a route, especially in situations where the calculation must be done by hand. However, the findings here can be used as an addition to Naismith’s rule; it is likely that (under Naismith’s rule) the predicted ascent time will be overestimated and the predicted descent time will be underestimated. It is not uncommon for hikers to contact one another when they reach the summit of a hill, and provide an estimated arrival time back at the campsite. Knowing that the descent will likely take longer than estimated by Naismith’s rule will result in more accurate arrival estimations being given. Similarly, the knowledge of how the hill slope reduces walking speeds, or that just 10 cm of vegetation can reduce walking speeds by up to 0.6 km/h may well affect route choices made when out on a walk. For example, if a hiker is following a footpath, but can see from their map that the path forms a large curve then they can use our findings to decide whether it will be faster to travel off-road and cut the corner. On flat terrain with heavy levels of obstruction, our model suggests that such a short cut will be faster if the distance covered on the path is more than 15% longer than the off-road distance. Speed is not the only factor which would affect this decision, as safety and navigability are also important variables, but these results can help people make more informed choices when on a hike.

The benefit of using crowdsourced GPS data to build our model is also a limitation of the approach, as we did not have control over data collection. This meant that models were unable to account for any bias in our data such as group size, ability and composition, or other potential variables such as weather conditions, as factors in determining walking speed (although we would expect the volume of data to cause most of these effects to average out).

Unlike previous work [8], we did not use fixed values to classify breaks and non-walking or hiking tracks. Instead we developed filters based on the attributes of known walking data (see S2 File). The methods used to filter the datasets were blinded to the outcome of the model generation, the choice of filtering methods will have had an impact on the dataset and subsequent model and no ground truth was available against which to test our assumptions.

Our method of calculating the terrain obstruction value was relatively crude, looking only at the obstruction height at each GPS point. While this did prove to be successful, and we observed a clear difference in walking speeds between areas of light and heavy obstruction (see S5 File), the inaccuracies present within GPS data may have led to some erroneous obstruction measurements, for example in a field sparsely populated with trees. In future, efforts should be made to refine this approach, such as considering the average obstruction level over a wider area around each point.

A further limitation of our data came when we looked to classify points into paved roads, unpaved roads or off-road. A combination of GPS drift and map error means that there is significant uncertainty and so we had to use a search radius around each data point to identify potential roads. We suspect that we were likely overclassifying tracks on roads. While our model appears to be robust to this overclassification (due to the volumes of correctly classified data used), the overclassification left us with a reduced number of off-road datapoints from which to predict off-road travel speeds.

Furthermore, the use of crowdsourced data meant that all of our data came from ‘walkable’ regions by definition. When including the terrain obstruction variable, we were unable to determine if there are levels of terrain obstruction which makes walking impossible. Similarly, the vast majority of the data was collected on shallow hill- and walking slopes, leading to a sparcity of data in steeper areas. While this does mean that we can be very confident about our walking speed predictions in less steep regions (where most walking occurs), it is unclear whether the lack of data on steeper regions is a result of steep slopes being relatively rare, or that they cannot be easily navigated, so hikers chose an alternate path. As described above we had to make a number of assumptions regarding data filtering and processing including model selection, and other choices may give different results. To support anyone who wants to challenge or test these assumptions, or try different models, we have made all our code available on Github. Further, all of the data sources used are detailed in S1 File and the filters/assumptions we used to clean the data are fully detailed in S2 File.

Conclusion

Widely used algorithms (e.g. Naismith’s rule) for estimating walking/hiking speed are simple to understand, very easy to calculate but are based on limited observations. Here we curated a dataset of almost 88,000 km of walking and hiking data. We found that the existing algorithms perform quite well against the dataset but they tend to overestimate ascent time, underestimate descent time and most ignore terrain obstruction and hill slope both of which we found to be significant factors. We used the data to derive a new model that takes into account these variables. We demonstrated that the model provides more accurate walking speeds than the existing methods in all scenarios, and particularly in off-road regions. By providing improved walking speed predictions in these off-road regions, we have enabled more accurate calculations of the fastest route to or from any given location, which could save minutes in an emergency situation where every second is important.

Supporting information

S1 File. Data sources.

(PDF)

S2 File. Data acquisition and preparation.

(PDF)

S3 File. Exploratory data modelling study.

(PDF)

S4 File. Exploring the differences between Scotland and the rest of the UK.

(PDF)

S5 File. Exploring the impact of terrain obstruction.

(PDF)

S6 File. Comparison of walking speed changes while crossing a simulated off-road terrain region.

(PDF)

Acknowledgments

Preprocessing of the GPX files made use of the resources provided by the Edinburgh Compute and Data Facility (ECDF) [34].

Data Availability

Data cannot be shared publicly because it was accessed under the terms of the Ordnance Survey Educational User Licence, and cannot be made directly available to third parties The data underlying the results presented in the study are available from the sources listed in S1 File. URLs to the original Hikr and OpenStreetMap data are provided in S1 File The elevation and Lidar data used throughout were accessed through Digimap (link in S1 File). The specific data sources and resolutions are listed. For each data source, all available UK data at the time of the study was requested (the dates each dataset were accessed are also provided in S1 File). The study can be replicated by others if the listed data is downloaded, and detailed steps to replicate the data processing steps are provided in S2 File. Further, the original code used to process the data is available on Github (link at the end of Methods section) The Hikr and OpenStreetmap data are publicly available The data obtained through Digimap was accessed under an Educational User License. This is not fully public, but is available freely to all educational users - this covers: "All activities that a fair-minded and reasonable person would agree falls within the spirit and intention of ‘Educational Use’. Educational use at all levels – including schools, colleges, universities and research councils, (whether on site or remotely); Educational Use within an Elective Home Education." (full licence here: https://digimap.edina.ac.uk/help/copyright-and-licensing/os_eula/).

Funding Statement

Author Andrew Wood was funded by the UK Engineering and Physical Sciences Research Council (grant EP/R513209/1), https://www.ukri.org/councils/epsrc/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

Decision Letter 0

Yuxia Wang

24 Jul 2023

PONE-D-23-19311Improved prediction of hiking speeds using a data driven approachPLOS ONE

Dear Dr. Wood,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 07 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Yuxia Wang

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following in the Acknowledgments Section of your manuscript: 

   "This work was funded by the UK Engineering and Physical Sciences Research Council (grant EP/R513209/1) and the University of Edinburgh. It was supported in data acquisition by Ordnance Survey and Digimap.Preprocessing of the GPX files made use of the resources provided by the Edinburgh Compute and Data Facility (ECDF)"

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. 

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 

   "Author Andrew Wood funded by the UK Engineering and Physical Sciences Research Council (grant EP/R513209/1), https://www.ukri.org/councils/epsrc/

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

"Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

4. We note that Figure 2 in your submission contain map/satellite images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (a) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (b) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Figure 2 to publish the content specifically under the CC BY 4.0 license.  

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This research centers on the prediction of hiking speed using a novel generalized linear model. By integrating public GPS data, this model has the ability to forecast walking speed by considering the gradient of the terrain (hill slope) and the degree of terrain obstruction. Despite the announcement of substantial improvement compared to various established models, there are several revisions that need to be addressed first.

1. The methods of the paper need further clarification. First, detailed descriptions of the data and data processing should be included in the main body of the paper, rather than in the Supporting Information, as this is a paper focusing on a data-driven approach. This leads to the structure of the paper being fragmented and some terminologies sound wired and confusing. For instance, what is meant by "datapoint" mentioned in line 152?

2. Additionally, more details regarding the Generalised Linear Model, which forms the foundation of the research, should be provided. Questions that need to be addressed include: What are the inputs? How are these variables organized? What is the relationship between variables a, b, c, d and the results presented in Table 1? How does the critical gradient affect the performance of the model? Are the speed and characteristics aggregated at a point level or in line string segments?

3. The introduction section of the document is excessively lengthy and would benefit from a reorganization of its content. I suggest incorporating some of the introductory material about the hiking speed model into the methods section. Furthermore, the introductory material concerning the hiking speed model should be summarized (in introduction section) and present the formulations instead of listing various models and qualitatively describe these models (in method section). This will facilitate the comprehension of the paper.

4. Considering that the model utilizes multiple variables and employs GLM for regression, the improvement achieved by the model is not as significant, which could potentially impact its practical application. While the model is expected to outperform rule-based models that do not fit the empirical data in previous studies, I have reservations about its applicability in real-life scenarios. The author should provide more details about the results and discuss the implications of these performance changes on hiking activities.

5. In terms of the conclusion, the research should interpret the social implications of the findings derived from the data. The importance of hiking speed estimations and the contributions of the paper should be emphasized in both introduction and conclusion.

Reviewer #2: Summary

In this paper, Wood et al. propose a new set of equations for predicting travel rates as a function of landscape conditions. Whereas several functions exist for the prediction of rates driven by walking slope (slope in the direction of pedestrian travel), few take into consideration the hill slope (slope in the direction of the terrain’s steepest descent) and/or the presence of terrain obstructions above the ground surface. Wood et al. incorporate both of these two characteristics, finding significant effects of each and providing novel quantitative insight that could be valuable for more robust travel rate predictions, particularly in off-road/off-trail environments. The paper is well presented, and the topic is of wide interdisciplinary interest – well-suited for PLOS ONE. I have a few major and minor concerns that I believe should be addressed prior to publication, but I do believe the work will eventually be a valuable contribution to the existing literature.

Major Comments

- It seems terrain obstructions could be handled in a more elegant way. Simply using height in a binary fashion as the basis of determining off-road impedance seems like a missed opportunity. Height is definitely one important consideration (it’s easier to step over short vegetation than tall vegetation), but density is another arguably more important one. One can easily walk through tall but sparse vegetation just as one can easily walk through short but dense vegetation. Some focal measure of density (e.g., the number of lidar-derived pixels above a certain height threshold within a given neighborhood) would be worth examining.

- I would like to see comparisons to more existing travel rate functions. Several papers have already demonstrated superiority over Tobler and Naismith. To ensure that this work truly represents a valuable contribution to the literature requires comparisons to more contemporary algorithms. Although it’s not necessarily the most important statistical measure of model performance in the context of GLMs, an R-squared value of 0.09 does not leave the reader with a high degree of confidence in your new model.

- Related to the point above, you point out the travel rates are largely just a means to getting at a more useful measure in travel time. You provide several examples about the importance of estimating travel time. Can you demonstrate how your new algorithm provides an opportunity for the accurate estimation of travel time? I think the results would look a lot better than your travel rate estimates, since predicting time over a longer hike should have smaller margins of error than instantaneous travel rates derived from erroneous GNSS data.

- I think a lot of valuable information that could be included in the main manuscript is placed in the supplementary materials. For example, the main manuscript does not have any depiction of the final function forms (line plots of speed vs. slope, e.g., S4 Fig 3), which seems like an important omission. Also S3 Fig 1 provides really useful insight into the complexity of trying to predict travel rates, given the extreme variability in the data.

Minor Comments

L4: Extremely minor point but I’m not sure I would consider it “standard practice”, per se. It’s certainly *good* practice.

L2-8: You begin the paragraph stating that travel rates are important in “many situations” and then only proceed to give one example. You might consider adding more to frame the importance of your study.

L14-15: I appreciate the split into individual and external factors, but to say that the effects of slope, for example, “will be consistent across all individuals” isn’t true. Slope, and other landscape conditions, affects people very differently.

L2-23: The first three paragraphs have not a single reference to existing literature. I’ll grant you that a lot of the discussion up to this point is based on intuition/experience, but there are several statements that would carry more weight with references to existing studies.

L33-34: I don’t understand this statement. Are you suggesting that Naismith’s rule does not account for slope? That’s *precisely* what Naismith’s rule does… Please clarify/rephrase.

L67: very different situations than what?

L84 and throughout: You might consider the more universal term of global navigation satellite systems (GNSS) rather than the US-specific Global Positioning System (GPS)

L94 and elsewhere: It seems you should define “terrain obstruction” explicitly. I assume this means the presence of vegetation, primarily? But I could also imagine a cliff being considered a terrain obstruction.

Figure 2 caption: should read “can *be* identified”

L145-151: So speeds are based solely on time between sequential GNSS positions and the timing of those same positions? Wouldn’t this provide an underestimate of speeds? I’m picturing a windy, zig-zagging trail… If a GNSS position is only recorded every, say, 30 seconds, then the resulting “track” may appear to move straight through the zigzags, giving an underestimate of the distance actually traveled.

L174: more important than the version of RStudio (simply the IDE) is the version of R

S2

In the Break Finding section, about 2/3 through the second paragraph, you say “…paused recording for a break – example here”. Is this an error?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Dec 18;18(12):e0295848. doi: 10.1371/journal.pone.0295848.r002

Author response to Decision Letter 0


7 Sep 2023

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

• The paper was prepared using the Plos Latex Template (https://www.overleaf.com/latex/templates/latex-template-for-plos-public-library-of-science-articles/wdmgcwzgvhnn) on Overleaf, so we believe it meets the requirements. Please let us know if further changes are required.

2. Thank you for stating the following in the Acknowledgments Section of your manuscript:

"This work was funded by the UK Engineering and Physical Sciences Research Council (grant EP/R513209/1) and the University of Edinburgh. It was supported in data acquisition by Ordnance Survey and Digimap.Preprocessing of the GPX files made use of the resources provided by the Edinburgh Compute and Data Facility (ECDF)"

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

"Author Andrew Wood funded by the UK Engineering and Physical Sciences Research Council (grant EP/R513209/1), https://www.ukri.org/councils/epsrc/

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

• We have removed the funding source from the acknowledgements section. The current funding statement is correct.

3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

"Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

• It is not possible to share the data directly, as it is under copyright. Data sources and a detailed methodology to extract the data from original sources to reproduce the work have been provided in the Supplementary Information. This has been confirmed by the reviewers in their response to Question 3.

4. We note that Figure 2 in your submission contain map/satellite images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (a) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (b) remove the figures from your submission:

• Figure 2 has been changed to use imagery from OpenStreetMap which is available under the Open Data Commons Open Database License

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This research centers on the prediction of hiking speed using a novel generalized linear model. By integrating public GPS data, this model has the ability to forecast walking speed by considering the gradient of the terrain (hill slope) and the degree of terrain obstruction. Despite the announcement of substantial improvement compared to various established models, there are several revisions that need to be addressed first.

1. The methods of the paper need further clarification. First, detailed descriptions of the data and data processing should be included in the main body of the paper, rather than in the Supporting Information, as this is a paper focusing on a data-driven approach. This leads to the structure of the paper being fragmented and some terminologies sound wired and confusing. For instance, what is meant by "datapoint" mentioned in line 152?

• Algorithms explaining the data processing steps have been added to the main body of the paper (pages 6-7). Additional minor changes have been added to the Methods to provide more explanation

2. Additionally, more details regarding the Generalised Linear Model, which forms the foundation of the research, should be provided. Questions that need to be addressed include: What are the inputs? How are these variables organized? What is the relationship between variables a, b, c, d and the results presented in Table 1? How does the critical gradient affect the performance of the model? Are the speed and characteristics aggregated at a point level or in line string segments?

• Table 1 has changed to better reflect the GLM. The critical gradient does not affect the performance of the model, other than that the model was selected from those which put the critical gradient in the correct region. This has been added to the main body of the paper (Lines 179-183).

3. The introduction section of the document is excessively lengthy and would benefit from a reorganization of its content. I suggest incorporating some of the introductory material about the hiking speed model into the methods section. Furthermore, the introductory material concerning the hiking speed model should be summarized (in introduction section) and present the formulations instead of listing various models and qualitatively describe these models (in method section). This will facilitate the comprehension of the paper.

• The introduction has been reduced in length, removing extraneous information. Information regarding the critical gradient and data has been moved to the materials and methods section

4. Considering that the model utilizes multiple variables and employs GLM for regression, the improvement achieved by the model is not as significant, which could potentially impact its practical application. While the model is expected to outperform rule-based models that do not fit the empirical data in previous studies, I have reservations about its applicability in real-life scenarios. The author should provide more details about the results and discuss the implications of these performance changes on hiking activities.

• Due to the crowdsourced nature of our dataset, we do not have a single route which has been recorded by multiple individuals. This makes time comparison to a real route difficult, due to the variance within walking speeds. We could cherrypick a route for which our model outperforms the existing ones, but have no way of knowing if that user was walking at the true population average speed, so feel that this would not further readers’ understanding.

• We have added a discussion on travel time over routes as a whole (where all models perform equally), and suggested that this is due to errors cancelling out (Lines 310-321, 333-339) We believe that the main strength of the new model is shown when predicting the walking speed and time for individual sections of a route.

• A simulation of a hike section has been added as S6 Supporting Information, which shows how the different models predict walking speeds while crossing a simulated hill. Walking time estimates for this simulated route are not given, as there is no way to know which of the models comes closest to predicting the true walking time.

• Hypothetical scenarios of situations where the new model provides an improvement over the current knowledge, and can impact decisions made while walking have been added to the discussion (Lines 341-348).

5. In terms of the conclusion, the research should interpret the social implications of the findings derived from the data. The importance of hiking speed estimations and the contributions of the paper should be emphasized in both introduction and conclusion.

• More examples of when the hiking time is important have been added to the introduction (Lines 8-13) The discussion has also been added with hypothetical scenarios of situations where the new model provides an improvement over the current knowledge (Lines 341-348).

Reviewer #2: Summary

In this paper, Wood et al. propose a new set of equations for predicting travel rates as a function of landscape conditions. Whereas several functions exist for the prediction of rates driven by walking slope (slope in the direction of pedestrian travel), few take into consideration the hill slope (slope in the direction of the terrain’s steepest descent) and/or the presence of terrain obstructions above the ground surface. Wood et al. incorporate both of these two characteristics, finding significant effects of each and providing novel quantitative insight that could be valuable for more robust travel rate predictions, particularly in off-road/off-trail environments. The paper is well presented, and the topic is of wide interdisciplinary interest – well-suited for PLOS ONE. I have a few major and minor concerns that I believe should be addressed prior to publication, but I do believe the work will eventually be a valuable contribution to the existing literature.

Major Comments

- It seems terrain obstructions could be handled in a more elegant way. Simply using height in a binary fashion as the basis of determining off-road impedance seems like a missed opportunity. Height is definitely one important consideration (it’s easier to step over short vegetation than tall vegetation), but density is another arguably more important one. One can easily walk through tall but sparse vegetation just as one can easily walk through short but dense vegetation. Some focal measure of density (e.g., the number of lidar-derived pixels above a certain height threshold within a given neighborhood) would be worth examining.

• Terrain obstruction had not been explored for its impact on walking speeds prior to this work. During exploration we found that the simple metric of obstruction height is very significant at predicting the walking speed. We acknowledge that this is a potential limitation and avenue for further research and have added this to the discussion (Lines 361-367).

- I would like to see comparisons to more existing travel rate functions. Several papers have already demonstrated superiority over Tobler and Naismith. To ensure that this work truly represents a valuable contribution to the literature requires comparisons to more contemporary algorithms. Although it’s not necessarily the most important statistical measure of model performance in the context of GLMs, an R-squared value of 0.09 does not leave the reader with a high degree of confidence in your new model.

• Other methods have not gained widespread use, likely due to the very small sample sizes which they were based on. A comparison with the most up to date work by Campbell et al, which also uses crowdsourced data, has been added throughout (Figs 1,4,5,6 and corresponding text)

- Related to the point above, you point out the travel rates are largely just a means to getting at a more useful measure in travel time. You provide several examples about the importance of estimating travel time. Can you demonstrate how your new algorithm provides an opportunity for the accurate estimation of travel time? I think the results would look a lot better than your travel rate estimates, since predicting time over a longer hike should have smaller margins of error than instantaneous travel rates derived from erroneous GNSS data.

• Due to the crowdsourced nature of our dataset, we do not have a single route which has been recorded by multiple individuals. This makes time comparison to a real route difficult, due to the variance within walking speeds. We could cherrypick a route for which our model outperforms the existing ones, but have no way of knowing if that user was walking at the true population average speed, so feel that this would not further readers’ understanding.

• We have added a discussion on travel time over routes as a whole (where all models perform equally), and suggested that this is due to errors cancelling out (Lines 310-321, 333-339) We believe that the main strength of the new model is shown when predicting the walking speed and time for individual sections of a route.

• A simulation of a hike section has been added as S6 Supporting Information, which shows how the different models predict walking speeds while crossing a simulated hill. Walking time estimates for this simulated route are not given, as there is no way to know which of the models comes closest to predicting the true walking time.

• Hypothetical scenarios of situations where the new model provides an improvement over the current knowledge, and can impact decisions made while walking have been added to the discussion (Lines 341-348).

- I think a lot of valuable information that could be included in the main manuscript is placed in the supplementary materials. For example, the main manuscript does not have any depiction of the final function forms (line plots of speed vs. slope, e.g., S4 Fig 3), which seems like an important omission. Also S3 Fig 1 provides really useful insight into the complexity of trying to predict travel rates, given the extreme variability in the data.

• Depictions of the final function forms have been added to the main body of the paper (Fig 3), as have Algorithms detailing the breakfinding and data filtering processes (pages 6,7)

Minor Comments

L4: Extremely minor point but I’m not sure I would consider it “standard practice”, per se. It’s certainly *good* practice.

• Changed to good practice and added citation (Line 4)

L2-8: You begin the paragraph stating that travel rates are important in “many situations” and then only proceed to give one example. You might consider adding more to frame the importance of your study.

• Further examples have been added (Lines 8-13)

L14-15: I appreciate the split into individual and external factors, but to say that the effects of slope, for example, “will be consistent across all individuals” isn’t true. Slope, and other landscape conditions, affects people very differently.

• We were trying to suggest that the variable will be consistent (ie the slope will always be that steep), rather than the effect of the variable. This has been changed to try and make this clearer (Lines 19-20)

L2-23: The first three paragraphs have not a single reference to existing literature. I’ll grant you that a lot of the discussion up to this point is based on intuition/experience, but there are several statements that would carry more weight with references to existing studies.

• References have been added to back up some of the statements (citations 1 & 2, lines 6,16)

L33-34: I don’t understand this statement. Are you suggesting that Naismith’s rule does not account for slope? That’s *precisely* what Naismith’s rule does… Please clarify/rephrase.

• Naismiths rule accounts for slope when walking uphill, but does not when walking downhill. This has been rephrased to make it clearer. (Lines 38-40)

L67: very different situations than what?

• This has been changed to emphasise that the participants were running and not walking (Line 60)

L84 and throughout: You might consider the more universal term of global navigation satellite systems (GNSS) rather than the US-specific Global Positioning System (GPS)

• GNSS has been added (line 77), with a note that GPS is the more frequently used term by the general public

• We chose to use GPS throughout the rest of the body of the paper, as it is the more widespread term, so felt that the paper is more accessible to a wider audience by using it

L94 and elsewhere: It seems you should define “terrain obstruction” explicitly. I assume this means the presence of vegetation, primarily? But I could also imagine a cliff being considered a terrain obstruction.

• A definition of terrain obstruction has been included (Line 116)

Figure 2 caption: should read “can *be* identified”

• Thank you, this has been changed

L145-151: So speeds are based solely on time between sequential GNSS positions and the timing of those same positions? Wouldn’t this provide an underestimate of speeds? I’m picturing a windy, zig-zagging trail… If a GNSS position is only recorded every, say, 30 seconds, then the resulting “track” may appear to move straight through the zigzags, giving an underestimate of the distance actually traveled.

• In practice, the data positions are generally recorded every 3-5 seconds as the device detects movement, so this is not an issue

L174: more important than the version of RStudio (simply the IDE) is the version of R

• Thank you, this has been changed (Line 188)

S2

In the Break Finding section, about 2/3 through the second paragraph, you say “…paused recording for a break – example here”. Is this an error?

• Thank you, this has been changed

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Yuxia Wang

9 Oct 2023

PONE-D-23-19311R1Improved prediction of hiking speeds using a data driven approachPLOS ONE

Dear Dr. Wood,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 23 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Yuxia Wang

Academic Editor

PLOS ONE

Additional Editor Comments:

Dear Authors,

We received two reviews of your manuscript. While Reviewer #2 recommended acceptance, Reviewer #1 pointed some significant concerns which need to be addressed. Therefore, I would like to suggestion a major revision. During the revision, please pay attention to the suggestion and comments of Reviewer #1. Please note that your revised version will be further assessed by external reviewers.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The revised version of the paper has undergone substantial improvements in terms of its writing, logic, and clarity. However, there remain some significant concerns that, in my opinion, should be addressed before publication.

Statistical Significance of Model Improvement: One critical concern, previously highlighted in the last round of review comments, pertains to the practical significance of the model's performance. Given that the model incorporates multiple variables and employs Generalized Linear Models (GLM) for regression, the reported improvement achieved by the model appears to be modest. This issue has also been noted by other reviewers, particularly concerning the low R-squared value of 0.09. The authors should not overlook this concern and should provide a more thorough discussion of the model's practical utility and limitations.

Detailed Presentation of Data-Driven Approach: As the paper focuses on a data-driven approach to modeling walking speed, it would greatly enhance the understanding of the methodology if the authors provided more quantitative characteristics of the original data. For instance, Figure 3 illustrates walking speed predictions under various terrain conditions. It is essential to include statistical characteristics of the actual speed data (e.g., average, standard deviation, or the 95% confidence interval) when individuals are traversing different types of terrain, derived from real datasets. Additionally, incorporating statistical characteristics of the ground truth data in Figure 4 would strengthen the paper's credibility and the evaluation of the model's performance.

Transparency in Exclusion of Data Points: It would be beneficial for readers to know the number and percentage of data points or sections that were excluded due to the filtering process. This transparency will provide a clearer picture of the data selection and processing steps.

Additional Minor Comments:

Introduction References: In the introduction section, consider providing more references when discussing the classification of factors that can impact walking speed into two groups. This will enhance the depth of the literature review and provide a stronger foundation for your work.

Model Performance Evaluation: Highlighting the fact that the model's performance evaluations are conducted at the section level should be done earlier in the paper to prevent any misunderstanding. Additionally, consider presenting statistical characteristics of the sections, such as their length and the distribution of slopes, to provide a comprehensive assessment.

Clarify Terminology: Distinguish between "hill slope" and "walking slope" throughout the paper to avoid confusion. For instance, in the caption for Figure 3, specify the meaning of "slope" in Figures 3A and 3B to enhance clarity for readers.

Overall, while the revisions have improved the paper's quality, addressing these concerns and making the suggested enhancements will further enhance the paper's scientific rigor and comprehensibility.

Reviewer #2: I thank the authors for addressing my comments thoroughly and am happy to recommend that this revised version be accepted for publication. It will be a valuable contribution to the literature.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Dec 18;18(12):e0295848. doi: 10.1371/journal.pone.0295848.r004

Author response to Decision Letter 1


22 Nov 2023

We would like to thank the reviewers for their additional comments and appreciate the improvements that these have made to the paper.

Statistical Significance of Model Improvement: One critical concern, previously highlighted in the last round of review comments, pertains to the practical significance of the model's performance. Given that the model incorporates multiple variables and employs Generalized Linear Models (GLM) for regression, the reported improvement achieved by the model appears to be modest. This issue has also been noted by other reviewers, particularly concerning the low R-squared value of 0.09. The authors should not overlook this concern and should provide a more thorough discussion of the model's practical utility and limitations.

The low R-squared indicates that there remains variability in the data that is not captured in the model. We have acknowledged this from the beginning by indicating that we know of many variables which will impact the speed (such as group ability etc), for which there are no data available to include in the model. This has been made more explicit at the end of the introduction [lines 99-101]. Further acknowledgement of this has also been added to the results [lines 288-292], and the following has been added to the discussion [Lines 356-361]: “The existing methods to calculate walking speeds require tuning for use in real-world scenarios, as there are many factors which can affect an individual's walking speed beyond the slope and obstruction level (such as the weather, fitness level or age). The model presented here requires the same tuning as these existing methods but provides more a more accurate population average walking speed across a wide range of terrain and slope conditions.”

Detailed Presentation of Data-Driven Approach: As the paper focuses on a data-driven approach to modeling walking speed, it would greatly enhance the understanding of the methodology if the authors provided more quantitative characteristics of the original data. For instance, Figure 3 illustrates walking speed predictions under various terrain conditions. It is essential to include statistical characteristics of the actual speed data (e.g., average, standard deviation, or the 95% confidence interval) when individuals are traversing different types of terrain, derived from real datasets. Additionally, incorporating statistical characteristics of the ground truth data in Figure 4 would strengthen the paper's credibility and the evaluation of the model's performance.

A new table (Table 1) and figure (Figure 4) have been added which illustrate the breakdown of the tracks by slope angle, and the confidence intervals for the mean walking speed. These demonstrate how our model closely fits the average walking speed, particularly in areas where the majority of walking occurs (where we have the most data), and thus where accuracy is most important. We have also added an explanation for the deviation from the confidence interval at high slopes [lines 260-272]

Transparency in Exclusion of Data Points: It would be beneficial for readers to know the number and percentage of data points or sections that were excluded due to the filtering process. This transparency will provide a clearer picture of the data selection and processing steps.

Responding to this point is tricky as “data points” were not truly defined or counted until after the initial filtering and merging. This means that there are no equivalent numbers for data points present in the unfiltered data that would support a direct comparison. However, we support the need for transparency and therefore all code used to filter the data is included for checking. The initial number of GPS tracks (~20,000) has been included in both Supplementary Information 2, and in the results. This reduced to ~7600 after processing (also mentioned in the results). The majority of the removed data was from non-walking tracks (mostly clear from the velocity), but a detailed breakdown of exclusions by filtering reasons was not recorded. The main details of filtering are in Supplementary Information 2 but the results section has been changed to make the need for significant filtering clearer. [Lines 217-220]

Additional Minor Comments:

Introduction References: In the introduction section, consider providing more references when discussing the classification of factors that can impact walking speed into two groups. This will enhance the depth of the literature review and provide a stronger foundation for your work.

Further references have been added, which discuss the factors which can affect walking speeds. [Line 15]

Model Performance Evaluation: Highlighting the fact that the model's performance evaluations are conducted at the section level should be done earlier in the paper to prevent any misunderstanding. Additionally, consider presenting statistical characteristics of the sections, such as their length and the distribution of slopes, to provide a comprehensive assessment.

A line has been added to the start of results, with the minimum and average point distance. [lines 222/223]. Table 1 has been added detailing the slope distributions of the data

Clarify Terminology: Distinguish between "hill slope" and "walking slope" throughout the paper to avoid confusion. For instance, in the caption for Figure 3, specify the meaning of "slope" in Figures 3A and 3B to enhance clarity for readers.

Figure captions have been adjusted as requested.

Attachment

Submitted filename: ResubmissionResponseToReviewers.docx

Decision Letter 2

Yuxia Wang

1 Dec 2023

Improved prediction of hiking speeds using a data driven approach

PONE-D-23-19311R2

Dear Dr. Wood,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Yuxia Wang

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I thank the authors for addressing my comments thoroughly and am happy to recommend that this revised version be accepted for publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

Acceptance letter

Yuxia Wang

8 Dec 2023

PONE-D-23-19311R2

Improved prediction of hiking speeds using a data driven approach

Dear Dr. Wood:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Yuxia Wang

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Data sources.

    (PDF)

    S2 File. Data acquisition and preparation.

    (PDF)

    S3 File. Exploratory data modelling study.

    (PDF)

    S4 File. Exploring the differences between Scotland and the rest of the UK.

    (PDF)

    S5 File. Exploring the impact of terrain obstruction.

    (PDF)

    S6 File. Comparison of walking speed changes while crossing a simulated off-road terrain region.

    (PDF)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: ResubmissionResponseToReviewers.docx

    Data Availability Statement

    Data cannot be shared publicly because it was accessed under the terms of the Ordnance Survey Educational User Licence, and cannot be made directly available to third parties The data underlying the results presented in the study are available from the sources listed in S1 File. URLs to the original Hikr and OpenStreetMap data are provided in S1 File The elevation and Lidar data used throughout were accessed through Digimap (link in S1 File). The specific data sources and resolutions are listed. For each data source, all available UK data at the time of the study was requested (the dates each dataset were accessed are also provided in S1 File). The study can be replicated by others if the listed data is downloaded, and detailed steps to replicate the data processing steps are provided in S2 File. Further, the original code used to process the data is available on Github (link at the end of Methods section) The Hikr and OpenStreetmap data are publicly available The data obtained through Digimap was accessed under an Educational User License. This is not fully public, but is available freely to all educational users - this covers: "All activities that a fair-minded and reasonable person would agree falls within the spirit and intention of ‘Educational Use’. Educational use at all levels – including schools, colleges, universities and research councils, (whether on site or remotely); Educational Use within an Elective Home Education." (full licence here: https://digimap.edina.ac.uk/help/copyright-and-licensing/os_eula/).


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES