Skip to main content
PLOS One logoLink to PLOS One
. 2022 Aug 3;17(8):e0271670. doi: 10.1371/journal.pone.0271670

Are there lane advantages in track and field?

David Munro 1,*
Editor: Roy Cerqueti2
PMCID: PMC9348673  PMID: 35921267

Abstract

Shorter distance events in track and field are replete with folk tales about which lane assignments on the track are advantageous. Estimating the causal effect of lane assignments on race times is a difficult task as lane assignments are typically non-random. To estimate these effects I exploit a random assignment rule for the first round of races in short distance events. Using twenty years of data from the IAAF world athletic championships and U20 world championships, there is no evidence of lane advantages in the 100m. Contrary to popular belief, the data suggest that outside lanes in the 200m and 400m produce faster race times. In the 800m, which is unique in having a lane break, there is some weak evidence that outside lanes producer slower race times, possibly reflecting the advantage of inside lanes having an established position on the track at the lane break. Given that these results do not support common convictions on lane advantages, they also serve as an interesting case study on false beliefs.

Introduction

In shorter distance track and field events one frequently encounters tales about lane advantages, that is, which lane assignments on the track produce the fastest event times. This track and field folklore is often heard from coaches, teammates, etc., however they are also codified in competition rules, and appear in the popular press (e.g. [1, 2]). These beliefs are held at the highest levels of the sport. Following his bronze medal win in the 100m at the 2020 Olympics Andre De Grasse noted: “I knew it was going to be a tough one after I drew lane nine. I didn’t have a great semifinal and I knew I had to come out and try and execute as best as I can” [3]. In the context of this paper, readers may find it interesting to note that De Grasse’s race in lane nine was his personal record.

Common narratives claim that in races with corners, running in the outside lane (typically 7 through 8 or 9) is disadvantageous as the runner cannot see any of their competitors, and that the very inside lanes (typically 1 and 2) are also seen as undesirable as they have the tightest corners. Therefore, the middle lanes of the track are deemed the most desirable. In support of the belief that inside corners are slower, researchers examining the biomechanics of running find evidence that tighter corners do in fact slow runners down. Tighter corners both reduce running speeds (e.g. [4, 5]) and have lower foot force production [6]. However, no such empirical evidence exists to support the claim that one’s inability to see competitors when running in an outside lane creates a disadvantage. While beliefs about lane advantages are commonly connected to races with corners, the De Grasse quote above highlights that these beliefs persist in events run on straightaways. There is also no empirical work examining the existence of lane advantages in straightaway races. While assessing the impact of seeing competitors, and its ultimate effect on race times, is clearly a relevant question in the context of track and field performance, it also relates more broadly to questions about the performance effects of motivational or psychological factors. For example, there is some evidence [7] that sports teams losing games at halftime end up winning the game more frequently (though [8], find opposing evidence), that professional golfers respond differently to the possibility of losses [9], and, in the non-sports world, that incentives framed as losses (as opposed to gains) can increase worker productivity [10]. This paper contributes to this literature by analyzing these motivation or psychological effects in a track and field context.

Estimating the causal impact of these lane advantages is a difficult empirical task as lane assignments are typically non-random and instead are a function of seed times or race times in prior elimination rounds of the event. Lanes deemed as advantageous in the folklore are assigned to runners with superior seed times or qualifying times. This endogenous assignment to treatment (lanes) prevents a causal interpretation of any differences observed in race times by lane.

Besides the biomechanical evidence, other researchers have approached this question in various ways. Using a mathematical model of track geometry, [11] exploit different track designs and find tighter corners (smaller radii) produce slower times. [12] examines the effects that lane assignments on world records, but does not control for endogenous assignment to lanes. [6] examines the effects of lane assignments on placings in races, and only find statistical differences in events with endogenous assignment to lanes.

To overcome the issue of endogenous assignment to lanes and obtain a causal estimate of the effect of lane assignments on race times, I exploit a random assignment rule used in the first round of major track meets. Using twenty years of data from the IAAF, I estimate these effects in the 100m, 200m, 400m, and 800m. Beginning with the 100m, the data suggests that lane assignments have no effect on race times. This null effect is precisely measured and, given the level of statistical power in the analysis, if true lane advantages do exist in the 100m, the precise null results suggest they must be quite small. In the 200m, there is robust evidence that outside lanes on the track produce the fastest race times. Further, average race times appear to be roughly monotonically decreasing with lane number. This is consistent with the evidence on the biomechanics of running, but inconsistent with the view that outside lanes are undesirable. The common belief in the track and field folklore is that the middle lanes (often 3–6) are the most desirable. Thus, these estimates suggest that these beliefs are incorrect. To give a brief sense of magnitude, depending on the estimation strategy, I find that lane 8 produces, on average, race times which are between 0.084 and 0.178 seconds faster than lane 2. While this is small in absolute magnitude, it could amount to important differences in race results given the standard deviation of race times in the data is less than 1 second.

Consistent with the results in the 200m, outside lanes in the 400m also appear to produce faster race times on average. Though these results are somewhat weaker statistically than those from the 200m. It is important to emphasize that as average race times increase so do the dispersion of race times, so statistical power becomes more of an issue in the 400m and 800m. In an alternative estimation approach which pools runners to increase statistical power, there is stronger statistical evidence that outside lanes in the 400m are faster on average and the magnitude of the effects are similar to those found in the 200m. Finally, in the 800m, there is some mixed evidence that outside lanes produce slower race times. The 800m is unique in this collection of events in its use of a lane break. Thus, the result that outside lanes produce slower times is consistent with the notion that inside lanes are advantageous in the 800m as those runners have an established position on the inside of the track at the lane break. Results from all these events are generally consistent across various statistical models.

These baseline results reflect the net impact of lane assignments where both the tighter corners (biomechanical effects) and the motivational or psychological effects related to seeing competitors could be simultaneously impacting race times. The evidence from the 200m highlights that if the motivational/psychological effects do slow runners down in outside lanes, they are dominated by the tight corners effect. Leveraging the fact that the outermost lane on the track (where runners see no other runners for some portion of the race in the 200 and 400m) are not perfectly correlated with lane number, I also estimate the marginal effect of being in the outermost lane. In the 200m, there is evidence that, all else equal, being in the outermost lane slows runners down. This is suggestive that race times may be influenced to some degree by motivational or psychological factors related to seeing competitors.

I end the paper with some discussion of the implications of these results for race rules, and address why common beliefs about lane advantages are not supported by the data. Finally, I also highlight what other events or competitions this approach to estimating the effects of lane assignments could be implemented.

Data and empirical strategy

The data come from IAAF World Championships and U20 World Championships from 2000 to 2019 and was accessed from [13]. Prior to 2000, World Championship data did not include reaction times for the 100m through 400m, and data on season’s bests and personal bests, which are important regressors below, become very sparse. As a result, I focus on post-2000 data. Data was collected for Men’s and Women’s 100m, 200m, 400m, and 800m. In aggregate, this amounts to roughly 8000 individual race times for these events over this time period. A replication package for the analyses conducted in this paper can be found here: [14]

Causal inference framework

The causal effect of lane assignments involves estimating how a runners performance would have changed if they ran their race in a different lane. Denote runner i in heat j observed race time as Yi,j. There are typically nine lanes on the track, and so possibly nine treatment statuses, but as a simple illustrative example, suppose we are interested in measuring the causal effect of running in lane 8. One lane must be chosen as a reference point to compare all other lanes against, and as is discussed below, I choose lane 2 for this. Denote T8,i,j as a binary indicator variable denoting assignment to lane 8: T8,i,j = {0, 1}. The observed race time can be written in terms of potential outcomes:

Yi,j={Y8,i,jifT8,i,j=1Y2,i,jif1k9k2Tk,i,j=0 (1)

Where the last condition is when all other indicator variables are zero, and the runner is in lane 2. In the lane 8 example,

Yi,j=Y2,i,j+(Y8,i,j-Y2,i,j)T8,i,j (2)

(Y8,i,jY2,i,j) is the causal effect of running in lane 8 (relative to lane 2). The fundamental empirical challenge in estimating the true causal effect of lane assignments on race times is that, for most races, the assignment to lanes is conditional on a runner’s ability. To understand this issue in this example, the difference in average race times between lanes 2 and 8 can be written as:

E[Yi,j|T8,i,j=1]-E[Yi,j|1k9k2Tk,i,j=0]Observeddifferenceinaverageracetimes=E[Y8,i,j|T8,i,j=1]-E[Y2,i,j|T8,i,j=1]Avg.treatmenteffectonthetreated+E[Y2,i,j|T8,i,j=1]-E[Y2,i,j|1k9k2Tk,i,j=0]Selectionbias (3)

The selection effect term captures the difference in average Y2,i,j between those who were assigned to lanes 8 and 2. In races where assignment to lanes is not random, the selection bias term will not be equal to zero (e.g. runners assigned to lanes conditional on ability). Random assignment of lanes eliminates the selection bias term in Eq (3). With random assignment E[Y2,i,j|T8,i,j=1]=E[Y2,i,j|1k9k2Tk,i,j=0] (independence of runner ability and treatment status), thus the selection bias term in Eq (3) cancels. For more discussion on this see [15].

To quantify the observed difference of race times across lanes I estimate the following statistical model:

Yi,j=α0+1k9k2βkTk,i,j+αfXf,i,j+ϵi,j (4)

Where Yi,j denotes the observed race time of runner i in heat j, Tk,i,j denotes indicator variables for each lane (excluding lane 2, the reference lane), and Xf,i,j denotes a collection of control variables. Simply estimating Eq (4) on all race data would likely lead to biased estimates of treatment effects as lane assignments are typically non-random. To overcome this fundamental issue and to estimate the causal effects of lane assignments on race times, I leverage the random assignment rule implemented by the IAAF in the first round of each event. This random assignment is important from a causal inference perspective as the runners in each lane will (on average) have the same characteristics, and thus any differences in race times can be attributed to lane assignments. This is the independence criteria highlighted above. Specifically [16], states: “In the first round and any additional preliminary qualification round as per Rule 166.1, the lane order shall be drawn by lot.” From personal correspondence with rules officials at the IAAF I have confirmed that this random assignment rule was initiated in the 1985–86 rulebook under rule 141.11 and is still in place today. In recent years, the Men’s 100m in the World Championship also included Preliminary Round heats, which occurred prior to Round 1 heats. This preliminary round is for “unqualified” athletes. According to the rules, random assignment to lanes is supposed to occur in both the Preliminary Round and Round 1 heats. However, from examining the data, and corresponding with rules officials, it appears that the fastest race times from the Preliminary Round heats were sorted into the outside lanes for the Round 1 heats, resulting in a non-random assignment in Round 1 of these events. As such, for the events where there is a Preliminary Round prior to Round 1, I exclude the Round 1 data from the analysis.

In practice, one could explore the average differences in race times across lanes in a non-parametric manner (e.g. t-tests). However, runner ability varies quite a bit in the first heats and, as a result, there is substantial variation in race times. This makes detecting any statistical differences challenging. The use of the statistical model in Eq (4) is useful as it includes various control variables which help explain much of this variation and helps to sharpen the estimates of lane effects.

The following control variables (Xf,i,j) are included in the regression; the recorded wind measurement in each heat (which is only included in 100m and 200m events) and positive (negative) measurements denote tailwinds (headwinds), the runner’s reaction time to the start gun (which is included in the 100m, 200m, and 400m), the runner’s season best race time, the runner’s personal best race time, and, when data for Men’s and Women’s races are pooled, a dummy variable indicating male events. An additional desirable feature of including season’s best is that it controls for any year effects (e.g. sprinters getting faster over time). The coefficient of interest is on the lane dummy variables, which estimate the causal effect of lane assignments on race times. The additional covariates are useful in explaining much of the variation in race times, which helps to sharpen the estimates of lane effects. Another approach to estimate lane effects would be to exploit within sprinter variation (i.e. observing the same sprinter in multiple lane assignments). However, the vast majority of runners (70–80%) appear in the data only once, which severely hampers such an approach. The inclusion of personal best in regression Eq (4) plays this role to some degree for athletes who are in the data more than once, but only if personal best is not changed between observations of the same athlete.

Results

100m

I begin by analyzing lane assignment effects in the 100m. Narratives about lane advantages tend to be focused on races with corners (200m and 400m) but the quote from Andre De Grasse in the introduction highlights that they also persist in the 100m. Similar to the 200m and 400m, the beliefs that middle lanes are best in the 100m could relate to the fact that middle lanes improve a runners vantage and helps them judge where they are relative to their competitors. Indeed, in lane assignment rules used for later rounds of races, the fastest qualifying times are assigned to inside lanes, which suggests they are viewed as favorable.

To begin each analysis, I confirm whether the randomization across lanes is effective. To do this I estimate the following statistical model:

Yi,jSB=α0+1k9k2βkTk,i,j+αfXf,i,j+ϵi,j (5)

Where Yi,jSB denotes a runner’s season’s best. If runners are assigned lanes based on ability (e.g. their performance in meets taking place earlier in the season), this would be highly problematic for assessing the causal impact of lane assignments. To qualify for the World or U20 Championships, athletes must meet the entry standard in a window that typically spans a year prior to the event. To insure that lane assignments in Round 1 are indeed random, I proxy for an athletes ability with their season’s best (prior to the event being analyzed) and test whether there are statistical differences in season’s best across lanes. Results from this randomization check are reported in Table 1 below. As discussed below, lane 2 was chosen as the baseline to compare against all other lanes.

Table 1. Randomization check for 100m.

Coeff. (Ind. Var.) Mens Womens Pooled
β1 (Lane 1) -0.0167 -0.0354 0.1576 0.1642** 0.0365 0.0344
(0.0510) (0.0408) (0.1112) (0.0751) (0.0555) (0.0394)
[74] [71] [40] [37] [114] [108]
β3 (Lane 3) -0.0226 -0.0474 0.0175 0.0381 -0.0016 -0.0043
(0.0521) (0.0429) (0.0905) (0.0682) (0.0525) (0.0403)
[98] [93] [103] [95] [201] [188]
β4 (Lane 4) -0.0497 -0.0483 -0.1836** -0.119* -0.1168** -0.0838**
(0.0488) (0.0427) (0.0857) (0.0611) (0.0494) (0.0371)
[104] [101] [104] [101] [208] [202]
β5 (Lane 5) 0.0101 -0.0133 -0.0046 0.0332 0.0017 -0.0090
(0.0502) (0.0409) (0.0857) (0.0650) (0.0500) (0.0378)
[111] [105] [101] [95] [212] [201]
β6 (Lane 6) -0.0432 -0.0627* -0.0068 -0.0100 -0.0264 -0.0372
(0.0463) (0.0377) (0.0929) (0.0646) (0.0515) (0.0368)
[110] [106] [104] [97] [214] [203]
β7 (Lane 7) 0.0992* 0.0474 -0.1383* -0.0737 -0.0193 -0.0149
(0.0549) (0.0462) (0.0834) (0.0575) (0.0503) (0.0370)
[105] [96] [104] [101] [209] [197]
β8 (Lane 8) 0.0081 -0.0391 -0.0822 0.0219 -0.0371 -0.0080
(0.0521) (0.0444) (0.0785) (0.0597) (0.0471) (0.0371)
[98] [90] [97] [95] [195] [185]
β9 (Lane 9) 0.0588 0.0487 -0.2905*** -0.1990** -0.1313* -0.0890
(0.0810) (0.0730) (0.1008) (0.0823) (0.0680) (0.0578)
[26] [25] [32] [31] [58] [56]
α1 (Male) -1.125*** -1.087***
(0.0249) (0.0194)
α0 (constant) 10.53*** 10.50*** 11.71*** 11.58*** 11.68*** 11.58***
(0.0321) (0.0284) (0.0658) (0.0444) (0.0414) (0.0292)
N 828 786 788 748 1616 1534
R2 0.0135 0.0136 0.028 0.0305 0.568 0.68
Outliers Removed No Yes No Yes No Yes
F-stat. 1.19 1.21 3.28 3.44 1.84 1.69
p-value 0.2995 0.2893 0.0011 0.0007 0.0658 0.0957

This table reports the coefficients estimated from model Eq (5). To ease interpretation of the results, the independent variable associated with each coefficient estimate is highlighted in parentheses. The number of observations per lane are reported in square brackets. Robust standard errors (see [19]) are reported in parentheses.

, *, **, and *** denote significance at the one-sided 10%, two-sided 10%, 5%, and 1% levels, respectively.

Columns 1 and 3 in Table 1 report the results from estimating Eq (5) using all data from the Men’s and Women’s races, respectively. As discussed in more detail below, columns 2 and 4 report the results with outliers excluded. Columns 5 and 6 report the results using all data and data with outliers removed for the pooled Men’s and Women’s data.

In general, the randomization appears to effectively balance runners into lanes based on their season’s best times. In a few cases there are statistically significant results, but these normally appear in lanes with a low number of observations, reported in square brackets. It is not reported in the table because there are no corresponding regression results, but lane 2 has a similar number of observations to lane 3. A common issue in the 100m, and all other events, is that in these first round races lanes 1 and 9 are often empty. As an example, in Table 1 the Women’s 100m has 40 or fewer observations in lanes 1 and 9, relative to around 100 observations in the other lanes. Because lanes 1 and 9 have much fewer observations than the other lanes, they are more susceptible to issues relating to low statistical power. As such, all estimated lane effects for lanes 1 and 9 throughout this paper should be treated with caution as they are more susceptible to Type-1 error (see, e.g., [17]). In addition, small sample sizes are susceptible to Type-M error (exaggerating the magnitude of the effects) [18].

At the bottom of the randomization tables, F-statistics and their associated p-values are reported for joint significance tests of the lanes. Only the women’s races are jointly significance at the 5% level, and when this data is pooled with the Men’s data, the lane estimates fail significance at the 5% level, suggesting that the randomization is generally effective.

As an additional robustness check, I examine if the propensity (probability) a runner is assigned to a specific lane is statistically related to their season’s best. These results are reported in Tables 12–15 in S1 Appendix. None of the regressions show a significant relationship between season’s best and treatment status (lane assignments), providing additional evidence that the randomization is effective.

Moving on to the estimates of the effect of lane assignments on race times, results from model Eq (4) for the 100m data are reported in Table 2. Racers who do not start (DNS) or who are disqualified (DQ) to not register race times. In addition, I exclude any racers with missing season and personal best data as these are important in the analysis. Again, I report the results separately for Men’s and Women’s races and also pool the Men’s and Women’s data in the “Pooled” columns to help improve statistical power. I chose lane 2 as the baseline to compare the other lanes against. I do this because, as is highlighted above, lane 1 consistently has much fewer observations than lanes 2 through 8 and thus may be more susceptible to issues relating to low statistical power. Columns 1 and 3 in Table 2 do not show any systematic effect of lane assignments on race times. There are a few statistically significant lane effects in the Men’s data in column 1. For example, lane 3 produces race times which are on average 0.061 seconds slower than lane 2 (significant at the 5% level). Where as in the Women’s data (column 3), for example, lane 7 is 0.0442 seconds faster on average relative to lane 2 (weakly significant at the 1-sided 10% level). However, the lack of consistency between lanes within the Men’s and Women’s races, along with the lack of consistency between genders suggests these results may be anomalous. Pooling the Men’s and Women’s data (column 5) to improve statistical power only yields a weakly significant effect (1-sided 10%) for lane 9 (-0.043 seconds faster than lane 2). But again, this result should be treated with caution as it has far fewer observations than other lanes.

Table 2. Regression results for 100m.

Coeff. (Ind. Var.) Mens Womens Pooled
β1 (Lane 1) 0.0194 0.0257 -0.0137 -0.0267 0.0030 0.0033 0.0081
(0.0232) (0.0222) (0.0473) (0.0339) (0.0238) (0.0194) (0.0201)
β3 (Lane 3) 0.0610** 0.0539*** -0.0238 -0.0320 0.0223 0.0145 0.0130
(0.0255) (0.0199) (0.0333) (0.0257) (0.0210) (0.0163) (0.0175)
β4 (Lane 4) 0.0453 0.0227 -0.0187 -0.0059 0.0133 0.0087 0.0039
(0.0326) (0.0172) (0.0292) (0.0251) (0.0220) (0.0154) (0.0170)
β5 (Lane 5) 0.0450 0.0234 -0.0264 -0.0127 0.0105 0.0071 0.0118
(0.0356) (0.0171) (0.0308) (0.0265) (0.0238) (0.0158) (0.0170)
β6 (Lane 6) 0.0243 0.0258 -0.0406 -0.0411* -0.0044 -0.0043 -0.0013
(0.0211) (0.0190) (0.0318) (0.0246) (0.0190) (0.0156) (0.0168)
β7 (Lane 7) 0.0216 0.0166 -0.0442 -0.0353 -0.0143 -0.0106 -0.0125
(0.0232) (0.0184) (0.0293) (0.0246) (0.0190) (0.0155) (0.0166)
β8 (Lane 8) 0.0564** 0.0375* -0.0258 -0.0143 0.0146 0.0142 0.0121
(0.0230) (0.0195) (0.0324) (0.0278) (0.0198) (0.0171) (0.0181)
β9 (Lane 9) -0.0039 -0.0008 -0.0686* -0.090*** -0.0431 -0.0513** -0.0594**
(0.0355) (0.0.0327) (0.0398) (0.0283) (0.0280) (0.0218) (0.0239)
α1 (Wind) -0.0446*** -0.0429*** -0.0555*** -0.0532*** -0.0508*** -0.0490*** -0.0481***
(0.0100) (0.0058) (0.0061) (0.006) (0.0056) (0.0041) (0.0045)
α2 (Reaction Time) 1.01*** 0.8025*** 1.252*** 1.220*** 1.089*** 0.992*** 0.940***
(0.253) (0.1802) (0.3187) (0.233) (0.211) (0.151) (0.159)
α3 (PB) 0.484*** 0.448*** 0.2563*** 0.4095*** 0.355*** 0.444*** 0.391***
(0.0688) (0.0573) (0.1026) (0.0799) (0.0779) (0.0534) (0.0562)
α4 (SB) 0.351*** 0.349*** 0.6797*** 0.4568*** 0.549*** 0.399*** 0.429***
(0.0686) (0.0583) (0.1079) (0.0861) (0.0820) (0.0565) (0.0600)
α5 (Male) -0.106*** -0.178*** -0.235***
(0.0187) (0.0164) (0.0201)
α0 (constant) 1.731*** 2.152*** 0.7524*** 1.557*** 1.113*** 2.151*** 0.4742***
(0.278) (0.211) (0.222) (0.216) (0.184) (0.156) (0.179)
N 828 786 788 748 1616 1534 1285
R2 0.684 0.759 0.888 0.854 0.920 0.946 0.952
Outliers Removed No Yes No Yes No Yes Yes
Pos 1/2 Removed No No No No No No Yes

This table reports the coefficients estimated from model Eq (4). To ease interpretation of the results, the independent variable associated with each coefficient estimate is highlighted in parentheses. Robust standard errors are reported in parentheses.

, *, **, and *** denote significance at the one-sided 10%, two-sided 10%, 5%, and 1% levels, respectively.

Another concern one might have with the data is the presence of extreme outliers. For example, in the data there are race times that are more than three standard deviations slower than the mean. It is possible these extreme outliers have an important influence on the estimated lane effects. It also seems plausible that these extreme outliers are unrelated to lane assignments. For example, a runner who sustains an injury during the race may have a much slower race time than the norm. To control for these outliers, I conduct the same analysis where I exclude the slowest 5% of the race times in the Men’s and Women’s race, reported in columns 2 and 4 respectively, and results from the pooled analysis are reported in column 6. Excluding these extreme outliers does not generate a meaningful change in the overall regression results in the 100m. In the pooled data with outliers excluded only lane 9 again has a significant lane effect, being on average 0.0513 seconds faster than lane 2. While this effect should be treated with caution because of the low number of observations, it is also the opposite effect relative to the common narrative pertaining to outside lanes in the 100m.

An important point worth emphasizing with this empirical strategy is that runners, of course, are not blinded to their lane assignments. In an analogy from clinical trials for drugs, it is as if “control” subjects do not receive a placebo and are aware of their treatment status. A concern with non-placebo trials is that control subjects engage in differential behavior because of their status (e.g. seek their own treatment) which may impact the estimates of treatment effects. Though leveraging random assignment ensures runner characteristics will be balanced across lanes, it is possible that runners adjust their effort in response to their lane assignments. The main objective in these early rounds is to qualify to advance to later rounds. Runners may be interested in preserving energy for later races and, as such, give “just enough” effort to advance. The concern is that these “just enough” effort types may supply different levels of effort conditional on their lane assignments, which may impact the estimates of lane effects. It is typical that two or three racers qualify to advance. The IAAF rules that determine qualification for later rounds vary by meet as they can be determined by Technical Delegates. However, it is common that two or three racers from each heat automatically advance, with the possibility of more runners qualifying on time. Aside from these “just enough” types, the remaining athletes are likely to be “maximum effort” types in attempting to qualify to advance. While it is certainly plausible that differential effort provision conditional on lanes could exist, it is important to note that these athletes would constitute the minority of runners because of qualification rules. As an additional robustness check, I re-estimate the model Eq (4) excluding runners who finished in first or second place. Because excluding two runners per race amounts to an important reduction in sample size, I do this on the pooled data. Excluding these runners does not have a meaningful impact on the results, reported in the final column of Table 2, and suggests that differential provision of effort across lanes does not impact the estimates of lane effects. Collectively, these results suggest that there is no robust evidence of lane effects in 100m races. To ease interpretation of the results, Fig 1a plots the lane coefficient estimates from the second last column of Table 2 (i.e. the results generated from pooled data excluding outliers).

Fig 1. Graphical display of regression results.

Fig 1

These figures plot the estimated lane effects using pooled men’s and women’s data and excluding outliers. 95% confidence intervals are denoted by the smaller symbols.

An issue that is relevant throughout this paper is statistical power. From a null finding of lane effects one cannot, of course, conclude that no lane effects exist. One can only conclude that given the statistical power in this analysis, if true lane effects do exist, their magnitude was not detectable. To provide a sense of the role that statistical power is playing in these null results I briefly highlight the lane effects that could be detected given this sample size. Following [20] I report some Minimum Detectable Effects (MDE) from the above regressions. Analyzing MDEs is a common approach to evaluate ex-post statistical power (see, e.g., [21]). At statistical power of 0.8 and a significance level of 5% or 10%, the MDE is found by multiplying the standard error on the coefficient estimate by 2.8 and 2.49, respectively. For example, using the results from the pooled data in column 6 in Table 2, the standard error on the lane 8 coefficient is 0.0171. Thus, the MDEs at the 5 or 10% significance level would be 0.0479 and 0.0426, respectively. While one cannot rule out true lane effects in the 100m from the null results in Table 2, these MDEs help establish that if lane effects do exist in the 100m, they must be quite small.

200m

200m races are more generally thought to have lane advantages and a common view is that periphery lanes—outside and inside lanes—are slower. Here I repeat the same general analysis strategy as above. To begin, the randomization check is reported in Table 9 in the S1 Appendix. These results show the randomization is effective. Only in lane 1 of the pooled data does season’s best appear to be (weakly) related to lane assignments, which again could be a result of many fewer observations in lane 1.

Results from running model Eq (4) on the 200m data are reported in Table 3. The Men’s, Women’s, and Pooled data including or excluding outliers all show evidence that outside lanes produce lower average race times than lane 2. This consistency across Men’s and Women’s races, and the fact that these lane advantages seem to monotonically increase as the lane number increases are reassuring results. The estimated lane coefficients using pooled data and excluding outliers are plotted in Fig 1b. The estimates are also robust to excluding runners who finish first or second in each race, reported in the final column of Table 3. As discussed above, this suggests that differential provision of effort across lanes from faster athletes is not driving the results.

Table 3. Regression results for 200m.

Coeff. (Ind. Var.) Mens Womens Pooled
β1 (Lane 1) 0.0307 0.0856* -0.0798 -0.0524 0.0010 0.0351 0.0459
(0.0807) (0.0448) (0.0752) (0.0774) (0.0564) (0.0441) (0.0461)
β3 (Lane 3) -0.0295 -0.0289 -0.0645 -0.0872 -0.0421 -0.0517* -0.0645**
(0.0730) (0.0352) (0.0561) (0.0529) (0.0481) (0.0306) (0.0312)
β4 (Lane 4) -0.0626 -0.0540 -0.0900 -0.0966* -0.0717 -0.0691** -0.0704**
(0.0750) (0.0350) (0.0610) (0.0584) (0.0503) (0.0317) (0.0340)
β5 (Lane 5) -0.0930 -0.0413 -0.0677 -0.0975* -0.0812* -0.0628** -0.0653**
(0.0692) (0.0347) (0.0617) (0.0548) (0.0481) (0.0308) (0.0323)
β6 (Lane 6) -0.0598 -0.0343 -0.1389*** -0.1409*** -0.0941* -0.0796*** -0.0842***
(0.0784) (0.0345) (0.0533) (0.0524) (0.0500) (0.0294) (0.0303)
β7 (Lane 7) -0.1035 -0.0595* -0.1598*** -0.1472*** -0.1263*** -0.0967*** -0.0804**
(0.0714) (0.0349) (0.0544) (0.0533) (0.0467) (0.0306) (0.0321)
β8 (Lane 8) -0.1270* -0.0838** -0.1621*** -0.1781*** -0.1396*** -0.1222*** -0.1059***
(0.0706) (0.0350) (0.0610) (0.0550) (0.0484) (0.0313) (0.0329)
β9 (Lane 9) -0.1871** -0.114** -0.232*** -0.2417*** -0.2040*** -0.1677*** -0.1508***
(0.0860) (0.0523) (0.0795) (0.0802) (0.0586) (0.0465) (0.0498)
α1 (Wind) -0.0623*** -0.0608*** -0.1023*** -0.0853*** -0.0809*** -0.0719*** -0.0660***
(0.0164) (0.0087) (0.0147) (0.0142) (0.0112) (0.0081) (0.0086)
α2 (Reaction Time) 1.017** 1.541*** 0.9392** 0.949** 1.001*** 1.325*** 1.530***
(0.432) (0.2789) (0.4312) (0.419) (0.312) (0.2449) (0.2518)
α3 (PB) 0.463*** 0.385*** 0.5875*** 0.4966*** 0.5464*** 0.4491*** 0.4229***
(0.0673) (0.0507) (0.0654) (0.0580) (0.0509) (0.0407) (0.0430)
α4 (SB) 0.469*** 0.475*** 0.2832*** 0.350*** 0.346*** 0.4027*** 0.4162***
(0.0739) (0.0548) (0.0710) (0.0632) (0.0584) (0.0447) (0.0480)
α5 (Male) -0.312*** -0.432*** -0.515***
(0.0643) (0.0396) (0.0501)
α0 (constant) 1.674*** 3.013*** 3.383*** 3.918*** 2.823*** 3.677*** 3.985***
(0.5837) (0.3728) (0.716) (0.461) (0.553) (0.324) (0.382)
N 926 880 708 671 1634 1551 1335
R2 0.647 0.75 0.81 0.757 0.931 0.958 0.960
Outliers Removed No Yes No Yes No Yes Yes
Pos 1/2 Removed No No No No No No Yes

This table reports the coefficients estimated from model (4). To ease interpretation of the results, the independent variable associated with each coefficient estimate is highlighted in parentheses. Robust standard errors are reported in parentheses.

, *, **, and *** denote significance at the one-sided 10%, two-sided 10%, 5%, and 1% levels, respectively.

These estimates suggest the advantage of outside lanes can be sizable. For example, in the Women’s data, excluding outliers, lane 8 is estimated to be 0.1781 faster than lane 2. The standard deviation (SD) of race times in this data is 0.68. A common way to estimate the magnitude of an effect is to compute the effect size =|Effect|SD. Thus, these estimated results produce an effect size of 0.262, which is sizable. Put a different way, these lane effects could easily be the difference between qualifying, or not, to advance to the next round of the race.

Of particular interest is the fact that these estimated lane advantages are the opposite of what is commonly believed regarding outside lanes. The seeming persistence and pervasiveness of false beliefs is interesting and I return to it in the Discussion section.

400m

Turning to the 400m races, I again first present the randomization check in Table 10, reported in the S1 Appendix. These results again show robust evidence that the randomization successfully balances racers by their season’s bests across the different lanes. The only, weakly, significant result is for lane 3 in the Women’s data, and this disappears when outliers are excluded.

The estimates of lane effects on race times in the 400m are somewhat consistent with the 200m, but are much noisier. These results are reported in Table 4. Wind speed is not recorded in the 400m, so these results are estimated by running model Eq (4) without wind as a control. There is some mixed evidence in the Women’s data that outside lanes produce faster race times, consistent with the lane advantages estimated in the 200m races. However, they do not appear to be monotonically decreasing with lane number, and, in addition, they are absent in the Men’s data. When the data is pooled together and outliers are excluded lanes 4, 5, 6, 7 and 9 show some evidence of faster race times relative to lane 2. These results are not greatly impacted by excluding runners who finish in first or second (reported in the final column). The estimates using pooled data and excluding outliers are graphically depicted in Fig 1c. Visually, the results between the 200 and 400m look somewhat similar, with race times tending to increase with lane number, but from the 95% confidence intervals, it is clear that the 400m results are statistically weaker.

Table 4. Regression results for 400m.

Coeff. (Ind. Var.) Mens Womens Pooled
β1 (Lane 1) 0.0531 0.0526 0.0717 -0.0357 0.0597 0.0123 -0.0234
(0.1126) (0.1035) (0.2403) (0.1921) (0.1160) (0.0981) (0.0993)
β3 (Lane 3) -0.0123 -0.0251 0.1369 -0.1217 0.0431 -0.0689 -0.0358
(0.0929) (0.0780) (0.1726) (0.1472) (0.0906) (0.0758) (0.0805)
β4 (Lane 4) -0.0536 -0.0302 -0.1042 -0.1875 -0.0760 -0.0987 -0.0952
(0.0908) (0.0839) (0.1343) (0.1229) (0.0770) (0.0704) (0.0760)
β5 (Lane 5) -0.0900 -0.1009 -0.0273 -0.0990 -0.0608 -0.0994 -0.0997
(0.1055) (0.0882) (0.1439) (0.1348) (0.0857) (0.0761) (0.0827)
β6 (Lane 6) 0.0454 -0.0551 -0.2576** -0.2674** -0.0953 -0.1528** -0.1296*
(0.1228) (0.0849) (0.1222) (0.1219) (0.0862) (0.0703) (0.0735)
β7 (Lane 7) -0.0005 -0.0350 -0.1512 -0.1744 -0.0703 -0.0991 -0.0942
(0.1019) (0.0839) (0.1228) (0.1196) (0.0778) (0.0693) (0.0738)
β8 (Lane 8) -0.0525 -0.0535 -0.0775 -0.0639 -0.0634 -0.0540 -0.0290
(0.0975) (0.0860) (0.1376) (0.1343) (0.0809) (0.0754) (0.0798)
β9 (Lane 9) 0.0178 0.0150 -0.3458** -0.3465** -0.1547 -0.1578 -0.1527
(0.1234) (0.1138) (0.1578) (0.1607) (0.0996) (0.0975) (0.0997)
α1 (Reaction Time) 1.272** 1.493*** 2.319*** 1.781*** 1.807*** 2.070*** 1.927***
(0.600) (0.442) (0.741) (0.6689) (0.4697) (0.385) (0.412)
α2 (PB) 0.6397*** 0.5588*** 0.700*** 0.5851*** 0.6760*** 0.5750*** 0.5834***
(0.0699) (0.0622) (0.0845) (0.0755) (0.0572) (0.0523) (0.0577)
α3 (SB) 0.3667*** 0.3344*** 0.2691*** 0.2958*** 0.3099*** 0.3100*** 0.3050***
(0.0746) (0.0667) (0.0883) (0.0826) (0.0604) (0.0573) (0.0638)
α4 (Male) -0.3683*** -0.9782*** -1.015***
(0.1329) (0.1034) (0.1356)
α0 (constant) 0.2126 5.337*** 2.250*** 6.784*** 1.467 6.629*** 6.532***
(1.359) (0.9447) (1.330) (1.107) (1.025) (0.778) (0.951)
N 872 826 677 643 1549 1469 1269
R2 0.793 0.748 0.79 0.758 0.947 0.9605 0.9610
Outliers Removed No Yes No Yes No Yes Yes
Pos 1/2 Removed No No No No No No Yes

This table reports the coefficients estimated from model (4). To ease interpretation of the results, the independent variable associated with each coefficient estimate is highlighted in parentheses. Robust standard errors are reported in parentheses.

, *, **, and *** denote significance at the one-sided 10%, two-sided 10%, 5%, and 1% levels, respectively.

One important issue with the 400m, and 800m below, is that the longer average race times tend to be associated with greater dispersion in race times. For example, as noted above, the standard deviation of race times excluding outliers in the 200m Women’s data is 0.68. The analogous standard deviation in the 400m is 1.64 seconds. As a result, for a given number of observations, statistical power weakens as event times increase. To give a sense of statistical power, I again report the MDE for the 400m. For example, using the pooled data without outliers, the estimate for lane 8 has a standard error of 0.0754. With statistical power of 0.8, this gives a MDE of 0.2111 and 0.1878 for the 5% and 10% significance levels, respectively. Thus, given the number of observations in the 400m data, statistical power would be insufficient to pick up lane effects that would be similar in magnitude as the 200m. Of course, it is also important the emphasize that even if there are lane effects in the 400m that are of similar magnitude as the 200m, their relative importance would be much smaller in the 400m since they represent a much smaller fraction of the mean or standard deviation of race times.

Also of interest is that these results are in contrast to the common belief that outside lanes are a significant disadvantage in the 400m. In both [1, 2] there is discussion about the gold medal race in the 2016 Olympics by Wayde van Niekerk. He is the first man to win the 400m from lane 8 and these articles clearly highlight the sentiment that this is impressive because lane 8 places runners at a disadvantage. However, the results in Table 4 show that, if anything, outside lanes produce average race times that are faster than lane 2. Of course, winning from lane 8 is impressive in that the runner registered one of the slowest qualifying times for the final, but this does not necessarily suggest that lane 8 itself is a disadvantage: van Niekerk’s improvement from his semifinal time was an impressive 1.42 seconds, where as the average improvement of all the other runners in that race was 0.172 seconds.

800m

The 800m race is unique from the above events as lane assignments are not fixed for the duration of the race. Runners are assigned to a lane and must remain in that lane until the break line 100m from the start. This unique feature of the 800m, relative to the other shorter distance events, makes it interesting to explore in the context of lane assignment effects.

I again begin with the randomization check for the 800m data, reported in Table 11 in the S1 Appendix. There appears to be robust evidence that the randomization is effective. Moving on to the estimates of lane effects, I again implement model Eq (4). However, wind speed and reaction times are not recorded for the 800m and are thus not included in the regression. In addition, since the 800m tends to be a pack race—runners tend to run together in a pack for some portion of the race—I also include race fixed effects. Thus, lane effects are estimated after controlling for the average time in a race. On occasion, when tracks do not have a ninth lane, 800m races can have two runners assigned to lane 8. This is quite rare in the data, but I exclude these racers when it does occurs. These regression results are reported in Table 5.

Table 5. Regression results for 800m.

Coeff. (Ind. Var.) Mens Womens Pooled
β1 (Lane 1) 0.2290 0.1579 0.6594 0.5756** 0.3135 0.333* 0.2244
(0.4478) (0.222) (0.6427) (0.2872) (0.3725) (0.1835) (0.1900)
β3 (Lane 3) 0.0351 0.1168 -0.0693 0.1869 -0.1469 0.1270 0.1981
(0.3065) (0.1929) (0.5944) (0.2327) (0.2969) (0.1547) (0.1592)
β4 (Lane 4) 0.6195 -0.0455 -0.1565 0.1541 0.1903 -0.0061 0.0879
(0.6913) (0.1656) (0.5726) (0.2264) (0.4413) (0.1462) (0.1476)
β5 (Lane 5) -0.0641 -0.1209 0.4735 0.7145** 0.1108 0.2660* 0.2474
(0.2896) (0.1742) (0.5744) (0.2923) (0.2796) (0.1616) (0.1722)
β6 (Lane 6) -0.0958 0.0446 0.3534 0.1648 0.0583 0.1461 0.1446
(0.3152) (0.1690) (0.6474) (0.2644) (0.3148) (0.1389) (0.1510)
β7 (Lane 7) 0.4814 0.2301 0.1087 0.5689** 0.2314 0.3570** 0.3834**
(0.3506) (0.1913) (0.5692) (0.2679) (0.2991) (0.1602) (0.1771)
β8 (Lane 8) 0.0114 -0.0435 0.9580 0.5302* 0.3375 0.2131 0.1522
(0.3157) (0.1832) (0.6791) (0.2764) (0.3211) (0.1600) (0.1709)
β9 (Lane 9) 2.029 0.1053 -0.7562 0.0782 0.6528 0.0437 0.1797
(1.823) (0.2424) (0.8611) (0.3968) (1.085) (0.2330) (0.2198)
α1 (PB) 0.7129*** 0.2647*** 0.4550** 0.1629 0.531** 0.218* 0.490***
(0.1879) (0.0691) (0.2158) (0.1182) (0.2228) (0.1257) (0.0312)
α2 (SB) 0.331*** 0.280*** 0.2558 0.1038* 0.2923* 0.1364* 0.0410**
(0.0904) (0.0731) (0.1558) (0.0570) (0.1746) (0.0755) (0.0162)
α3 (Male) -5.47** -16.43*** -12.65***
(2.717) (1.918) (1.051)
α0 (constant) -2.016 49.61*** 36.88** 90.11*** 27.31** 86.04*** 63.55***
(16.12) (3.13) (14.56) (12.00) (12.37) (9.15) (3.74)
N 803 762 613 582 1416 1344 1169
R2 0.565 0.816 0.700 0.817 0.886 0.976 0.9795
Outliers Removed No Yes No Yes No Yes Yes
Pos 1/2 Removed No No No No No No Yes

This table reports the coefficients estimated from model (4). To ease interpretation of the results, the independent variable associated with each coefficient estimate is highlighted in parentheses. Robust standard errors are reported in parentheses.

, *, **, and *** denote significance at the one-sided 10%, two-sided 10%, 5%, and 1% levels, respectively.

The results are somewhat mixed, possibly due to the issues regarding longer race times and dispersion highlighted above, but there is some weak evidence that outside lanes tend to produce slower race times on average. For example, in the pooled data excluding outliers, lanes 5, 7 and 8 show positive and significant (weakly in some cases) effects on race times, ranging from 0.213 to 0.357 seconds. These results are generally consistent, but somewhat statistically weaker, when runners who finish in first or second are excluded. The results using pooled data and excluding outliers are reported in Fig 1d.

Of interest, the result that outside lanes produce slower race times on average is the opposite of the general result found in the 200m and 400m. As noted above, one possible explanation for this may be the unique lane break feature of the 800m. Since the inside lane of the track minimizes the distance covered, after the break-line all runners converge to the inside lanes. This might make the inside lanes advantageous as runners in the outside lanes either have to jockey for position with runners who have an establish position on the inside of the track, or continue to run in lanes which lengthen the distance travelled around the track.

Vantage points and effort effects

As noted above, the narrative that outside lanes are undesirable in races with corners stems from the idea that not being able to see competitors puts runners at a disadvantage. It could be the case that seeing a competitor generates additional motivation for runners and spurs increased effort. Because of staggered starts, higher lane numbers will be able to see fewer runners, and the outermost lane can see no other runners (until they are passed). This effect will likely be the most dramatic in the 200 and 400m. If these “effort effects” of lanes do exist, a natural interpretation is that they would cause average race times to increase with lane number (i.e. outside lanes would be slower). This effect goes in the opposite direction compared to the biomechanical effects of tight corners. The results reported above should be thought of as the net effects of lane assignments. It is possible that both margins (effort and biomechanical effects) impact runners, but the results in the 200 and 400m suggest that, if anything, race times decrease with lane number. As such, the narrative that outside lanes are undesirable because of effort effects is not well supported by the data.

This, of course, does not rule out that effort effects are active, just that the tight corner effects dominate. While the net effects are clearly what ultimately matters in terms of assessing the desirability of lanes, it is still an interesting question to know if effort effects are present. To assess this question, I revisit the 200 and 400m results and leverage the fact that it is not always the same lane which is the outermost one. Because not all lanes are full in each race and/or some tracks do not have a 9th lane, there is some variability in which lane is the outermost (commonly lanes 8 or 9, and occasionally 7). Because the outermost lane and lane numbers are not perfectly correlated, I can leverage this fact to estimate the separate effect of being in the outermost lane (i.e. not having any competitors ahead of you to start a race).

To estimate these effects I run the same model as Eq (4) but where I also include an additional indicator variable taking a value of 1 (0) when a runner is (is not) in the outermost lane. In other words, controlling for lane effects as in Eq (4), is there a separate statistical effect of being in the outermost lane? To cut down on repetition, these results are estimated on the data without outliers and are reported in Table 6.

Table 6. Regression results for 200m and 400m with a separate indicator variable for the outermost lane.

Coeff. (Ind. Var.) 200m 400m
Mens Womens Pooled Mens Womens Pooled
β1 (Lane 1) 0.0856 -0.0524 0.0351 0.0525 -0.0347 0.0124
(0.0524) (0.0774) (0.0441) (0.104) (0.1922) (0.0981)
β3 (Lane 3) -0.0285 -0.0873 -0.0516* -0.0251 -0.1210 -0.0689
(0.0332) (0.0530) (0.0296) (0.0800) (0.1474) (0.0758)
β4 (Lane 4) -0.0536 -0.0971* -0.0691** -0.0302 -0.1871 -0.0988
(0.0333) (0.0585) (0.0317) (0.0839) (0.1231) (0.0704)
β5 (Lane 5) -0.0407 -0.0977* -0.0628** -0.1009 -0.0985 -0.0994
(0.0344) (0.0548) (0.0308) (0.0883) (0.1350) (0.0761)
β6 (Lane 6) -0.0341 -0.1409*** -0.0796*** -0.0551 -0.2669** -0.1529**
(0.0334) (0.0525) (0.0294) (0.0849) (0.1220) (0.0703)
β7 (Lane 7) -0.0608* -0.1499*** -0.0989*** -0.0336 -0.1846 -0.1010
(0.0353) (0.0533) (0.0306) (0.0841) (0.1189) (0.0692)
β8 (Lane 8) -0.1526*** -0.2182*** -0.1761*** 0.0002 -0.1948 -0.0910
(0.0479) (0.0689) (0.0405) (0.1418) (0.1545) (0.1035)
β9 (Lane 9) -0.2110*** -0.3020*** -0.2458*** 0.0948 -0.5781** -0.2171
(0.0748) (0.1063) (0.0623) (0.1807) (0.2441) (0.1498)
α1 (Outermost) 0.0973* 0.0597 0.0778* -0.0798 0.2315 0.0591
(0.0499) (0.0682) (0.0410) (0.1400) (0.1829) (0.1128)
α2 (Wind) -0.0610*** -0.0861*** -0.0726***
(0.009) (0.0143) (0.0081)
α3 (Reaction Time) 1.498*** 0.954** 1.309*** 1.502*** 2.694*** 2.073***
(0.290) (0.416) (0.2443) (0.443) (0.671) (0.386)
α4 (PB) 0.387*** 0.501*** 0.453*** 0.559*** 0.5829*** 0.5746***
(0.0573) (0.0581) (0.0407) (0.0623) (0.0755) (0.0523)
α5 (SB) 0.474*** 0.346*** 0.399*** 0.334*** 0.2960*** 0.3101***
(0.0640) (0.0633) (0.0448) (0.0669) (0.0824) (0.0573)
α6 (Male) -0.431*** -0.981***
(0.0395) (0.1037)
α0 (constant) 3.00*** 3.916*** 3.671*** 5.333*** 6.884*** 6.646***
(0.395) (0.461) (0.324) (0.946) (1.072) (0.779)
N 880 671 1551 826 643 1469
R2 0.752 0.757 0.958 0.615 0.759 0.9605
Outliers Removed Yes Yes Yes Yes Yes Yes

This table reports the coefficients estimated from model (4) with a separate indicator variable for the outermost lane. To ease interpretation of the results, the independent variable associated with each coefficient estimate is highlighted in parentheses. Robust standard errors are reported in parentheses.

, *, **, and *** denote significance at the one-sided 10%, two-sided 10%, 5%, and 1% levels, respectively.

The coefficient of interest is on the outermost indicator variable (α1). In the men’s 200m, the estimated coefficient is 0.0973 with a p-value of 0.052. So there is reasonable statistical evidence that, after controlling for lane effects, being in the outermost lane does generate slower race times on average. In the women’s 200m, the coefficient falls to 0.0597 and is insignificant at standard levels. And in the pooled data the coefficient is 0.0778 with a p-value of 0.058. Overall, while the evidence is somewhat mixed between the men’s and women’s races, there is some evidence that being in the outmost lane does have a negative impact on runners. Again, it is important to emphasize that these results do not mean that the outside lanes generate an overall slowdown in race times. The results above clearly highlight that being in the outside lanes in the 200m generate, on average, faster race times. The positive coefficients on the outermost variable is simply the marginal impact of being in the outermost lane.

The results from the 400m races are much more mixed, with none of the outermost coefficients being statistically significant. Again, this could stem from weaker statistical power at this distance.

Alternative regression models

A desirable feature of model Eq (4) is that it is agnostic about the structure of lane advantages, allowing each lane to have a separate treatment effect. However, one of the downsides about this approach is that it requires eight regressors, which compromises statistical power. This may be especially worrisome in the 400 and 800m where statistical power issues are more salient. In this section I explore two alternative statistical models to help to alleviate the statistical power issue. In the first approach, I repeat the general approach in Eq (4) but instead pool runners together in lanes 1 and 2, 3 and 4, 5 and 6, and 7, 8, and 9. This dramatically increases the number of observations per regressor but, of course, has the obvious downside of assuming the statistical impact of, for example, lanes 3 and 4 is identical. Specifically, with this alternative regression I estimate:

Yi,j=α0+β1Ti,j3,4+β2Ti,j5,6+β3Ti,j7,8,9+αfXf,i,j+ϵi,j (6)

Where T3,4, T5,6, and T7,8,9 denote dummy variables which take a value of 1 when racers are in lanes 3 or 4, 5 or 6, and 7 or 8 or 9, respectively, and take a value of 0 otherwise. Lanes 1 and 2 are now the baseline grouping. To reduce repetition, I report only results which pool men’s and women’s data and exclude outliers.

Results from Eq (6) are reported in Table 7. Overall, the general results are quite similar to those generated from the original model Eq (4). There is no evidence of lanes assignments impacting races times in the 100m and there is strong evidence that outsides produce faster race times in the 200m. The results from the 400m become somewhat more significant and show some evidence that outside lanes in the 400m are also faster. And in the 800m, outside lanes tend to produce slower race times, but the effects remain quite weak statistically.

Table 7. Regression results for coarser lane groupings.

Coeff. (Ind. Var.) 100m 200m 400m 800m
β1 (Lane 3, 4) 0.0105 -0.0700*** -0.0880 -0.0409
(0.0121) (0.0244) (0.0572) (0.1242)
β2 (Lane 5, 6) 0.0002 -0.0806*** -0.1295** 0.1047
(0.0120) (0.0240) (0.0573) (0.1172)
β3 (Lane 7, 8, 9) -0.0064 -0.1257*** -0.0919* 0.1570
(0.0121) (0.0241) (0.0549) (0.1215)
α1 (Wind) -0.0487*** -0.0720***
(0.0041) (0.0081)
α2 (Reaction Time) 0.994*** 1.343** 2.068***
(0.1507) (0.244) (0.384)
α3 (PB) 0.447*** 0.447*** 0.576*** 0.220*
(0.0527) (0.0403) (0.0523) (0.1258)
α4 (SB) 0.397*** 0.4048*** 0.310*** 0.1344*
(0.0557) (0.0443) (0.0574) (0.0757)
α5 (Male) -0.176*** -0.4312*** -0.9738*** -16.45***
(0.0164) (0.0396) (0.1032) (1.926)
α0 (constant) 1.829*** 3.683*** 6.602*** 86.09***
(0.1560) (0.3235) (0.777) (10.90)
N 1534 1551 1469 1344
R2 0.946 0.9579 0.9604 0.9756
Outliers Removed Yes Yes Yes Yes

This table reports the coefficients estimated from model Eq (6), utilizing coarser lane groupings. To ease interpretation of the results, the independent variable associated with each coefficient estimate is highlighted in parentheses. Robust standard errors are reported in parentheses.

, *, **, and *** denote significance at the one-sided 10%, two-sided 10%, 5%, and 1% levels, respectively.

For the last model, instead of utilizing indicator variables for lane assignments I implement a continuous lane variable. In particular, I estimate the following statistical model:

Yi,j=α0+β1Zi,j+αfXf,i,j+ϵi,j (7)

where Zi,j is a continuous lane variable taking values from 1 to 9. Of course, the implied assumption here is that lane numbers impact race times in a linear fashion. This further increases statistical power, but, of course, has the undesirable feature of imposing a functional form which may or may not capture the true data generating process. Results of this regression are reported in Table 8.

Table 8. Regression results with lane effects modeled linearly.

Coeff. (Ind. Var.) 100m 200m 400m 800m
β1 (Lane) -0.0021 -0.0189*** -0.0132 0.0214
(0.0018) (0.00349) (0.00822) (0.0176)
α1 (Wind) -0.0487*** -0.0716***
(0.0041) (0.00807)
α2 (Reaction Time) 1.00*** 1.319*** 2.119***
(0.1510) (0.2445) (0.381)
α3 (PB) 0.4453*** 0.4475*** 0.575*** 0.221*
(0.0524) (0.0403) (0.0525) (0.1263)
α4 (SB) 0.3980*** 0.4054*** 0.3106*** 0.1344*
(0.0554) (0.0443) (0.0576) (0.0760)
α5 (Male) -0.177*** -0.4281*** -0.9718*** -16.44***
(0.0164) (0.0395) (0.1033) (1.932)
α0 (constant) 1.85*** 3.680*** 6.566*** 86.00***
(0.1553) (0.3225) (0.7744) (10.92)
N 1534 1551 1469 1344
R2 0.946 0.9579 0.9603 0.9756
Outliers Removed Yes Yes Yes Yes

This table reports the coefficients estimated from model Eq (7) with lane effects modeled linearly. Robust standard errors are reported in parentheses.

, *, **, and *** denote significance at the one-sided 10%, two-sided 10%, 5%, and 1% levels, respectively.

The results in Table 8 are again quite similar to those found from the original model Eq (4). The β1 coefficient is small and highly insignificant in the 100m and it is negative and highly significant in the 200m. While the coefficient is negative in the 400m, and a similar magnitude as the 200m, it is only weakly significant (p-value = 0.109). Finally, the β1 coefficient is positive in the 800m, but also fails significance at standard levels (p-value = 0.22).

In summary, employing these alternative statistical models seems to buttress the initial estimates of lane effects found from estimating model Eq (4). In all cases there is no evidence of lane effects in the 100m and robust evidence that outside lanes are faster in the 200m. With coarser lane groupings the evidence of faster outside lanes in the 400m becomes somewhat stronger, and this aligns with the effects seen in the 200m. And finally, through all the different statistical models the evidence from the 800m suggests outside lanes are slower, but these effects are quite weak statistically.

Discussion

Leveraging a random assignment rule implemented in the first round of IAAF events, this paper provides causal estimates of lane assignments in sprint distance track and field events. I find no evidence of lane advantages in the 100m, which suggests that a runner’s vantage point is inconsequential for their performance. In the 200m, I find robust evidence that outside lanes on the track produce faster race times. This result is consistent with the biomechanical evidence on the impact of tight corners on running speeds. While average race times in outside lanes are also faster in the 400m, the statistical evidence is somewhat weaker than the 200m. But it is important to note that statistical power becomes more of an issue in events with longer race times. Finally, I find some weak evidence that outside lanes in the 800m tend to produce slower race times, which may be a product of the unique lane break feature of the 800m.

There are a number of interesting points worth discussing. The first is the fact that results in the 200m and 400m suggest that the commonly held belief that middle lanes are best is incorrect. Why these seemingly false beliefs persist is an interesting question. One possible interpretation is that in most observations of track and field races, slower athletes are assigned to the periphery lanes of the track. For example, in the IAAF rules, after round 1, athletes are ranked by their round 1 race times and: “Three draws will be made: i) one for the four highest ranked athletes or teams to determine placings in lanes 3, 4, 5 and 6, ii) another for the fifth and sixth ranked athletes or teams to determine placings in lanes 7 and 8, and iii) another for the two lowest ranked athletes or teams to determine placings in lanes 1 and 2.” Thus, the runners assigned to the periphery lanes are the slowest runners in the race. As another example, in the widely used track and field software called Hytek, used in Olympic trials and NCAA championships, the “standard lane preferences” option in the software ranks lanes from most preferred to least preferred as: 4, 5, 3, 6, 2, 7, 1, 8. Again, the slowest runners are assigned to the periphery lanes. Failure to account for this non-random assignment to lanes may reinforce the idea that periphery lanes are slower. While this can possibly explain the persistence of false beliefs about lane advantages it fails to explain the origin of the these lane assignment rules. One possible explanation of the origin of these rules is a technological constraint. I have heard, but unfortunately have not been able to find documentation to support, that assigning faster runners to the middle lanes was done in the hand timing era to make it easier for timers to see runners cross the finish line in an “inverted-V” pattern, with the middle runners crossing first. This would allow hand timers on either side of the track to observe runners in a sequential finish, and make accurate timing easier. This is an interesting possible explanation: a technical constraint was the impetus for lane assignment rules, which themselves led to beliefs that runners in middle lanes perform the best because of a failure to account for non-random assignment to lanes. Beyond addressing the specific question of lane advantages, the results in this paper could also be viewed as an interesting case study in the persistence of false beliefs (e.g. [22, 23]).

These results are also interesting in the context of the design of lane assignment rules. Lane assignment rules are designed to “reward” the fastest qualifying times with advantageous lanes in later rounds. However, the results here suggest that estimated lane advantages are not consistent with the implied advantages in lane assignment rules. This opens an interesting discussion about fairness and whether these lane assignment rules should be modified. In addition, in track meets that use seed times to assign lanes in the first round of events, there is a question about the fairness of the competition. If the goal of competition is to put all athletes on an even playing field to begin the competition, events that use non-random lane assignments in the first round put some runners at an entrenched disadvantage.

There are a number of ways this work could be expanded. To begin, I have not examined lane advantages in sprint distance hurdle events or relays. Perhaps more interesting given the results from the 200m and 400m is to examine the effect of lane assignments for indoor events, which use a 200m track that has tighter corners than outdoor 400m tracks. It is also possible this methodology could be extended to examine the effect of lane assignments in other sports. If a similar random assignment rule is used at some point in the competition, one could examine this question in swimming, cycling, and speed skating events.

Supporting information

S1 File

(ZIP)

S1 Appendix

(PDF)

Acknowledgments

I thank Erick Gong for helpful discussions as well as the Editor, Roy Cerqueti, and three anonymous reviewers for helpful comments.

Data Availability

All data is publicly available and can be accessed via: https://www.worldathletics.org/competitions A full replication package is included in my submission materials and, in addition, can be located here: https://github.com/dmunro-git/Lane-Advantages.

Funding Statement

The author received no specific funding for this work.

References

  • 1.Morgan F. Are there lane advantages in athletics, swimming, and track cycling?; 2016. http://www.bbc.co.uk/newsbeat/article/37083059/are-there-lane-advantages-in-athletics-swimming-and-track-cycling.
  • 2.Boylan P. How do lane assignments and starting spots work in track?; 2016. https://www.sbnation.com/2016/8/15/12486250/rio-2106-track-athletics-lane-staggered-start-400-record-wayde-van-niekerk.
  • 3.Strashin J. Andre De Grasse wins bronze medal in Olympic men’s 100m; 2021. https://www.cbc.ca/sports/olympics/summer/trackandfield/track/olympics-track-and-field-100-metre-august-1-1.6126000.
  • 4. Taboga P, Kram R, Grabowski AM. Maximum-speed curve-running biomechanics of sprinters with and without unilateral leg amputations. Journal of Experimental Biology. 2016;219(6):851–858. doi: 10.1242/jeb.133488 [DOI] [PubMed] [Google Scholar]
  • 5. Churchill SM, Trewartha G, Salo AI. Bend sprinting performance: new insights into the effect of running lane. Sports Biomechanics. 2019;18(4):437–447. doi: 10.1080/14763141.2018.1427279 [DOI] [PubMed] [Google Scholar]
  • 6. Hanley B, Casado A, Renfree A. Lane and heat draw have little effect on placings and progression in Olympic and IAAF World Championship 800 m running. Frontiers in Sports and Active Living. 2019;1:19. doi: 10.3389/fspor.2019.00019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Berger J, Pope D. Can losing lead to winning? Management Science. 2011;57(5):817–827. doi: 10.1287/mnsc.1110.1328 [DOI] [Google Scholar]
  • 8. Klein Teeselink B, van den Assem MJ, van Dolder D. Does losing Lead to winning? An empirical analysis for four sports. Management Science. 2022. doi: 10.1287/mnsc.2022.4372 [DOI] [Google Scholar]
  • 9. Pope DG, Schweitzer ME. Is Tiger Woods loss averse? Persistent bias in the face of experience, competition, and high stakes. American Economic Review. 2011;101(1):129–57. doi: 10.1257/aer.101.1.129 [DOI] [Google Scholar]
  • 10. Hossain T, List JA. The behavioralist visits the factory: Increasing productivity using simple framing manipulations. Management Science. 2012;58(12):2151–2167. doi: 10.1287/mnsc.1120.1544 [DOI] [Google Scholar]
  • 11. Quinn MD. The effect of track geometry on 200-and 400-m sprint running performance. Journal of Sports Sciences. 2009;27(1):19–25. doi: 10.1080/02640410802392707 [DOI] [PubMed] [Google Scholar]
  • 12. Morton RH. Statistical effects of lane allocation on times in running races. Journal of the Royal Statistical Society: Series D (The Statistician). 1997;46(1):101–104. doi: 10.1111/1467-9884.00063 [DOI] [Google Scholar]
  • 13.World Athletics. Competitions; 2020. https://www.worldathletics.org/competitions/.
  • 14.Munro D. Replication Package; 2022. https://github.com/dmunro-git/Lane-Advantages.
  • 15. Angrist JD, Pischke JS. Mostly harmless econometrics: An empiricist’s companion. Princeton university press; 2009. [Google Scholar]
  • 16.World Athletics. Competition Rules 2018-2019; 2018.
  • 17. Leppink J, Winston K, O’Sullivan P. Statistical significance does not imply a real effect. Perspectives on medical education. 2016;5(2):122–124. doi: 10.1007/s40037-016-0256-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Gelman A, Carlin J. Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors. Perspectives on Psychological Science. 2014;9(6):641–651. doi: 10.1177/1745691614551642 [DOI] [PubMed] [Google Scholar]
  • 19. White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980; p. 817–838. doi: 10.2307/1912934 [DOI] [Google Scholar]
  • 20. Bloom HS. Minimum detectable effects: A simple way to report the statistical power of experimental designs. Evaluation Review. 1995;19(5):547–556. doi: 10.1177/0193841X9501900504 [DOI] [Google Scholar]
  • 21.McKenzie D, Ozier O. Why ex-post power using estimated effect sizes is bad, but an ex-post MDE is not. World Bank Development Impact Blog. 2019.
  • 22. Laney C, Fowler NB, Nelson KJ, Bernstein DM, Loftus EF. The persistence of false beliefs. Acta Psychologica. 2008;129(1):190–197. doi: 10.1016/j.actpsy.2008.05.010 [DOI] [PubMed] [Google Scholar]
  • 23. Nunn N, Sanchez de la Sierra R. Why being wrong can be right: Magical warfare technologies and the persistence of false beliefs. American Economic Review. 2017;107(5):582–87. doi: 10.1257/aer.p20171091 [DOI] [Google Scholar]

Decision Letter 0

Roy Cerqueti

13 Jan 2022

PONE-D-21-35820

Are there lane advantages in track and field?

PLOS ONE

Dear Dr. Munro,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we have decided that your manuscript does not meet our criteria for publication and must therefore be rejected.

I am sorry that we cannot be more positive on this occasion, but hope that you appreciate the reasons for this decision.

Yours sincerely,

Roy Cerqueti, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Overall, this paper is interesting and mostly well executed. However, it sometimes lacks in motivation and a deeper analysis that tries to establish the root causes for the (non-)findings, which could be done with the present data. There are also a few issues in the presentation, with the potential for improvement.

Motivation

- The main scientific relevance of the research question – as it is presented in the intro - currently rest with how much one trusts the claims in the cited literature on biomechanics and physical factors. The author should make the significance of this literature more clear if this is intended.

- If not, the author should make the other motivation for this paper – and the results’ potential wider implications - more clear.

- This applies in particular to the motivational/psychological factors at play.

Physics, psychics and measurement

- Given that not all 9 nine lanes are always filled and that each lane has its own average curvature (e.g., measured by radian covered per meter run and its concentration), could one not construct a related variable which measures and tests biomechanics argument directly (and continuously)?

- It should be made clear whether there are potentially opposing effect from other factors. For example Berger and Pope (2011) show that “being behind” spurs effort. People on inner tracks, if randomly assigned, are on average more often (literally) behind their opponents. Could this be an explanation for null effects? (see also Teeselink, van den Assem and van Dolder, 2021 for an opposing account)?

- By constructing appropriate measures (e.g., whether someone was on the outmost lane and could not see competitors vs accounting for lane curvature which might not co-vary perfectly if not all lanes are filled) one could account for – and test - different theories.

- I was kind of surprised that wind but not wind shadow created by opponents – where lane assignment is probably relevant – was discussed.

- What if the average skill athletes’ level in the first round of these tournaments is not high enough that the finer features of the biomechanics and other subtle factors affecting performance just don’t play out to be decisive? Maybe even the first round is highly challenging to get into and only achievable for pros, but without further contextual info that’s hard to judge.

Econometric specification and presentation

- Are 8 lane dummies really the best specification, especially when there are concerns about statistical power? If effects are expected to be monotone across lanes, why not a continuous (linear or quadratic) function?

- If, there is a good reason to not use a continuous specification, why can lanes not be grouped (e.g., into 1&2, 3&4, 5&6, 7&8) to achieve more power per coefficient?

- Reading essentially 5 times the same specification and table but with different data sets is kind of hard and makes it hard to compare findings. Why not only present the 100m table as an example in the main text and then depict the (relevant) line coefficient and their SE graphically (e.g., lanes on the x-axis, normalised coefficients on the y-axis with SE and a line connecting them). This graphical presentation could then be added for all other distance to the same graph (e.g. by stacking the connecting lines with the coefficients). All other tables and the randomness check can then go to an appendix and the paper would be much more comprehensive. (Ideally the graph would also depict a normalised interval for MDE for each line.)

Other (some minor, some not) issues

- “was put in place during in the 1985-86 rules under rule 141.11” reads like the rule were only in place in the years/season 1985/1986 while elsewhere we find “AAF World Championships and U20 World Championships from 2000 to 2019”. I suggest to clarify this (also footnote 3 which is hard to read).

- In general, I would advise to describe a bit more clearly how/when/how long the random assignment was introduced. Right now it reads just like it was and then a bunch of technical details. Maybe that can be presented more story-like.

- I would also advise to refer to measures in the flow of the text not by their variable names but by what they describe in order to improve readability.

- the variable SB is first described as “SB is the runner’s season best race time”, then as “ assigned lanes based on prior race results (proxied here by SB)”. Am I correct that the author tries to claim that results prior to race can be proxied by SB, e.g., the best result across the whole season?

- The sum in the regression equation should index over only 8 dummies, not 9, as one lane is the baseline.

- I did not find the data in the paper or an appendix, except for a link to a sports website. I would expect the authors to share, with the manuscript, i) the original dataset used, including any outliers or incomplete data dropped for the actual analysis, ii) a short description on how it was generated, iii) the script(s) used to analyse and pre-process the data and to generate all tables/figures.

Reviewer #2: The topic of the paper is undoubtedly interesting and appealing. However, I have very serious concerns on the validity of the results presented in the paper due to the very poor and inaccurate description of the applied statistical methodology. Following, the details of the review are divided into major and minor comments.

Major comments:

1) The general description of the statistical methodology applied in the paper is completely missing. This issue does not allow to appropriately evaluate the validity of the results reported in the paper. The author should to carefully describe the applied statistical methodology in details in a separate section, by reporting in a rigorous way the main theory (formulas, assumptions, and the corresponding references). Furthermore, the specifications of the statistical models in formulas (1) and (2) are completely inaccurate; for instance, subscripts are missing in these formulas, the random components are just reported as “error” rather than through the well-known statistical notation, models’ assumptions are completely missing.

2) My main concern relates to the validity of the results reported in the paper. Since the description of the statistical methodology is completely missing, from what I see, it seems that the author apply a linear regression model? At the same time, the author claims along the entire manuscript to estimate “the causal effect of line assignments on race times”. However, it is well-known that we cannot speak about a causal effect when considering the “classical” linear regression model. There are several specific approaches for causal inference, as for instance the potential outcomes framework, causal graphs and similar; however, nothing is mentioned in the paper on this point. The issue mentioned above is of crucial importance on the entire validity and interpretation of the results reported in this paper, and it should be carefully justified and explained in details.

3) Statistical model diagnostics are completely missing. They should be performed and reported in order to appropriately evaluate the estimated statistical models.

4) Tables no.1-no.8: in all the tables, the estimated coefficients are reported incorrectly as Lane 1, Lane 3, Wind, etc. For example, the author should to write β ^_1 rather than Lane 1, β ^_3 rather than Lane 3, α ^_1 rather than Wind, and so on. Moreover, R^2 are erroneously reported as “R ^ 2”, and furthermore they should be discussed.

Minor comments:

1) To the best of my knowledge, the PLOS One Guidelines for authors require that the references in the text are reported by numbers, and footnotes are not permitted. Please, correct.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

- - - - -

For journal use only: PONEDEC3

PLoS One. 2022 Aug 3;17(8):e0271670. doi: 10.1371/journal.pone.0271670.r002

Author response to Decision Letter 0


3 Feb 2022

Response to reviewer comments:

Reviewer 1:

Overall, this paper is interesting and mostly well executed. However, it sometimes lacks in motivation and a deeper analysis that tries to establish the root causes for the (non-)findings, which could be done with the present data. There are also a few issues in the presentation, with the potential for improvement.

Motivation

- The main scientific relevance of the research question – as it is presented in the intro - currently rest with how much one trusts the claims in the cited literature on biomechanics and physical factors. The author should make the significance of this literature more clear if this is intended.

- If not, the author should make the other motivation for this paper – and the results’ potential wider implications - more clear.

- This applies in particular to the motivational/psychological factors at play.

Response: Thanks for these suggestions. I have added more discussion in the introduction of the relevance of the paper for the literature examining the performance effects of motivational/psychological factors.

Physics, psychics and measurement

- Given that not all 9 nine lanes are always filled and that each lane has its own average curvature (e.g., measured by radian covered per meter run and its concentration), could one not construct a related variable which measures and tests biomechanics argument directly (and continuously)?

Response: This is interesting. From the biomechanics literature there is no single “mechanism” that would slow runners down in tighter corners. Some ideas are that tight corners increase step frequency, lower foot force production, create asymmetries between legs, etc. Furthermore, while one can model the geometry of the track mathematically, to my knowledge there are not mathematical models relating this geometry to biomechanical factors. Without such a theory it’s hard to know how to treat the effect of curvature. For example, is the curvature effect linear, non-linear, etc. What I like about the baseline regression specification is that it’s agnostic about the lane-specific treatment effects. In relation to one of your points below, I have added two more specifications to the paper to explore the robustness of the results to other specifications.

- It should be made clear whether there are potentially opposing effect from other factors. For example Berger and Pope (2011) show that “being behind” spurs effort. People on inner tracks, if randomly assigned, are on average more often (literally) behind their opponents. Could this be an explanation for null effects? (see also Teeselink, van den Assem and van Dolder, 2021 for an opposing account)?

Response: See next response.

- By constructing appropriate measures (e.g., whether someone was on the outmost lane and could not see competitors vs accounting for lane curvature which might not co-vary perfectly if not all lanes are filled) one could account for – and test - different theories.

Response: Thanks for these great comments. The evidence from the baseline specification is simply showing the net impact of being in a specific lane. I think this is ultimately what we care about in the track and field context, i.e. “are the middle lanes best?” But there is a subtle point you raise that the “being behind spurs effort” channel could still be active, but it’s not strong enough to make the outside lanes slower. This question is more relevant for the motivational/psychological factors you highlight. Following your suggestion, I’ve added a separate analysis where I include an indicator variable for the outmost lane (which, as you note, is not perfectly correlated with lane number). I pursue this additional analysis in the 200 and 400m, where the staggered starts/effort effects would be most noticeable and find some evidence that the outermost lanes in the 200m do slow runners down. I find no statistical effect of the outermost lane in the 400m, which could be a product of noisier data. Anyway, thanks for these suggestions, I think they have really strengthened the paper.

- I was kind of surprised that wind but not wind shadow created by opponents – where lane assignment is probably relevant – was discussed.

Response: Thanks for this comment. I’ve never encountered it anecdotally from my participation in the sport and I searched for any discussion in track and field forums about wind shadows in sprint events and was unable to find anything (it’s certainly a factor in long- distance events where runners draft, but I couldn’t find anything in relation to sprinting). So, it seems like it’s not a common thing that is highlighted in relation to lane assignments. I worry about adding discussion about this since it doesn’t seem to be widely discussed, so I have left it out.

- What if the average skill athletes’ level in the first round of these tournaments is not high enough that the finer features of the biomechanics and other subtle factors affecting performance just don’t play out to be decisive? Maybe even the first round is highly challenging to get into and only achievable for pros, but without further contextual info that’s hard to judge.

Response: I think this is an interesting question. My off-the-cuff response is that these are the world championships, so clearly these are the best runners in the world. I’ve tried to think of different ways to provide more contextual info, but nothing obvious came to mind... (e.g. look at average season’s bests of this group of runners, but compare it to who?) Being the world championships doesn’t necessarily rule out that these lane effects are only/more salient for the cream of the crop, but I don’t know if there is a way to assess that question reliably. Ultimately, exploiting the random assignment feature is important for obvious reasons, and this feature doesn’t exist in the more elite rounds (e.g. semis/finals). I don’t disagree with the possibility that lane effects would be more relevant for more elite athletes, but without a reliable way to empirically assess the question it feels very conjecture-y, so I have left this discussion out.

Econometric specification and presentation

- Are 8 lane dummies really the best specification, especially when there are concerns about statistical power? If effects are expected to be monotone across lanes, why not a continuous (linear or quadratic) function?

Response: See below.

- If, there is a good reason to not use a continuous specification, why can lanes not be grouped (e.g., into 1&2, 3&4, 5&6, 7&8) to achieve more power per coefficient?

Response: Thanks for these comments. In the baseline analysis I chose dummies for each lane to be as agnostic as possible regarding the functional form of any lane effects. As additional robustness checks, and to improve statistical power, I have added additional results where I group 1&2, 3&4, 5&6, 7&8&9 and where I treat lanes as a continuous variable. The general conclusions from the original baseline specification are also reflected in these additional results.

- Reading essentially 5 times the same specification and table but with different data sets is kind of hard and makes it hard to compare findings. Why not only present the 100m table as an example in the main text and then depict the (relevant) line coefficient and their SE graphically (e.g., lanes on the x-axis, normalised coefficients on the y-axis with SE and a line connecting them). This graphical presentation could then be added for all other distance to the same graph (e.g. by stacking the connecting lines with the coefficients). All other tables and the randomness check can then go to an appendix and the paper would be much more comprehensive. (Ideally the graph would also depict a normalised interval for MDE for each line.)

Response: I agree that there are a lot of tables to look at. I’ve moved the randomness checks for the 200, 400, and 800m to the appendix. I’ve also added a graphical depiction of the main results. Stacking everything in one graph became visually very cluttered. So, what I’ve done is to graphically display the estimated lane effects (from pooling men and women and excluding outliers) together with 95% confidence intervals in side-by-side figures for the 100, 200, 400 and 800m. I think this graphical display helps to interpret the results. I’ve left the regression result tables in the main text as I refer to them frequently in the text and it would be burdensome for readers to flip back and forth between the appendix.

Other (some minor, some not) issues

- “was put in place during in the 1985-86 rules under rule 141.11” reads like the rule were only in place in the years/season 1985/1986 while elsewhere we find “AAF World Championships and U20 World Championships from 2000 to 2019”. I suggest to clarify this (also footnote 3 which is hard to read).

- In general, I would advise to describe a bit more clearly how/when/how long the random assignment was introduced. Right now it reads just like it was and then a bunch of technical details. Maybe that can be presented more story-like.

Response: I have re-worded the section to emphasize that the random assignment rule was initiated in the 1985-86 season and remains in place today. I wish I had more of a “rule history” (i.e. why were rules changed, etc.) to rely on to tell a story here, but unfortunately I’ve been unable to find that information anywhere.

- I would also advise to refer to measures in the flow of the text not by their variable names but by what they describe in order to improve readability.

Response: I’ve done this where appropriate.

- the variable SB is first described as “SB is the runner’s season best race time”, then as “ assigned lanes based on prior race results (proxied here by SB)”. Am I correct that the author tries to claim that results prior to race can be proxied by SB, e.g., the best result across the whole season?

Response: I have re-written this section to help clarify.

- The sum in the regression equation should index over only 8 dummies, not 9, as one lane is the baseline.

Response: thanks for point this out, I have made this change.

- I did not find the data in the paper or an appendix, except for a link to a sports website. I would expect the authors to share, with the manuscript, i) the original dataset used, including any outliers or incomplete data dropped for the actual analysis, ii) a short description on how it was generated, iii) the script(s) used to analyse and pre-process the data and to generate all tables/figures.

Response: My apologies for this. I’m used to providing replication packages during the publication process, not prior. While the data is publicly available in uncompiled form from the IAAF, I have now included a link to a replication package with the compiled data.

Reviewer 2:

The topic of the paper is undoubtedly interesting and appealing. However, I have very serious concerns on the validity of the results presented in the paper due to the very poor and

inaccurate description of the applied statistical methodology. Following, the details of the review are divided into major and minor comments.

Major comments:

1) The general description of the statistical methodology applied in the paper is completely missing. This issue does not allow to appropriately evaluate the validity of the results reported in the paper. The author should to carefully describe the applied statistical methodology in details in a separate section, by reporting in a rigorous way the main theory (formulas, assumptions, and the corresponding references). Furthermore, the specifications of the statistical models in formulas (1) and (2) are completely inaccurate; for instance, subscripts are missing in these formulas, the random components are just reported as “error” rather than through the well-known statistical notation, models’ assumptions are completely missing.

Response: My apologies for this omission. In the field I’m in, it would be seen as unnecessarily/obvious to discuss the theory/assumptions underlying the regression analysis. With random assignment to treatment, the analysis is quite straight and amounts to reporting average treatment effects. To be as agnostic as possible about the structure of these effects I estimate them using dummy (0,1) variables for lanes in a regression that controls for the other covariates. I have added some discussion of this to clarify.

2) My main concern relates to the validity of the results reported in the paper. Since the description of the statistical methodology is completely missing, from what I see, it seems that the author apply a linear regression model? At the same time, the author claims along the entire manuscript to estimate “the causal effect of line assignments on race times”. However, it is well-known that we cannot speak about a causal effect when considering the “classical” linear regression model. There are several specific approaches for causal inference, as for instance the potential outcomes framework, causal graphs and similar; however, nothing is mentioned in the paper on this point. The issue mentioned above is of crucial importance on the entire validity and interpretation of the results reported in this paper, and it should be carefully justified and explained in details.

Response: Respectfully, “However, it is well-known that we cannot speak about a causal effect when considering the “classical” linear regression model” is simply incorrect. There is nothing inherently problematic about using linear regression to estimate causal effects. The issue is whether the assumptions of OLS are met or not. In non-experimental data, it is unlikely that the assumptions are met. But the whole point of the paper is to leverage random assignment to ensure the exogeneity assumption is met. There is nothing wrong with using OLS to estimate average treatment effects with random assignment.

3) Statistical model diagnostics are completely missing. They should be performed and reported in order to appropriately evaluate the estimated statistical models.

Response: As is now clarified in the paper, there is no functional form assumed in estimating the average treatment effects of lanes. As a result, in terms of model diagnostics, there is nothing to test regarding the functional form or homoscedasticity for the main variables of interest (lane effects). In response to the other reviewer’s suggestions, I have also added results estimated from two different regression specifications.

4) Tables no.1-no.8: in all the tables, the estimated coefficients are reported incorrectly as Lane 1, Lane 3, Wind, etc. For example, the author should to write β _̂ 1 rather than Lane 1, β _̂ 3 rather than Lane 3, α _̂ 1 rather than Wind, and so on. Moreover, R^2 are erroneously reported as “R ^ 2”, and furthermore they should be discussed.

Response: Thanks for this comment. I have raised “2” to a superscript in “R^2”. For readability sake, I feel “Lane 1” is preferrable to “\\beta_1”. It seems very straightforward to understand that “Lane 1” in these tables is referring the coefficient estimates for Lane 1.

Minor comments:

1) To the best of my knowledge, the PLOS One Guidelines for authors require that the references in the text are reported by numbers, and footnotes are not permitted. Please, correct.

Response: thank you for pointing this out, I have made these changes.

Attachment

Submitted filename: PLOS ONE Response to reviewer comments.pdf

Decision Letter 1

Roy Cerqueti

14 Apr 2022

PONE-D-21-35820R1Are there lane advantages in track and field?PLOS ONE

Dear Dr. Munro,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 29 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Roy Cerqueti, Ph.D.

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

2. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex.

3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: No

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: No

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: As already stated in my previous report, the topic of the paper is undoubtedly interesting and appealing. However, I still remain with the concerns included in my previous report, mainly due to the fact that almost all major comments/suggestions I made to improve the paper are not taken into account. The statistical methodology applied in the paper is still not clearly described in a general and rigorous way; the statistical models are not reported accurately, for instance formulas (1) and (2). From some author’s clarifications, I was able to understand that the author deals with a causal inference framework, but its general and clear description is missing, as well as the relevant literature in this field. Following, the details of the review is reported into major comments.

Major comments:

1. The general description of the statistical methodology applied in the paper is still completely missing. The point is that this cause a misunderstanding of the applied statistical methodology, and it does not allow to appropriately evaluate the validity of the results reported in the paper. From what I was able to understand from the author’s corrections made in the manuscript, and some responses, the author deals with a causal inference framework, rather than with classical linear regression model. Regarding on the field one works, in my opinion, the causal inference framework the author deals with, is anything but “unnecessarily/obvious to discuss the theory/assumptions underlying the regression analysis.” I wish also to point out that it is not just a regression analysis, but a causal inference framework which is something different from classical linear regression modelling. I still suggest to the author to carefully describe it in details in a separate section, by reporting in a rigorous way the main theory (formulas, assumptions, and also the most relevant literature in the causal inference framework which is completely missing). Regarding the statistical models in formula (1) and (2), they are still reported inaccurately: 〖Time〗_(i,j) to indicate the response variable? Statistical models should be reported accurately in general, for instance using y_ij for the response variable. What about the subscripts: i=1,…?,j=1,…? Also, some statistical terminology: for instance, along the paper just using “specification” along the paper to refer to a statistical model?

2. Again, it is well-known that we cannot speak about a causal effect when considering the “classical” linear regression model. Correlation does not imply causality! Using OLS regression to estimate average treatment effects with random assignment is a causal inference framework, not a “classical” linear regression modelling. This is also why I kindly invite the author to describe briefly describe in a rigorous way the applied statistical methodology, in order to make the paper clear for potential readers.

3. Being revealed that the author use a causal inference framework, and not just a classical linear regression modelling, I have a question. More precisely, the random assignment assumption is crucial for the validity of the results. The author check such assumption through the statistical model in formula (2): do you think that this is enough to confirm it, and why? What about existing approaches in the literature to deal with this issue; for instance, just to mention one (i.e., not limited to), the propensity score approach? Furthermore, in Section 3.1, in the sentence “…only lane 1 has a statistically significant relationship with SB, which, again, is likely an effect of the low number of observations.” Why it should be due to “an effect of the low number of observations”?

4. Regarding the results for the “pooled” data: why a covariate for gender is not included in the statistical models? How results change if you include also “gender” as a covariate in an appropriate way?

5. Again, in my opinion, suitable models diagnostics should be performed in order to appropriately evaluate the estimated statistical models. They are not just limited to evaluate the “functional form”. Just a clarification: the homoscedasticity assumption relates to the error component, and not to the “main variables of interest (lane effects)”.

6. Again, Tables no.1-no.8: in all the tables, the estimated coefficients are reported incorrectly as Lane 1, Lane 3, Wind, etc. For example, the author should to write β ^_1 rather than Lane 1, β ^_3 rather than Lane 3, α ^_1 rather than Wind, and so on. It could seem very straightforward to understand, but the problem is that this is not correct, and it’s an error.

Reviewer #3: The paper focuses on the common belief about the fact that some lanes on the track, in particular the middle ones such as 3-6, are advantageous with respect to the others. Using a sample of random assigned lanes in the first round of events the author finds no evidence supporting the common belief and, in some cases, even contrary. The work concludes that the common belief is a folk tale.

General comments

The paper answers an interesting and “fanciful” question about an alleged benefit in racing in the middle lanes. The paper is well motivated and the results are likely to have a great echo in non-academic fields. The econometric analysis is correctly conducted, and the results obtained seem sound. Notwithstanding, there are some points that deserve to be refined. In particular:

1) notation of eq (1) and (2). As far as I’ve understood, the dummies Lk represent the Lanek for runner i in heat j and their value is supposed to change according to the runner and to the heat. Therefore, they should also report the subscript ij and the summation is up to k-1 given the collinearity problem.

2) When the pulled sample is used a gender dummy should be included

3) Quoting from page 7 “only lane 1 has a statistically significant relationship with SB, which, again, is likely an effect of the low number of observations”. It is difficult to understand this claim, it seems the other way round. When the number of observations increases, the standard error (SE) decreases, and consequently the t-stat increases as t=beta/SE(beta), therefore when the number of observations is low one over accepts the null of non significant statistical effects, i.e. weak power. I think the claim should be revised and another piece of explanation for that significance should be put forth.

4) Power problems. I wonder whether there are other tests that can be implemented to analyse the issue whether the results are driven by low power or whether they can be read as pure lack of statistical significance. I am not an expert of this specific filed, but one idea could be to adapt the tests proposed by Cattaneo Titiunik and Vazquez-Bera (2019) to the case under scrutiny.

Minor issues

The references must be reported according to the common practice followed in the literature. In the current version, no contribution reports the year.

References

Cattaneo M-D., Titiunik R., and Vazquez-Bera G., 2019. Power calculations for regression-discontinuity designs. The Stata Journal, 19(1): 210-245.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Referee Report_PONE-D-21-35820.pdf

PLoS One. 2022 Aug 3;17(8):e0271670. doi: 10.1371/journal.pone.0271670.r004

Author response to Decision Letter 1


18 May 2022

Response to Reviewers:

Reviewer #1

Thank you for your positive assessment of my original manuscript and helpful comments. I hope my responses and changes to the manuscript were satisfactory.

Reviewer #3

Thank you for your time and effort in reviewing my manuscript along with your helpful suggestions. Below I’ve responded to each of your comments and detailed the changes made to the paper in bold below each.

The paper answers an interesting and “fanciful” question about an alleged benefit in racing in the middle lanes. The paper is well motivated and the results are likely to have a great echo in non-academic fields. The econometric analysis is correctly conducted, and the results obtained seem sound. Notwithstanding, there are some points that deserve to be refined. In particular:

1) notation of eq (1) and (2). As far as I’ve understood, the dummies L_k represent the Lane k for runner i in heat j and their value is supposed to change according to the runner and to the heat. Therefore, they should also report the subscript ij and the summation is up to k-1 given the collinearity problem.

Response: Thank you for pointing this out, I have made these suggested changes. I modified the summation up to n-1, as the notation should be distinct from the index variable k.

2) When the pulled sample is used a gender dummy should be included

Response: Thanks for this suggestion. A gender dummy wasn’t included originally because Personal and Seasonal Bests should pick up a lot of the gender differences. But there certainly could be added information controlled for with a gender dummy, so I have included it in the pooled regressions. The general conclusions from the regressions don’t change. If anything, the dummy tightens up some of the standard errors in the pooled regressions, and some of the results are slightly stronger. Again, thanks for this suggestion.

3) Quoting from page 7 “only lane 1 has a statistically significant relationship with SB, which, again, is likely an effect of the low number of observations”. It is difficult to understand this claim, it seems the other way round. When the number of observations increases, the standard error (SE) decreases, and consequently the t-stat increases as t=beta/SE(beta), therefore when the number of observations is low one over accepts the null of non significant statistical effects, i.e. weak power. I think the claim should be revised and another piece of explanation for that significance should be put forth.

Response: Thanks for these comments. The idea is that with low N, it’s more likely to have Type-I error (erroneously rejecting the null of no effect). I have added a note about this and a reference to Leppink et al. 2016. The pooled results for the randomization checks have changed to some degree with the gender dummy, so lane 1 is no longer significant. In the 100m, only lane 4 is significant, however, with multiple lanes/hypothesis tests, a more appropriate test is to examine if the lane assignments are jointly significant. As such, to strengthen these randomization checks, I have also added F-tests to examine the joint significance of the lanes, and by and large they fail significance at 5% level. In only the women’s 100m, F-tests are statistically significant, but they fail 5% significance when the data is pooled with the men’s results. In the 200, 400 and 800m all the F-test results are highly insignificant. These results suggest that, collectively, the lane assignments are unrelated to the prior performance of runners, i.e. randomly assigned.

4) Power problems. I wonder whether there are other tests that can be implemented to analyse the issue whether the results are driven by low power or whether they can be read as pure lack of statistical significance. I am not an expert of this specific filed, but one idea could be to adapt the tests proposed by Cattaneo Titiunik and Vazquez-Bera (2019) to the case under scrutiny.

Response: Thanks for this suggestion. I looked at the suggested paper, and it’s tailored to regression discontinuity research designs, which, unfortunately, are a different identification strategy than I take. I’m not an expert with power calculations, but I did some more research and it seems like reporting MDEs, which I do in the paper, is a common approach to discussing ex-post statistical power (e.g. Mckenzie and Ozier 2019). I have added more discussion and a citation motivating the use of MDEs.

Minor issues

The references must be reported according to the common practice followed in the literature. In the current version, no contribution reports the year.

Response: Reporting the years is also the practice I’m familiar and comfortable with, however the PLOS ONE guidelines state: “References are listed at the end of the manuscript and numbered in the order that they appear in the text. In the text, cite the reference number in square brackets (e.g., “We used the techniques developed by our colleagues [19] to analyze the data”).”

References

Cattaneo M-D., Titiunik R., and Vazquez-Bera G., 2019. Power calculations for regression-discontinuity designs. The Stata Journal, 19(1): 210-245.

References:

Leppink, Jimmie, Kal Winston, and Patricia O’Sullivan. "Statistical significance does not imply a real effect." Perspectives on medical education 5.2 (2016): 122-124.

McKenzie, David, and Owen Ozier. "Why ex-post power using estimated effect sizes is bad, but an ex-post MDE is not." World Bank Development Impact Blog (2019).

Reviewer #2:

I appreciate your time and effort in reviewing my manuscript. Please see my responses to your comments below.

As already stated in my previous report, the topic of the paper is undoubtedly interesting and appealing. However, I still remain with the concerns included in my previous report, mainly due to the fact that almost all major comments/suggestions I made to improve the paper are not taken into account. The statistical methodology applied in the paper is still not clearly described in a general and rigorous way; the statistical models are not reported accurately, for instance formulas (1) and (2). From some author’s clarifications, I was able to understand that the author deals with a causal inference framework, but its general and clear description is missing, as well as the relevant literature in this field. Following, the details of the review is reported into major comments.

Major comments:

1. The general description of the statistical methodology applied in the paper is still completely missing. The point is that this cause a misunderstanding of the applied statistical methodology, and it does not allow to appropriately evaluate the validity of the results reported in the paper. From what I was able to understand from the author’s corrections made in the manuscript, and some responses, the author deals with a causal inference framework, rather than with classical linear regression model. Regarding on the field one works, in my opinion, the causal inference framework the author deals with, is anything but “unnecessarily/obvious to discuss the theory/assumptions underlying the regression analysis.” I wish also to point out that it is not just a regression analysis, but a causal inference framework which is something different from classical linear regression modelling. I still suggest to the author to carefully describe it in details in a separate section, by reporting in a rigorous way the main theory (formulas, assumptions, and also the most relevant literature in the causal inference framework which is completely missing). Regarding the statistical models in formula (1) and (2), they are still reported inaccurately: 〖Time〗_(i,j) to indicate the response variable? Statistical models should be reported accurately in general, for instance using y_ij for the response variable. What about the subscripts: i=1,…?,j=1,…? Also, some statistical terminology: for instance, along the paper just using “specification” along the paper to refer to a statistical model?

Response: Thanks for these comments. I had a hard time interpreting what is being requested here as there are no specific references to relevant literature. The identification strategy is random assignment to treatment. In this case, understanding causal effects amounts to reported average treatment effects. While I implement a regression-based approach, all the regressions are doing is computing the difference in average times by lane number. One could just compute raw means and do this, but the regression framework is convenient because it allows me to control for other correlates, which helps improve precision. In an attempt to respond to the request for more discussion of causal inference, I have added some discussion about what the random assignment buys you (i.e. the equivalence of characteristics of the treatment groups). Thank you for the suggestion regarding subscripts. Following your suggestion and one from the other reviewer I have made some modifications to the notation. I have also changed the language to “regression specification” to avoid any confusion.

2. Again, it is well-known that we cannot speak about a causal effect when considering the “classical” linear regression model. Correlation does not imply causality! Using OLS regression to estimate average treatment effects with random assignment is a causal inference framework, not a “classical” linear regression modelling. This is also why I kindly invite the author to describe briefly describe in a rigorous way the applied statistical methodology, in order to make the paper clear for potential readers.

Response: Thanks for these comments. However, I respectfully disagree. The word “classical” simply refers to the case when the assumptions of OLS are met. There is nothing inherently wrong with using regression for causal inference *if* you are confident that assignment to treatment is random. This is the whole point of the paper-- to leverage the random assignment to lanes to estimate a causal effect. In other words, OLS is simply being used as a statistical method to test a null hypothesis of differences in the data that are generated from random variation. For more discussion on using regression to estimate causal treatment effects with random assignment see Ch. 9 of Gelman and Hill (2006). I have added some discussion about random assignment and causal inference to help clarify the identification strategy.

3. Being revealed that the author use a causal inference framework, and not just a classical linear regression modelling, I have a question. More precisely, the random assignment assumption is crucial for the validity of the results. The author check such assumption through the statistical model in formula (2): do you think that this is enough to confirm it, and why? What about existing approaches in the literature to deal with this issue; for instance, just to mention one (i.e., not limited to), the propensity score approach? Furthermore, in Section 3.1, in the sentence “…only lane 1 has a statistically significant relationship with SB, which, again, is likely an effect of the low number of observations.” Why it should be due to “an effect of the low number of observations”?

Response: Thanks for these comments. I’m sympathetic to the concern about random assignment to treatment, this is an important assumption. In terms of why we should believe it: as stated in the paper, it is in the competition rulebook of the IAAF. It is, of course, possible that they don’t follow the rules, so the randomization checks were done as an attempt to confirm adherence to this rule, and they are generally supportive. To conduct them, I look at a runners Season’s Best listed in the startlist for each heat. This is the only observable information on the runners I have available (besides gender, but I also group gender separately). Propensity score matching is useful when you are exploring differences across a number of characteristics. But, unfortunately, I only have one characteristic. For the question at hand though -- how race times vary by lane -- SB is arguably the most relevant characteristic as it proxies very well for a runner’s ability. To strengthen the randomization check results I have also added F-tests to examine the joint significance of the lanes. In terms of the low number of observations, the concern is that with smaller sample sizes it’s more likely to have Type-1 error (incorrectly rejecting the null of no effect). I have added a note about this and a reference to Leppink et al. 2016.

4. Regarding the results for the “pooled” data: why a covariate for gender is not included in the statistical models? How results change if you include also “gender” as a covariate in an appropriate way?

Response: Thanks for this comment, I have added gender as a control in the pooled regressions.

5. Again, in my opinion, suitable models diagnostics should be performed in order to appropriately evaluate the estimated statistical models. They are not just limited to evaluate the “functional form”. Just a clarification: the homoscedasticity assumption relates to the error component, and not to the “main variables of interest (lane effects)”.

Response: Thanks for these comments. As noted in the results tables, the standard errors reported are all heteroscedasticity-consistent (i.e. robust standard errors).

6. Again, Tables no.1-no.8: in all the tables, the estimated coefficients are reported incorrectly as Lane 1, Lane 3, Wind, etc. For example, the author should to write β ^_1 rather than Lane 1, β ^_3 rather than Lane 3, α ^_1 rather than Wind, and so on. It could seem very straightforward to understand, but the problem is that this is not correct, and it’s an error.

Response: Thanks for these comments. I’m sympathetic to your point, yet the norm is to label the coefficients estimates with the names of independent variables, and not, e.g., \\beta_1. Indeed, this is how statistical software (e.g. R, Stata, etc.) reports regression results. I think it’s important to follow the norm, so I have left the labelling as is. This practice also avoids the need for readers to keep referring back and forth between to the regression specification and the results tables to understand what each \\beta represents.

References:

Gelman, Andrew, and Jennifer Hill. Data analysis using regression and multilevel/hierarchical models. Cambridge university press, 2006.

Attachment

Submitted filename: PLOS ONE Response to reviewer comments RR.pdf

Decision Letter 2

Roy Cerqueti

17 Jun 2022

PONE-D-21-35820R2Are there lane advantages in track and field?PLOS ONE

Dear Dr. Munro,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

ACADEMIC EDITOR: Dear David, one reviewer accepts the paper, while the other one raises criticisms on some crucial aspects of the study. Please, provide a detailed response to the points raised by the reviewer. I want to stress that only a satisfactory treatment of such points might lead to the acceptance of the paper.Thank you for your effort, I wish you all the best.Yours,Roy

Please submit your revised manuscript by Aug 01 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Roy Cerqueti, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: (No Response)

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: (No Response)

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: The topic of the paper is undoubtedly appealing and interesting. However, in this third round of revision, my concerns remain the same, since most of my major comments/suggestions to improve the paper are again not taken fully and/or appropriately into account. To this end, in what follows, once again I report almost the same major comments present in my two previous reports.

Major comments:

1. The general description of the statistical methodology applied in the paper, as well as the relevant literature in this field are still completely missing. I still kindly suggest to the author to carefully describe in details in a separate section and/or subsection the main statistical theory (i.e., the statistical methodology used in this paper) in a general and rigorous way, i.e., main formulas, assumptions and so on. Moreover, the most relevant literature in the causal inference framework, which is still completely missing, should be also reported by the author and cited where necessary. Regarding the statistical models in formula (1) and (2), they are still reported inaccurately: again, why 〖Time〗_(i,j) is used to indicate the response variable? Statistical models should be reported accurately in general, for instance using y_ij for the response variable. Similar issues also apply to the independent variables included in the model. Again, what about the subscripts: i=1,…?,j=1,…? That is, the subscripts “i” and “j” goes from 1 to what? About the subscript “k”: why it goes from 1 to n? Why not to “K” for example, since the letter “n” in statistics is usually used to indicate the sample size. Also, I think it is confused and not in line with the basic statistical terminology to use always and everywhere “regression specification” only to refer to a statistical model.

2. In fact, it is well-known that in a causal inference framework with random assignment to treatment, one can use a regression model to estimate causal effects. But, please note that when one deals with classical linear regression modelling (NOT in a causal inference framework with the well-known assumptions on the treatment assignments mechanism), one cannot state that this is a causal effect, because correlation does not imply causality. I understood that the author deals with causal inference framework, but in my opinion, it should be accurately described in a separate section and/or subsection through a general description of the main statistical theory, as I already suggested at my previous major comment #1.

3. Please, justify and explain better your statement on the use of propensity score matching in your specific case-study. Furthermore, my previous question was not just limited to propensity score matching: what about other existing approaches in the literature to deal with this issue? The statements along the paper which rely on “low number of observations” are still very confused, and, at least in this current form, they do not seem appropriate. This is because from what I see in all the tables, the values reported for “N” are high rather than low. Please, justify.

4. Again, in my opinion, suitable models diagnostics should be performed in order to appropriately evaluate the estimated statistical models. Please, note that they are definitely not just limited to standard errors.

5. Again, Tables no.1-no.8: in all the tables, the estimated coefficients are reported incorrectly as Lane 1, Lane 3, Wind, etc. For example, the author should to write β ^_1 rather than Lane 1, β ^_3 rather than Lane 3, α ^_1 rather than Wind, and so on. It could seem very straightforward to understand, it is how R, Stata reports them, but the problem is that this is not correct, it’s not the norm and it’s an error.

Reviewer #3: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Aug 3;17(8):e0271670. doi: 10.1371/journal.pone.0271670.r006

Author response to Decision Letter 2


1 Jul 2022

Response to Reviewers:

Reviewer #2: The topic of the paper is undoubtedly appealing and interesting. However, in this third round of revision, my concerns remain the same, since most of my major comments/suggestions to improve the paper are again not taken fully and/or appropriately into account. To this end, in what follows, once again I report almost the same major comments present in my two previous reports.

Major comments:

1. The general description of the statistical methodology applied in the paper, as well as the relevant literature in this field are still completely missing. I still kindly suggest to the author to carefully describe in details in a separate section and/or subsection the main statistical theory (i.e., the statistical methodology used in this paper) in a general and rigorous way, i.e., main formulas, assumptions and so on. Moreover, the most relevant literature in the causal inference framework, which is still completely missing, should be also reported by the author and cited where necessary. Regarding the statistical models in formula (1) and (2), they are still reported inaccurately: again, why 〖Time〗_(i,j) is used to indicate the response variable? Statistical models should be reported accurately in general, for instance using y_ij for the response variable. Similar issues also apply to the independent variables included in the model. Again, what about the subscripts: i=1,…?,j=1,…? That is, the subscripts “i” and “j” goes from 1 to what? About the subscript “k”: why it goes from 1 to n? Why not to “K” for example, since the letter “n” in statistics is usually used to indicate the sample size. Also, I think it is confused and not in line with the basic statistical terminology to use always and everywhere “regression specification” only to refer to a statistical model.

Response: see response to 2.

2. In fact, it is well-known that in a causal inference framework with random assignment to treatment, one can use a regression model to estimate causal effects. But, please note that when one deals with classical linear regression modelling (NOT in a causal inference framework with the well-known assumptions on the treatment assignments mechanism), one cannot state that this is a causal effect, because correlation does not imply causality. I understood that the author deals with causal inference framework, but in my opinion, it should be accurately described in a separate section and/or subsection through a general description of the main statistical theory, as I already suggested at my previous major comment #1.

Response: I agree that without random assignment linear regression can’t speak about causal effects. In relation to your comments 1 and 2 I have added a separate section detailing the causal inference framework, which highlights the importance of the independence assumption (treatment status independent of outcomes). This emphasizes the importance of utilizing the heats which implement random assignment to lanes.

As per your request, I have changed the notation to Y_{i,j} instead of “Time.” The i and j notation is not over sums, so they are just denoting unique observations for different “i” and “j”, so it is not necessary to stipulate the limits of these indexes. In relation to the 1 to n notation, I changed the notation in rewriting the causal inference section and no longer use “n.”

As per your request, I have also changed “specification” to “model.”

3. Please, justify and explain better your statement on the use of propensity score matching in your specific case-study. Furthermore, my previous question was not just limited to propensity score matching: what about other existing approaches in the literature to deal with this issue? The statements along the paper which rely on “low number of observations” are still very confused, and, at least in this current form, they do not seem appropriate. This is because from what I see in all the tables, the values reported for “N” are high rather than low. Please, justify.

Response: I believe your comment is referring to propensity score matching in relation to the randomization checks. Normally, propensity score matching is a way to match individuals in the treatment and control groups based on similar covariates (we can reduce the dimensionality of this by summarizing similar individuals by their propensity scores.) This matching is typically done when there are concerns that individuals in the treatment/control groups are systematically different. In this sense, the matching part only really makes sense when propensities vary systematically conditional on treatment status. However, one can use the first step (estimating propensities) to see if the randomization successfully balanced treatment and control groups. I think this is what you may be asking for. In the appendix I have added probit regressions where I estimate how the probability of being assigned to a lane is a function of runner ability (season’s best). If these treatment probabilities vary systematically by runner ability that would be concerning about the randomization. Thankfully, none of the regressions show that season’s best is significantly related to treatment status. I hope I have interpreted your comment appropriately.

In relation to the sample sizes, what I’m trying to highlight is that readers should be cautious about any statistical significance being derived from small sample sizes. For example, in the Women’s 100m randomization check, there are statistically significant results for lanes 1 and 9. However, the sample sizes in these lanes are less than 40, relative to around 100 in the other lanes. These are relatively small sample sizes. The concern is that with smaller sample sizes it’s more likely to have Type-1 error (incorrectly rejecting the null of no effect) (Leppink et al. 2016.). This also relates to Type-M error (Gelman and Carlin, 2014). They emphasize that significant results from small sample sizes often overstate the magnitude of the true effect: “The problem, though, is that if sample size is too small, in relation to the true effect size, then what appears to be a win (statistical significance) may really be a loss (in the form of a claim that does not replicate).” My point is that in lanes with small numbers of observations, while there are occasionally significant results, readers should be skeptical of the replicability of those results. I have attempted to clarify this in the text.

4. Again, in my opinion, suitable models diagnostics should be performed in order to appropriately evaluate the estimated statistical models. Please, note that they are definitely not just limited to standard errors.

Response: There are several diagnostics/robustness checks related to the model already in the paper: R^2, F-statistics, removing outliers, robust standard errors, and three different statistical models. None of the results are sensitive to these different checks. It’s also important to emphasize that the coefficients of interest are on indicator variables (treatment indicators), and there is no functional form assumed here. It may also be relevant to note that with random assignment, choosing the appropriate model is not really necessary for a causal interpretation of regression. For more discussion of this, see Chp. 3 in Angrist and Pischke (2009).

Without specific guidance on what diagnostic(s) you think would be valuable to add and why, I don’t know how else to respond to this comment. I apologize.

5. Again, Tables no.1-no.8: in all the tables, the estimated coefficients are reported incorrectly as Lane 1, Lane 3, Wind, etc. For example, the author should to write β ^_1 rather than Lane 1, β ^_3 rather than Lane 3, α ^_1 rather than Wind, and so on. It could seem very straightforward to understand, it is how R, Stata reports them, but the problem is that this is not correct, it’s not the norm and it’s an error.

Response: I’ve changed the notation in the tables to be \\beta_1 etc., as you have requested. However, as a search of papers in PLOS ONE will highlight, I do not think this is a standard practice when reporting regression results, at least in the fields I am familiar with. To ease interpretation in light of this change, I have also included the independent variable in parenthesis beside each coefficient, so readers know what each coefficient represents. I hope you find this to be a reasonable compromise.

Attachment

Submitted filename: Reviewer 2 Comments- RR2.pdf

Decision Letter 3

Roy Cerqueti

6 Jul 2022

Are there lane advantages in track and field?

PONE-D-21-35820R3

Dear Dr. Munro,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Roy Cerqueti, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Dear David,

I'm satisfied with your revision strategy, No further revision rounds are required by my side.

Thanks a lot, yours,

Roy

Reviewers' comments:

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (ZIP)

    S1 Appendix

    (PDF)

    Attachment

    Submitted filename: PLOS ONE Response to reviewer comments.pdf

    Attachment

    Submitted filename: Referee Report_PONE-D-21-35820.pdf

    Attachment

    Submitted filename: PLOS ONE Response to reviewer comments RR.pdf

    Attachment

    Submitted filename: Reviewer 2 Comments- RR2.pdf

    Data Availability Statement

    All data is publicly available and can be accessed via: https://www.worldathletics.org/competitions A full replication package is included in my submission materials and, in addition, can be located here: https://github.com/dmunro-git/Lane-Advantages.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES