Skip to main content
PLOS One logoLink to PLOS One
. 2021 Jul 15;16(7):e0254538. doi: 10.1371/journal.pone.0254538

Predicting performance in 4 x 200-m freestyle swimming relay events

Paul Pao-Yen Wu 1,2,*, Toktam Babaei 1,2, Michael O’Shea 1,2, Kerrie Mengersen 1,2, Christopher Drovandi 1,2, Katie E McGibbon 3, David B Pyne 4, Lachlan J G Mitchell 3, Mark A Osborne 5
Editor: Dalton Müller Pessôa Filho6
PMCID: PMC8282077  PMID: 34265006

Abstract

Aim

The aim was to predict and understand variations in swimmer performance between individual and relay events, and develop a predictive model for the 4x200-m swimming freestyle relay event to help inform team selection and strategy.

Data and methods

Race data for 716 relay finals (4 x 200-m freestyle) from 14 international competitions between 2010–2018 were analysed. Individual 200-m freestyle season best time for the same year was located for each swimmer. Linear regression and machine learning was applied to 4 x 200-m swimming freestyle relay events.

Results

Compared to the individual event, the lowest ranked swimmer in the team (-0.62 s, CI = [−0.94, −0.30]) and American swimmers (−0.48 s [−0.89, −0.08]) typically swam faster 200-m times in relay events. Random forest models predicted gold, silver, bronze and non-medal with 100%, up to 41%, up to 63%, and 93% sensitivity, respectively.

Discussion

Team finishing position was strongly associated with the differential time to the fastest team (mean decrease in Gini (MDG) when this variable was omitted = 31.3), world rankings of team members (average ranking MDG of 18.9), and the order of swimmers (MDG = 6.9). Differential times are based on the sum of individual swimmer’s season’s best times, and along with world rankings, reflect team strength. In contrast, the order of swimmers reflects strategy. This type of analysis could assist coaches and support staff in selecting swimmers and team orders for relay events to enhance the likelihood of success.

Introduction

A key challenge of relay events in sporting competitions is team selection and the order of athletes, as they can impact race outcomes [13]. In the 4 x 100-m track running relay, factors such as the preferred hand for giving and receiving a baton, athlete skills in running bends, and the lane drawn, should all be considered when selecting team order [1]. In swimming, key considerations for relay team performance include flying start technique and exchange block times [46]. As a result, much of the relay literature in swimming focuses on technical components related to the dive start, including differences between the flat start performed in individual events and the flying start performed by swimmers assigned to the second to fourth relay positions [7, 8]. However, in both track running and swimming, it appears that selecting the fastest athlete for the first or lead-off relay leg is popular and successful [13], although further research is required to determine how this impacts team performance.

Recently, pacing differences between individual and relay events in swimming have been examined indicating that some swimmers alter their pacing strategy during relay events [9]. This difference in pacing strategy between individual and relay swims may be attributed to the relay leg assignment as well as the added pressure to perform well for the team [9]. Extensive research on team dynamics and behavioural aspects of competitive relay swimming are described in the literature [1013]. Compared to individual events, swimming performance is typically faster in relays which may be attributed to elevated motivation and effort [13, 14]. However, there is conflicting evidence of differences in starts, turns and swimming speed between individual and relay events [8]. In addition to the motivational effects of relay swimming, the order of swimmers in the relay team can also potentially impact the effort exerted by each swimmer. Swimmers positioned in later relay leg positions were found to be more likely in certain contexts to swim faster than those in earlier positions relative to individual event times [11, 12]. Contextually, the positive influence of relay leg positioning has been ascribed to an increase in the perceived importance of individual contributions to the team outcome [12, 14].

During FINA-sanctioned events including the biennial World Championships, relay teams must nominate their four selected swimmers, and the team order, one hour prior to the start of the heats or finals session in which the relay occurs [15]. However, swimmers are typically selected for the national squad a few weeks to several months prior to the competition based on their performance in the corresponding individual event. In addition, due to the complex interactions between physiological, psychological and team-based dynamics [10], predictions of individual performance in relays and overall team outcomes are challenging. Therefore, there is a need for effective predictive tools that could support coaches in the decision-making process to maximise the performance of the relay team as a whole, as well as each individual swimmer.

With an abundance of performance data now available in many sports as a result of advances in technology, data-driven models are becoming increasingly popular in sports science [1619]. These models have been applied to swimming in an attempt to predict the performances of individual events based on training and competition data, as well as anthropometric, physiological and biomechanical characteristics. Despite extensive analyses of behavioural and other components of relay swimming, limited work to date have brought together the various components to predict race outcomes of relay events. The 4 x 200-m freestyle relay is currently the longest relay in the FINA competition schedule. With each swimmer required to complete 4 x 50 m laps the event requires well-developed speed-endurance, technical skills in starts, turns and finishes, and the element of pacing and sufficient data to model pacing effects [9]. This complexity makes the 4 x 200-m freestyle ideal as a starting point for developing and testing predictive and analytical tools. Therefore, the aim of this study was to predict and better understand contextual factors contributing to relay team performance in light of individual swimmer performance, and develop predictive models to analyse the relationship between a team’s finishing position and these factors for the 4x200-m swimming freestyle relay.

Methods

Data collection

Race data for 4 x 200-m freestyle relay finals from 14 international long course competitions between 2010 and 2018 were analysed retrospectively, comprising Olympic Games (2012, 2016), Pan Pacific Championships (2010, 2014), World Championships (2011, 2013, 2015, 2017), Commonwealth Games (2014, 2018) and European Championships (2010, 2012, 2014, 2016). This data set has been used previously [9] and included the world ranking for each swimmer in the year of that competition, reaction times, 50-m splits and overall times. For each relay swimmer, the individual 200-m freestyle season’s best time for the same season (typically beginning around September and concluding around July-August) was located using FINA world rankings (https://www.fina.org/content/swimming-world-ranking). A total of 716 relay swims, divided approximately evenly across the four relay legs, and corresponding individual event swims were analysed across 348 different swimmers (175 males and 173 females). To allow comparisons between individual and relay events, exchange block times for relay swimmers positioned on the second to fourth leg were adjusted to equal reaction times from the individual event using the methodology developed by [9] and [20].

The team average ranking was calculated as the average world ranking of the four swimmers in the team, where world rankings for the year of the relay competition were used. In our dataset, the swims contributing to the rankings were coincidentally prior to the major competition of that year. The best or highest ranking and the worst or lowest ranking within the team was used as an indicator of team depth. Relay team data was only considered when individual season’s best times for all four swimmers within the team were available, resulting in the analysis of 121 teams of a total of 188. Relay teams were classified into four categories by finishing position: (1) gold, (2) silver, (3) bronze, and (4) a non-medal position (4th to 8th placed teams). There were 20, 17, 16, and 68 data points for each of these finishing positions, respectively. Team characteristics (i.e. explanatory variables) included world ranking of individual swimmers and team average ranking, the order of swimmers in the team, individual season’s best 200-m freestyle time, start lap strategy, and pacing strategy from the relay race (Table 1). To capture the potential differences in preparation, performance level and the number of nations competing at the World Championships and Olympic Games compared to other events such as Commonwealth Games, a variable to capture competition effects was included in the model. Furthermore, to better understand trends across nations while preserving model identifiability [21], we included a categorical variable to specify two major competitor nations, namely the USA and Australia, whereas other nations were grouped into a third category of ‘Other’.

Table 1. Description of the explanatory variables.

Variable Description
Gender Female or male
Nation Nationality of the team/swimmers. There were a total of 33 nations categorised into three groups: USA, Australia and Other
Season’s Best time Fastest individual 200-m freestyle time within the same season
Relay time Time taken for an individual swimmer to complete their 200-m freestyle relay leg
Team performance time Sum of the relay times for the four swimmers on each relay team. Total time taken to complete the race by the team
World Ranking Ranking according to FINA based on the individual season’s best time for the 200-m freestyle
Relay Leg The position of the swimmer within the relay team e.g. 1, 2, 3 or 4
Order of the team Chronology of swimmers according to relative world ranking within the team
Team average ranking The average of the world ranking of the four swimmers in the team
Best Ranking The highest world ranking in the team (fastest swimmer)
Worst Ranking The lowest world ranking in the team (slowest swimmer)
Finishing Position Finishing place of the team in a relay final (1–8). Classified into four categories:
    (1) Gold
    (2) Silver
    (3) Bronze
    (4) Non-medal position (4th-8th position)
Start lap strategy Percentage of race time spent in lap 1 categorised as average, fast or slow
Pacing strategy Slope of laps 2–4 and laps 3–4 categorised as even, negative or positive

The order of swimmers in the relay was encoded according to the relative world ranking of each swimmer within a team. For example, a relay order of “2-1-3-4” indicates that the second fastest swimmer swam (i.e. second highest world ranking) the lead-off or first leg, the fastest swimmer swam the second leg, and so on. However, the large number of swimmer permutations, including order as a categorical variable, directly leads to problems with model identifiability. As a result, the order categorisation only considered the unique positions for the first and second fastest swimmers. In this scheme, a swimming order of “2-1-3-4” and “2-1-4-3” are both represented as “21xx”. In view of the many permutations of ordering that have not been widely used, we focused on the five most frequently employed permutations (“1x2x”, “1xx2”, “21xx”, “2xx1”, “12xx”) and included a category ‘other’ for the remaining permutations. Pacing was determined from the swimmer’s best individual performance that season by converting split times into a percentage of overall race time spent in each 50-m lap, and characterised by the start lap strategy (lap 1) and pacing strategy (laps 2–4) [9].

Statistical analysis

Two main types of methods were used: (i) linear regression, to study individual swimmer’s relay performances, and (ii) random forests, to predict race outcomes given team configurations. This section describes the two methods, model fitting and model validation.

Linear regression

Multiple linear regression [21] was used to estimate the relationships between an individual swimmer’s performance in a relay and the explanatory variables (Eq 2). An explanatory variable was deemed to have a significant effect if p≤0.05.

Random forests

Random forests were used to predict team finishing positions based on explanatory variables as they are ideally suited for a mixture of numeric and categorical variables with potentially highly non-linear relationships. Random forests are an ensemble modelling extension of simple decision trees, which recursively partition the space of explanatory variables to minimise some dispersion criteria (i.e. measure of variability) in the resultant partitions [22]. Random forests have also demonstrated high predictive sensitivity and specificity for complex problems in many domains [22]. This method helps to overcome the overfitting problem encountered in decision trees by building many shallow trees using data subsets sampled through bagging. We built a random forest model, referred to as RF1, to predict gold, silver, bronze or non-medal finishing positions. To assist with better prediction of medal colour, we also trialled a model that only predicts medal colour, RF2. We developed a predictor variable based on the observation that team performance in a relay is the sum of the individual performance times of the four swimmers within the team. Therefore, based on the sum of the season’s best individual times we constructed a theoretical performance measure of each team relative to the theoretical performance of the fastest team based on differential time (Diff.Time) defined as follows:

Diff.Timej=i=14sijminji=14sij (1)

where for team j and individual i, sij is the season’s best time for that swimmer.

Model fitting

All statistics were calculated using R software [23] and implemented with the base and randomForest packages to fit linear regression and random forest models, respectively. The parameters of the random forest were tuned by making use of a cross-validation based technique. Five-fold cross validation was run 100 times in conjunction with a grid search for selecting model parameters including the number of variables to sample at each split in the tree, and the number of variables sampled as candidates at each split in the tree. Given the randomly sampled nature of random forests, repeated evaluations provide a more robust selection for the tuning parameters [24].

Model validation

For the linear regression model, goodness of fit is sufficient to give confidence that the model is reasonable, and the model can be interrogated to ascertain the impact of different explanatory variables on individual performance [21]. In comparison, the random forest was employed to predict race finishing position, so we validated model performance using leave-one-out cross-validation. In this scheme, we iterated over each data point, trained with all other data points and tested with the current data point.

We used a 4x4 confusion matrix to show the number of times a recorded gold, silver, bronze or non-medal result (corresponding to rows) was classified by the model as a gold, silver, bronze or non-medal outcome (columns corresponded to predictions). We computed model sensitivity, also referred to as producer’s accuracy when there are more than two categories, which is the rate at which the model correctly classifies a result as a member of a certain category [24]. Note that there is no direct analogue for specificity when there are more than two categories. The randomForest package uses the Gini index as one approach to capture both sensitivity and specificity [22]. This index is useful for assessing both the validity of the model, and for quantifying the relative influence of explanatory variables based on the decrease in the Gini index when a variable is removed from the model.

Finally, the utility of the random forest was demonstrated by applying it to a case study analysis of the 2019 World Championships.

Results

Variables affecting swimmer performance in the relay

The multiple linear regression model had a R2 goodness of fit value of 0.97 which explains 97% of the variation in the data. The model formulation was:

Relay.Timei=Nation+Gender+Seasons.Besti+Relay.Legi+Team.Ranki+Pacing.Starti+Pacingi+ϵ (2)

where swimmer i’s relay swim time is explained by nation, gender, season’s best individual 200-m freestyle event time, relay leg assignment, relative world ranking within the team (1 for highest, 4 for lowest), start lap strategy and pacing strategy corresponding to the individual’s season’s best performance, and ϵ is a normally distributed error term. The predictor variables explained 97% of the variation in relay time, and were selected to investigate team selection (e.g. swimming rankings), relay leg and ordering, and pacing effects as discussed in Section 1. Noting that all covariates are categorical with the exception of one, assumptions of normality and independence were checked with a residual plot (S1 Fig).

The resulting model coefficients are presented in terms of their mean effect and 95% confidence interval (CI), on a scale of seconds (Fig 1). The model coefficients show that males are on average 2.34 s faster than females per leg in the 200m relay event. Thus, having accounted for other variables in the model such as gender and nation, the fourth ranked swimmer (slowest) appears to perform faster than expected by -0.62 s (95% confidence interval CI = [-0.94,-0.30], p = 0.00) compared to the first ranked swimmer. In addition, compared to the first leg, swimmers in the third leg tend to swim slower than expected by 0.24 s (CI = [-0.05,0.54], p = 0.10). While start lap strategy in the individual 200-m freestyle event did not appear to impact relay time, swimmers who displayed a parabolic pacing strategy in their individual event can potentially swim faster than those with an even (-0.19 s, CI = [-0.43,0.05], p = 0.11) and positive (0.12 s, CI = [-0.12,0.35], p = 0.31) pacing strategy; these effects were again not significant, but potentially of interest for future investigation. Swimmers from the United States typically swam -0.48 s (CI = [-0.89,-0.08], p = 0.02) faster than expected compared to swimmers from all other nations. However, there do not appear to be any significant effects attributed to the relay leg.

Fig 1. Coefficients for explanatory variables and individual swim performance in the relay.

Fig 1

Note that the baseline categorical variable for gender is female, for relay leg is the first leg, team ranking is rank one, pacing start lap strategy is average, nation is Australia, and pacing strategy is even. If the confidence interval does not overlap with 0, then the effect is considered significant.

Predicting team finishing position

Through parameter tuning, a random forest model that sampled two variables at each split was selected with the following formulation:

Estimated.Finishing.Position=f(Gender,Nation,Order,Team.Avg.Rank,Best.Rank,Worst.Rank,Diff.Time,Elite.Competition) (3)

where a team’s finishing position is predicted using gender, nation, order of swimmers, team average ranking, best and worst world ranking, Diff.Time (differential time) and race type (elite competition).

Cross-validation results for RF1 show that gold medal predictions achieved 100% sensitivity (producer’s accuracy) and non-medalling results were predicted with 93% sensitivity (Table 2). The most influential variables in the model and their impact on cross-validated predictions, measured using Mean Decrease Gini (MDG), are: differential time (MDG = 31.3), team average ranking (18.9), best ranking (12.6), order of swimmers (6.9), nation (2.6) and gender (1.8) [22]. However, it was significantly harder to predict silver and bronze race outcomes as the model achieved sensitivities of 35% and 13%, respectively. In contrast, cross-validation of RF2 predictions of medal colour achieved sensitivities of 100%, 41% and 63% for gold, silver and bronze finishing positions, respectively (Table 3).

Table 2. RF1 misclassification matrix of medalling teams: Gold, Silver, Bronze; and non-medalling teams.

Model Predicted Output
Gold Silver Bronze Non-Medal Total Sensitivity
True Finishing Position Gold 20 0 0 0 20 1.00
Silver 3 6 2 6 17 0.35
Bronze 1 2 2 11 16 0.13
Non-Medal 0 1 4 63 68 0.93
Total 24 9 8 80
User’s Accuracy 0.83 0.67 0.25 0.79

Table 3. RF2 misclassification matrix of medalling teams into three groups of Gold, Silver and Bronze.

Model Predicted Output
Gold Silver Bronze Total Sensitivity
True Finishing Position Gold 20 0 0 20 1.00
Silver 3 7 7 17 0.41
Bronze 1 5 10 16 0.63
Total 24 12 17
User’s Accuracy 0.83 0.58 0.59

The practical validity of the prediction model was tested by predicting the finishing positions of the top four male and female teams at the 2019 FINA World Championships. The probability of each team achieving a gold, silver or bronze medal was predicted in model 1, and the probability of finishing in a medal or non-medal position in model 2 (Table 4). We also modelled the probability of various team orders showcasing the battle for the bronze medal position in the Women’s 4 x 200-m freestyle relay (Table 5). Canada was successful in winning bronze despite having a lower probability of success across the various team orders.

Table 4. Probability output of the random forest model for the top four female and male 4 x 200-m freestyle relay teams in the 2019 FINA World Championships.

Ranking RF1 (Pr)
Finish Position Nation Nation Category Time (min:sec) Gender Team Avg Best Worst Order Gold Silver Bronze Non- Medalling
Gold Australia Australia 7:41.50 F 9 2 17 1xx2 0.84 (+) 0.10 0.06 0.00
Silver USA USA 7:41.87 F 18 7 43 21xx 0.08 0.44 (+) 0.35 0.13
Bronze Canada Other 7:44.35 F 24 12 41 Other 0.02 0.17 0.55 (-) 0.27
4th China Other 7:46.22 F 24 6 36 1x2x 0.02 0.14 0.66 (-) 0.19
Gold Australia Australia 7:00.85 M 17 2 36 12xx 0.71 (+) 0.13 0.17 0.00
Silver Russia Other 7:01.81 M 14 6 23 2xx1 0.33 0.40 (-) 0.25 (-) 0.02
Bronze USA USA 7:01.98 M 19 11 27 1xx2 0.23 0.51 (-) 0.20 (-) 0.07
4th Italy Other 7:02.01 M 39 10 70 12xx 0.01 0.02 0.15 0.81 (+)

F = female, M = male, Pr = probability.

Note: Correct predictions are annotated with (+) and incorrect predictions with (-).

Table 5. Effect of team order on the probability (Pr) of Gold, Silver, Bronze or no medal for Canada and China who finished 3rd and 4th, respectively, in the women’s 4 x 200-m freestyle relay at the 2019 FINA World Championships.

Team Pr (Gold) Pr (Silver) Pr (Bronze) Pr (NoMedal)
Bronze (3rd Place)—Canada
1xx2 0.004 0.247 0.385 0.364
21xx 0.002 0.164 0.447 0.387
other 0.015 0.173 0.547 0.265
2xx1 0.009 0.267 0.334 0.390
12xx 0.002 0.216 0.375 0.407
1x2x 0.006 0.188 0.523 0.283
4th Place—China
1xx2 0.001 0.219 0.542 0.229
21xx 0.005 0.098 0.712 0.185
other 0.021 0.135 0.667 0.177
2xx1 0.012 0.250 0.453 0.285
12xx 0.006 0.147 0.621 0.226
1x2x 0.017 0.137 0.659 0.187

Note: Pr values in bold text indicate the team order with the highest probability of winning the bronze medal and rows shaded in grey indicate the team order used in the race

Discussion

The statistical approaches developed in this study were useful in identifying the variables affecting relay swimming performance given individual swimmer performance, and predicting relay team finishing positions for the 4x200-m freestyle relay. Results indicate that swimmers from the USA, and those swimmers who were the slowest within their teams according to ranking, typically performed better in relays than in individual events. The random forest model RF1 was highly effective at correctly predicting gold medal winning teams (100% sensitivity), and whether a team will medal or not (non-medalling sensitivity of 93%). However, the models were less accurate in distinguishing between silver (35% using RF1, 41% using RF2) and bronze (13% using RF1, 63% using RF2). This outcome might be due to small differential times between these positions for some swimming competitions. In contrast, the differential times between the bronze medal position and non-medal positions for all competitions tended to be much larger. The RF2 model could be used by decision makers to evaluate silver and bronze medal scenarios assuming that a team will win a medal. These models enable coaches and support staff to simulate different relay race scenarios to determine the optimal relay team configuration by using swimmer characteristics, anticipated opponent swimmers and team order.

Differentiating psychological from technical effects

Among the many variables that may impact relay swimming performance, the psychology of team competition is important [13, 14]. Note that we have adjusted for the effect of the flying start in relay legs two through four by setting exchange block times equal to individual reaction time [8]. Any residual differences between legs were captured via the relay leg term; thus, we were able to discern potential psychological effects from technical effects.

Our results indicate that the largest effect of the variables modelled in this study was due to the worst-ranked or slowest swimmer in a team. These swimmers typically swam 0.62 s faster in the relay than in the corresponding individual event. Peer effects can have a positive impact on individual performance within a team, and these psychosocial effects may help explain the improved performance of some swimmers in relays relative to their individual times in the present study [10, 25]. However, our findings differ from those of Hüffmeier and Hertel [12] who reported on the effects of relay leg assignment (i.e. going first, second, third or last). In contrast, we found the relative ranking of the swimmer within the team (i.e. worst-ranked swimmer) to be a larger effect, and relay leg assignment to be generally not significant. Motivating group effects are typically greater when an individual perceives their contribution as important to the overall team outcome [12, 14]. Therefore, it is possible that the slower swimmers within the team felt more pressure and motivation to step up and put their team in a good position. In contrast, relay teams comprised of higher ranking athletes are more likely to underperform relative to their individual performance [25]. Such psychological impacts could be an area for further study to help motivate and develop swimmers in relay and non-relay contexts.

Nationality impacts

Swimmer nationality also impacted performance as individual swimmers from the USA tended to swim 0.48 s faster during the relays than their predicted individual swim times. This outcome could be attributed to the competition structure of the National Collegiate Athletic Association (NCAA) which allows for the frequent practise of relay swimming in competitive races. In contrast, Australian swimmers (and those of many other nations) may only swim in a limited number of relay events throughout the season prior to the major international competition, and rarely get the opportunity to practice with potential teammates. Team cohesiveness may play a role as social loafing is less likely to occur in highly cohesive teams [26]. However, further research is required to determine the underlying nature of differences between nations.

Relative influence of variables

The ability to accurately predict team finishing position based on a set of explanatory variables would support coaches in making an evidence-based decision when selecting relay team swimmers and leg assignments, potentially weeks to months ahead of competition. Random forest models were used to make these predictions and the most influential variables were identified based on cross-validation, and the mean decrease in sensitivity and specificity as measured by MDG [22]. As might be expected, the strength of the team, as captured by rankings and individual season’s best times, was the leading contributor to finishing position (Results). However, team strategy, in terms of the order of swimmers was the next most influential factor. The dataset used for modelling comprised primarily of high calibre, international events including Olympics and World Championships. Typically, these are the pinnacle events that athletes train and prepare for. We identified that medal outcomes were highly influenced by differential time (MDG of 31.3), which is based on the sum of individual swimmer’s season’s best times. This outcome suggests that individuals are performing at or near their best at these international relay competitions and, equivalently, that season’s best times are useful in predicting individual swimmers’ performance at pinnacle relay events.

Illustrative case study

To illustrate how the model could be used to support decision making, we demonstrate with a case study of predicting the finishing positions for the top four teams at the 2019 FINA World Championships. This data, which included world rankings and season best times coming into the competition, were not included in the original dataset. Although the gold medal predictions were correct, the model incorrectly predicted the bronze and 4th positions for females, and silver and bronze positions for males. Team average ranking for the two female teams was identical with a similar differential between the highest and lowest ranked swimmer. However, the fourth placed team had the best ranking swimmer overall, which may indicate that this team underperformed relative to their expected team performance time. This explanation may also serve as a reason for the incorrect model prediction here for both medal colour and medal or non-medal. Similarly, the USA Men’s team was predicted to finish in the silver medal position, but Russia outperformed them by just 0.17 s. However, the model was able to correctly predict a medal and non-medal position.

These models can also be used in a predictive decision support scenario where the impact of different swimmer orders on finishing position can be evaluated in a risk-informed, probabilistic manner. For the Canadian women’s teams in the 2019 FINA World Championships, the order used in the race provided the highest chance of bronze and lowest chance of a non-medal finish. A 2xx1 order could have increased the chance for a silver medal by 9.4%, but also increase the chance for missing out on a medal by 12.5%. According to the model, China would have increased their chance of a bronze medal and slightly decreased their chance of a non-medal finish if they applied another swimmer order or 21xx. However, these scenarios only serve as illustrations, and should be seen as observations given limitations of the data, the numerous possible swimmer order and ranking combinations, and the many other factors influencing medal finishes that were not included in the model.

Limitations and future work

While these statistical approaches were successfully applied to enhance our understanding of the variables impacting both individual and team performance in relay swimming events, there are some limitations. First, only teams with available data for all four swimmers were analysed, resulting in partial data for some races which is a potential source of misclassification errors. However, increasingly more data are becoming available as demonstrated by the availability of each swimmer’s season’s best time and world ranking going into the 2019 World Championships. Data about individual swimmer’s physical status or performance characteristics (such as individual and relay block times [20]) could be used to extend this work and improve the predictivity of the model. Currently, less than 25% of the swimmers in this dataset had 3 or more relay races recorded and the majority only had 1; this limited our ability to model individual characteristics.

We also assumed that all relay teams had an equal chance of finishing in each position, whereas in reality some teams are more likely to be chasing medal positions than others. Incorporating such prior knowledge into the model, such as through a Bayesian formulation, could strengthen the prediction accuracy. Moreover, the individual analyses could be combined through a hierarchical model to enable ‘borrowing of strength’ between events to improve estimates and extend insights. Additionally, intermediary positions and whether the team is still in medal contention or not at the end of each leg could also affect individual swimmer performance. Multivariate extensions of the proposed models could be developed, with sufficient data, for this more complex task. Finally, hybrids of machine learning and statistical methods could be used in future research to build similar predictive models for other swimming relay events, such as the mixed relays [3].

Conclusion

The prediction models of this study indicate that the slowest swimmers within a team, and swimmers from the USA, tend to swim faster than expected in relay events. These swimmers can step-up and perform above expectations in relay events relative to their season’s best individual event performance. Gold medal and non-medal finishing positions can be accurately predicted by using random forest models, however these models are less accurate in differentiating between silver and bronze medal positions.

The outputs of machine learning models developed in this study can be used by coaches and support staff to assist with decision-making processes around team selection, and for determining the best combination of swimmers to maximise team performance. Different team configurations can be inserted into the model to examine how the probability of finishing in each position changes with different swimmers and team orders. The models can be integrated into decision-making algorithms and expert systems which can also be updated with new data. These prediction models could also be applied to other sports such as track running and cycling where athlete selection and order likely influence team performance.

Supporting information

S1 Fig. Residual and normality plot for the linear regression model reported in results.

Note that the residuals are mostly normal and randomly distributed. Although there is a little deviation from normality in the right tail of the distribution (ignoring the outlier), there are relatively few data points here and these slow swim times are not as relevant in the context of predicting medalling performances.

(DOCX)

S1 File

(CSV)

Data Availability

All relevant data are included within the manuscript and its supporting information files.

Funding Statement

This research was conducted by the Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (project number CE140100049) and funded in part by the Australian Government. It was also supported by the Queensland Academy of Sport's Sport Performance Innovation and Knowledge Excellence unit, and by Swimming Australia Limited. Funding was awarded for the project, not to authors Grant numbers - NA URLs: https://acems.org.au/home https://www.qld.gov.au/recreation/sports/academy/services/spike https://www.swimming.org.au/ NO The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Ward-Smith AJ, Radford PF. A mathematical analysis of the 4x100 m relay. J Sports Sci. 2002;20(5):369–81. doi: 10.1080/026404102317366627 . [DOI] [PubMed] [Google Scholar]
  • 2.McGibbon KE, Pyne DB, Thompson KG, Osborne MA, Shephard ME. Pacing and team strategy in relay events. XIII th International Symposium on Biomechanics and Medicine in Swimming; Tsukuba, Japan: 2018. [Google Scholar]
  • 3.Veiga S, del Cerro JS, Rodriguez L, Trinidad A, González-Ravé JM. How Mixed Relay Teams in Swimming Should Be Organized for International Championship Success. 2021;12(421). doi: 10.3389/fpsyg.2021.573285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Saavedra JM, Garcia-Hermoso A, Escalante Y, Dominguez AM, Arellano R, Navarro F. Relationship between exchange block time in swim starts and final performance in relay races in international championships. J Sports Sci. 2014;32(19):1783–9. doi: 10.1080/02640414.2014.920099 . [DOI] [PubMed] [Google Scholar]
  • 5.Kibele A, Fischer S. Relay starts in swimming- A review of related issues. The science of swimming and aquatic activies: Nova Publisher; 2018. [Google Scholar]
  • 6.McLean SP, Holthe MJ, Vint PF, Beckett KD, Hinrichs RN. Addition of an approach to a swimming relay start. J Appl Biomech. 2000;16(4):342–55. . [Google Scholar]
  • 7.Qiu X, Veiga S, Lorenzo A, Kibele A, Navarro E. Differences in the key parameters of the individual versus relay swimming starts. Sports Biomechanics. 2021:1–13. doi: 10.1080/14763141.2021.1878262 [DOI] [PubMed] [Google Scholar]
  • 8.Skorski S, Etxebarria N, Thompson KG. Breaking the myth that relay swimming is faster than individual swimming. International journal of sports physiology and performance. 2016;11(3):410–3. doi: 10.1123/ijspp.2014-0577 [DOI] [PubMed] [Google Scholar]
  • 9.McGibbon KE, Shephard ME, Osborne MA, Thompson K, Pyne DB. Pacing and performance in swimming: Differences between individual and relay events. Int J Sport Physiol Perform. 2020; Ahead of print:1–8. doi: 10.1123/ijspp.2019-0381 [DOI] [PubMed] [Google Scholar]
  • 10.Jane W-J. Peer Effects and Individual Performance: Evidence From Swimming Competitions. J Sports Econ. 2015;16(5):531–9. doi: 10.1177/1527002514521429 [DOI] [Google Scholar]
  • 11.Neugart M, Richiardi MG. Sequential teamwork in competitive environments: Theory and evidence from swimming data. European Economic Review. 2013;63:186–205. doi: 10.1016/j.euroecorev.2013.07.006 [DOI] [Google Scholar]
  • 12.Hüffmeier J, Hertel G. When the whole is more than the sum of its parts: Group motivation gains in the wild. J Exp Soc Psychol. 2011;47(2):455–9. doi: 10.1016/j.jesp.2010.12.004 [DOI] [Google Scholar]
  • 13.Williams KD, Nida SA, Baca LD, Latane B. Social loafing and swimming—Effects of identifiability on individual and relay performance of intercollegiate swimmers. Basic Appl Soc Psych. 1989;10(1):73–81. doi: 10.1207/s15324834basp1001_7 . [DOI] [Google Scholar]
  • 14.Hüffmeier J, Krumm S, Kanthak J, Hertel G. “Don’t let the group down”: Facets of instrumentality moderate the motivating effects of groups in a field experiment. Eur J Soc Psychol. 2012;42(5):533–8. doi: 10.1002/ejsp.1875 [DOI] [Google Scholar]
  • 15.FINA. Rules and Regulations. https://www.fina.org/sites/default/files/general/css2019_rr_v1_20190325.pdf2019.
  • 16.Silva AJ, Costa AM, Oliveira PM, Reis VM, Saavedra J, Perl J, et al. The use of neural network technology to model swimming performance. J Sports Sci Med. 2007;6(1):117–25. . [PMC free article] [PubMed] [Google Scholar]
  • 17.Hoffmann M, Moeller T, Seidel I, Stein T. Predicting elite triathlon performance: A comparison of multiple regressions and artificial neural networks. International Journal of Computer Science in Sport. 2017;16(2):101. doi: 10.1515/ijcss-2017-0009 [DOI] [Google Scholar]
  • 18.Edelmann-Nusser J, Hohmann A, Henneberg B. Modeling and prediction of competitive performance in swimming upon neural networks. Eur J Sport Sci. 2002;2. doi: 10.1080/17461390200072201 [DOI] [Google Scholar]
  • 19.Mitchell LJG, Rattray B, Fowlie J, Saunders PU, Pyne DB. The impact of different training load quantification and modelling methodologies on performance predictions in elite swimmers. Eur J Sport Sci. 2020:1–10. doi: 10.1080/17461391.2020.1719211 [DOI] [PubMed] [Google Scholar]
  • 20.Saavedra JM, García-Hermoso A, Escalante Y, Dominguez AM, Arellano R, Navarro F. Relationship between exchange block time in swim starts and final performance in relay races in international championships. Journal of Sports Science. 2014;32(19):1783–9. doi: 10.1080/02640414.2014.920099 [DOI] [PubMed] [Google Scholar]
  • 21.Casella G, Berger RL. Statistical inference: Duxbury: Pacific Grove, CA; 2002. [Google Scholar]
  • 22.Breiman L. Manual On Setting Up, Using, And Understanding Random Forests V3.1. https://wwwstatberkeleyedu/~breiman/Using_random_forests_V31pdf. 2002. [Google Scholar]
  • 23.RCoreTeam. R: A language and environment for statistical computing R Foundation for Statistical Computing. Vienna, Austria: 2013. [Google Scholar]
  • 24.Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second ed. New York: Springer; 2009. [Google Scholar]
  • 25.Depken CA, Haglund LE. Peer Effects in Team Sports: Empirical Evidence From NCAA Relay Teams. J Sports Econ. 2011;12(1):3–19. doi: 10.1177/1527002509361192 [DOI] [Google Scholar]
  • 26.Hoigaard R, Tofteland I, Ommundsen Y. The effect of team cohesion on social loafing in relay teams. Int J Appl Sports Sci. 2006;18(1):59–73. [Google Scholar]

Decision Letter 0

Dalton Müller Pessôa Filho

7 Apr 2021

PONE-D-21-03614

Predicting performance in 4 x 200-m freestyle swimming relay events

PLOS ONE

Dear Dr. Wu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 22 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Dalton Müller Pessôa Filho, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: General comments

Authors are to be accomplished by selecting a very interesting topic in competitive swimming and by applying an in-depth data analysis to understand variations between individual and relay swimming performances. Manuscript is well contextualized, well written from a formal point of view and provides some interesting insights on relay performances.

Still, there are some conceptualization aspects that authors should justify and some arguments that should be further explained. Also, some of the main study limitations should be acknowledged. Please see specific comments for details.

Lastly, authors should reconsider if seven tables of results are needed to present the main results to achieve the proposed aims.

Specific comments

Introduction

Line 60-61: Is there previous research on differences between the flat start performed in individual events and the flying start performed by swimmers assigned to the second to fourth relay positions? In this case this research should be referenced. A very recent research on the topic has been published days ago (DOI: 10.1080/14763141.2021.1878262), but not many more examples of this are available…from what the present reviewer knows.

Lines 69-70: contrary to what is stated by authors, other research (http://dx.doi.org/10.1123/ijspp.2014-0577) indicates that “Highly trained swimmers do not swim (or turn) faster in relay events than in their individual races. Relay exchange times account for the difference observed in individual vs relay performance.” Maybe this should be also acknowledged on introduction.

Lines 64-76: In the present paragraph, the present reviewer considers authors are pretty “optimistic” about findings of cited studies. For example, are there enough evidence to state that relay swimmers “exert more effort than those in earlier positions”?? Authors are suggested to revise paragraph and to stick to evidence by previous studies. If additional interpretations should be inserted, authors are encouraged to used terms like “probable”, “may be”, “it is supposed”, ...

Line 99. Why 4x200m freestyle event was selected for the research purposes? This should be justified in introduction.

Methods

Line 108: what exact dates were selected for “season best time”? Natural year? September 1st to August 31th? Please specify. In the opinion of the present reviewer, one of the main weakness of the present research is that individual times could be obtained in a different season period than the major competition. Previous research has highlighted 1) the great proportion of swimmers who do not swim best times in major competitions and 2) % changes between different season periods (https://doi.org/10.1123/ijspp.2018-0782). Therefore, differences between individual and relay leg performances could be not caused by the specific relay conditions but to the different physical status of swimmers. This could be minimized by comparing the individual and relay performances within the same major competition. Race conditions would not be equal, but authors would ensure similar physical status of swimmers.

Line 112-114: The present reviewer considers it would be interesting to include exchange block times. Do these times change according to the race status or ranking for each leg? Are differences between individual and relay performances based on differences on the remaining of the race (beyond block times) or based on both block times plus the swimming laps? Considering the present research aims to “to predict and understand variations in swimmer performance between individual and relay events, and the contextual factors affecting relay team finishing positions”… does it make sense to exclude one of the main variables affecting relay races result? Authors should at least acknowledge this as a study limitation.

Line 116: what date was considered for world ranking of the swimmers in the relay? Day of major competition beginning? World ranking of the complete year? Please specify.

Line 119: 121 teams of a total of … (179 according to what indicated in line 110)?

Results

Line 213: “in the third leg the swimmers tend to swim slower than expected by 0.24 s (CI=[-0.05,0.54], p=0.10) than swimmers on the first leg”. Please rephrase.

Lines 214-216: are authors referring here to the individual or relay performance? Should “individual event” be substituted by “individual relay leg”?

Table 2: Is this table really needed in the present results section? what is the utility of the present table within the manuscript?

Table 4: Could be table 4 expressed in the text instead a table? Considering overall seven tables could distract readers from the main findings of the present research….

Discussion

The present reviewer would expect “race partial positioning” as an important variable to be included in the model to predict variations between individual and relay performances. Indeed, team tactics are usually developed according to expected partial positioning after the first, second, third leg. Are differences between individual and relay performances related to the partial positioning of relay swimmers at the beginning of their relay leg?

References

A recent reference on relay tactics (doi: 10.3389/fpsyg.2021.573285) seems to be adequate to support and discuss some of the ideas explained in the present manuscript.

Reviewer #2: General comments

The work is of interest to PLOSONE readers and a novel approach. However, some parts of the manuscript are confusing and hard to read. I recommend that you do the proposed changes and re-review it.

Specific Comments

1. Abstract. It needs to be rewritten in its entirety. Participants/sample, stasticical analysis and results are mixed. They are hard to read. Please re-write it with this in mind.

1.a) I recommend impersonal wording throughout the Abstract. Instead of "Our aim...", "The aim was..." etc. Please re-write it with this in mind.

1.b) Lines 33-35. The objective should coincide with the objective at the end of the Introduction and the beginning of the Discussion. The objective must be in the past tense as the study has already been carried out. The objective should include the term “4x200 m swimming freestyle relay events”.

1.c) Lines 35-36. “We applied linear regression and machine learning to 4 x 200-m swimming freestyle relay events”. This should be in the sentences about the statistical analysis (after participants/sample).

1.d) Line 40. “…American swimmers...” Is nationality of swimmers a studied variable? It is confusing. Information about table 1 could be included.

2. Line 82-88. It is too speculative. Please, re-write.

3. Line 98-99. Please, delete it. The objective should be the last sentence of the Introduction Section.

4. Line 135 and followings. “…a relay order of “2-1-3-4” indicates that the second fastest swimmer swam the lead-off or first leg…”. This second fastest swimmer is the second fastest according “the start time” (before the relay was swam) or “the final time” (after the relay was swam). I it can be inferred, but it needs clarification.

5. Line 155-157. Was the stepwise selection procedure used? Please, clarify

6. Please, first explain what is a “random forest” (lines 163-168) and after why was it used (lines 158-162).

7. Line 258-264. Why was the 2019 FINA World Championships used to test the model? Why not the 2012 or 2016 Olympic Games? Why was only the “battle for the 3rd place” analyzed in female? Would the results be different if other Championship was analyzed? These questions are really relevant. This information should be clarified and included in the Statistical Analysis Section.

8. The paragraphs of the Discussion section are a bit unconnected and repetitive. Please, try to make it more “readable”.

9. Line 321-324. This is a repetition of Results. Please, re-write.

10. The team position at the moment that swimmer swims could influence (very probably) in his/her time. Please, include this as a limitation.

Minor comments

11. Too much “Given…” Line 81, 84, 89… Please, re-write.

12. The Statistical Analysis Section is a bit hard to read. Please, consider to re-write it and make it more “readable”.

13. Abbreviations are used to avoid repeating words… Mean Decrease in Gini (MDG) in line 243, 252, 321…

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Santiago Veiga

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jul 15;16(7):e0254538. doi: 10.1371/journal.pone.0254538.r002

Author response to Decision Letter 0


17 May 2021

Dear Prof. Filho

Thank you to you and the reviewers for your valuable feedback. We have worked on the feedback with responses highlighted in yellow and excerpts from the updated paper highlighted in gray. Please note that the line numbers in the response refer to Manuscript.docx. Also, we have uploaded the data to supporting information.

Please see the attached response to reviewers document for the formatted response.

Thanks

Paul

PONE-D-21-03614

Predicting performance in 4 x 200-m freestyle swimming relay events

PLOS ONE

Dear Dr. Wu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 22 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Dalton Müller Pessôa Filho, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

We have revised the format according to those guidelines.

2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

We have uploaded the dataset used in our analysis as supporting information and updated the data availability statement.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

We have added the caption to the end of the manuscript as requested

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

________________________________________

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

________________________________________

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: General comments

Authors are to be accomplished by selecting a very interesting topic in competitive swimming and by applying an in-depth data analysis to understand variations between individual and relay swimming performances. Manuscript is well contextualized, well written from a formal point of view and provides some interesting insights on relay performances.

Still, there are some conceptualization aspects that authors should justify and some arguments that should be further explained. Also, some of the main study limitations should be acknowledged. Please see specific comments for details.

Lastly, authors should reconsider if seven tables of results are needed to present the main results to achieve the proposed aims.

We have distilled the results and discussion now into five tables

Specific comments

Introduction

Line 60-61: Is there previous research on differences between the flat start performed in individual events and the flying start performed by swimmers assigned to the second to fourth relay positions? In this case this research should be referenced. A very recent research on the topic has been published days ago (DOI: 10.1080/14763141.2021.1878262), but not many more examples of this are available…from what the present reviewer knows.

Thank you for the reference, we have added that and this one: https://doi.org/10.1123/ijspp.2014-0577 to discuss this.

Lines 69-70: contrary to what is stated by authors, other research (http://dx.doi.org/10.1123/ijspp.2014-0577) indicates that “Highly trained swimmers do not swim (or turn) faster in relay events than in their individual races. Relay exchange times account for the difference observed in individual vs relay performance.” Maybe this should be also acknowledged on introduction.

Thank you for the reference, we have added a discussion about it in that paragraph (line76).

However, there is conflicting evidence of differences in starts, turns and swimming speed between individual and relay events [7].

Lines 64-76: In the present paragraph, the present reviewer considers authors are pretty “optimistic” about findings of cited studies. For example, are there enough evidence to state that relay swimmers “exert more effort than those in earlier positions”?? Authors are suggested to revise paragraph and to stick to evidence by previous studies. If additional interpretations should be inserted, authors are encouraged to used terms like “probable”, “may be”, “it is supposed”, ...

Thank you for this comment – we have re-phrased that entire paragraph as suggested (line 69):

Recently, pacing differences between individual and relay events in swimming have been examined indicating that some swimmers alter their pacing strategy during relay events [8]. This difference in pacing strategy between individual and relay swims may be attributed to the relay leg assignment as well as the added pressure to perform well for the team [8]. Extensive research on team dynamics and behavioural aspects of competitive relay swimming are described in the literature [9-12]. Compared to individual events, swimming performance is typically faster in relays which may be attributed to elevated motivation and effort [12, 13].However, there is conflicting evidence of differences in starts, turns and swimming speed between individual and relay events [7]. In addition to the motivational effects of relay swimming, the order of swimmers in the relay team can also potentially impact the effort exerted by each swimmer. Swimmers positioned in later relay leg positions were found to be more likely in certain contexts to swim faster than those in earlier positions relative to individual event times [10, 11]. Contextually, the positive influence of relay leg positioning has been ascribed to an increase in the perceived importance of individual contributions to the team outcome [11, 13].

Line 99. Why 4x200m freestyle event was selected for the research purposes? This should be justified in introduction.

We have clarified this on line 100:

The 4 x 200-m freestyle relay is currently the longest relay in the FINA competition schedule. With each swimmer required to complete 4 x 50 m laps the event requires well-developed speed-endurance, technical skills in starts, turns and finishes, and the element of pacing and sufficient data to model pacing effects [8]. Therefore, the aim of this study was to enhance our understanding of contextual factors contributing to relay team performance in light of individual swimmer performance, and develop predictive models to analyse the relationship between a team’s finishing position and these factors for the 4x200-m swimming freestyle relay.

Methods

Line 108: what exact dates were selected for “season best time”? Natural year? September 1st to August 31th? Please specify. In the opinion of the present reviewer, one of the main weakness of the present research is that individual times could be obtained in a different season period than the major competition. Previous research has highlighted 1) the great proportion of swimmers who do not swim best times in major competitions and 2) % changes between different season periods (https://doi.org/10.1123/ijspp.2018-0782). Therefore, differences between individual and relay leg performances could be not caused by the specific relay conditions but to the different physical status of swimmers. This could be minimized by comparing the individual and relay performances within the same major competition. Race conditions would not be equal, but authors would ensure similar physical status of swimmers.

This timeframe has been clarified to be the same season, typically beginning around September and ending around July/August (line 116).

For each relay swimmer, the individual 200-m freestyle season's best time for the same season (typically beginning around September and concluding around July-August) was located using FINA world rankings (https://www.fina.org/content/swimming-world-ranking).

Although the use of the best time within the competition could improve predictions, that would the limit the ability to use the model to assist in team selection, training and preparation, as those activities occur weeks to months prior to the competition (Introduction line 86). In addition, each country is typically only allowed two competitors per individual event at major competitions such as the Olympics and World Championships, thus the best swim time in the same competition would only be available for half of the relay team.

(line 341) The ability to accurately predict team finishing position based on a set of explanatory variables would support coaches in making an evidence-based decision when selecting relay team swimmers and leg assignments weeks to months ahead of competition.

Season best time as currently construed represents more so the calibre of the swimmer rather than the current physical status; however, understanding physical status is a key area for future work. Both points have been clarified in the discussion.

(line 387) However, increasingly more data are becoming available as demonstrated by the availability of each swimmer’s season’s best time and world ranking going into the 2019 World Championships. Potentially, data about individual swimmer’s physical status or performance characteristics (such as individual and relay block times [19]) could be used to extend this work and improve the predictivity of the model.

Line 112-114: The present reviewer considers it would be interesting to include exchange block times. Do these times change according to the race status or ranking for each leg? Are differences between individual and relay performances based on differences on the remaining of the race (beyond block times) or based on both block times plus the swimming laps? Considering the present research aims to “to predict and understand variations in swimmer performance between individual and relay events, and the contextual factors affecting relay team finishing positions”… does it make sense to exclude one of the main variables affecting relay races result? Authors should at least acknowledge this as a study limitation.

In our study, differences between individual and relay performances are effectively based on swimming time only (i.e. without block times) (Methods 121).

We agree that investigating how relay exchange block times might change with race status, or ranking, might be interesting and we have reflected that in the discussion for future investigation. However, such a model should capture the inherent dependency of block times on individual swimmers, which requires multiple measurements per individual. This is challenging with the current dataset as the median number of relay races per swimmer is 1, and less than 25% of swimmers in this dataset had 3 or more recorded relay races. More data is needed to be able to analyse block times in a statistically rigorous manner. This point has been clarified in the Discussion section (line 389).

Data about individual swimmer’s physical status or performance characteristics (such as individual and relay block times [19]) could be used to extend this work and improve the predictivity of the model. Currently, less than 25% of the swimmers in this dataset had 3 or more relay races recorded and the majority only had 1; this limited our ability to model individual characteristics.

Line 116: what date was considered for world ranking of the swimmers in the relay? Day of major competition beginning? World ranking of the complete year? Please specify.

The world ranking has been clarified as follows (line 125):

The team average ranking was calculated as the average world ranking of the four swimmers in the team, where world rankings for the year of the relay competition were used. In our dataset, the swims contributing to the rankings were coincidentally prior to the major competition of that year.

Line 119: 121 teams of a total of … (179 according to what indicated in line 110)?

121 teams of a total of 188. (line 131)

Results

Line 213: “in the third leg the swimmers tend to swim slower than expected by 0.24 s (CI=[-0.05,0.54], p=0.10) than swimmers on the first leg”. Please rephrase.

(line 234) In addition, compared to the first leg, swimmers in the third leg tend to swim slower than expected by 0.24 s (CI=[-0.05,0.54], p=0.10).

Lines 214-216: are authors referring here to the individual or relay performance? Should “individual event” be substituted by “individual relay leg”?

We are referring to the individual 200m freestyle event (clarified, line 236)

Table 2: Is this table really needed in the present results section? what is the utility of the present table within the manuscript?

We agree that Table 2 is not essential and have removed it.

Table 4: Could be table 4 expressed in the text instead a table? Considering overall seven tables could distract readers from the main findings of the present research….

We have removed Table 4 and placed this as text in the Results (line 259).

Discussion

The present reviewer would expect “race partial positioning” as an important variable to be included in the model to predict variations between individual and relay performances. Indeed, team tactics are usually developed according to expected partial positioning after the first, second, third leg. Are differences between individual and relay performances related to the partial positioning of relay swimmers at the beginning of their relay leg?

We agree that race partial positioning is an important variable; however, it presents an additional layer of modelling complexity, predicting three intermediary and one final position, which typically requires more data for fitting and validation.

This point has been added to the Discussion line 399:

Additionally, intermediary positions and whether the team is still in medal contention or not at the end of each leg could also affect individual swimmer performance. Multivariate extensions of the proposed models could be developed, with sufficient data, for this more complex prediction task.

References

A recent reference on relay tactics (doi: 10.3389/fpsyg.2021.573285) seems to be adequate to support and discuss some of the ideas explained in the present manuscript.

Thank you for this recommendation. We have incorporated this recent work into the manuscript in the following areas:

(line 58) A key challenge of relay events in sporting competitions is team selection and the order of athletes, as they can impact race outcomes [1-3].

(line 66) However, in both track running and swimming, it appears that selecting the fastest athlete for the first or lead-off relay leg is popular and successful [1-3], although further research is required to determine how this impacts team performance.

(line 402) Finally, hybrids of machine learning and statistical methods could be used in future research to build similar predictive models for other swimming relay events, such as the mixed relays [3].

Reviewer #2: General comments

The work is of interest to PLOSONE readers and a novel approach. However, some parts of the manuscript are confusing and hard to read. I recommend that you do the proposed changes and re-review it.

Specific Comments

1. Abstract. It needs to be rewritten in its entirety. Participants/sample, stasticical analysis and results are mixed. They are hard to read. Please re-write it with this in mind.

1.a) I recommend impersonal wording throughout the Abstract. Instead of "Our aim...", "The aim was..." etc. Please re-write it with this in mind.

We took the opportunity to review carefully the content and wording of the abstract with reference to the reviewer’s comment – our abstract has the following structure:

Aim (sentence 1)

Method (sentence 2)

Data (sentence 3, 4)

Results (sentence 5, 6)

Discussion (sentence 7-10)

We have added headings in the abstract to help separate out the sections, and made the abstract impersonal as suggested:

Abstract

Aim

The aim was to predict and understand variations in swimmer performance between individual and relay events, and develop a predictive model for the 4x200-m swimming freestyle relay event to help inform team selection and strategy.

Data and Methods

Race data for 716 relay finals (4 x 200-m freestyle) from 14 international competitions between 2010-2018 were analysed. Individual 200-m freestyle season best time for the same year was located for each swimmer. Linear regression and machine learning was applied to 4 x 200-m swimming freestyle relay events.

Results

Compared to the individual event, the lowest ranked swimmer in the team (-0.62 s, CI=[-0.94,-0.30]) and American swimmers (-0.48 s [-0.89,-0.08]) typically swam faster 200-m times in relay events. Random forest models predicted gold, silver, bronze and non-medal with 100%, up to 41%, up to 63%, and 93% sensitivity, respectively.

Discussion

Team finishing position was strongly associated with the differential time to the fastest team (mean decrease in Gini (MDG) when this variable was omitted =31.3), world rankings of team members (average ranking MDG of 18.9), and the order of swimmers (MDG=6.9). Differential times are based on the sum of individual swimmer’s season’s best times, and along with world rankings, reflect team strength. In contrast, the order of swimmers reflects strategy. This type of analysis could assist coaches and support staff in selecting swimmers and team orders for relay events to enhance the likelihood of success.

1.b) Lines 33-35. The objective should coincide with the objective at the end of the Introduction and the beginning of the Discussion. The objective must be in the past tense as the study has already been carried out. The objective should include the term “4x200 m swimming freestyle relay events”.

We have ensured consistency of wording across the Abstract, Introduction and Discussion:

Abstract (line 34)

The aim was to predict and understand variations in swimmer performance between individual and relay events, and develop a predictive model for the 4x200-m swimming freestyle relay event to help inform team selection and strategy.

Introduction (line 104)

Therefore, the aim of this study was to enhance our understanding of contextual factors contributing to relay team performance in light of individual swimmer performance, and develop predictive models to analyse the relationship between a team’s finishing position and these factors for the 4x200-m swimming freestyle relay.

Discussion (line 291)

The statistical approaches developed in this study were useful in identifying the variables affecting relay swimming performance given individual swimmer performance, and predicting relay team finishing positions for the 4x200-m freestyle relay.

1.c) Lines 35-36. “We applied linear regression and machine learning to 4 x 200-m swimming freestyle relay events”. This should be in the sentences about the statistical analysis (after participants/sample).

Data (participants/sample) has been placed before methods in the abstract as suggested.

1.d) Line 40. “…American swimmers...” Is nationality of swimmers a studied variable? It is confusing. Information about table 1 could be included.

Yes, in addition to the abstract, nationality is listed as a variable in Table 1, included as a variable in equations 2 and 3, with nationality effects presented in the Results and further analysed in the Discussion.

2. Line 82-88. It is too speculative. Please, re-write.

We have clarified this paragraph by removing the sentence beginning with “Given this long period between selection and the major competition…”, and added a citation for the statement about physiological, psychological and team-based dynamics (line 84):

During FINA-sanctioned events including the biennial World Championships, relay teams must nominate their four selected swimmers, and the team order, one hour prior to the start of the heats or finals session in which the relay occurs [15]. However, swimmers are typically selected for the national squad a few weeks to several months prior to the competition based on their performance in the corresponding individual event. In addition, due to the complex interactions between physiological, psychological and team-based dynamics [10], predictions of individual performance in relays and overall team outcomes are challenging. Therefore, there is a need for effective predictive tools that could support coaches in the decision-making process to maximise the performance of the relay team as a whole, as well as each individual swimmer.

3. Line 98-99. Please, delete it. The objective should be the last sentence of the Introduction Section.

Deleted.

4. Line 135 and followings. “…a relay order of “2-1-3-4” indicates that the second fastest swimmer swam the lead-off or first leg…”. This second fastest swimmer is the second fastest according “the start time” (before the relay was swam) or “the final time” (after the relay was swam). I it can be inferred, but it needs clarification.

Here, we are using the swimmer’s world ranking and have clarified this point in line 148:

The order of swimmers in the relay was encoded according to the relative world ranking of each swimmer within a team. For example, a relay order of “2-1-3-4” indicates that the second fastest swimmer swam (i.e. second highest world ranking) the lead-off or first leg, the fastest swimmer swam the second leg, and so on.

5. Line 155-157. Was the stepwise selection procedure used? Please, clarify

No it was not, and this point has been clarified on line 166:

Multiple linear regression [21] was used to estimate the relationships between an individual swimmer’s performance in a relay and the explanatory variables (Eq 2).

6. Please, first explain what is a “random forest” (lines 163-168) and after why was it used (lines 158-162).

We have re-ordered the text as suggested (line 170):

Random forests were used to predict team finishing positions based on explanatory variables as they are ideally suited for a mixture of numeric and categorical variables with potentially highly non-linear relationships. Random forests are an ensemble modelling extension of simple decision trees, which recursively partition the space of explanatory variables to minimise some dispersion criteria (i.e. measure of variability) in the resultant partitions [22]. Random forests have also demonstrated high predictive sensitivity and specificity for complex problems in many domains [22]. This method helps to overcome the overfitting problem encountered in decision trees by building many shallow trees using data subsets sampled through bagging. We built a random forest model, referred to as RF1, to predict gold, silver, bronze or non-medal finishing positions. To assist with better prediction of medal colour, we also trialled a model that only predicts medal colour, RF2.

7. Line 258-264. Why was the 2019 FINA World Championships used to test the model? Why not the 2012 or 2016 Olympic Games? Why was only the “battle for the 3rd place” analyzed in female? Would the results be different if other Championship was analyzed? These questions are really relevant. This information should be clarified and included in the Statistical Analysis Section.

There has been a misunderstanding. The random forest model was tested using leave-one-out cross-validation on all of the competitions available in the dataset, including the 2012 and 2016 Olympic Games. Leave-one-out cross-validation ensures that we train the model on all other competitions, and test on a competition that was not used for training the model. The aggregated result was a model (RF1) that was highly effective at correctly predicting gold medal winning teams (100% sensitivity) and whether a team will medal or not (non-medalling sensitivity of 93%) (line 295). For further details, please refer to Statistical Analysis line 200, Results line 257, Table 3, and Discussion line 295.

However, to help illustrate how the model could be used, a case study was performed on particular aspects of the 2019 FINA World Championships as they were the most recent competition. We have clarified these points in the Statistical Analysis (line 215)

Finally, the utility of the model was demonstrated by applying it to a case study analysis of the 2019 World Championships.

and the Discussion (line 358):

To illustrate how the model could be used to support decision making, we demonstrate with a case study of predicting the finishing positions for the top four teams at the 2019 FINA World Championships.

8. The paragraphs of the Discussion section are a bit unconnected and repetitive. Please, try to make it more “readable”.

We took the opportunity to revisit the sequence of paragraphs in the Discussion section, removing repetitive statements and clarifying throughout. There are 6 sub-sections within the Discussion, which have been explicitly labelled with sub-headings as follows: 1) opening paragraph summarising main outcomes and applications, 2) differentiating psychological from technical effects, 3) nationality issues, primarily USA, 4) relative influence of variables, especially the issue of swimmer ranking with cross-reference to the literature and individual performances, 5) an illustrative case study of how the model could be applied, 6) limitations and future work.

Discussion

The statistical approaches developed in this study were useful in identifying the variables affecting relay swimming performance given individual swimmer performance, and predicting relay team finishing positions for the 4x200-m freestyle relay. Results indicate that swimmers from the USA, and those swimmers who were the slowest within their teams according to ranking, typically performed better in relays than in individual events. The random forest model RF1 was highly effective at correctly predicting gold medal winning teams (100% sensitivity), and whether a team will medal or not (non-medalling sensitivity of 93%). However, the models were less accurate in distinguishing between silver (35% using RF1, 41% using RF2) and bronze (13% using RF1, 63% using RF2). This outcome might be due to small differential times between these positions for some swimming competitions. In contrast, the differential times between the bronze medal position and non-medal positions for all competitions tended to be much larger. The RF2 model could be used by decision makers to evaluate silver and bronze medal scenarios assuming that a team will win a medal. These models enable coaches and support staff to simulate different relay race scenarios to determine the optimal relay team configuration by using swimmer characteristics, anticipated opponent swimmers and team order.

Differentiating Psychological from Technical Effects

Among the many variables that may impact relay swimming performance, the psychology of team competition is important [13, 14]. Note that we have adjusted for the effect of the flying start in relay legs two through four by setting exchange block times equal to individual reaction time [8]. Any residual differences between legs were captured via the relay leg term; thus, we were able to discern potential psychological effects from technical effects.

Our results indicate that the largest effect of the variables modelled in this study was due to the worst-ranked or slowest swimmer in a team. These swimmers typically swam 0.62 s faster in the relay than in the corresponding individual event. Peer effects can have a positive impact on individual performance within a team, and these psychosocial effects may help explain the improved performance of some swimmers in relays relative to their individual times in the present study [10, 25]. However, our findings differ from those of Hüffmeier and Hertel (12) who reported on the effects of relay leg assignment (i.e. going first, second, third or last). In contrast, we found the relative ranking of the swimmer within the team (i.e. worst-ranked swimmer) to be a larger effect, and relay leg assignment to be generally not significant. Motivating group effects are typically greater when an individual perceives their contribution as important to the overall team outcome [12, 14]. Therefore, it is possible that the slower swimmers within the team felt more pressure and motivation to step up and put their team in a good position. In contrast, relay teams comprised of higher ranking athletes are more likely to underperform relative to their individual performance [25]. Such psychological impacts could be an area for further study to help motivate and develop swimmers in relay and non-relay contexts.

Nationality Impacts

Swimmer nationality also impacted performance as individual swimmers from the USA tended to swim 0.48 s faster during the relays than their predicted individual swim times. This outcome could be attributed to the competition structure of the National Collegiate Athletic Association (NCAA) which allows for the frequent practise of relay swimming in competitive races. In contrast, Australian swimmers (and those of many other nations) may only swim in a limited number of relay events throughout the season prior to the major international competition, and rarely get the opportunity to practice with potential teammates. Team cohesiveness may play a role as social loafing is less likely to occur in highly cohesive teams [26]. However, further research is required to determine the underlying nature of differences between nations.

Relative Influence of Variables

The ability to accurately predict team finishing position based on a set of explanatory variables would support coaches in making an evidence-based decision when selecting relay team swimmers and leg assignments, potentially weeks to months ahead of competition. Random forest models were used to make these predictions and the most influential variables were identified based on cross-validation, and the mean decrease in sensitivity and specificity as measured by MDG [22]. As might be expected, the strength of the team, as captured by rankings and individual season’s best times, was the leading contributor to finishing position (Results). However, team strategy, in terms of the order of swimmers was the next most influential factor. The dataset used for modelling comprised primarily of high calibre, international events including Olympics and World Championships. Typically, these are the pinnacle events that athletes train and prepare for. We identified that medal outcomes were highly influenced by differential time (MDG of 31.3), which is based on the sum of individual swimmer’s season’s best times. This outcome suggests that individuals are performing at or near their best at these international relay competitions and, equivalently, that season’s best times are useful in predicting individual swimmers’ performance at pinnacle relay events.

Illustrative Case Study

To illustrate how the model could be used to support decision making, we demonstrate with a case study of predicting the finishing positions for the top four teams at the 2019 FINA World Championships. This data, which included world rankings and season best times coming into the competition, were not included in the original dataset. Although the gold medal predictions were correct, the model incorrectly predicted the bronze and 4th positions for females, and silver and bronze positions for males. Team average ranking for the two female teams was identical with a similar differential between the highest and lowest ranked swimmer. However, the fourth placed team had the best ranking swimmer overall, which may indicate that this team underperformed relative to their expected team performance time. This explanation may also serve as a reason for the incorrect model prediction here for both medal colour and medal or non-medal. Similarly, the USA Men’s team was predicted to finish in the silver medal position, but Russia outperformed them by just 0.17 s. However, the model was able to correctly predict a medal and non-medal position.

These models can also be used in a predictive decision support scenario where the impact of different swimmer orders on finishing position can be evaluated in a risk-informed, probabilistic manner. For the Canadian women’s teams in the 2019 FINA World Championships, the order used in the race provided the highest chance of bronze and lowest chance of a non-medal finish. A 2xx1 order could have increased the chance for a silver medal by 9.4%, but also increase the chance for missing out on a medal by 12.5%. According to the model, China would have increased their chance of a bronze medal and slightly decreased their chance of a non-medal finish if they applied another swimmer order or 21xx. However, these scenarios only serve as illustrations, and should be seen as observations given limitations of the data, the numerous possible swimmer order and ranking combinations, and the many other factors influencing medal finishes that were not included in the model.

Limitations and Future Work

While these statistical approaches were successfully applied to enhance our understanding of the variables impacting both individual and team performance in relay swimming events, there are some limitations. First, only teams with available data for all four swimmers were analysed, resulting in partial data for some races which is a potential source of misclassification errors. However, increasingly, more data are becoming available as demonstrated by the availability of each swimmer’s season’s best time and world ranking going into the 2019 World Championships. Potentially, data about individual swimmer’s physical status or performance characteristics (such as individual and relay block times [20]) could be used to extend this work and improve the predictivity of the model. Currently, less than 25% of the swimmers in this dataset had 3 or more relay races recorded and the majority only had 1; this limited our ability to model individual characteristics.

We also assumed that all relay teams had an equal chance of finishing in each position, whereas in reality some teams are more likely to be chasing medal positions than others. Incorporating such prior knowledge into the model, such as through a Bayesian formulation, could strengthen the prediction accuracy. Moreover, the individual analyses could be combined through a hierarchical model to enable ‘borrowing of strength’ between events to improve estimates and extend insights. Additionally, intermediary positions and whether the team is still in medal contention or not at the end of each leg can also potentially affect individual swimmer performance. Multivariate extensions of the proposed models could potentially be developed, with sufficient data, for this, more complex, task. Finally, hybrids of machine learning and statistical methods could be used in future research to build similar predictive models for other swimming relay events, such as mixed relay events [3].

9. Line 321-324. This is a repetition of Results. Please, re-write.

We focused that on the importance of strength of team, followed by order of swimmers (line 346):

As might be expected, the strength of the team, as captured by rankings and individual season’s best times, was the leading contributor to finishing position (Results). However, team strategy, in terms of the order of swimmers was the next most influential factor.

10. The team position at the moment that swimmer swims could influence (very probably) in his/her time. Please, include this as a limitation.

This point has been added to the discussion line 399:

Additionally, intermediary positions and whether the team is still in medal contention or not at the end of each leg can also potentially affect individual swimmer performance. Multivariate extensions of the proposed models could potentially be developed, with sufficient data, for this, more complex, prediction task.

Minor comments

11. Too much “Given…” Line 81, 84, 89… Please, re-write.

We have replaced the repetitions with “due to the…” and “with an…” (line 88 and line 94).

12. The Statistical Analysis Section is a bit hard to read. Please, consider to re-write it and make it more “readable”.

This section has been clarified and re-structured with subheadings and an opening text outlining the structure has been provided:

Statistical Analysis

Two main types of methods were used: (i) linear regression, to study individual swimmer’s relay performances, and (ii) random forests, to predict race outcomes given team configurations. This section describes the two methods, model fitting and model validation.

Linear Regression

Multiple linear regression [21] was used to estimate the relationships between an individual swimmer’s performance in a relay and the explanatory variables (Eq 2). An explanatory variable was deemed to have a significant effect if p≤0.05.

Random Forests

Random forests were used to predict team finishing positions based on explanatory variables as they are ideally suited for a mixture of numeric and categorical variables with potentially highly non-linear relationships. Random forests are an ensemble modelling extension of simple decision trees, which recursively partition the space of explanatory variables to minimise some dispersion criteria (i.e. measure of variability) in the resultant partitions [22]. Random forests have also demonstrated high predictive sensitivity and specificity for complex problems in many domains [22]. This method helps to overcome the overfitting problem encountered in decision trees by building many shallow trees using data subsets sampled through bagging. We built a random forest model, referred to as RF1, to predict gold, silver, bronze or non-medal finishing positions. To assist with better prediction of medal colour, we also trialled a model that only predicts medal colour, RF2. We developed a predictor variable based on the observation that team performance in a relay is the sum of the individual performance times of the four swimmers within the team. Therefore, based on the sum of the season’s best individual times we constructed a theoretical performance measure of each team relative to the theoretical performance of the fastest team based on differential time (Diff.Time) defined as follows:

Diff.Time_j=∑_(i=1)^4▒s_ij -min┬(∀j)⁡∑_(i=1)^4▒s_ij (1)

where for team j and individual i, s_ij is the season’s best time for that swimmer.

Model Fitting

All statistics were calculated using R software [23] and implemented with the base and randomForest packages to fit linear regression and random forest models, respectively. The parameters of the random forest were tuned by making use of a cross-validation based technique. Five-fold cross validation was run 100 times in conjunction with a grid search for selecting model parameters including the number of variables to sample at each split in the tree, and the number of variables sampled as candidates at each split in the tree. Given the randomly sampled nature of random forests, repeated evaluations provide a more robust selection for the tuning parameters [24].

Model Validation

For the linear regression model, goodness of fit is sufficient to give confidence that the model is reasonable, and the model can be interrogated to ascertain the impact of different explanatory variables on individual performance [21]. In comparison, the random forest was employed to predict race finishing position, so we validated model performance using leave-one-out cross-validation. In this scheme, we iterated over each data point, trained with all other data points and tested with the current data point.

We used a 4x4 confusion matrix to show the number of times a recorded gold, silver, bronze or non-medal result (corresponding to rows) was classified by the model as a gold, silver, bronze or non-medal outcome (columns corresponded to predictions). We computed model sensitivity, also referred to as producer’s accuracy when there are more than two categories, which is the rate at which the model correctly classifies a result as a member of a certain category [24]. Note that there is no direct analogue for specificity when there are more than two categories. The randomForest package uses the Gini index as one approach to capture both sensitivity and specificity [22]. This index is useful for assessing both the validity of the model, and for quantifying the relative influence of explanatory variables based on the decrease in the Gini index when a variable is removed from the model.

Finally, the utility of the random forest was demonstrated by applying it to a case study analysis of the 2019 World Championships.

13. Abbreviations are used to avoid repeating words… Mean Decrease in Gini (MDG) in line 243, 252, 321…

We have corrected the text to use MDG

________________________________________

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Santiago Veiga

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Response to Reviewers_v2.docx

Decision Letter 1

Dalton Müller Pessôa Filho

15 Jun 2021

PONE-D-21-03614R1

Predicting performance in 4 x 200-m freestyle swimming relay events

PLOS ONE

Dear Dr. Wu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jul 30 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Dalton Müller Pessôa Filho, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

The reviewers congratulate the authors on the improvement of this manuscript after the first round of revision. However, Reviewer #2 has addressed two new comments to the authors. Therefore, the authors need to provide the responses to these other comments before the final decision.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Authors have adequately addressed the comments raised in a previous round of review. However, it was difficult for the present reviewer to find the authors’ responses within the text. Authors should indicate their responses by bullet points or “response”.

Reviewer #2: General comments

Thank you for accepting the suggestions. I would like to suggest two more things:

The objective should coincide with the Abstract (lines 33-36) and the Introduction (lines 104-107). To predict and to enhance are not the same.

Line 101-104. Please, clarify more why the study was done in 4 x 200-m swimming freestyle relay events

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Santi Veiga

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jul 15;16(7):e0254538. doi: 10.1371/journal.pone.0254538.r004

Author response to Decision Letter 1


24 Jun 2021

Dear Prof. Filho

Thank you to you and the reviewers for your feedback. Please see below our responses and attached our formatted response to reviewers, where responses are highlighted in yellow and excerpts from the updated paper highlighted in gray. Please note that the line numbers in the response refer to Manuscript.docx.

Thanks

Authors

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

________________________________________

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes

________________________________________

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

________________________________________

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes

________________________________________

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes

________________________________________

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Authors have adequately addressed the comments raised in a previous round of review. However, it was difficult for the present reviewer to find the authors’ responses within the text. Authors should indicate their responses by bullet points or “response”.

Reviewer #2: General comments

Thank you for accepting the suggestions. I would like to suggest two more things:

The objective should coincide with the Abstract (lines 33-36) and the Introduction (lines 104-107). To predict and to enhance are not the same.

We have made the Introduction consistent with the Abstract (line 105):

Therefore, the aim of this study was to predict and better understand contextual factors contributing to relay team performance in light of individual swimmer performance, and develop predictive models to analyse the relationship between a team’s finishing position and these factors for the 4x200-m swimming freestyle relay.

Line 101-104. Please, clarify more why the study was done in 4 x 200-m swimming freestyle relay events

We have clarified as follows (line 101):

With each swimmer required to complete 4 x 50 m laps the event requires well-developed speed-endurance, technical skills in starts, turns and finishes, and the element of pacing and sufficient data to model pacing effects [9]. This complexity makes the 4 x 200-m freestyle ideal as a starting point for developing and testing predictive and analytical tools.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 2

Dalton Müller Pessôa Filho

29 Jun 2021

Predicting performance in 4 x 200-m freestyle swimming relay events

PONE-D-21-03614R2

Dear Dr. Wu,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Dalton Müller Pessôa Filho, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewer #2 has accepted the manuscript for publication in Plos One. Congratulations to the authors!

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Acceptance letter

Dalton Müller Pessôa Filho

2 Jul 2021

PONE-D-21-03614R2

Predicting performance in 4 x 200-m freestyle swimming relay events

Dear Dr. Wu:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Dr. Dalton Müller Pessôa Filho

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Residual and normality plot for the linear regression model reported in results.

    Note that the residuals are mostly normal and randomly distributed. Although there is a little deviation from normality in the right tail of the distribution (ignoring the outlier), there are relatively few data points here and these slow swim times are not as relevant in the context of predicting medalling performances.

    (DOCX)

    S1 File

    (CSV)

    Attachment

    Submitted filename: Response to Reviewers_v2.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All relevant data are included within the manuscript and its supporting information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES