Using machine learning to determine the nationalities of the fastest 100-mile ultra-marathoners and identify top racing events

Beat Knechtle; Katja Weiss; David Valero; Elias Villiger; Pantelis T Nikolaidis; Marilia Santos Andrade; Volker Scheer; Ivan Cuk; Robert Gajda; Mabliny Thuany

doi:10.1371/journal.pone.0303960

. 2024 Aug 22;19(8):e0303960. doi: 10.1371/journal.pone.0303960

Using machine learning to determine the nationalities of the fastest 100-mile ultra-marathoners and identify top racing events

Beat Knechtle ^1,^2,^*, Katja Weiss ², David Valero ³, Elias Villiger ², Pantelis T Nikolaidis ⁴, Marilia Santos Andrade ⁵, Volker Scheer ³, Ivan Cuk ⁶, Robert Gajda ⁷, Mabliny Thuany ⁸

Editor: Stevo Popovic⁹

PMCID: PMC11340887 PMID: 39172797

Abstract

The present study intended to determine the nationality of the fastest 100-mile ultra-marathoners and the country/events where the fastest 100-mile races are held. A machine learning model based on the XG Boost algorithm was built to predict the running speed from the athlete’s age (Age group), gender (Gender), country of origin (Athlete country) and where the race occurred (Event country). Model explainability tools were then used to investigate how each independent variable influenced the predicted running speed. A total of 172,110 race records from 65,392 unique runners from 68 different countries participating in races held in 44 different countries were used for analyses. The model rates Event country (0.53) as the most important predictor (based on data entropy reduction), followed by Athlete country (0.21), Age group (0.14), and Gender (0.13). In terms of participation, the United States leads by far, followed by Great Britain, Canada, South Africa, and Japan, in both athlete and event counts. The fastest 100-mile races are held in Romania, Israel, Switzerland, Finland, Russia, the Netherlands, France, Denmark, Czechia, and Taiwan. The fastest athletes come mostly from Eastern European countries (Lithuania, Latvia, Ukraine, Finland, Russia, Hungary, Slovakia) and also Israel. In contrast, the slowest athletes come from Asian countries like China, Thailand, Vietnam, Indonesia, Malaysia, and Brunei. The difference among male and female predictions is relatively small at about 0.25 km/h. The fastest age group is 25–29 years, but the average speeds of groups 20–24 and 30–34 years are close. Participation, however, peaks for the age group 40–44 years. The model predicts the event location (country of event) as the most important predictor for a fast 100-mile race time. The fastest race courses were occurred in Romania, Israel, Switzerland, Finland, Russia, the Netherlands, France, Denmark, Czechia, and Taiwan. Athletes and coaches can use these findings for their race preparation to find the most appropriate racecourse for a fast 100-mile race time.

Introduction

Ultra-endurance events lasting more than 6 hours include ultramarathon running races [1]. The popularity of these events has increased significantly in the last 25 years, particularly in ultra-marathon races, where an exponential increase in participation has been observed [2, 3]. The ultra-marathon race distance of 100 miles (161 km) is highly popular, especially in the United States [4]. The high popularity of the 100-mile race distance among ultra-marathoners has led to a high level of scientific interest among researchers [5–9]. The main topics of scientific interest have included fluid and electrolyte metabolism, the heart structure and function of the 100-mile ultra-marathoner, successful race performance, pacing, nutrition, anthropometry, age, mental toughness, sleep, muscle damage, skeletal and renal health, overuse injuries, metabolomics, and pain perception [5–7]. A large number of studies have been performed about the ‘Western States Endurance Run’ [8, 9], where the first paper appeared in 1987 [6].

The largest body of research has been performed on fluid and electrolyte metabolism [10], focusing on specific aspects such as exercise-associated hyponatremia [11, 12] and fluid metabolism [13, 14]. Considering the heart of the 100-mile ultra-marathoners, aspects such as cardiac adaptation [15], the heart rate variability [5], alterations in cardiac mechanics [16], and the right ventricle [17] were of scientific interest. Other aspects of 100-mile ultra-marathoners such as the age of the best performance [18], the sex difference in performance [19], age group performances [18], physiological aspects [20], nutrition [21], mental toughness [7], pacing [22], and the use of non-steroidal anti-inflammatory drugs [23, 24] have also been studied extensively.

Since most of the research on 100-mile ultra-marathoners has been performed in races held in the United States, most 100-mile runners originate from that country. However, a study conducted to understand the best countries competing at 100 miles, using a macro-to-micro analysis, showed that most of the athletes were from the American and European continents, despite the observation of the fastest being from Africa [25]. An analysis of each continent showed that women from Sweden, Hungary and Russia presented the best performances in the top three, top 10 and top 100, while the fastest men were from Brazil, Russia and Lithuania [25]. However, we do not know (i) athletes of which nationality are the fastest in 100-mile ultra-marathon running and (ii) where the fastest 100-mile race courses are located worldwide. In this context, we undertook this research to determine the country of origin of the fastest runners and the location of the fastest race courses. These insights would help athletes and coaches to better plan their race strategy to obtain a fast race time. We hypothesized, based on these recent findings, that our study would confirm recent findings that most of the 100-mile runners would originate from Europe and America, that the fastest 100-mile runners would be found to originate from Africa, and that the fastest race courses would be situated in Africa, Europe and/or America.

Methods

Ethical approval

This study was approved by the Institutional Review Board of Kanton St. Gallen, Switzerland, with a waiver of the requirement for informed consent of the participants as the study involved the analysis of publicly available data (EKSG 01/06/2010). The study was conducted following recognized ethical standards according to the Declaration of Helsinki adopted in 1964 and revised in 2013.

Data set and data preparation

The race data was obtained from DUV Ultra-Marathon Statistik (statistik.d-u-v.org/geteventlist.php) by the end of 2022. The data were accessed July 11, 2023, for research purposes. The raw 100-mile sample contained 172,394 race records, with the United States accounting for around 70% of the sample, while there were also numerous countries with just one or two race records. Each race record included the athlete’s name, age group, gender and country of origin, the race location and year, the race distance, and the athlete’s race time, from which the race speed was calculated. ISO3 codes were used for the country information. After discarding any incomplete or incorrect instances and filtering out countries with a very low number of records, a total of 172,110 race records from 65,392 unique runners from 68 different countries participating in races held in 44 different countries were used for analyses. To minimize the potential effect of outliers, a minimum of 10 race records was set per country to qualify for the analysis.

Statistical analysis

First, two independent ranking tables were created, aggregating the race records by country of origin and event, and then sorting each list of countries by number of race records. To reduce noise and ensure that the results were statistically representative, race records from athlete countries with less than 15 records or less than five unique runners were removed, and race records from event countries with less than 10 records were removed. Descriptive statistical results for each country are summarized in the ranking tables, where the tables index also serves as a key to the Partial Dependence Plots (PDP). We then built and evaluated a non-linear machine learning (ML) predictive regression model and looked into the model logic through some explainability tools. The algorithm used for building the model is the popular XG Boost. XG Boost (xgboost.readthedocs.io/en/stable/) belongs to the family of gradient-boosting tree-ensemble algorithms and is widely used to solve classification and regression problems in data science.

XG Boost regression model

The model was designed to use the following variables as predictors or inputs to the model: “Athlete_gender_ID”, “Age_group_ID”, “Athlete_country_ID”, and “Event_country_ID”. The predicted variable, or algorithm output, was the Race (running) speed (km/h). Before the data could be fit in the model, the predictors had to be numerically encoded, Th. Athlete gender variable was encoded as female = 0 and male = 1. The Age group variable was already numerically encoded in 5-year age groups (except group 18, which represents runners of less than 20 years, and group 75, which represents 75 years and older). The Athlete country and Event country variables were encoded based on their position in the respective ranking tables. Fig 1 illustrates the setup, with the variables used as predictors or inputs and the race (running) speed prediction as the model output.

Model training and evaluation strategy

A hold-out evaluation strategy was used to train and evaluate the model, executing a simulation with different test splits and combinations of several estimators and learning rates. Two evaluation metrics, MAE (Mean Absolute Error) and R², were calculated. Also, the model relative features importances, partial dependence plots (PDP) and prediction distribution plots were calculated and are displayed in the results section. In addition to the model interpretability analysis, a set of descriptive target plot charts show the predictor values, group sizes, and the group’s average speed, helping to set expectations for the PDP and prediction charts.

After several iterations and tests, the optimal model parameters and accuracy scores were:

500 estimators (learners or trees)
Learning rate of 0.5
R² score of 0.23 (in-sample test)
MAE of 0.87 km/h

Model interpretation

The ’optimal’ model accuracy score of R² = 0.23 indicates an existing but moderately weak effect of the predicting variables in the model output. To assess how each predictor contributed to the model output, we computed the importance of the model’s relative features, the PDP plots, and the model prediction distributions. The PDP plots show the relative amount of change on the model output for each predicting variable’s different values with respect to a reference value (value 0). The prediction distribution plots use boxplots to show the distribution of the model predictions of average race speed. Descriptive statistical values are given in terms of frequencies (counts), mean, standard deviation (std), minimum values (min), and maximum values (max), and also with median values (in the box plots). All computation and analysis were done using a Jupyter Notebook (Google Colab) and Python and associated libraries (pandas, numpy, xgboost, pdpbox, sklearn, matplotlib, sns).

Results

The qualifying sample used for analysis consists of 172,110 race records from 65,392 unique runners from 68 different countries participating in races held in 44 different countries. Table 1 presents the country rankings by number of race records and unique runners. The mean race speed is color-coded, with darker colors corresponding to higher values (faster running speed). The first column in the ranking tables is the index to interpret the PDP charts. The United States accounted for the highest participation in both athlete country and event country rankings, followed by Great Britain, Canada, South Africa, Japan, Germany, and Australia.

Table 1. Athlete country ranking table.

Athlete country		Race speed (km/h)				Race records	Unique runners
		mean	std	min	max
0	USA	6.053	1.147	1.176	15.409	117583	37833
1	GBR	6.428	1.479	2.222	13.977	11193	5501
2	CAN	6.211	1.127	2.572	16.974	5553	2080
3	RSA	7.314	1.349	2.410	13.469	5547	2657
4	JPN	5.349	1.348	2.443	12.428	5084	2913
5	GER	6.710	1.378	2.338	12.520	4439	2001
6	AUS	6.288	1.554	2.252	13.449	2894	1464
7	SWE	6.963	1.387	2.072	12.839	2094	895
8	CHN	4.381	1.155	2.715	12.087	1700	1491
9	ITA	6.467	1.891	2.700	12.363	964	568
10	DEN	7.296	1.325	3.379	11.851	958	493
11	FRA	6.462	1.946	2.403	12.013	953	644
12	PHI	4.989	0.972	3.025	9.984	919	384
13	NZL	6.082	1.527	3.382	12.389	896	526
14	GRE	5.121	1.344	3.696	13.665	808	425
15	POL	6.924	1.716	3.445	12.960	799	510
16	BEL	6.567	1.424	2.862	11.233	780	435
17	TPE	7.194	1.583	3.362	11.109	768	339
18	NOR	6.952	1.714	3.216	11.895	738	364
19	NED	6.807	1.361	3.338	11.777	728	369
20	IRL	6.913	1.518	2.735	11.667	606	325
21	MEX	6.242	1.487	3.136	12.600	558	239
22	HUN	7.746	1.563	1.277	12.745	503	289
23	RUS	7.717	1.636	3.597	14.034	424	294
24	CZE	7.136	1.850	2.440	12.905	376	217
25	FIN	7.697	1.355	4.014	11.666	346	169
26	SUI	6.584	1.532	2.797	11.819	331	160
27	THA	4.052	0.784	2.982	8.778	321	273
28	ESP	6.881	2.012	2.281	12.256	320	216
29	KOR	6.479	1.116	3.529	9.944	299	204
30	ARG	5.466	1.181	3.278	10.530	282	235
31	MAS	4.307	1.001	3.096	8.172	236	177
32	HKG	5.424	1.719	3.358	10.761	166	116
33	AUT	6.660	1.733	2.849	12.096	163	114
34	IND	6.207	1.330	3.741	11.178	156	101
35	SGP	5.307	1.047	3.186	9.297	149	104
36	BRA	6.496	1.902	3.428	11.654	110	65
37	CRO	5.882	2.067	3.355	10.937	103	88
38	COL	4.820	1.535	3.183	9.237	103	66
39	SLO	6.310	2.162	3.355	11.785	90	67
40	SVK	7.488	1.797	3.298	12.184	80	42
41	VIE	4.209	0.656	3.612	6.902	66	41
42	ISR	7.353	1.543	4.010	10.487	66	66
43	ROU	6.908	2.153	3.209	13.411	62	42
44	CHI	5.700	0.949	2.779	8.677	62	39
45	LAT	7.626	1.796	4.478	11.462	62	36
46	UKR	7.900	1.974	4.738	12.831	59	32
47	PAN	5.877	1.058	4.139	9.461	58	17
48	POR	6.726	1.722	3.501	11.723	46	28
49	SRB	6.183	2.036	3.650	11.825	44	29
50	PER	5.854	1.351	3.250	9.649	43	17
51	BUL	6.749	2.046	4.196	12.568	37	22
52	GUA	6.332	1.251	4.480	9.551	37	12
53	CRC	6.010	1.095	3.392	8.613	36	16
54	LTU	8.738	2.352	4.311	14.365	35	21
55	TUR	6.647	1.782	3.781	9.976	30	14
56	EST	7.266	1.635	3.713	10.205	28	17
57	BLR	6.988	1.133	4.504	9.166	27	15
58	VEN	5.894	1.378	3.214	7.706	25	11
59	PUR	6.563	1.086	4.721	8.854	23	9
60	IRI	5.765	0.899	3.691	7.062	21	5
61	INA	4.492	0.592	3.473	5.539	20	12
62	BRU	3.978	0.679	3.203	5.437	19	12
63	CYP	5.156	0.793	3.860	6.399	18	11
64	ISL	5.664	0.981	4.180	8.039	18	9
65	BIH	6.621	1.754	4.276	9.889	16	9
66	ECU	6.305	1.573	3.416	9.223	16	9
67	ZIM	7.121	2.491	4.417	11.883	16	8

Open in a new tab

Std (standard deviation); min (minimum value); max (maximum value)

Event country ranking

The country of event ranking table, with 42 countries, is shown in Table 2. Most runners competed in races held in the United States, Great Britain, South Africa, Japan, and Germany.

Table 2. List of event countries sorted by mean running speed.

Event country		Race speed (km/h)				Race records	Unique runners
		mean	std	min	max
0	USA	6.061	1.152	1.176	16.974	124323	40565
1	GBR	6.449	1.439	2.656	14.307	11343	6002
2	RSA	7.314	1.341	2.410	13.469	5581	2717
3	JPN	5.126	1.187	2.443	10.922	4397	2878
4	GER	6.813	1.251	2.836	12.600	4033	2239
5	CAN	6.165	1.229	2.222	13.421	2981	1725
6	AUS	6.281	1.544	2.252	13.449	2616	1482
7	SWE	6.916	1.327	2.072	12.839	2075	978
8	CHN	4.300	1.064	2.715	12.087	1638	1487
9	NZL	5.795	1.478	3.140	12.776	1077	808
10	TPE	7.569	1.634	3.362	12.960	903	456
11	GRE	4.867	0.738	3.868	9.180	851	473
12	ITA	7.067	2.467	2.718	14.365	782	626
13	BEL	6.236	1.258	3.734	10.462	779	519
14	PHI	4.697	0.712	3.025	8.425	772	401
15	DEN	7.498	1.263	4.910	11.947	767	471
16	POL	7.328	1.764	3.468	13.029	744	590
17	NOR	6.832	1.707	3.216	11.868	692	411
18	NED	7.798	1.020	5.060	12.100	653	408
19	FRA	7.385	2.357	3.338	13.549	651	605
20	HUN	7.087	1.875	1.277	12.219	441	289
21	THA	4.189	0.934	2.982	8.778	403	351
22	IRL	7.247	1.534	4.605	11.667	339	212
23	SUI	8.190	1.546	3.586	12.905	330	267
24	RUS	7.787	1.137	5.693	13.318	321	250
25	CZE	7.385	1.475	3.298	11.558	302	226
26	FIN	8.111	1.059	6.480	11.666	287	210
27	CRO	4.876	1.016	3.355	8.116	272	256
28	ARG	5.114	0.783	4.063	8.281	253	240
29	KOR	6.483	0.855	4.623	9.944	233	185
30	MAS	4.193	0.980	3.096	7.645	195	180
31	ESP	7.344	2.004	3.953	13.411	184	148
32	SGP	5.685	0.685	4.816	7.842	130	102
33	MEX	5.635	1.067	4.031	9.070	117	83
34	ROU	9.299	1.157	6.938	12.533	116	116
35	COL	4.042	0.743	3.183	6.350	96	77
36	AUT	5.664	2.285	3.454	11.333	92	92
37	VIE	4.175	0.641	3.612	6.902	75	75
38	LUX	5.210	0.659	4.472	7.520	70	60
39	IND	5.978	0.895	4.189	8.587	70	70
40	HKG	4.410	0.762	3.768	7.421	52	52
41	CHI	5.226	1.182	3.416	8.013	44	41
42	SRB	6.768	2.075	3.716	9.808	20	17
43	ISR	8.523	0.929	6.870	10.287	10	10

Open in a new tab

Std (standard deviation); min (minimum value); max (maximum value)

Model features relative importances

The ’optimal’ model can only explain 23% of the race speed variability through the four predictors at best, indicating that additional predicting variables should be added to the model in order to improve its accuracy. The model (Fig 2) rates Event country (0.49) as the most important predictor (based on data entropy reduction), followed by Athlete country (0.24), Age group (0.15), and Gender (0.13).

Partial dependence plots (PDP)

The PDP plot shows the following: Model outputs are around 0.26 km/h higher for males than for females (Fig 3). The highest model outputs are given to runners in age groups 25–29 years and 30–34 years (Fig 4). Athlete country ID 54 (Lithuania) shows a distinct peak, matching the highest mean speed in the ranking table (Fig 5). Event country IDs 16, 19, 23 and 34 obtain the highest peaks in the corresponding PDP chart, although only 23 (Switzerland) and 34 (Romania) are among the fastest in the ranking table (Fig 6).

Prediction distributions and target plots

The target plots represent a descriptive visualization of the 100 km race dataset by predictor and show the groups’ sizes and average speeds. The prediction plots show the distribution of the XG Boost model output (the predicted race speed) by predictor value through a set of boxplots. The difference among male and female predictions is relatively small at about 0.23 km/h (Fig 7). The fastest age group is 25–29 years, but the average speeds of groups 20–24 years and 30–34 years stay close (Fig 8). Participation, however, peaks in the age group 40–44 years. The model replicates predictions that loosely follow the average speed curve in the country-based charts. In terms of participation, the United States leads by far, followed by Great Britain, Canada, South Africa, and Japan, in both athlete and event countries. The fastest athletes come mostly from Eastern European countries (Lithuania, Latvia, Ukraine, Finland, Russia, Hungary, Slovakia) and also Israel, while the slowest athletes come from Asian countries like China, Thailand, Vietnam, or Malaysia (Fig 9). The fastest 100-mile races are held in Romania, Israel, Switzerland, Finland, Russia, the Netherlands, France, Denmark, Czechia, and Taiwan (Fig 10).

Discussion

The present study aimed to determine the country of origin of the fastest 100-mile runners and the countries hosting the fastest 100-mile race courses using an XG Boost regression model. We found that the event location (i.e. the country where the race is held) was the most important predictor for a fast 100-mile race time where the fastest race courses are offered in Romania, Israel, Switzerland, Finland, Russia, the Netherlands, France, Denmark, Czechia, and Taiwan. Regarding the first aim, the fastest athletes come mostly from Eastern European countries (i.e., Lithuania, Latvia, Ukraine, Finland, Russia, Hungary, and Slovakia).

The fastest race courses

The first important finding was that the country of the event was the most important feature concerning the XG Boost model’s predictive power. The countries with the fastest 100-mile events were Romania, Israel, Switzerland, Finland, Russia, the Netherlands, France, Denmark, Czechia, and Taiwan. Therefore, we could confirm our hypothesis only for Europe, not for Africa and/or America. Common to these races or race courses was the fact they were road-based flat courses on small loops. In some instances, the races recorded the 100-mile split times in a longer or longer race, such as a 24-hour race. In other instances, the races were held as indoor races. In a few instances, the races were held as Championships, such as European or World Championships. In more detail, in Romania, the ‘IAU 24 h European Championship’ was held in Timisoara in 2018, where the 100-mile split times were taken. The race is a road-based ultra-marathon held on a 1,236 m long asphalt loop (http://s24h.ro/). Importantly, Aleksandr Sorokin from Lithuania passed the 100 miles in 12:50:26 h:min:s. In Israel, the ‘Spartanion 100 Miles Race’ has been held since 2020 in Ganei Yehoshua Park, Tel Aviv, on a 1,459 m long circular, fast and clean course (https://spartanion.com/). In Switzerland, the ‘24 heures de Lausanne’ recorded 1981 a 100-mile split time with a time of 12:28:16 h:min:s. Furthermore, the ‘24-Stundenlauf Aare-Insel Brugg’ (www.24stundenlauf.ch) and the ‘Self-Transcendence 24h Lauf Basel’ (https://ch.srichinmoyraces.org/self-transcendence-1224-stunden-lauf-basel) recorded 100-mile split times. In addition, in 1993, the 2^nd ‘IAU 24h EC Basel’ was held with 100 miles split times. In Finland, the ‘Endurance 24 h Ultrarun Espoo’ has been held since 2010 and the 100-mile split times were taken. The course is a 390,04 m mondo-surfaced indoor track at Esport Ratiopharm Arena in Tapiola Sports Center, Espoo (https://endurance.fi/e24). Different 100-mile races have been held in Russia, such as the ‘24h ‘Sutki Begom’ Moskau’ with a 100-mile split, apart from trail races (Vottovaara Mountain Race and Elton Ultra-Trail). In the Netherlands, the first 100-mile race was held in 1983, with the ‘Sint Oedenrode Wandelevenement’ held as a walking event. Later, the ‘24 uurs Apeldoorn’ recorded from 1989 to 1997 a 100-mile split time. Furthermore, the 24-hour races ‘24uur van Steenbergen’ and the ‘LangsteNachtLoop 24 uurs’ recorded 100-mile split times. In France, 100-mile split times were recorded in ‘48 Heures Pedestre a Montauban’, where Yiannis Kouros passed the 100 miles in 1985 in 11:52:40 h:min:s. In 2019, a 100-mile split was recorded in ‘IAU 24h WC, 24 heures d’Albi’ where Aleksandr Sorokin passed in 13:12:39 h:min:s. The first 100-mile race in Denmark started in 2007 with the ‘Mors 100 miles’ as a flat road race. In 2009, the ‘100 Miles—Around the isle of Mors’ also started as a flat road race. In Czechia, a 100-mile split time was recorded in the ‘Brno Spring 48 Hour Indoor’ as an indoor run. Later, split times were recorded in the ‘Self-Transcendence Race 24h Kladno’ and the ‘Běh na 24 hodin Pilsen’ as an indoor run. The finding that the country of the race is the most important predictor of performance might be attributed to topographic characteristics, environmental conditions, and runners’ preference for specific races to achieve optimal performance. Concerning topographic characteristics, it is observed that most countries with the fastest races share the common feature of flat terrains. In contrast, most of these countries have a continental climate favoring the achievement of fast race. It is also well known that the training process follows the principle of periodization [26, 27], according to which the training is divided into specific phases where the characteristics of exercise (e.g., intensity, volume, recovery and mode) are manipulated to peak performance at a certain time. In this context, runners are assumed to participate in a race that fits within their training plan. In addition, a specific race may be selected in terms of reputation (a race can be considered more important than another), where it is already known that other high-level runners intend to participate, and this leads to a sequence of reciprocal cause and effect in which: the fast runners choose fast races to compete, and in turn, the participation of fast runners ensure that a fast race remains fast.

The fastest runners

In contrast to a recent study reporting that the fastest 100-mile ultra-marathoners were women from Sweden, Hungary and Russia and men from Brazil, Russia and Lithuania [25], we found that runners from Lithuania, Latvia, Ukraine, Finland, Russia, Hungary, and Slovakia obtained the fastest running speeds. In the first instance, we found that 35 runners from Lithuania were among the fastest. Although it might be possible that one or a few runners from the same country could bias the result, the best Lithuanian ultra-marathoner, Aleksandr ‘Sania’ Sorokin, has finished only four 100-mile races. Still, with the world record 100 miles on the track in 2021 in the ‘Centurion Running Track 100 Mile’ in the United Kingdom and the 100 miles on road in the ‘Sparanion Race’ in 2022 in Israel (www.irunfar.com/aleksandr-sorokin-150-kilometer-100-mile-and-12-hour-world-record-holder-interview). Therefore, 31 race records must be from other fast Lithuanian ultra-marathoners. It should be highlighted that the fastest runners in the present study originated from countries that shared geographical, cultural, and socio-economical characteristics. Furthermore, a recent review reported a dominance of Russian athletes in ultra-marathon running and suggested as potential explanations a possible misuse of performance-enhancing substances, historical, climate-geographical, and psychophysiological (e.g., a combination of genetic and social) factors [28]. Although most 100-mile runners were from the United States, US runners are not among the fastest. In the US, plenty of 100-mile races are held, and most are trail runs (https://runningintheusa.com/classic/list/map/100m). One of the most traditional 100-mile races is the ‘Western States 100 Mile Endurance Run’ held since 1976 (www.ws100.com/). Another 100-mile race with a long tradition is the ‘Old Dominion 100 Mile Endurance Run’, which started in 1979 (www.olddominionrun.org/). The greater participation of US runners in the race may shift the average time downwards, which does not necessarily mean that they have lower times than other nationalities. The relatively large number of US-American finishers in this race distance indicated that these runners could be more ‘recreational’ than those from other countries (who, in turn, could be considered more ‘selective’) and might partially explain that they were not among the fastest nationalities.

The age of peak performance

We also found that athletes in the age group 25 (25–29 years) were the fastest in the 100-mile race distance. This age is significantly lower than that found in a study of 35,956 finishes (6,862 women and 29,094 men) in 100-mile ultramarathons between 1998 and 2011. The annual top ten fastest runners had an average age of ~39 and ~37 years for women and men [29]. The difference to the present results might be that the present study considered all athletes, whereas the existing study was restricted to the annual ten fastest. Furthermore, the relatively young age of the fastest finishers in our study might be explained in terms of ‘selectiveness’ variation by age group. The number of finishers in the 25–29 age group is three to four times less than that in the age groups 35–39, 40–44 and 45–49, suggesting that the athletes in the former one might be considered as more ‘selected’ compared to the more ‘recreational’ athletes of the latter groups. In another way, these are interesting findings, indicating that young runners, when well-trained, can perform well in ultramarathon events.

Limitations

Although this study uses a very large data set and highly sophisticated analyses, we must acknowledge some limitations. We found that the fastest running speeds were obtained by runners from Lithuania, Latvia, Ukraine, Finland, Russia, Hungary, and Slovakia—countries with partially low numbers of runners. Since we did not account for repeated measures, one or two outstanding athletes from these countries could be responsible for the country’s performance. However, as described by Aleksandr ‘Sania’ Sorokin, only one athlete cannot achieve all the best race results for one country. Aspects such as training, previous experience [30], motivation [31], drafting [32], pre-race nutrition [33], and environmental conditions [34] could not be considered. We must also be aware that these race courses might not all have been exactly measured, so some very fast race courses might not have the full length of 100 miles (161 km). Another limitation is associated with the available information. With only four predictors, the model could only be very general. More realistic models could be built by collecting additional runner-specific data and mixing it with the available data.

Conclusion

In summary, the event location (i.e. the country where the race is held) is the most important predictor for a fast 100-mile race time, according to our XG Boost regression model. The fastest race courses occurred in Romania, Israel, Switzerland, Finland, Russia, the Netherlands, France, Denmark, Czechia, and Taiwan. Common to these races or race courses is the fact they are held on a road-based flat course on small loops. In some instances, the races took the 100-mile split times in a longer race, such as a 24-hour race or longer. In other instances, the races were held as indoor races. In a few instances, the races were held as European or World Championships. Athletes and coaches can use these findings for their race preparation to find the most appropriate race course for a fast 100-mile race time. For example, running a 24-hour race (often flat and circular) might be better to try to break 100-mile personal best time, thus combining two "races" in one, than running some challenging 100-mile race.

Supporting information

S1 Data

(XLSX)

pone.0303960.s001.xlsx^{(3.7MB, xlsx)}

S2 Data

(XLSX)

pone.0303960.s002.xlsx^{(4.5MB, xlsx)}

Data Availability

Availability of Data and Materials For this study, we have included official results and split times from the DUV Ultra-Marathon Statistik (statistik.d-u-v.org/geteventlist.php). The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Funding Statement

The author(s) received no specific funding for this work.

References

1.Scheer V, Basset P, Giovanelli N, Vernillo G, Millet GP, Costa RJS. Defining Off-road Running: A Position Statement from the Ultra Sports Science Foundation. Int J Sports Med. 2020;41(5):275–84. doi: 10.1055/a-1096-0980 [DOI] [PubMed] [Google Scholar]
2.RunRepeat. The State of Ultra Running 2020 2021 [cited 2021 24 September 2021]. https://runrepeat.com/state-of-ultra-running.
3.Scheer V, Valero D, Villiger E, Rosemann T, Knechtle B. The impact of the COVID-19 pandemic on endurance and ultra-endurance running. Medicina. 2021;57(1). doi: 10.3390/medicina57010052 . [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Hoffman M, Ong J, Wang G. Historical analysis of participation in 161 km ultramarathons in North America. Int J Hist Sport. 2010;27(11):1877–91. doi: 10.1080/09523367.2010.494385 [DOI] [PubMed] [Google Scholar]
5.Paech C, Schrieber S, Daehnert I, Schmidt-Hellinger PJ, Wolfarth B, Wuestenfeld J, et al. Influence of a 100-mile ultramarathon on heart rate and heart rate variability. BMJ Open Sport Exerc Med. 2021;7(2):e001005. Epub 20210513. doi: 10.1136/bmjsem-2020-001005 . [DOI] [PMC free article] [PubMed] [Google Scholar]
6.McIntosh HD. 100-Mile Western States Endurance Run: a physiologic stress laboratory. J Am Coll Cardiol. 1987;9(1):248. doi: 10.1016/s0735-1097(87)80113-4 . [DOI] [PubMed] [Google Scholar]
7.Brace AW, George K, Lovell GP. Mental toughness and self-efficacy of elite ultra-marathon runners. PLoS One. 2020;15(11):e0241284. Epub 20201104. doi: 10.1371/journal.pone.0241284 . [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hoffman MD, Fogard K, Winger J, Hew-Butler T, Stuempfle KJ. Characteristics of 161-km ultramarathon finishers developing exercise-associated hyponatremia. Res Sports Med. 2013;21(2):164–75. doi: 10.1080/15438627.2012.757230 . [DOI] [PubMed] [Google Scholar]
9.Hoffman MD, Wegelin JA. The western states 100-mile endurance run: Participation and performance trends. Medicine and Science in Sports and Exercise. 2009;41(12):2191–8. doi: 10.1249/MSS.0b013e3181a8d553 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Winger JM, Hoffman MD, Hew-Butler TD, Stuempfle KJ, Dugas JP, Fogard K, et al. The effect of physiology and hydration beliefs on race behavior and postrace sodium in 161-km ultramarathon finishers. Int J Sports Physiol Perform. 2013;8(5):536–41. Epub 20130214. doi: 10.1123/ijspp.8.5.536 . [DOI] [PubMed] [Google Scholar]
11.Cairns RS, Hew-Butler T. Incidence of Exercise-Associated Hyponatremia and Its Association With Nonosmotic Stimuli of Arginine Vasopressin in the GNW100s Ultra-endurance Marathon. Clin J Sport Med. 2015;25(4):347–54. doi: 10.1097/JSM.0000000000000144 . [DOI] [PubMed] [Google Scholar]
12.Hoffman MD, Fogard K. Factors related to successful completion of a 161-km ultramarathon. Int J Sports Physiol Perform. 2011;6(1):25–37. doi: 10.1123/ijspp.6.1.25 . [DOI] [PubMed] [Google Scholar]
13.Hoffman MD, Stuempfle KJ, Valentino T. Sodium Intake During an Ultramarathon Does Not Prevent Muscle Cramping, Dehydration, Hyponatremia, or Nausea. Sports Med Open. 2015;1(1):39. Epub 20151222. doi: 10.1186/s40798-015-0040-x . [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Lebus DK, Casazza GA, Hoffman MD, Van Loan MD. Can changes in body mass and total body water accurately predict hyponatremia after a 161-km running race? Clin J Sport Med. 2010;20(3):193–9. doi: 10.1097/JSM.0b013e3181da53ea . [DOI] [PubMed] [Google Scholar]
15.George KP, Warburton DE, Oxborough D, Scott JM, Esch BT, Williams K, et al. Upper limits of physiological cardiac adaptation in ultramarathon runners. J Am Coll Cardiol. 2011;57(6):754–5. doi: 10.1016/j.jacc.2010.05.070 . [DOI] [PubMed] [Google Scholar]
16.Nagueh SF, Smiseth OA, Appleton CP, Byrd BF 3rd, Dokainish H, Edvardsen T, et al. Recommendations for the Evaluation of Left Ventricular Diastolic Function by Echocardiography: An Update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging. J Am Soc Echocardiogr. 2016;29(4):277–314. doi: 10.1016/j.echo.2016.01.011 . [DOI] [PubMed] [Google Scholar]
17.Lord R, Somauroo J, Stembridge M, Jain N, Hoffman MD, George K, et al. The right ventricle following ultra-endurance exercise: insights from novel echocardiography and 12-lead electrocardiography. Eur J Appl Physiol. 2015;115(1):71–80. Epub 20140910. doi: 10.1007/s00421-014-2995-6 . [DOI] [PubMed] [Google Scholar]
18.Rüst CA, Rosemann T, Zingg MA, Knechtle B. Age group performances in 100 km and 100 miles ultra-marathons. SpringerPlus. 2014;3:331. Epub 20140701. doi: 10.1186/2193-1801-3-331 . [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Waldvogel KJ, Nikolaidis PT, Di Gangi S, Rosemann T, Knechtle B. Women Reduce the Performance Difference to Men with Increasing Age in Ultra-Marathon Running. Int J Environ Res Public Health. 2019;16(13). Epub 20190704. doi: 10.3390/ijerph16132377 . [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Tam N, Coetzee DR, Ahmed S, Lamberts RP, Albertus-Kajee Y, Tucker R. Acute fatigue negatively affects risk factors for injury in trained but not well-trained habitually shod runners when running barefoot. Eur J Sport Sci. 2017;17(9):1220–9. Epub 20170818. doi: 10.1080/17461391.2017.1358767 . [DOI] [PubMed] [Google Scholar]
21.Stellingwerff T. Competition Nutrition Practices of Elite Ultramarathon Runners. Int J Sport Nutr Exerc Metab. 2016;26(1):93–9. Epub 20150609. doi: 10.1123/ijsnem.2015-0030 . [DOI] [PubMed] [Google Scholar]
22.Hoffman MD. Pacing by winners of a 161-km mountain ultramarathon. Int J Sports Physiol Perform. 2014;9(6):1054–6. Epub 20140319. doi: 10.1123/ijspp.2013-0556 . [DOI] [PubMed] [Google Scholar]
23.McAnulty S, McAnulty L, Nieman D, Morrow J, Dumke C, Henson D. Effect of NSAID on muscle injury and oxidative stress. Int J Sports Med. 2007;28(11):909–15. Epub 20070531. doi: 10.1055/s-2007-964966 . [DOI] [PubMed] [Google Scholar]
24.McAnulty SR, Owens JT, McAnulty LS, Nieman DC, Morrow JD, Dumke CL, et al. Ibuprofen use during extreme exercise: effects on oxidative stress and PGE2. Med Sci Sports Exerc. 2007;39(7):1075–9. doi: 10.1249/mss.0b13e31804a8611 . [DOI] [PubMed] [Google Scholar]
25.Thuany M, Weiss K, Villiger E, Scheer V, Ouerghi N, Gomes TN, et al. A macro to micro analysis to understand performance in 100-mile ultra-marathons worldwide. Scientific Reports. 2023;13(1):1415. doi: 10.1038/s41598-023-28398-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Mujika I, Halson S, Burke LM, Balagué G, Farrow D. An Integrated, Multifactorial Approach to Periodization for Optimal Performance in Individual and Team Sports. Int J Sports Physiol Perform. 2018;13(5):538–61. doi: 10.1123/ijspp.2018-0093 . [DOI] [PubMed] [Google Scholar]
27.Lorenz D, Morrison S. Current concepts in periodization of strength and conditioning for the sports physical therapist. Int J Sports Phys Ther. 2015;10(6):734–47. . [PMC free article] [PubMed] [Google Scholar]
28.Knechtle B, Rosemann T, Nikolaidis P. The Role of Nationality in Ultra-Endurance Sports: The Paradigm of Cross-Country Skiing and Long-Distance Running. Int J Environ Res Public Health. 2020;17(7). Epub 20200408. doi: 10.3390/ijerph17072543 . [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Rüst CA, Knechtle B, Rosemann T, Lepers R. Men cross America faster than women—the "Race Across America" from 1982 to 2012. Int J Sports Physiol Perform. 2013;8(6):611–7. Epub 2013/02/26. doi: 10.1123/ijspp.8.6.611 . [DOI] [PubMed] [Google Scholar]
30.Swain P, Biggins J, Gordon D. Marathon pacing ability: Training characteristics and previous experience. Eur J Sport Sci. 2020;20(7):880–6. Epub 20191115. doi: 10.1080/17461391.2019.1688396 . [DOI] [PubMed] [Google Scholar]
31.Rozmiarek M, Malchrowicz-Mośko E, León-Guereño P, Tapia-Serrano MÁ, Kwiatkowski G. Motivational Differences between 5K Runners, Marathoners and Ultramarathoners in Poland. Sustainability. 2021;13(12):6980. doi: 10.3390/su13126980 [DOI] [Google Scholar]
32.Hausswirth C, Brisswalter J. Strategies for Improving Performance in Long Duration Events. Sports Medicine. 2008;38(11):881–91. doi: 10.2165/00007256-200838110-00001 [DOI] [PubMed] [Google Scholar]
33.Reinhard C, Galloway SDR. Carbohydrate Intake Practices and Determinants of Food Choices During Training in Recreational, Amateur, and Professional Endurance Athletes: A Survey Analysis. Frontiers in Nutrition. 2022;9. doi: 10.3389/fnut.2022.862396 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Helou NE, Tafflet M, Berthelot G, Tolaini J, Marc A, Guillaume M, et al. Impact of environmental parameters on marathon running performance. Plos One. 2012;7(5):e37407. Epub 2012/06/01. doi: 10.1371/journal.pone.0037407 . [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0303960.r001

Decision Letter 0

Stevo Popovic

12 Feb 2024

PONE-D-23-43701Using Machine Learning to Determine the Nationalities of the Fastest 100-Mile Ultra-Marathoners and Identify Top Racing EventsPLOS ONE

Dear Dr. Knechtle,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 28 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Stevo Popovic, Ph.D.

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. In the online submission form you indicate that your data is not available for proprietary reasons and have provided a contact point for accessing this data. Please note that your current contact point is a co-author on this manuscript. According to our Data Policy, the contact point must not be an author on the manuscript and must be an institutional contact, ideally not an individual. Please revise your data statement to a non-author institutional point of contact, such as a data access or ethics committee, and send this to us via return email. Please also include contact information for the third-party organization, and please include the full citation of where the data can be found.

4. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please delete it from any other section.

5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

I am so please to inform you I have collected two reviews and we are ready to go ahead with the evaluation process. It is your turn now to read the reviews and carefully revise the manuscript according to the requirements of the reviewers. I would appreciate if you prepare the comments with some arguments and adequate justifications for both of them.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Manuscript Number: PONE-D-23-43701

Manuscript Title: Using Machine Learning to Determine the Nationalities of the Fastest 100-Mile Ultra-Marathoners and Identify Top Racing Events

The manuscript is interesting and well written with several strengths.

I just have a few minor comments to make.

- Overall, the different sections of the manuscript are well written.

- Line 50: delete “(ML)”.

- Line 60: Please, change “offered” to “occured”.

- Lines 79 – 80: Missed reference.

- Lines 103 – 105: Change “In this study, therefore, we sought to determine the country of origin of the fastest runners and the location of the fastest race courses.” to “In this context, we undertook this research to determine the country of origin of the fastest runners and the location of the fastest race courses.”

- Line 142: Change “ML” to “machine learning (ML)”

- Line 165: What is “MAE”

- Line 257 - 269: The first paragraph of the discussion is too large, I would suggest creating a small paragraph.

- Line 258 – 260: Replace “Based on a recent study, we assumed that the fastest athletes would be found to originate from Sweden, Hungary, Russia, Brazil, or Lithuania” to “Based on a recent study, we assumed that the fastest athletes would come from Sweden, Hungary, Russia, Brazil, or Lithuania.”

- Line 374: delete “had”.

- Line 376: delete “Better”.

- Line 384: Change “offered” to “occurred”.

- The limitations are properly discussed.

- Figures 3, 4, 5, and 6: change “PDP” to “partial dependence plots (PDP)”.

Reviewer #2: The work represents an interesting attempt to apply the XGBoost algorithm to determine the country of origin of the fastest runners and to identify the fastest races. The input characteristics under consideration are: athletes nationality, age group, gender, as well as the event country, while the output is the estimation of speed. The mentioned application has several limitations, the most important of which is the selection of input characteristics, i.e., the ability to obtain a good prediction from them. Another limitation is the dataset, which contains data from different types of races (championships, recreational races), undoubtedly resulting in a biased estimator. Additionally, for certain countries, there is a very small number of samples, and samples are with a high standard deviation. All of this does not lead to a high-quality estimator. However, my recommendation is for the work to be published, considering that the authors are aware of all the shortcomings of the estimator they have made, they have adequately addressed them in the paper and addressed potential risks, so the work can be a good guide for future research in this area.

I suggest incorporating an analysis of the median as it often provides a more robust measure than the mean, particularly when dealing with datasets containing significant deviations in extreme values.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Nejmeddine Ouerghi

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Aug 22;19(8):e0303960. doi: 10.1371/journal.pone.0303960.r002

Author response to Decision Letter 0

31 Mar 2024

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Partly

________________________________________

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

________________________________________

3. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

4. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

5. Review Comments to the Author

Reviewer #1: Manuscript Number: PONE-D-23-43701

Manuscript Title: Using Machine Learning to Determine the Nationalities of the Fastest 100-Mile Ultra-Marathoners and Identify Top Racing Events

The manuscript is interesting and well written with several strengths.

I just have a few minor comments to make.

- Overall, the different sections of the manuscript are well written.

- Line 50: delete “(ML)”.

Answer: We agree with the expert reviewer and deleted as suggested.

- Line 60: Please, change “offered” to “occured”.

Answer: We agree with the expert reviewer and changed as suggested

- Lines 79 – 80: Missed reference.

Answer: We agree with the expert reviewer and added as suggested

Answer: We agree with the expert reviewer and changed as suggested

- Line 142: Change “ML” to “machine learning (ML)”

Answer: We agree with the expert reviewer and changed as suggested

- Line 165: What is “MAE”

Answer: We agree with the expert reviewer and changed to MAE (Mean Absolute Error)

- Line 257 - 269: The first paragraph of the discussion is too large, I would suggest creating a small paragraph.

Answer: We agree with the expert reviewer and changed as suggested to: The present study aimed to determine the country of origin of the fastest 100-mile runners and the countries hosting the fastest 100-mile race courses using an XG Boost regression model. We found that the event location (i.e. the country where the race is held) was the most important predictor for a fast 100-mile race time where the fastest race courses are offered in Romania, Israel, Switzerland, Finland, Russia, the Netherlands, France, Denmark, Czechia, and Taiwan. Regarding the first aim, the fastest athletes come mostly from Eastern European countries (i.e., Lithuania, Latvia, Ukraine, Finland, Russia, Hungary, and Slovakia).

Answer: since we had to reduce that section this sentence was deleted

- Line 374: delete “had”.

Answer: We agree with the expert reviewer and deleted as suggested.

- Line 376: delete “Better”.

Answer: We agree with the expert reviewer and deleted as suggested.

- Line 384: Change “offered” to “occurred”.

Answer: We agree with the expert reviewer and changed as suggested.

- The limitations are properly discussed.

Answer: no changes are required

- Figures 3, 4, 5, and 6: change “PDP” to “partial dependence plots (PDP)”.

Answer: We agree with the expert reviewer and changed as suggested.

Answer: We recognize the limitations of this analysis, but could only use the data available. As a predictive model, there is no question about our XGB model ability (R2=0.23 in-sample tested). So the main strength of our study resides in using the model interpretability tools (reverse engineering) such as the PDP charts and the prediction distribution charts, to understand the 23% of explained output variability. Interestingly, but not as a surprise, the XGB model mostly learnt the descriptive statistical structure of the dataset.

On the comment about the median, I would like to emphasize that there is indeed a mix of mean and median values in the charts. So I´ll explain here in detail, so there are no doubts about this.

The country ranking tables (sorted by number of race records) are created simply by aggregating race records and then calculating the different stats. So they show the basic descriptive statistics (mean, std, min and max and the counts).

The PDP charts are produced after the model has been trained and represent the relative value of the model output for a predictor. The curve highlights the mean values.

Finally, the called “model interpretability charts” show group average values (means) in the middle chart (red curve) and median values in the top chart (boxplot with blue labels). Actually they are nearly the same.

Attachment

Submitted filename: PONE-D-23-43701_Revision 1_Comments for Reviewers.docx

pone.0303960.s003.docx^{(588KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0303960.r003

Decision Letter 1

Stevo Popovic

6 May 2024

Using Machine Learning to Determine the Nationalities of the Fastest 100-Mile Ultra-Marathoners and Identify Top Racing Events

PONE-D-23-43701R1

Dear Dr. Knechtle,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Stevo Popovic, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

6. Review Comments to the Author

Reviewer #1: Manuscript Number: PONE-D-23-43701R1

Manuscript Title: Using Machine Learning to Determine the Nationalities of the Fastest 100-Mile Ultra-Marathoners and Identify Top Racing Events

The manuscript is interesting and nicely written.

The authors have been well corrected and modified the manuscript according to my comments.

I recommend to accept the manuscript for publication.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Nejmeddine Ouerghi

Reviewer #2: No

**********

PLoS One. doi: 10.1371/journal.pone.0303960.r004

Acceptance letter

Stevo Popovic

20 May 2024

PONE-D-23-43701R1

PLOS ONE

Dear Dr. Knechtle,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Stevo Popovic

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Data

(XLSX)

pone.0303960.s001.xlsx^{(3.7MB, xlsx)}

S2 Data

(XLSX)

pone.0303960.s002.xlsx^{(4.5MB, xlsx)}

Attachment

Submitted filename: PONE-D-23-43701_Revision 1_Comments for Reviewers.docx

pone.0303960.s003.docx^{(588KB, docx)}

Data Availability Statement

[pone.0303960.ref001] 1.Scheer V, Basset P, Giovanelli N, Vernillo G, Millet GP, Costa RJS. Defining Off-road Running: A Position Statement from the Ultra Sports Science Foundation. Int J Sports Med. 2020;41(5):275–84. doi: 10.1055/a-1096-0980 [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref002] 2.RunRepeat. The State of Ultra Running 2020 2021 [cited 2021 24 September 2021]. https://runrepeat.com/state-of-ultra-running.

[pone.0303960.ref003] 3.Scheer V, Valero D, Villiger E, Rosemann T, Knechtle B. The impact of the COVID-19 pandemic on endurance and ultra-endurance running. Medicina. 2021;57(1). doi: 10.3390/medicina57010052 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303960.ref004] 4.Hoffman M, Ong J, Wang G. Historical analysis of participation in 161 km ultramarathons in North America. Int J Hist Sport. 2010;27(11):1877–91. doi: 10.1080/09523367.2010.494385 [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref005] 5.Paech C, Schrieber S, Daehnert I, Schmidt-Hellinger PJ, Wolfarth B, Wuestenfeld J, et al. Influence of a 100-mile ultramarathon on heart rate and heart rate variability. BMJ Open Sport Exerc Med. 2021;7(2):e001005. Epub 20210513. doi: 10.1136/bmjsem-2020-001005 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303960.ref006] 6.McIntosh HD. 100-Mile Western States Endurance Run: a physiologic stress laboratory. J Am Coll Cardiol. 1987;9(1):248. doi: 10.1016/s0735-1097(87)80113-4 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref007] 7.Brace AW, George K, Lovell GP. Mental toughness and self-efficacy of elite ultra-marathon runners. PLoS One. 2020;15(11):e0241284. Epub 20201104. doi: 10.1371/journal.pone.0241284 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303960.ref008] 8.Hoffman MD, Fogard K, Winger J, Hew-Butler T, Stuempfle KJ. Characteristics of 161-km ultramarathon finishers developing exercise-associated hyponatremia. Res Sports Med. 2013;21(2):164–75. doi: 10.1080/15438627.2012.757230 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref009] 9.Hoffman MD, Wegelin JA. The western states 100-mile endurance run: Participation and performance trends. Medicine and Science in Sports and Exercise. 2009;41(12):2191–8. doi: 10.1249/MSS.0b013e3181a8d553 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303960.ref010] 10.Winger JM, Hoffman MD, Hew-Butler TD, Stuempfle KJ, Dugas JP, Fogard K, et al. The effect of physiology and hydration beliefs on race behavior and postrace sodium in 161-km ultramarathon finishers. Int J Sports Physiol Perform. 2013;8(5):536–41. Epub 20130214. doi: 10.1123/ijspp.8.5.536 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref011] 11.Cairns RS, Hew-Butler T. Incidence of Exercise-Associated Hyponatremia and Its Association With Nonosmotic Stimuli of Arginine Vasopressin in the GNW100s Ultra-endurance Marathon. Clin J Sport Med. 2015;25(4):347–54. doi: 10.1097/JSM.0000000000000144 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref012] 12.Hoffman MD, Fogard K. Factors related to successful completion of a 161-km ultramarathon. Int J Sports Physiol Perform. 2011;6(1):25–37. doi: 10.1123/ijspp.6.1.25 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref013] 13.Hoffman MD, Stuempfle KJ, Valentino T. Sodium Intake During an Ultramarathon Does Not Prevent Muscle Cramping, Dehydration, Hyponatremia, or Nausea. Sports Med Open. 2015;1(1):39. Epub 20151222. doi: 10.1186/s40798-015-0040-x . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303960.ref014] 14.Lebus DK, Casazza GA, Hoffman MD, Van Loan MD. Can changes in body mass and total body water accurately predict hyponatremia after a 161-km running race? Clin J Sport Med. 2010;20(3):193–9. doi: 10.1097/JSM.0b013e3181da53ea . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref015] 15.George KP, Warburton DE, Oxborough D, Scott JM, Esch BT, Williams K, et al. Upper limits of physiological cardiac adaptation in ultramarathon runners. J Am Coll Cardiol. 2011;57(6):754–5. doi: 10.1016/j.jacc.2010.05.070 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref016] 16.Nagueh SF, Smiseth OA, Appleton CP, Byrd BF 3rd, Dokainish H, Edvardsen T, et al. Recommendations for the Evaluation of Left Ventricular Diastolic Function by Echocardiography: An Update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging. J Am Soc Echocardiogr. 2016;29(4):277–314. doi: 10.1016/j.echo.2016.01.011 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref017] 17.Lord R, Somauroo J, Stembridge M, Jain N, Hoffman MD, George K, et al. The right ventricle following ultra-endurance exercise: insights from novel echocardiography and 12-lead electrocardiography. Eur J Appl Physiol. 2015;115(1):71–80. Epub 20140910. doi: 10.1007/s00421-014-2995-6 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref018] 18.Rüst CA, Rosemann T, Zingg MA, Knechtle B. Age group performances in 100 km and 100 miles ultra-marathons. SpringerPlus. 2014;3:331. Epub 20140701. doi: 10.1186/2193-1801-3-331 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303960.ref019] 19.Waldvogel KJ, Nikolaidis PT, Di Gangi S, Rosemann T, Knechtle B. Women Reduce the Performance Difference to Men with Increasing Age in Ultra-Marathon Running. Int J Environ Res Public Health. 2019;16(13). Epub 20190704. doi: 10.3390/ijerph16132377 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303960.ref020] 20.Tam N, Coetzee DR, Ahmed S, Lamberts RP, Albertus-Kajee Y, Tucker R. Acute fatigue negatively affects risk factors for injury in trained but not well-trained habitually shod runners when running barefoot. Eur J Sport Sci. 2017;17(9):1220–9. Epub 20170818. doi: 10.1080/17461391.2017.1358767 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref021] 21.Stellingwerff T. Competition Nutrition Practices of Elite Ultramarathon Runners. Int J Sport Nutr Exerc Metab. 2016;26(1):93–9. Epub 20150609. doi: 10.1123/ijsnem.2015-0030 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref022] 22.Hoffman MD. Pacing by winners of a 161-km mountain ultramarathon. Int J Sports Physiol Perform. 2014;9(6):1054–6. Epub 20140319. doi: 10.1123/ijspp.2013-0556 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref023] 23.McAnulty S, McAnulty L, Nieman D, Morrow J, Dumke C, Henson D. Effect of NSAID on muscle injury and oxidative stress. Int J Sports Med. 2007;28(11):909–15. Epub 20070531. doi: 10.1055/s-2007-964966 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref024] 24.McAnulty SR, Owens JT, McAnulty LS, Nieman DC, Morrow JD, Dumke CL, et al. Ibuprofen use during extreme exercise: effects on oxidative stress and PGE2. Med Sci Sports Exerc. 2007;39(7):1075–9. doi: 10.1249/mss.0b13e31804a8611 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref025] 25.Thuany M, Weiss K, Villiger E, Scheer V, Ouerghi N, Gomes TN, et al. A macro to micro analysis to understand performance in 100-mile ultra-marathons worldwide. Scientific Reports. 2023;13(1):1415. doi: 10.1038/s41598-023-28398-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303960.ref026] 26.Mujika I, Halson S, Burke LM, Balagué G, Farrow D. An Integrated, Multifactorial Approach to Periodization for Optimal Performance in Individual and Team Sports. Int J Sports Physiol Perform. 2018;13(5):538–61. doi: 10.1123/ijspp.2018-0093 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref027] 27.Lorenz D, Morrison S. Current concepts in periodization of strength and conditioning for the sports physical therapist. Int J Sports Phys Ther. 2015;10(6):734–47. . [PMC free article] [PubMed] [Google Scholar]

[pone.0303960.ref028] 28.Knechtle B, Rosemann T, Nikolaidis P. The Role of Nationality in Ultra-Endurance Sports: The Paradigm of Cross-Country Skiing and Long-Distance Running. Int J Environ Res Public Health. 2020;17(7). Epub 20200408. doi: 10.3390/ijerph17072543 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303960.ref029] 29.Rüst CA, Knechtle B, Rosemann T, Lepers R. Men cross America faster than women—the "Race Across America" from 1982 to 2012. Int J Sports Physiol Perform. 2013;8(6):611–7. Epub 2013/02/26. doi: 10.1123/ijspp.8.6.611 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref030] 30.Swain P, Biggins J, Gordon D. Marathon pacing ability: Training characteristics and previous experience. Eur J Sport Sci. 2020;20(7):880–6. Epub 20191115. doi: 10.1080/17461391.2019.1688396 . [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref031] 31.Rozmiarek M, Malchrowicz-Mośko E, León-Guereño P, Tapia-Serrano MÁ, Kwiatkowski G. Motivational Differences between 5K Runners, Marathoners and Ultramarathoners in Poland. Sustainability. 2021;13(12):6980. doi: 10.3390/su13126980 [DOI] [Google Scholar]

[pone.0303960.ref032] 32.Hausswirth C, Brisswalter J. Strategies for Improving Performance in Long Duration Events. Sports Medicine. 2008;38(11):881–91. doi: 10.2165/00007256-200838110-00001 [DOI] [PubMed] [Google Scholar]

[pone.0303960.ref033] 33.Reinhard C, Galloway SDR. Carbohydrate Intake Practices and Determinants of Food Choices During Training in Recreational, Amateur, and Professional Endurance Athletes: A Survey Analysis. Frontiers in Nutrition. 2022;9. doi: 10.3389/fnut.2022.862396 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0303960.ref034] 34.Helou NE, Tafflet M, Berthelot G, Tolaini J, Marc A, Guillaume M, et al. Impact of environmental parameters on marathon running performance. Plos One. 2012;7(5):e37407. Epub 2012/06/01. doi: 10.1371/journal.pone.0037407 . [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Using machine learning to determine the nationalities of the fastest 100-mile ultra-marathoners and identify top racing events

Beat Knechtle

Katja Weiss

David Valero

Elias Villiger

Pantelis T Nikolaidis

Marilia Santos Andrade

Volker Scheer

Ivan Cuk

Robert Gajda

Mabliny Thuany

Roles

Abstract

Introduction

Methods

Ethical approval

Data set and data preparation

Statistical analysis

XG Boost regression model

Fig 1. XG Boost model.

Model training and evaluation strategy

Model interpretation

Results

Table 1. Athlete country ranking table.

Event country ranking

Table 2. List of event countries sorted by mean running speed.

Model features relative importances

Fig 2. Optimal model features relative importance.

Partial dependence plots (PDP)

Fig 3. Partial Dependence Plots (PDP) for gender (ID = female, ID 1 = male).

Fig 4. Partial Dependence Plots (PDP) for age group.

Fig 5. Partial Dependence Plots (PDP) for the athlete´s country of origin.

Fig 6. Partial Dependence Plots (PDP) for country of country where the race was held.

Prediction distributions and target plots

Fig 7. Prediction distributions and target plots for gender.

Fig 8. Prediction distributions and target plots for age group.

Fig 9. Prediction distributions and target plots value plots for origin of the athlete.

Fig 10. Prediction distributions and target plots value plots for the country where the events were held.

Discussion

The fastest race courses

The fastest runners

The age of peak performance

Limitations

Conclusion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Stevo Popovic

Roles

Author response to Decision Letter 0

Decision Letter 1

Stevo Popovic

Roles

Acceptance letter

Stevo Popovic

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases