An analysis of pacing profiles in sprint kayak racing using functional principal components and hidden Markov models

Harry Estreich; Nicola Bullock; Mark Osborne; Edgar Santos-Fernandez; Paul Pao-Yen Wu

doi:10.1371/journal.pone.0326375

. 2025 Jul 2;20(7):e0326375. doi: 10.1371/journal.pone.0326375

An analysis of pacing profiles in sprint kayak racing using functional principal components and hidden Markov models

Harry Estreich ^1,^2,^*, Nicola Bullock ^3,^4,^5,^#, Mark Osborne ^3,^#, Edgar Santos-Fernandez ^1,^2,^#, Paul Pao-Yen Wu ^1,^2,^#

Editor: Matteo Vandoni,⁶

PMCID: PMC12221036 PMID: 40601604

Abstract

This study analysed sprint kayak pacing profiles in order to categorise and compare an athlete’s race profile throughout their career. We used functional principal component analysis of normalised velocity data for 500m and 1000m races to quantify pacing. The first four principal components explained 90.77% of the variation over 500m and 78.80% over 1000m. These principal components were then associated with unique pacing characteristics with the first component defined as a dropoff in velocity and the second component defined as a kick. All other defined characteristics were a variation of these two, i.e., late kick. We then applied a Hidden Markov model to categorise each profile over an athlete’s career, using the PC scores, into different types of race profiles. This model included age and event type and identified a trend for a higher dropoff in development pathway athletes. Using the four different race profile types, four athletes had all their race profiles throughout their careers analysed. It was identified that an athlete’s pacing profile changes throughout their career as an athlete matures. This information provides coaches, practitioners and athletes with expectations as to how pacing profiles change across the course of an athlete’s career.

Introduction

Predictive models of individual athlete pacing in competitive kayak races and their change over a career can be used by coaches and sports scientists to better understand athlete progression and optimise strategies for peak performance. The pacing profile refers to the actual differences or changes in speed throughout a race whereas a pacing strategy is the planned changes in speed over the race. Three main pacing strategies have emerged in athletic competition based off analysing split times: (i) negative pacing, where speed increases over the course of the event, (ii) positive pacing, where speed decreases, and (iii) even pacing, where speed remains similar over an event [1].

However, when higher resolution timing data is used with more frequent splits, additional pacing profiles were observed [1]. These include an all-out pacing where athletes quickly accelerate to their maximum speed before a continuous decrease in speed throughout the rest of the event. Parabolic-shaped race profiles have three forms: (i) U-Shaped profiles have the slowest section in the middle of the event, (ii) J-Shaped profiles have the slowest section at the start before a gradual increase in pace, and (iii) reverse J-Shaped profiles have a gradual decrease in speed before an increase in speed before the finish. A similar type of profile is a Seahorse-shaped pacing strategy where athletes increase pace near the finish before dropping off before the end [2]. Lastly, a variable pacing profile is when an athlete has a distinct change in pace at certain points often due to external factors such as weather events or changes.

Typically, the sprint kayak pacing profile and strategy tends to vary by distance (200m, 500m, 1000m) and event (K1, K2, K4). Over short distance events (e.g., 200m), all out or positive pacing profiles were evident in para-canoe [3] and able-bodied kayaking [2]. Positive profiles were evident in 500m; however, this differed over the longer 1000m event. For the 1000m events, two different pacing strategies have been identified; a seahorse-shaped profile [2], and a reverse j-shaped profile [4]. The reverse j-shaped profile is also a dominant profile in rowing, who compete over 2000m [5]. Both of these papers describing pacing profiles in sprint kayak used split compared split times and other method to compare them was to use ANOVAs to determine whether there is a statistically significant average speed difference between splits [2]. Another method for identifying pacing profiles, used in swimming [6], was to calculate the slope of a linear regression line for certain sections of a race and then defining each profile as positive, even or negative pacing using a set criteria. Additionally, these studies normalised their data sets due to competition occurring in an outdoor environment, with varying water temperature, water salinity, and weather conditions. Normalising the data is required in order to compare across different races with varying environmental conditions and water conditions. However, none of the reviewed approaches considered the longitudinal evolution of pacing profiles over individual athlete careers. In addition, the use of the different and infrequent time splits imposes an artificial interpretation of race profile velocities, which can lead to differing interpretations of the same race.

Modelling of pacing profiles and their evolution over an athlete’s career can not only support targeted interventions for coaching, but also help identify groups of similar athletes for mutual learning. Additionally, a longitudinal model of athlete’s pacing profile throughout their career enables the incorporation of additional explanatory factors including age (U18, U21, U23, Open) and event significance (Domestic competitions, World Cups, World Championships/Olympics) and uncertainty in their effects on the profiles. The effect of these explanatory factors has been explored in previous sports-based literature. A study identified that older athletes had a more even pacing profile compared to younger athletes in marathon running [7] and another study observed a difference in pacing pattern depending on race type [8]. Therefore analysing both the age and event significance of each race profile will be a key focus of this paper.

Statistical models such as principal component analysis (PCA) have been identified as an alternative technique for analysing pacing profiles. PCA is a dimension reduction technique that can be utilised to quantify the variation in split times for each race profile [9]. One approach to classify race profiles that is more robust to difference in split time resolution is to use a curve to approximate the race profile (speed over time). Known as functional Principal Components Analysis (fPCA), this approach has been used to identify the statistically significant variation between the top 3 and bottom 3 athletes in 1000m finals at international sprint kayak competitions based on normalised velocity data [10]. fPCA fits curves to discrete split times and transforms these curves onto principal component (PCs), which are also curves. The algorithm fits PCs with the goal of explaining the majority of the variation in the data using the first few PCs, typically sorted in decreasing order of variance explained. As a result, each PC can potentially map to key pacing profile characteristics, derived organically from the data (real-world race performances). As an extension to fPCA, several studies used cluster analysis to categorise different progression curves in swimming [11] and to categorise different positions in rugby league [12]. These studies used a variety of two-step and model-based clustering techniques to categorise their data sets. It was identified that the inclusion of key variables, including event type and race type, within the clustering process was necessary in order to identify the effects of certain variables. Additionally, a key consideration with the model is the ability to model an athlete’s trends throughout their career.

Hidden Markov modelling (HMM) has widespread use in sports data analysis and has been identified as an alternative to clustering to explicitly captures changes in cluster, referred to as a state in this framework, over time. HMMs have been used to model different levels of control in football [13] and infer stroke phase in swimming [14]. Additionally, HMMs have also been used to model longitudinal data and to infer an unobserved state that is changing over time. For example, in basketball [15] it was used to infer athlete streakiness (streaky or non-streaky performance) and in baseball [16] it was used to predict future home run totals. Noticeably, the paper used a vector of covariates (home ballpark and position) within the HMM framework to capture different impacts of these covariates with state. This concept is applicable within the HMM of this paper as several variables (e.g., age, race type) influence the type of pacing profile. HMMs have also been used to quantify different types of physical activity using accelerometer data [17]. This concept of classifying data points into different categories is directly applicable in this paper as this paper’s goal is to identify and classify different types of race profiles. Therefore, HMMs will be explored as a potential method for modelling the evolution of an athlete’s pacing profile.

The aim of this study was to derive categories of pacing profiles and track their change over time for individual athletes. There are two parts to this: (i) using fPCA to derive key patterns in sprint kayak pacing profiles using the data, and (ii) HMM modelling clusters (or states) of pacing profiles and their change over time for individual athletes, given covariates of age and event, capturing the uncertainty inherent in race performances. The latter will allow for the analysis and prediction of transitions in pacing profile for individual athletes.

Methods

Data

Race performance data was collected across a large set of athletes in domestic (National domestic regattas, National Championships and Selection Regattas) and international competitions (World Cups, World Championships and Olympic Games, Junior World Championships, U23 World Championships) between 2010 and 2023. The data set included racing across two different races, Women’s K1 500m and Men’s K1 1000m. Within each data set, some athletes were included even though they only had a few races within the data set. These athletes were kept in the data set in order to include more expansive and diverse race types however only athletes who have a significant number of races only a long period of time were analysed individually. Additionally, although the initial data set included different event phases including heats, semi-finals and finals, heat races from domestic competitions were removed from the data set as it was determined that they were often not a true reflection of an athlete’s pacing profile due to a lack of competitive depth in some of these events. Outlier removal was conducted based on input from domain experts, who leveraged their deep understanding of the data-generating process to identify observations that were inconsistent with the expected patterns or operational constraints of the system. This expert-driven approach ensured that the outliers identified were truly anomalous and not representative of the underlying phenomena being studied

In the Women’s K1 500m data set, there were 70 athletes across a variety of age groups (U18, U21, U23, Open) and event phases (heat, semi-finals, final). Each race is classified as either a domestic event, non-Championship international race (World Cup, Junior and U23 World Championships) or a Championship international race (World Championships, Olympics). Of the 70 athletes, 13 athletes had at least 10 races and 9 athletes had more than 30 races with two athletes, Athlete A (90 races) and Athlete B (75 races) having the most. The Men’s K1 1000m data set had 67 athletes with 18 athletes having at least 10 races. Two athletes, Athlete C (59 races) and Athlete D (32 races) where identified as having a large number of races across many levels of competition and multiple age groups.

From each race, pacing profiles were created from athlete velocity data which included interval times for each 50m segment. The initial unnormalized data for the Women’s K1 500m had an average velocity of 4.29m/s (SD 0.23m/s) and an average maximum velocity of 4.76m/s (SD 0.29m/s) whilst the Men’s K1 1000m had an average of velocity of 4.8m/s (0.24 m/s) and an average maximum velocity of 5.49m/s (SD 0.3m/s). This data set was then normalised using the average velocity for the entire race. Therefore, each 500m race profile had 10 data points describing an athlete’s velocity and each 1000m race had 20 data points. As each race was conducted under different environmental conditions it is necessary to normalise the velocity data by average boat speed for each race profile. By normalising the data, the athlete’s overall boat speed throughout the race is no longer comparable and it is the athlete’s strategy or pacing performance that is analysed. In Fig 1, the mean pacing profile for the Women’s K1 500m shows that athletes tend to increase their velocity until reaching their peak segment velocity at approximately 80m before continuously decreasing through to the end. In Fig 2, the mean pacing profile for Men’s K1 1000m shows that athletes tend to increase speed and reach their peak velocity at approximately 100m into the race before decreasing in speed until the 500m mark. For the second half of the race, athletes tend to maintain most of their speed except for a small kick in the last 250m of the race. This data was collected using a 10 Hz GPS device (Catapult Sport, Melbourne, Australia) with each 50m split identified and analysed for the change in time which was used to calculate all velocity for each segment.

Fig 1 — The mean normalised segment velocity for each split time is calculated from all athletes within the data set.

Statistical methods

The statistical methods part is split into two sections. The first section involves fPCA and quantifies the variation in each data set using principal components (PCs) and the resulting principal component scores can be used to describe each race profile in terms of each principal component. The computation in this section was done using R [18], specifically the fda package [19]. The second section uses the PC scores calculated in the first section to globally-cluster different types of race profiles using a hidden Markov model (HMM), using the depmixS4 package [21]. Both sections were completed independently for both data sets.

Functional PCA

In our framework, the discrete data set of split times must first be transformed into functional data [20]. To do this each of the pacing profiles are represented using a b-spline basis and weighted eigenfunction, calculated from the discrete segment split times. Firstly, let $x$ be the distance along the race in metres and $f (x)$ is the smoothed version of the normalised race velocity calculated using the b-splines basis. The set of principal component scores for the first PC, $β_{1}$ , is computed from using:

β_{1} = \int_{x_{1}}^{x_{p}} Φ_{1} (x) f (x) d x

(1)

where $Φ_{1} (x)$ is the first principal component, $f (x) \in [f_{1} (x), \dots, f_{n} (x)])$ , is the set of smoothed pacing profiles. $x_{1}$ is 0, representing the start of the race in metres, and $x_{p}$ is 500 or 1000, the end of race in metres (this differs for the two different data sets). Each principal component is calculated by maximising the variance of the set of first principal component scores:

‖ {Φ^{2}}_{1} (x) ‖ = \int_{x_{1}}^{x_{p}} {Φ^{2}}_{1} (x) d x

(2)

Each successive principal component is then calculated using the same methods however each principal component is orthogonal to the previous one. The percentage of variation described by each principal component can be calculated and generally a significant amount of variation can be described by the first few principal components.

Each of these principal components, also known as eigenfunctions, map to the data set and therefore each principal component can be analysed to determine what characteristics of a pacing profile are evident within each eigenfunction. Therefore, from each pacing profile the PC scores can be used to indicate how each principal component describes each profile with each fPC (functional Principal Component) score calculated from a linear combination of the eigenfunction and the mean-centred pacing profile. Therefore, a more positive fPC score will often indicate that the mean-centred pacing profile is more similar to the eigenfunction.

The application of fPCA to the two data sets produced four principal components in each data set that describe a significant amount of variation. In the Women’s K1 500m data set, PC1 and PC2 explaining a majority of the variation with 75.58% accounted for with the first two components, with the addition of PC3 and PC4, 90.77% of the variation was explained using the first four principal components. In the Men’s K1 1000m data set, 62.88% of the variation was explained with PC1 and PC2 and 78.8% explained by the first four principal components. In this case, the first four PCs can explain the vast majority of the variation in the data set. Additionally, utilising four instead of two PCs improved the practical interpretation of the model by capturing more distinct pacing profiles. Therefore, by using fPCA each pacing profile can be reduced to a few fPC scores that still account for the majority of the variation in the data set. These fPC scores can be analysed to categorise different types of profiles using a HMM which can be used to analyse pacing profiles trajectories over a career.

Hidden Markov model

The hidden Markov model is built using a set of known observations, the PC scores and explanatory variables, gender and event type, and a set of unknown hidden states, which represent clusters of pacing profiles with similar characteristics (see Discussion for characteristics derived from principal components). Each state describes a certain type of pacing profile how an athlete’s career trends between these different states can be used to identify the changes in pacing profile throughout an athlete’s career.

As there are multiple observed variables a multi-variate hidden Markov model is necessary and the model is also set up such that each athlete is its own independent continuous time-based stochastic process. It was also determined that a four-state model was the most appropriate, as the AIC (Akaike Information Criteria) value, an estimator of model fit and prediction error, did not improve by increasing the number of states beyond 4 (see Appendix 1 in S1 File). Additionally, as the starting values can change the results of the hidden Markov model, the best performing model was found by 200 model repeats to find the model with the lowest log-likelihood. Therefore only the first four principal components were used to model the response (observations). There are two main sets of parameters that are fitted using the data sets [21]. Firstly, there is a constant transition matrix, which is an 4x4 matrix (given it is a four-state model), which contains the transition probabilities between each state as well as an initial state probability vector, these are both independent of athletes. This can be modelled using the joint likelihood of observations $O_{1 : T}$ , which is a multivariate normal distribution with a 4x1 mean vector and 4x4 covariate matrix, and latent states $S_{1 : T}$ , given model parameters $θ$ and covariates $z_{1 : T} = (z_{1}, \dots, z_{t})$ [22]

P (O_{1 : T}, S_{1 : T} | θ, z_{1 : T}) = π_{1} b_{S_{t}} (O_{1} | z_{1}) \prod_{t = 1}^{T - 1} a_{i j} b_{S_{t}} (O_{t + 1} | z_{t + 1})

(3)

Where $S_{t}$ is an element of S = {1 … n}, a set of n latent states. $π_{1} = P (S_{1})$ is a vector of the initial probabilities. $a_{i j} = P (S_{t + 1} = j | S_{t} = i)$ provides the probability of a transition from state $i$ to state $j$ . $b_{S_{t}}$ is a vector of observation densities ${b^{k}}_{S_{t}} = P ({O^{k}}_{t} | S_{t} = j, t)$ that provides the conditional densities of observations ${O^{k}}_{t}$ associated with latent state $j$ and covariate $z_{t}$ . In this hidden Markov model these observation densities were modelled using Gaussian distributions. This was verified by checking the distributions of each principal component using fitdistrplus package [22] (see Appendix 3 in S1 Fig). Each Gaussian distribution also has a state-specific variance (see Appendix 2 in S1 File). Residual analysis also confirmed that the model is suitable for the data (see Appendix 5 in S1 Fig). Each of these parameters have been estimated using the depmixs4 package [23], this package uses an estimation-maximisation algorithm which iteratively maximising the expected joint log-likelihood of the parameters. Additionally, the inferred state for each data point is obtained through global decoding.

Additionally, for each state, S1 to S4, there is a linear model for the emission probabilities relating the unobserved latent state, i.e., pacing profile pattern cluster, to the observed PC score variables, i.e., pacing profile pattern. Each linear model takes into account covariates of event type and age group. Each of these approaches are modelled using a multivariate regression equation (Equation 4) where the mean response corresponding to PC1–4 are $\begin{matrix} {μ_{1}, μ_{2}, μ_{3}, μ_{4}} \end{matrix}$ , $β_{i j k}$ is the coefficient for each variable where $i$ = principal component, $j$ = state and $k$ = variable. Additionally, the covariates values (either 0 or 1) are defined as x₁ = Age Group U21, x₂ = Age Group U23, x₃ = World Cup/ Juniors Event, x₄= World Championships/ Olympics. Equation 4 is an example equation for the Women’s K1 500m data set with the Men’s K1 1000m HMM having an additional variable for Age Group U18.

\begin{matrix} μ_{1} = P (S_{1}) * & (β_{110} + β_{111} * x_{1} + β_{112} * x_{2} + β_{113} * x_{3} + β_{114} * x_{4}) + \dots + P (S_{4}) * \\ (β_{140} + β_{141} * x_{1} + β_{142} * x_{2} + β_{143} * x_{3} + β_{144} * x_{4}) + \dots \end{matrix}

(4)

Results

fPCA

fPCA Eigenfunctions.

The fPCA model was fit to both data sets with the principal component eigenfunctions, PC1 though PC4, are shown in Figs 3 and 4. As principal component scores are a linear combination of each eigenfunction and each pacing profile relative to the mean pacing profile, a pacing profile whose relative segment velocities have the same sign as the eigenfunction shown will have more positive PC scores. Additionally, higher absolute eigenfunction values will have a larger effect on PC scores and therefore most important when identifying how a particular principal component correlates to in the data set.

In Fig 3, each principal component in the Women’s K1 500m can be analysed to determine what each PC score indicates. The PC1 eigenfunction indicates that a positive PC1 score corresponds to an above average peak normalised segment velocity in the first half of the race, peaking at approximately 125m, and a large contrast between the peak and minimum segment velocity. Therefore the particular pacing characteristic that PC1 corresponds with is defined as dropoff. A positive PC2 score indicates a velocity profile where an athlete velocity dips between 100m and 250m and begins increasing to above average velocity from the 300m mark, this is defined as kick. A positive PC3 score corresponds to a sharp contrast between the peak at approximately 80m and a low trough at approximately 120m before sinusoidal trends for the rest of the race, this is defined as early dropoff. A positive PC4 corresponds to a significant drop in speed between the 200m and 300m mark before significantly increasing for the last 200m mark, this is defined as a late kick.

In Fig 4, the PC1 eigenfunction is similar to the PC1 eigenfunction in Fig 3 as a positive PC1 score indicates above average normalised segment velocity in the fast half of the race before consistently decreasing through the rest of the race. PC2 is again similar, however, the minimum point in the eigenfunction is later in the race at approximately the 600m mark. PC3 indicates a consistent increase in speed between the 250m and 750m mark before a dropoff in speed in the last 250m. This is considerable different to the Women’s K1 500m data set and therefore a new definition, late dropoff, it used for this data set. PC4 has a significant dropoff at the 500m and a kick to end the race. PC1, PC2, and PC4 have similar characteristics across both data sets are therefore the definitions are the same across both data sets.

Hidden Markov model

The PC scores were fit to a HMM and a summary of the results are shown in Tables 1–4. Table 1 contains the intercept coefficients for the emission equation for each state and principal coefficient combination. These values represent the centroid for each state when plotted in 4-dimensional space when using baseline variables. These baseline variables are domestic event type and Open age group. These intercept values can be used to determine the mean race profile characteristics for each state as described by the PC scores. Therefore, higher PC scores indicate a state is more likely to exhibit the particular characteristics of the principal component.

Table 1. Intercept coefficients for each distribution describing the mean PC scores of each state (row) and principal component (column) for Women’s K1 500m and Men’s K1 1000m. These values indicate that centroid for each state when using the baseline variables.

Women’s K1 500m					Men’s K1 1000m
	PC1	PC2	PC3	PC4	PC1	PC2	PC3	PC4
State 1	0.173	0.230	−0.003	−0.028	−0.871	0.182	−0.126	−0.137
State 2	−0.418	0.552	−0.136	0.168	0.979	0.287	0.030	−0.068
State 3	−0.373	−0.105	0.070	0.050	−0.107	0.654	0.080	0.185
State 4	0.063	−0.045	−0.045	−0.039	0.118	0.007	−0.023	0.030

Open in a new tab

Table 2. Effect of event type on emission probability coefficients obtained from the HMM emission equation (equation 4). Rows are state and columns are each combination of principal component (PC1 through PC4) and event type (Domestic (baseline value), World Cup/ Juniors, World Championships/ Olympics). Each event type coefficient is relative the baseline Domestic event and a more positive coefficient indicates a higher probability for the given state.

	PC1		PC2		PC3		PC4
	World Cup/ Juniors	World Champs	World Cup/ Juniors	World Champs	World Cup/ Juniors	World Champs	World Cup/ Juniors	World Champs
Women’s K1 500m
State 1	−0.570	0.045	−0.273	−0.176	0.190	0.262	−0.016	0.073
State 2	0.051	0.246	−0.045	0.154	−0.108	−0.154	−0.009	−0.012
State 3	0.363	0.618	−0.241	−0.384	−0.092	0.037	−0.012	−0.277
State 4	0.078	0.060	−0.051	−0.047	0.018	0.048	0.014	−0.010
Men’s K1 1000m
State 1	0.061	0.023	−0.088	−0.119	0.088	0.120	0.084	0.062
State 2	0.585	1.355	0.285	−0.338	0.104	0.647	−0.664	−0.422
State 3	0.134	0.310	−0.493	0.404	−0.193	−0.547	−0.225	0.174
State 4	0.060	0.424	−0.082	0.065	−0.042	−0.062	−0.0.96	−0.047

Open in a new tab

Table 3. Effect of age group on emission probability coefficients obtained from the HMM emission equation (equation 4). Rows are state and columns are each combination of principal component (PC1 through PC4) and age group (Open (baseline coefficient), U18, U21, U23). Each event type coefficient is relative the baseline Open age group and a more positive coefficient indicates a higher probability for the given state.

	PC1			PC2			PC3			PC4
Women’s K1 500m
	U21	U23		U21	U23		U21	U23		U21	U23
State 1	−0.016	−0.024		−0.098	−0.040		0.028	−0.055		0.100	0.040
State 2	0.287	−0.384		0.339	0.103		−0.041	−0.277		−0.160	−0.148
State 3	0.206	0.288		−0.645	0.200		0.243	0.376		−0.176	−0.280
State 4	−0.383	−0.136		0.090	0.103		−0.023	−0.010		0.041	−0.021
Men’s K1 1000m
	U18	U21	U23	U18	U21	U23	U18	U21	U23	U18	U21	U23
State 1	1.008	0.895	1.216	0.213	−0.029	−0.061	−0.037	0.008	0.059	0.124	0.022	−0.058
State 2	−1.710	−1.656	−1.529	−0.291	−0.410	−0.238	−0.022	0.238	0.234	0.326	0.312	−0.049
State 3	0.285	−0.285	−0.090	−0.231	−0.788	−0.414	−0.258	−0.442	−0.283	−0.217	−0.256	0.099
State 4	0.538	0.119	0.026	−0.728	0.118	0.049	0.241	0.148	−0.052	0.217	0.049	−0.168

Open in a new tab

Table 4. The transition matrices for HMMs.

Women’s K1 500m					Men’s K1 1000m
	To State 1	To State 2	To State 3	To State 4	To State 1	To State 2	To State 3	To State 4
From State 1	0.976	0	0.024	0	0.805	0.081	0	0.114
From State 2	0	0.917	0.027	0.056	0.107	0.815	0.078	0
From State 3	0	0.022	0.853	0.125	0.064	0.099	0.838	0
From State 4	0.026	0.052	0.019	0.903	0.167	0.234	0.124	0.475

Open in a new tab

The connection between each state and the principal component scores are shown in Fig 5. This diagram also utilises the definitions for each principal component that are defined later in the Discussion to allow for the easy connection between HMM state, principal component and pacing characteristic. For example, in the Women’s K1 500m dataset, State 1 indicates a positive PC1 and PC2 mean value. Additionally, the thicker the line the larger the absolute value for each PC score is in Table 1.

The model coefficients in Table 2 indicate the effect that each event type variable has on the mean PC scores when in a given state, relative to the baseline category, Domestic. Similarly, the model coefficients in Table 3 indicate the effect that each age group has on the mean PC scores when in a given state, relative to the baseline category, Open. As per Equation 4, the predicted score is a weighted mixture of the state probabilities and coefficient values. Therefore, if the race profile was from a World Championships event in the U23 age group in the Women’s K1 500m, using Equation 4, the predicted mean for PC1 would be $μ_{1} = P (S_{1}) * (0.173 + 0.024 * 1 + 0.045 * 1) + \dots + P (S_{4}) * (0.063 + 0.136 * 1 + 0.06 * 1) + \dots$ with all coefficients retrieved from Tables 1 and 2.

The transition matrices in Table 4 show that transitions between states is possible between all states, however, maintaining the same state between time points is likely for all states. Noticeably, in the Women’s K1 500m HMM there is 97.6% probability of remaining in State 1 whereas there is just a 47.5% probability of remaining in State 4 for the Men’s K1 1000m HMM.

The HMM output provides a probability that each pacing profile is in a given state. By identifying the most likely state for each pacing profile, as shown in Figs 6 and 7, a career-long trend can be analysed to evaluate how an athlete’s pacing profile changes throughout their career. The case study analysis looks into the career-long trends of these four athletes.

Discussion

The initial findings indicated that for the Women’s K1 500m, the average race profile reflected an all-out race strategy where athletes get to their maximum speed as fast as possible and then gradually slowing down as fatigue sets in. In the Men’s K1 1000m, the seahorse profile, previously defined was identified in the average race profile as segment velocity began to increase at 750m before dropping throughout the end of race. This suggests that over the longer distance, athletes have the ability to plan their strategy more appropriately by attempting to maintain as much velocity during the middle of the race before increasing speeds at the end of the race. Alternatively, over the shorter 500m the all-out profile is the apparent default strategy as a final acceleration is not viable as maintaining a constant near-maximum pace requires less energy than reaccelerating [1]. This race strategy identification was done visually using Figs 1 and 2 and backs up previously identified strategies [2] and although they help provide an overall interpretation of a race profile, extending to fPCA allows for the key sources of variation in a pacing strategy to be identified.

The principal component eigenfunctions, shown in Figs 3 and 4, have previously been defined using pacing characteristics. It was identified that the pacing characteristics differ between the Women’s K1 500m and the Men’s K1 1000m, in part due to longer distances requiring different race strategies. The key identified difference is PC3, which is characterised by a dropoff much later in the race in the Men’s K1 1000m compared to the Women’s K1 500m. A common attribute between both distances for PC2 is the start of the kick, indicated by the increasing eigenfunction curve, is approximately 300m to 400m from the end. This suggest that when an athlete increases their velocity for the kick it occurs a similar distance from the end regardless of the race distance. These identified characteristics allow for coaches and athletes to identify what the core elements of a pacing profile are and provide information on how each athlete competes in terms of these 4 characteristics.

It should be noted that when implementing this two-step fPCA-HMM approach, there might be issues including loss of sensitivity and bias towards low-order PCA components [24], thus there is potential for misleading inference. Several papers have identified methods for including the feature analysis within the framework of an HMM [25–27]. However, the approach of using feature extraction techniques first then subsequently classify using modelling is common as well [28,29]. In this case study, preliminary validation indicated that the proposed model was robust and had been effective in explaining the variation in the data. The RMSE was less than 0.5 for each principal component which was less than the standard deviation of that component. Potentially, fitting both pacing curves and clusters together in a single framework could potentially lead to more robust results with better generalisability and propagate uncertainty [24]. However, as this is less explored with fPCA and HMMs, it is an avenue for future investigation and outside the scope of this study due to the significant increase in complexity and issues with interpretability. Additionally, we calculated the empirical distribution of the sojourn times for each hidden state and compared them to the geometric distribution using goodness-of-fit tests (this was done using the R package fitdistrplus) (see Appendix 4 in S1 Fig). Degeneracy issues were also checked, and there we no collapsed states as the transition matrix (Table 4) allows for transitions out of every state, and emission matrices (Tables 1–3) show that no two states are the same (i.e., not redundant). A potential exploration of this model would be to determine whether a semi-Markov model would be more appropriate, however, the model appears justifiable.

After defining the characteristics that are exhibited in each principal component, the next step is to describe each state from the Hidden Markov model in terms of each characteristic. As shown in Fig 6, the baseline variables from Table 1 can be used to synthesise the relationships between each state and each principal competent. For example, in the Women’s K1 500m data set, in State 1, the dominant principal components are both PC1 (0.173) and PC2 (0.230). Therefore State 1 can be defined as indicating high dropoff (PC1) and high kick (PC2) with negligible PC3 and PC4 scores. For States 1, 2 and 3 for both data sets, each state mean centroid is predominantly a combination of PC1 and PC2 scores with PC3 and PC4 have coefficients close to zero which indicates average characteristics. However, State 4, for both data sets, have average scores for all principal components.

The hidden Markov model also has variable coefficients describing the effect of event type (Domestic, World Cup/ Juniors, World Championships/ Olympics) and age group (U18, U21, U23, Open). In the model, these coefficients show the effect that each variable has on the response distribution compared to the baseline variables (Domestic event type and Open age group). Several key trends have been identified in Tables 2 and 3 for the Women’s K1 500m data set, these include that PC1 is higher in international events in every state (except for State 1 and World Cup/ Juniors) which indicates that athletes have a higher dropoff in international events. Additionally, PC2 is lower in international events in every state (except for State 2 and World Champs/ Olympics) which indicates that athletes have a lower kick in international events. There are no consistent trends in the age group coefficients, however, for the Women’s K1 500m data set, in State 4, PC1 is lower in development age groups and PC2 is higher. This indicates that those in State 4 have lower dropoff and higher kicks in development age groups.

Longitudinal modelling case study

As defined previously, Athletes A, B, C, D are four athletes who were identified for in-depth analysis due to their significant number of races within the data set. Therefore with the definitions of each principal component and the implications of positive and negative PC scores identified, the calculated PC scores for each athlete can be analysed to identify athlete-specific trends in race strategies. By building two hidden Markov model, one for the Women’s K1 500m data set and one for the Men’s K1 1000m data set, four identifiable clusters were calculated for each. The state for all four chosen athletes are shown from all races throughout their career in Figs 6 and 7. The state plot can be used to identify and analyse the trends an athlete pacing profile go through throughout their career.

As shown in Fig 6, Athlete A begins their career in State 2, displaying signs of a low dropoff/high kick pacing profile transitioning to State 4, average profile characteristics, when they move from U21s to U23s. Alternatively, Athlete B starts with a few races in State 3, low dropoff/high kick, before transitioning to State 4 for the majority of their career. This appears to indicate that both of these athletes, who have the most pacing profiles in the data set, trends towards an average profile throughout their career.

As shown in Fig 7, Athlete C appears to begin their career in State 1, low dropoff/high kick during U18s and U21s before beginning to shift towards State 4, average profile characteristics, throughout U23 and mostly maintaining State 4 throughout the Open age group. Noticeably, Athlete C is less consistent than Athletes A and B (Fig 6) with several races switching to State 3, low dropoff/low kick, and therefore their race profile appears to be inconsistent. Athlete D differs in that they start from State 4 before transitioning to State 1. However, there is a lack of data beyond this point and whether this change in long term is yet to be seen as the athlete is still early in their open career.

When comparing the trends in the two female athletes compared to the male athletes, the state consistency appears to differ significantly with both female athletes consistently staying in State 4 for the majority of their open careers whereas both male athletes show more inconsistency with both athletes transitioning between states regularly, although they are predominantly staying in State 4. It should be noted that both female athletes have considerably more data points in the Open age group, and they tended to change states throughout the development pathway age categories similar to that observed with the two male athletes. Therefore, it is possible that the male athletes will tend to be more consistent once they have a large data set in the Open age group, as both female athletes did. Lastly, it is clear that all four athletes underwent transitions between different states and this transition often occurred throughout the development pathway for all four athletes although both Athlete A and B were fairly consistent from the U23 age group. Therefore as athletes mature, their physical characteristics change and their pacing profiles change appropriately. This confirms that an athlete’s pacing profile can change throughout their career.

Future work

This model was able to categorise the different types of race strategies into four different states that are described using four different pacing characteristics and this allowed for the career trends of athletes to be analysed. However, there is the opportunity for further analysis to be undertaken to better understand how an athlete race profile changes throughout their career and to potentially forecast how it will change into the future. Firstly, a decision was made to normalise each race profile in order to remove the overall effect of environmental conditions. However, by doing this the overall speed of an athlete, which is likely to change throughout an athlete’s career, is no longer able to be considered or evaluated. Therefore, a possible extension of the model is to include race time in order to evaluate whether ability has an effect on an athlete’s pacing strategy. Additionally, a key assumption of the modelling is that the environmental conditions are consistent throughout an entire race, which is unlikely to be true and without weather data for every race it is a necessary limitation of the data set. A possible future study could analyse a large data set from one event with accompanying environmental mapping data that can incorporated into the modelling to evaluate the effects of weather on each race profile.

Practical applications

In this manuscript, several example athletes were analysed using the hidden Markov model and a number of key observations about how their pacing strategy change throughout their career were observed. These include that pacing strategy is not fixed and will change across the course of a sprint kayak athletes’ career. Additionally, an athlete was more prone to changes throughout development pathway and appeared to find more consistency once reaching the Open age group. How an athlete’s pacing profile changes could provide many benefits to coaches and athletes and the way in which an athlete’s strategy affects their performance can be determined. These conclusions could apply to a range of sprint kayak disciplines, multi-athlete disciplines and para events, as well in other sports where pacing is a key component, rowing and swimming. Additionally, the findings identified four different clusters of pacing profiles for both the Women’s K1 500m and the Men’s K1 1000m which could be used to help determine an athlete’s strategy consistency. The model also identified several key trends with respect to age group and event type and knowledge of this will help provide development pathway coaches and athletes with expectations as to how their strategy could or should be expected to change in the future.

Conclusion

In conclusion, the goal of this manuscript was to identify methods for quantifying and comparing athlete pacing profiles. This was conducted using a combination of fPCA and HMM by identifying four principal components for each data set. Each of these four PCs were then associated with unique pacing characteristics which were than used to categorise each pacing profile into 4 different states using a HMM. Using the 4 HMM states, an athlete’s pacing profile can be analysed throughout their career. This analysis concluded that an athlete’s pacing profile can change throughout their career and this insight can provide many benefits to coaches and athletes, particularly to development coaches. The longitudinal case study shows the benefits of fPCA and HMM in categorising athletes pacing profiles in sprint kayak.

Supporting information

S1 File

Appendix 1. AIC Values calculated for different number of states in HMM. Appendix 2. Standard Deviation for each state and principal component in the HMM.

(DOCX)

pone.0326375.s001.docx^{(31.9KB, docx)}

S1 Fig. Appendix Figures.

Appendix 3. Principal Component distributions for both the Men’s and Women’s dataset. There is also an example fitdistrplus plot for Men’s PC4 validating that the distribution is Geometric. Appendix 4. Histogram of Sojourn times and fitdistrplus plot for both Men’s and Women’s dataset. Appendix 5. Example histogram of residuals for state 1 in the Men’s HMM and a example QQ plot analysis for PC4.

(ZIP)

pone.0326375.s002.zip^{(243.8KB, zip)}

Acknowledgments

This research was supported by the Centre for Data Science at QUT and Paddle Australia.

Data Availability

A de-identified data set has been uploaded to GitHub: https://github.com/harryestreich1/pacingprofilesanalysis.

Funding Statement

The author(s) received no specific funding for this work.

References

1.Abbiss CR, Laursen PB. Describing and understanding pacing strategies during athletic competition. Sports Med. 2008;38(3):239–52. doi: 10.2165/00007256-200838030-00004 [DOI] [PubMed] [Google Scholar]
2.Goreham JA, Miller KB, Frayne RJ, Ladouceur M. Pacing strategies and relationships between speed and stroke parameters for elite sprint kayakers in single boats. J Sports Sci. 2021;39(19):2211–8. doi: 10.1080/02640414.2021.1927314 [DOI] [PubMed] [Google Scholar]
3.Redwood-Brown AJ, Brown HL, Oakley B, Felton PJ. Determinants of Boat Velocity during a 200 m Race in Elite Paralympic Sprint Kayakers. International Journal of Performance Analysis in Sport. 2021;21(6):1178–90. doi: 10.1080/24748668.2021.1986351 [DOI] [Google Scholar]
4.Borges TO, Bullock N, Coutts JA. Pacing characteristics of international Sprint Kayak athletes. International Journal of Performance Analysis in Sport. 2013;13(2):353–64. doi: 10.1080/24748668.2013.11868653 [DOI] [Google Scholar]
5.Garland SW. An analysis of the pacing strategy adopted by elite competitors in 2000 m rowing. Br J Sports Med. 2005;39(1):39–42. doi: 10.1136/bjsm.2003.010801 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.McGibbon KE, Pyne DB, Heidenreich LE, Pla R. A Novel Method to Characterize the Pacing Profile of Elite Male 1500-m Freestyle Swimmers. Int J Sports Physiol Perform. 2021;16(6):818–24. doi: 10.1123/ijspp.2020-0375 [DOI] [PubMed] [Google Scholar]
7.Nikolaidis PT, Knechtle B. Effect of age and performance on pacing of marathon runners. Open Access J Sports Med. 2017;8:171–80. doi: 10.2147/OAJSM.S141649 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Muehlbauer T, Melges T. Pacing patterns in competitive rowing adopted in different race categories. J Strength Cond Res. 2011;25(5):1293–8. doi: 10.1519/JSC.0b013e3181d6882b [DOI] [PubMed] [Google Scholar]
9.Abdi H, Williams LJ. Principal component analysis. WIREs Computational Stats. 2010;2(4):433–59. doi: 10.1002/wics.101 [DOI] [Google Scholar]
10.Goreham JA, Landry SC, Kozey JW, Smith B, Ladouceur M. Using principal component analysis to investigate pacing strategies in elite international canoe kayak sprint races. Sports Biomechanics. 2020;1–16. [DOI] [PubMed] [Google Scholar]
11.Leroy A, Marc A, Dupas O, Rey JL, Gey S. Functional Data Analysis in Sport Science: Example of Swimmers’ Progression Curves Clustering. Applied Sciences. 2018;8(10):1766. doi: 10.3390/app8101766 [DOI] [Google Scholar]
12.Wedding C, Woods CT, Sinclair WH, Gomez MA, Leicht AS. Examining the evolution and classification of player position using performance indicators in the National Rugby League during the 2015-2019 seasons. J Sci Med Sport. 2020;23(9):891–6. doi: 10.1016/j.jsams.2020.02.013 [DOI] [PubMed] [Google Scholar]
13.Ötting M, Langrock R, Maruotti A. A copula-based multivariate hidden Markov model for modelling momentum in football. AStA Advances in Statistical Analysis. 2021;1–19. [Google Scholar]
14.Dadashi F, Arami A, Crettenand F, Millet G, Komar J, Seifert L, et al. A hidden Markov model of the breaststroke swimming temporal phases using wearable inertial measurement units. 2013.
15.Wetzels R, Tutschkow D, Dolan C, van der Sluis S, Dutilh G, Wagenmakers E-J. A Bayesian test for the hot hand phenomenon. Journal of Mathematical Psychology. 2016;72:200–9. doi: 10.1016/j.jmp.2015.12.003 [DOI] [Google Scholar]
16.Jensen ST, McShane BB, Wyner AJ. Hierarchical Bayesian modeling of hitting performance in baseball. Bayesian Anal. 2009;4(4):631–52, 22. doi: 10.1214/09-ba424 [DOI] [Google Scholar]
17.Witowski V, Foraita R, Pitsiladis Y, Pigeot I, Wirsik N. Using hidden markov models to improve quantifying physical activity in accelerometer data - a simulation study. PLoS One. 2014;9(12):e114089. doi: 10.1371/journal.pone.0114089 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2022. [Google Scholar]
19.Ramsay JO, Graves S, Hooker G. fda: Functional Data Analysis. 2022. [Google Scholar]
20.Shang HL. A survey of functional principal component analysis. AStA Advances in Statistical Analysis. 2011;98. [Google Scholar]
21.Zucchini W, MacDonald IL. Hidden Markov models for time series: an introduction using R. Chapman and Hall/CRC; 2009. [Google Scholar]
22.Visser I, Speekenbrink M. depmixS4: An R Package for Hidden Markov Models. Journal of Statistical Software. 2010;36(7):1–21. [Google Scholar]
23.Delignette-Muller ML, Dutang C. Fitdistrplus: An R package for fitting distributions. J Stat Soft. 2015;64(4):1–34. [Google Scholar]
24.Vidaurre D. A new model for simultaneous dimensionality reduction and time-varying functional connectivity estimation. PLoS Comput Biol. 2021;17(4):e1008580. doi: 10.1371/journal.pcbi.1008580 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Rosti A, Gales MJF. Factor Analysed Hidden Markov Models. 2004.
26.Yao K, Paliwal KK, Lee T-W. Generative factor analyzed HMM for automatic speech recognition. Speech Communication. 2005;45(4):435–54. doi: 10.1016/j.specom.2005.01.002 [DOI] [Google Scholar]
27.Field M, Stirling D, Pan Z, Naghdy F. Learning Trajectories for Robot Programing by Demonstration Using a Coordinated Mixture of Factor Analyzers. IEEE Trans Cybern. 2016;46(3):706–17. doi: 10.1109/TCYB.2015.2414277 [DOI] [PubMed] [Google Scholar]
28.Saraçoğlu R. Hidden Markov model-based classification of heart valve disease with PCA for dimension reduction. Engineering Applications of Artificial Intelligence. 2012;25(7):1523–8. doi: 10.1016/j.engappai.2012.07.005 [DOI] [Google Scholar]
29.Varma S, Shinde M, Chavan SS, editors. Analysis of PCA and LDA Features for Facial Expression Recognition Using SVM and HMM Classifiers. Techno-Societal 2018; 2020. Cham: Springer International Publishing; 2020. [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0326375.r001

Decision Letter 0

Matteo Vandoni

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Aug 02 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Matteo Vandoni

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following in your Competing Interests section:

[N/A].

Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now

This information should be included in your cover letter; we will change the online submission form on your behalf.

3. In the online submission form, you indicated that [The data is owned by Paddle Australia. If the reviewers require access to the full-dataset as part of the peer review process, a de-identified data set can forwarded upon request.].

All PLOS journals now require all data underlying the findings described in their manuscript to be freely available to other researchers, either 1. In a public repository, 2. Within the manuscript itself, or 3. Uploaded as supplementary information.

This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If your data cannot be made publicly available for ethical or legal reasons (e.g., public availability would compromise patient privacy), please explain your reasons on resubmission and your exemption request will be escalated for approval.

Additional Editor Comments:

Dear Authors,

The reviewers highlighted several points to improve. Please carefully review the manuscript that is not acceptable in the present form.

Kind regards

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: No

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: No

**********

Reviewer #1: The manuscript addresses an interesting topic. The data are original and the results could be used to further researches on the topic. The use of the hidden Markov models (HMMs) is in general sound, but the employed methods require a revision. Detailed comments follow.

1. The review of the literature is rather poor. With respect to the empirical analysis, HMMs have been widely used in sport-data analysis and several extensions of the basic model are provided. Similarly, it is well-known that the two-step analysis leads to misleading inference; thus, dimensionality reduction and clustering should be performed simultaneously. See e.g. ROSTI, A. V. I. and GALES, M. J. F. (2002). Factor analysed hidden Markov models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 949–952. YAO, K., PALIWAL, K. K. and LEE, T. W. (2005). Generative factor analyzed HMM for automatic

speech recognition. Speech Commun. 45 435–454. FIELD, M., STIRLING, D., PAN, Z. and NAGHDY, F. (2016). Learning trajectories for robot programming by demonstration using a coordinated mixture of factor analyzers. IEEE Trans. Cybern. 46 706–717. A. Maruotti. J. Bulla. F. Lagona. M. Picone. F. Martella. "Dynamic mixtures of factor analyzers to characterize multivariate air pollutant exposures." Ann. Appl. Stat. 11 (3) 1617 - 1648

2. The HMM model is not well defined. It is rather unclear how the linear predictor looks like and how the parameters were estimated. I guess the Gaussian distribution is considered, but no information about the variance is given. Moreover, no info about outliers are given. Model fitting and performance, residuals analysis, etc are not given. Indeed, neither the likelihood is specified. Overall, the manuscript lacks of formal definition of the modelling; thus, it cannot be accepted as the methods are not well introduced, described, etc. It is completely unclear if a multivariate model is considered or if PCs are analysed independently. Is the clustering obtained via local or global decoding?

3. The HMM assumes that the sojourn distribution is geometrically distributed. Please, provide evidence that this is plausible for the analysed data; extend the model to a flexible sojourn if the case.

4. I am wondering if 4 PCs are really needed and which differences arise if less or more PCs were considered.

5. Please, provide the code used to estimate the parameters, to ensure the reproducibility of the results, and more results of the software used.

Reviewer #2: This reviewer appreciates the the time and effort invested by the authors in reporting their study. I will present below some suggestions for revising the text and some questions about the research.

Specific comments:

1) The abstract states the existence of four main components, but only two are defined. I suggest defining the remaining two.

2) In the introduction to the article (lines 40-42), it is described: "Predictive models of individual athlete pacing in competitive kayak races and their change over a career can be used by coaches and sports scientists to better understand athlete progression and optimise strategies for peak performance". HMMs are also known for their use in predictive models. In their work, could HMMs predict changes in an athlete's pace during a race or throughout their career?

3) Was any form of data selection or exclusion applied (for example, incomplete sequences)?

4) I suggest presenting averages and deviations of the unnormalized speed data. I also suggest presenting statistical power.

5) (Line 137) Check for discrepancies between figure descriptions and their mention in the text.

6) (line 174) The term "fPC" appears in this line, but it is not described or defined earlier in the text. I suggest providing a brief explanation or definition of the "fPC" acronym when it is first introduced.

7) I suggest that the particular pacing characteristics or interpretations described for PC1, PC2, PC3 and PC4 be inserted into the text, in a clear and concise way, as soon as they are obtained.

8) (Line 196) State the meaning of 'AIC' in full, as it was not mentioned or defined earlier in the text. Is the number of states chosen for the research more closely related to the AIC (Akaike Information Criterion) than to other factors?

9) (Lines 255-258) How can the values described in Table 1, centroids for each state, be explained as being equal for women (K1 500m) and men (K1 1000m)?

10) (Lines 255-258)Table 1 presents all states (1 to 4) related to PCs (1 to 4). However, in Figure 5, there are PCs not linked to all states. Is this correct?

11) There appears to be a distinction between “states” and “predicted states”. At times, they seem to be synonyms, while in other instances they are not. I request that you observe this and suggest standardizing if necessary.

12) The text mentions 4 states, but in Figure 6, the scale for Athlete A (Predicted State) varies from 3 to 7 (with data variation in 4 states), and the scale for Athlete B varies from possibly 1 to 7 (with data variation in 5 states). In Figure 7, the scales for Athletes B and C vary from 1 to 4. Is this correct? Please explain.

13) (Lines 349-353) Is paragraph "Athlete A begins their career in State 2, displaying signs of a low dropoff/high kick pacing profile transitioning to State 4, average profile characteristics, when they move from U21s to U23s. Alternatively, Athlete B starts with a few races in State 3, low dropoff/high kick before transitioning to State 4 for the majority of their career. This appears to indicate that both of these athletes, who have the most pacing profiles in the data set, trends towards an average profile throughout their career." related to Figure 6? Verify the paragraph if it is indeed related to the figure or clarify the origin of the state values mentioned.

14) (Lines 356-358) Check the sentence regarding Figure 6:"Noticeably, this athlete is less consistent than Athlete A and B with several races identified as State 3, low dropoff/low kick, and therefore there kick appears to be inconsistent".

15) (Lines 367-368) "Therefore, it is possible that the male athletes will tend to be more consistent once they have a large data set in the Open age group." Is this conclusion based solely on Athletes C and D? Athlete D does not have data in the Open age group (Figure 7).

16) (Line 441) Check the formatting of reference number 11.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jul 2;20(7):e0326375. doi: 10.1371/journal.pone.0326375.r002

Author response to Decision Letter 1

2 Oct 2024

We have attached a file contained all responses to the reviewers. All editor requirements have been fixed and a competing interests statement has been added to the cover letter.

PLoS One. doi: 10.1371/journal.pone.0326375.r003

Decision Letter 1

Matteo Vandoni

Dear Dr. Estreich,

Please submit your revised manuscript by Jan 10 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Matteo Vandoni

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Dear authors,

as you can see, revisor 1 asked to carefully revise some points..please provide a point by point response trying to asses his observations.

Kind regards

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #1: Partly

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: No

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: No

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

**********

Reviewer #1: Thank you very much for the efforts to reply to my comments.

Nevertheless, there are still several parts deserving clarifications and/or investigation.

1. As I mention before, the two-step analysis leads to misleading inference. Acknowledge this as a limitation of the study is not sufficient, as results may be unreliable if a joint approach is neglected. Moreover, it is rather unclear to me what the authors mean with "Our results have been shown to be robust in preliminary sensitivity analyses", more details on this are required.

2. I appreciate that more details on the HMM specification have been added to the main text. Are the Gaussian conditional densities with state-specific variances? If so, do you encounter any degeneracy issues? Moreover, please provide evidence that the Gaussian distribution is suited for the data at hand; the idea of removing outliers is questionable.

3. Residual analysis, qq-plot graphs, etc should be shown to ensure that the model is suitable for the data at hand. The AIC, and other model selection criteria, are useful to select the number of clusters (and to compare different model specifications) but not to guarantee that the model is adequate for the data at hand.

4. At last, one further point must be discussed and investigate, via models comparison. The HMM implicitly assume that the sojourn distribution is geometric. Please, check that this assumption is met and relax it if the case by assuming e.g. shifted negative binomial, logarithmic, etc sojourns. Please, provide evidence that the geometric sojourn is chosen according to any model selection criteria like the AIC.

Reviewer #2: I appreciate the authors for their responses to the queries raised. After a careful analysis, I am pleased to inform you that of the suggestions made in the previous review have been addressed. The changes implemented have improved the clarity and robustness of the work.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: No

**********

PLoS One. 2025 Jul 2;20(7):e0326375. doi: 10.1371/journal.pone.0326375.r004

Author response to Decision Letter 2

22 May 2025

I have attached a Response to Reviewers document that addresses all corrections to the paper

Attachment

Submitted filename: Response to Reviewers.docx

pone.0326375.s005.docx^{(173.1KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0326375.r005

Decision Letter 2

Matteo Vandoni

An analysis of pacing profiles in sprint kayak racing using functional principal components and Hidden Markov Models

PONE-D-24-05983R2

Dear Dr. Estreich,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Matteo Vandoni

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0326375.r006

Acceptance letter

Matteo Vandoni

PONE-D-24-05983R2

PLOS ONE

Dear Dr. Estreich,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Matteo Vandoni

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File

Appendix 1. AIC Values calculated for different number of states in HMM. Appendix 2. Standard Deviation for each state and principal component in the HMM.

(DOCX)

pone.0326375.s001.docx^{(31.9KB, docx)}

S1 Fig. Appendix Figures.

(ZIP)

pone.0326375.s002.zip^{(243.8KB, zip)}

Attachment

Submitted filename: Response to Reviewers.docx

pone.0326375.s005.docx^{(173.1KB, docx)}

Data Availability Statement

A de-identified data set has been uploaded to GitHub: https://github.com/harryestreich1/pacingprofilesanalysis.

[pone.0326375.ref001] 1.Abbiss CR, Laursen PB. Describing and understanding pacing strategies during athletic competition. Sports Med. 2008;38(3):239–52. doi: 10.2165/00007256-200838030-00004 [DOI] [PubMed] [Google Scholar]

[pone.0326375.ref002] 2.Goreham JA, Miller KB, Frayne RJ, Ladouceur M. Pacing strategies and relationships between speed and stroke parameters for elite sprint kayakers in single boats. J Sports Sci. 2021;39(19):2211–8. doi: 10.1080/02640414.2021.1927314 [DOI] [PubMed] [Google Scholar]

[pone.0326375.ref003] 3.Redwood-Brown AJ, Brown HL, Oakley B, Felton PJ. Determinants of Boat Velocity during a 200 m Race in Elite Paralympic Sprint Kayakers. International Journal of Performance Analysis in Sport. 2021;21(6):1178–90. doi: 10.1080/24748668.2021.1986351 [DOI] [Google Scholar]

[pone.0326375.ref004] 4.Borges TO, Bullock N, Coutts JA. Pacing characteristics of international Sprint Kayak athletes. International Journal of Performance Analysis in Sport. 2013;13(2):353–64. doi: 10.1080/24748668.2013.11868653 [DOI] [Google Scholar]

[pone.0326375.ref005] 5.Garland SW. An analysis of the pacing strategy adopted by elite competitors in 2000 m rowing. Br J Sports Med. 2005;39(1):39–42. doi: 10.1136/bjsm.2003.010801 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0326375.ref006] 6.McGibbon KE, Pyne DB, Heidenreich LE, Pla R. A Novel Method to Characterize the Pacing Profile of Elite Male 1500-m Freestyle Swimmers. Int J Sports Physiol Perform. 2021;16(6):818–24. doi: 10.1123/ijspp.2020-0375 [DOI] [PubMed] [Google Scholar]

[pone.0326375.ref007] 7.Nikolaidis PT, Knechtle B. Effect of age and performance on pacing of marathon runners. Open Access J Sports Med. 2017;8:171–80. doi: 10.2147/OAJSM.S141649 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0326375.ref008] 8.Muehlbauer T, Melges T. Pacing patterns in competitive rowing adopted in different race categories. J Strength Cond Res. 2011;25(5):1293–8. doi: 10.1519/JSC.0b013e3181d6882b [DOI] [PubMed] [Google Scholar]

[pone.0326375.ref009] 9.Abdi H, Williams LJ. Principal component analysis. WIREs Computational Stats. 2010;2(4):433–59. doi: 10.1002/wics.101 [DOI] [Google Scholar]

[pone.0326375.ref010] 10.Goreham JA, Landry SC, Kozey JW, Smith B, Ladouceur M. Using principal component analysis to investigate pacing strategies in elite international canoe kayak sprint races. Sports Biomechanics. 2020;1–16. [DOI] [PubMed] [Google Scholar]

[pone.0326375.ref011] 11.Leroy A, Marc A, Dupas O, Rey JL, Gey S. Functional Data Analysis in Sport Science: Example of Swimmers’ Progression Curves Clustering. Applied Sciences. 2018;8(10):1766. doi: 10.3390/app8101766 [DOI] [Google Scholar]

[pone.0326375.ref012] 12.Wedding C, Woods CT, Sinclair WH, Gomez MA, Leicht AS. Examining the evolution and classification of player position using performance indicators in the National Rugby League during the 2015-2019 seasons. J Sci Med Sport. 2020;23(9):891–6. doi: 10.1016/j.jsams.2020.02.013 [DOI] [PubMed] [Google Scholar]

[pone.0326375.ref013] 13.Ötting M, Langrock R, Maruotti A. A copula-based multivariate hidden Markov model for modelling momentum in football. AStA Advances in Statistical Analysis. 2021;1–19. [Google Scholar]

[pone.0326375.ref014] 14.Dadashi F, Arami A, Crettenand F, Millet G, Komar J, Seifert L, et al. A hidden Markov model of the breaststroke swimming temporal phases using wearable inertial measurement units. 2013.

[pone.0326375.ref015] 15.Wetzels R, Tutschkow D, Dolan C, van der Sluis S, Dutilh G, Wagenmakers E-J. A Bayesian test for the hot hand phenomenon. Journal of Mathematical Psychology. 2016;72:200–9. doi: 10.1016/j.jmp.2015.12.003 [DOI] [Google Scholar]

[pone.0326375.ref016] 16.Jensen ST, McShane BB, Wyner AJ. Hierarchical Bayesian modeling of hitting performance in baseball. Bayesian Anal. 2009;4(4):631–52, 22. doi: 10.1214/09-ba424 [DOI] [Google Scholar]

[pone.0326375.ref017] 17.Witowski V, Foraita R, Pitsiladis Y, Pigeot I, Wirsik N. Using hidden markov models to improve quantifying physical activity in accelerometer data - a simulation study. PLoS One. 2014;9(12):e114089. doi: 10.1371/journal.pone.0114089 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0326375.ref018] 18.R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2022. [Google Scholar]

[pone.0326375.ref019] 19.Ramsay JO, Graves S, Hooker G. fda: Functional Data Analysis. 2022. [Google Scholar]

[pone.0326375.ref020] 20.Shang HL. A survey of functional principal component analysis. AStA Advances in Statistical Analysis. 2011;98. [Google Scholar]

[pone.0326375.ref021] 21.Zucchini W, MacDonald IL. Hidden Markov models for time series: an introduction using R. Chapman and Hall/CRC; 2009. [Google Scholar]

[pone.0326375.ref022] 22.Visser I, Speekenbrink M. depmixS4: An R Package for Hidden Markov Models. Journal of Statistical Software. 2010;36(7):1–21. [Google Scholar]

[pone.0326375.ref023] 23.Delignette-Muller ML, Dutang C. Fitdistrplus: An R package for fitting distributions. J Stat Soft. 2015;64(4):1–34. [Google Scholar]

[pone.0326375.ref024] 24.Vidaurre D. A new model for simultaneous dimensionality reduction and time-varying functional connectivity estimation. PLoS Comput Biol. 2021;17(4):e1008580. doi: 10.1371/journal.pcbi.1008580 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0326375.ref025] 25.Rosti A, Gales MJF. Factor Analysed Hidden Markov Models. 2004.

[pone.0326375.ref026] 26.Yao K, Paliwal KK, Lee T-W. Generative factor analyzed HMM for automatic speech recognition. Speech Communication. 2005;45(4):435–54. doi: 10.1016/j.specom.2005.01.002 [DOI] [Google Scholar]

[pone.0326375.ref027] 27.Field M, Stirling D, Pan Z, Naghdy F. Learning Trajectories for Robot Programing by Demonstration Using a Coordinated Mixture of Factor Analyzers. IEEE Trans Cybern. 2016;46(3):706–17. doi: 10.1109/TCYB.2015.2414277 [DOI] [PubMed] [Google Scholar]

[pone.0326375.ref028] 28.Saraçoğlu R. Hidden Markov model-based classification of heart valve disease with PCA for dimension reduction. Engineering Applications of Artificial Intelligence. 2012;25(7):1523–8. doi: 10.1016/j.engappai.2012.07.005 [DOI] [Google Scholar]

[pone.0326375.ref029] 29.Varma S, Shinde M, Chavan SS, editors. Analysis of PCA and LDA Features for Facial Expression Recognition Using SVM and HMM Classifiers. Techno-Societal 2018; 2020. Cham: Springer International Publishing; 2020. [Google Scholar]

PERMALINK

An analysis of pacing profiles in sprint kayak racing using functional principal components and hidden Markov models

Harry Estreich

Nicola Bullock

Mark Osborne

Edgar Santos-Fernandez

Paul Pao-Yen Wu

Roles

Abstract

Introduction

Methods

Data

Fig 1. Mean Pacing Profile for the Women’s K1 500m data set.

Fig 2. Mean Pacing Profile for the Men’s K1 1000m data set.

Statistical methods

Functional PCA

Hidden Markov model

Results

fPCA

fPCA Eigenfunctions.

Fig 3. The first four principal components for the Women’s K1 500m race profile data set, plotted against distance.

Fig 4. The first four principal component eigenfunctions for the Men’s K1 1000m race profile data set, plotted against distance.

Hidden Markov model

Table 1. Intercept coefficients for each distribution describing the mean PC scores of each state (row) and principal component (column) for Women’s K1 500m and Men’s K1 1000m. These values indicate that centroid for each state when using the baseline variables.

Table 4. The transition matrices for HMMs.

Fig 5. Diagram showing mean PC scores for the centroid of each state when using baseline variables (.

Fig 6. State for each race profile across Athlete A and B’s career.

Fig 7. State for each race profile across Athlete C and D’s career.

Discussion

Longitudinal modelling case study

Future work

Practical applications

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Matteo Vandoni

Roles

Author response to Decision Letter 1

Decision Letter 1

Matteo Vandoni

Roles

Author response to Decision Letter 2

Decision Letter 2

Matteo Vandoni

Roles

Acceptance letter

Matteo Vandoni

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases