Abstract
This study aims to predict individual Acceleration-Velocity profiles (A-V) from Global Navigation Satellite System (GNSS) measurements in real-world situations. Data were collected from professional players in the Superleague division during a 1.5 season period (2019–2021). A baseline modeling performance was provided by time-series forecasting methods and compared with two multivariate modeling approaches using ridge regularisation and long short term memory neural networks. The multivariate models considered commercial features and new features extracted from GNSS raw data as predictor variables. A control condition in which profiles were predicted from predictors of the same session outlined the predictability of A-V profiles. Multivariate models were fitted either per player or over the group of players. Predictor variables were pooled according to the mean or an exponential weighting function. As expected, the control condition provided lower error rates than other models on average (p = 0.001). Reference and multivariate models did not show significant differences in error rates (p = 0.124), regardless of the nature of predictors (commercial features or extracted from signal processing methods) or the pooling method used. In addition, models built over a larger population did not provide significantly more accurate predictions. In conclusion, GNSS features seemed to be of limited relevance for predicting individual A-V profiles. However, new signal processing features open up new perspectives in athletic performance or injury occurrence modeling, mainly if higher sampling rate tracking systems are considered.
Subject terms: Computer science, Mechanical engineering, Biomarkers
Introduction
Global Navigation Satellite System (GNSS) is one of the gold standard systems in position measurements in field sports. Widely used for athlete monitoring purposes1–11, GNSS permits discriminating the physical demand at exercise through objective mechanical parameters, computed from GNSS and Inertial Measurement Units (IMU) signals12. Data collected from these wearable devices provide useful insights for understanding a player’s activity and its relationship with performance outcomes or injury occurrences during practice6,13–16.
In most cases, the information provided by wearable GNSS devices are summarised features over a session or a period (e.g. distance covered at different speed intervals, averaged pace for a given interval, acceleration, and deceleration counts). Beyond a standard set of simple and easily comprehensible features, extra information of mechanical and energetic nature that is derived from the players’ position may be available for customers under the manufacturer’s policy17–19. However, the validity of GNSS sport receivers should be considered regarding their technical properties, such as sampling frequency. Despite a lack of accuracy for quantifying exercise demand over short distances covered at high speed, including sharp turns, GNSS suffers from error rates according to a relatively low GNSS sampling frequency20,21. The signal quality of receivers also depends on the spatial configuration of satellites locked for recording (i.e. the number of satellites and their geometrical distribution in the sky)22. Nonetheless, GNSS with embedded IMU stands of interest and remains prone to further technological improvements. Beyond technological aspects, practitioners mostly use summarised GNSS statistics or metrics, whereas raw data are seldom considered for player analysis. The usual data fed back from the GNSS units might be elementary, while new features extracted from raw data could be more insightful for monitoring the physical demand of exercise and the related athlete’s response.
Using GNSS data for predicting athletic performance in team sports remains challenging. First, it implies defining an athletic performance in which interactions with opponents and the environment are sufficiently lowered. Usually, assessing athletic performances requires specific testing sessions performed in controlled conditions. It comes with challenging issues due to time and investigation costs, injury exposure, psychological state disruptions, and adjustments to training plans. Nevertheless, Morin et al.23 recently proposed a timely method for assessing a player’s athletic performance while practicing football without performing any specific tests. In brief, the method determines individual acceleration-velocity profiles (A-V) from continuous GNSS measurements for in-game and post hoc analysis. Such profiles come with practical meanings, notably for monitoring changes in athletic properties (by analogy to force-velocity profiles). They could be used for optimising training plans or proceeding to in-game tactical adjustments in case of significant profile impairments. However, determining in situ A-V profiles for monitoring athletic performances remains at a proof-of-concept stage23. It should be further validated for athletic performance modeling and injury explanation purposes.
On this basis and according to the literature, the present study considers three research issues:
The predictability of A-V profiles using only related GNSS features
The value of common metrics (summarised statistics) and aggregated features that are delivered by GNSS sensors manufacturers for predictive applications
The use of raw GNSS data for extracting new features for prediction purposes
In order to investigate these issues, we attempted to predict A-V profiles using data from an elite football team through different modeling approaches. A baseline approach that only considers A-V profiles and dismisses any potential predictors other than historical profiles was carried out. Then, we compared it with two distinct tasks that used commercial GNSS features and features extracted from raw GNSS data.
The rest of the manuscript is organized as follows. We introduce the data set that highlights the predictor and outcome variables. Next, we introduce the proposed models and their variants. Accordingly, we present the obtained results followed by exhaustive discussions of these results before concluding our study in the last section.
Methods
This section introduced a descriptive analysis of the data set used in our experiments. We defined the predictors and outcome variables besides the considered problem formulations before elaborating the proposed models. For clarification, we provided a pseudo-code of the modeling methodology through Algorithms 1, 2 and 3.
Data set
Population studied
Data from the FC Lucerne football club were collected over a 1.5 season period (2019–2021). The team evolves in the Superleague division, the highest division in Swiss professional football. A total of 196 training sessions and 74 games were stored in a cloud-hosted multi-model database (ArangoDB, CA, USA). For each session, raw GNSS data (Fieldwiz V1, CH, with concurrent reception of Global positioning system, Galileo, GLONASS, and BeiDou systems) and summarised features (see Appendix A for details) of each player were stored in a database as json files. A total of 42 players were initially recorded, including regular professional players and young hopes. Participants were fully informed about data collection, and their written consent were obtained. The study was performed in agreement with the standards set by the declaration of Helsinki (2013) involving human subjects. The protocol was reviewed and approved by the local research Ethics Committee (EuroMov, University of Montpellier, France). The present retrospective study relied on the collected data without causing any changes in the training plans of football players.
Predictor variables
Predictors are summarised in Appendix 1, Table A1. Let with be the domain of definition of the random variable . The variable X is thus defined as a vector of d dimensions, composed of aggregations of the summarised features given by the GNSS software (Fieldwiz, ASI, CH). Aggregated features can take two forms:
- The average of summarised features , such that:
1 - An exponential weight according to a softmax function (see Eq. 2), such that:
2
In Eq. 2, denotes an aggregated feature weighted by a scaling factor . is determined by a softmax function in which is a time vector describing the distance of events to the game of interest and denotes a scale parameter that sets the sensibility of the exponential decay weighting function.
For both aggregation methods, we arbitrarily set a window L of size . It refers to the summarised predictor sets given by the GNSS software that are pooled according to the last L sessions (either training or game) preceding the game of interest. Since the frequencies of sessions are heterogeneous, the number of days preceding the game to be predicted may differ over the weeks.
Outcome variables
In order to investigate the effect of training on athletic performances, we relied on A-V profiles such as provided by23 but in a slightly different way. Individual A-V profiles were modeled for each games. From the raw velocity and a sampling frequency , we derived an acceleration A such that
Here, we consider a signal with being the discrete formulation of x(t).
Then, a first-order Butterworth filter was applied to the acceleration signal with a cut-off frequency of 1 Hz. Velocity observations were binned into width bins in which the maximal acceleration values were retained. Hence, we modeled A-V profiles over velocities superior to using a linear regression between acceleration and velocity (see Fig. 1). A total of 1032 profiles were modeled, for an average of per player. The large standard deviation is related to occasional players (e.g. young players) who only played a few games through the season.
Figure 1.

Example of A-V profile modeled for a given player and a randomly selected game. Only plain dots (velocities above 3 m s−1) were used for fitting the linear regression.
The performance definition is given by such that and refer to the corresponding slope and the intercept of individual A-V profiles, respectively. Therefore, each observation in the ensemble is related to both an athlete j and the day of realisation t. A sample of fitted coefficients is presented in Figure 2. To formalize, letting with a density function f, the built data set is a sample .
Figure 2.
Evolution of A-V profiles fitted intercept and slopes over the 1.5 season period. Three players are randomly selected.
A descriptive analysis of A-V coefficients reported mean values and standard deviation as a noise estimate. Accordingly, we have and for and , respectively.
Definition of models and multivariate approaches
Time-series forecasting
Time-series forecasting problems are dominant in sports, applied to game outcomes24–27 generally intended to tipsters or bookmakers, sports popularity28, or performance monitoring29. In our study, we first consider that the auto-regressive component of the target variable may be influential in the prediction of individual athletic performances. Therefore, we started the modeling by defining baseline prediction performances from time-series forecasting, only using games observations (excluding training sessions).
In time-series forecasting, models without covariates use a restricted data set in which predictors are merely time information. The forecasting is deduced from the information in trend and seasonality components. In order to find the most performing models for time-series forecasting, we proceeded with a model selection using a simple holdout procedure, according to a split ratio of 0.8 between the training and testing data set. We can righteously expect a linear relationship between changes in theoretical maximal acceleration and maximal running velocity. Consequently, the ensemble was predicted in two different ways: sequentially (uni-modal forecasting) and concurrently (multi-modal forecasting).
Afterwards, we benefited from the selected forecasting models by combining them into a weighted average ensemble for better performances than a randomly selected single model on average.
Ridge regularisation
Then, we addressed the problem of predicting the acceleration-velocity profile from GNSS summarised features. Changes in A-V fitted parameters were investigated using a predictive regression approach. For this, a linear model (a ridge regularisation) used features pooled according to the two aforementioned aggregation methods (see Eqs. 1 and 2) and were compared to a Long Short Term Memory (LSTM) neural network, a particular case of recurrent neural networks (RNN).
Using a ridge regularisation was motivated by the high dimensional context that may lead to unsteady multivariate linear models, excessively sensitive to an expanded space of solutions. Accordingly, ridge regularisation reduces the space of solutions while solving collinearity problems, which remains common in sports30–32. It thus prevents biased estimates through penalising estimates of correlated features33,34. According to the two aggregation methods, the multivariate linear models and take the general formulation
| 3 |
where denotes the pooled predictors according to the mean function (see Eq. 1) and refers to the pooled predictors according to the exponential weighting function (see Eq. 2) for and , respectively. Also, denotes the parameters of the model and the random error term.
In addition, we defined a control task in which we attempted to predict from . Using commercial features of the day of A-V profile realisation should be, in theory, the simplest regression task and provide the lowest error rates in prediction.
Long short term memory neural network
Recurrent neural network is the class of neural networks that considers past information to be used as inputs while preserving the hidden states. Let us consider a multidimensional vector of fixed length l and dimension d, which includes unpooled summarised features as the model’s input. Basically and from a matrix, a recurrent unit successively combines the current values of of size d with the predicted value at time to return an output , defined by a function (see Fig. 3a). This procedure is repeated as many times as the number of training sessions preceding a game in a multi-layered structure. However, RNN suffers from short-term memory due to a vanishing gradient problem. Nevertheless, used for updating neural network weights, a gradient that shrinks as it back propagates through time stops the learning of layers. These layers may thus cause a loss of past information, particularly with long sequences.
Figure 3.
Simplified diagram of (a) a RNN cell and (b) a LSTM cell.
Introduced by Hochreiter et al.35, LSTM neural networks are designed to conserve long-term information through extended internal mechanisms. LSTM architecture benefits from a cell state and various gates that regulate the flow of information. As shown in Fig. 3b the cell state maps the previous cell state to a new cell state in which all the relevant information is carried throughout the sequence and where gates add or remove information to/from it. More details about LSTM dynamics in handling recurrent sequences are available in the original reference35. In sports, the use of LSTM remains quite recent with applications among action and activity recognition36–40, game outcomes41 and sports related concussion42.
Multivariate modeling approaches
Using commercial features
In order to investigate the effect of training sessions on changes in athletic performances, a multivariate analysis that includes data from training sessions is required.
We aimed at predicting using two sets of aggregated predictors and from the original features displayed in Appendix A, Table A1. Since models rely on several predictors, we consider the multivariate modeling approach.
For multivariate models, we performed a feature selection based on F-statistics and p-values converted from the cross-correlation between each feature of interest and the target through univariate sequential linear regression tests. Accordingly, we held the ten most meaningful features for making further predictions.
Extracting new features from raw global navigation satellite system data
In the previous formulation, the player position was recorded by GNSS at a sampling frequency of 10 Hz. Timestamp, player position (i.e. latitude, longitude), and velocity were available. Since commercial features were computed from the raw velocity vector and its derivative , we proposed to extract new features directly from .
First, we consider being a stationary time-series . Formally, a time series is stationary if the law of a generated vector is time translation invariant. That is, we consider a law such as with t being a time value and being a set of real numbers43. The stationary of time-series was checked using a Dickey-Fuller test44.
Several features were extracted from the time series in both time and frequency domains through Discrete Fourier Transform. For this purpose, we used the tsfresh Python module45. The feature extraction from both domains provided categorized 779 features46. A feature selection like performed during the previous tasks let us retain only the ten most relevant features, according to their significance level (F statistic and p value).
In summary, pseudo-code of the algorithms used in the methods is provided in Appendix 1, Section A.3.
Statistical analysis
In prediction, model performances were characterised by the mean absolute percentage error (%, MAPE) computed on test data sets. Repeated measures analysis of variance (ANOVA) and post-hoc analysis highlighted the significance of differences in MAPE distribution between models. Depending on a reference model for comparison, Tukey’s or Dunnett’s p value adjustment was used. The marginal mean difference was reported along with 95 % confidence intervals. Partial (or for one-way ANOVA) values were reported as a measure of effect size in ANOVAs. The significance level was set at p = 0.05 and consistently reported within the analysis.
Results
Predicting A-V profiles from games: reference models
The first baseline prediction was described by error rates observed in the control task. Using a set of predictors to predict A-V coefficients of the same session using a ridge regularisation returned an average MAPE of 0.066% and 0.102% for intercept and slope, respectively.
As shown in Fig. 4, we observed likely different MAPE values between intercept and slope predictions of A-V profiles. For this reason, we considered linearly re-scaled coefficients due to range and variance differences (averaged range = 0.325 and range = 3.98; and for the slope and the intercept, respectively). Accordingly, a two-way repeated measure ANOVA showed a slight trend in favour of an easier prediction task on A-V intercept ().
Figure 4.

Distributions of MAPE regarding multi-modal and uni-modal ensemble forecasting models.
Average ensembles were built following a model selection of a large set of time-series forecasting models . In the uni-modal approach, the forecasting models which provided the lowest MAPE in prediction were Prophet47, Theta, FourTheta48, and Fast Fourier Transform based. As expected, the combination of these models into an averaged ensemble provided the best performances (see Fig. 5 for examples). In the multi-modal approach, the retained forecasting models were VARIMA49, RNN-LSTM, and auto-regressive encoder-decoder Transformer50. In this case, RNN-LSTM and the averaged ensemble provided the best performances for predicting A-V slopes and intercepts, respectively. However, multi-modal averaged ensembles provided only a slight trend for a greater accuracy and were not significantly more accurate than univariate ensembles models on average (). Overall synthesis of the selected forecasting models and their performances are presented in Table 1.
Figure 5.
Example A-V profiles slopes forecasting using the uni-modal averaged ensemble. (a) represents the best prediction, (b) is the median prediction. Note that the red line represents the prediction made on the testing data set.
Table 1.
Average MAPE for each selected model.
| Models | MAPEslope | MAPEintercept | Multi-modalityb |
|---|---|---|---|
| Prophet | 0.134 | 0.095 | × |
| Theta | 0.150a | 0.096 | × |
| FourTheta | 0.120 | 0.085 | × |
| FFT | 0.161 | 0.121 | × |
| Ensemble | 0.115 | 0.081 | × |
| VARIMA | 0.162 | 0.127 | ✓ |
| RNN-LSTM | 0.111 | 0.099 | ✓ |
| Transformers | 0.120 | 0.075 | ✓ |
| Ensemble | 0.113 | 0.072 | ✓ |
Significant values are in [bold].
aAdditive seasonality.
bMulti-modal models required longer time-series. We limit the study of these models to time-series larger than 40 observations.
In comparison to the simplest scenario in which A-V profiles coefficients are predicted from a set of commercial features from the day of A-V realisation (, the control task), averaged ensembles forecasting models tended to be less accurate ().
Time series forecasting models are considered as a reference for further performance predictions and model comparisons.
A multivariate modeling using data from past training sessions and games
Analysis of re-scaled MAPE a lower error rate when predicting the intercept coefficients (). Post-hoc comparisons showed that multivariate LSTM and ridge regression that used data from past training and games sessions provided a higher error rate than the ridge regression of the control scenario. However, no significant differences in MAPE were observed between multivariate time-series forecasting models and regularised regression (LSTM and ridge regularisation). On average, individually fitted models did not provide lower prediction errors than those fitted on the group (p = 0.381). Except for univariate time-series forecasting models which only considered data from games and multivariate LSTM, there was no advantage of using the exponentially weighted aggregation (refer to Eq. 2 for details) over a simple aggregation according to the mean ().
No significant difference was reported between averaged intercept and slope predictions in models that used features extracted from raw data. Only a slight trend for a lower MAPE was imputed to intercept predictions (). In addition, individual and group computed LSTM provided similar performances in terms of accuracy (p = 0.775). That discarded any advantage of building models per player for predictions.
An overview of model performances showed that in average, the control task (the prediction of A-V profiles from commercial features of the same game) provided a lower error rate than any modeling task using past data (). In addition, neither models that used commercial features, nor models that considered new features extracted from raw data outperformed the time-series forecasting ensembles (p = 0.124, see Fig. 6). No significant differences in error rate distribution were found between the source of features (p = 0.453).
Figure 6.

Distributions of models’ MAPE.
Discussion
This study compared two multivariate modeling approaches that use commercial features and features extracted from GNSS raw data to a time-series forecasting approach. We considered the last as reference models that account for past events for predicting the A-V profiles of the next game. Beforehand, predictions made using predictors of the game of A-V profile realisation (i.e. control models) informed their predictability in ecological conditions.
Concerning the reference models, performing multi-modal forecasting might provide better results, but it also requires a larger sample size than uni-modal forecasting methods to estimate model parameters correctly. Accordingly, we filtered out players who performed less than 40 games for computing multi-modal forecasting models. Only nine players (out of 42) were retained for prediction, whereas the uni-modal task included data from a larger population (19 players). Therefore, the sample size heterogeneity should be considered when interpreting the forecasting results since a larger sample size might reasonably provide different, if not better, forecasting performances.
A practical limitation of univariate forecasting models is that we only consider game data for prediction. Hence, interpretations drawn from each forecast are restricted to the effect of preceding games on the next game, and the contribution of training sessions preceding a performance remains unclear. Accordingly, technical and medical staff around players should exploit multivariate models for detecting key performance indicators (KPIs) of A-V changes, or any other outcome51.
Being based on commercial summarised statistics (i.e. returned by the manufacturer, see Appendix 1, Table A1) or features extracted from the velocity vector, it was likely easier for the model to predict A-V profiles’ intercepts than the slopes. A greater variance allocated to this parameter may reasonably explain that finding, easing the estimation of the coefficient regarding a random error. In practice, a small change in the A-V profile may result in a substantial modification of the theoretical normalised force output (i.e. the acceleration) at the onset of maximal locomotor activities.
When comparing multivariate to reference forecasting models (uni-modal and multi-modal), no significant differences in error rates suggest that features describing past sessions were not informative enough to improve predictions. Accordingly, time stands as a significant predictor variable of subsequent events.
In addition, ridge regularisation used pooled features according to a simple aggregation by the mean or exponential smoothing. However, changing one pooling method to another did not lower prediction error rates. At first glance, that indicates either a limited relevance of the explanatory features used in the model or a lack of A-V profile predictability. However, the low error rates of predictions provided by control models allow us to support a reasonable A-V profile predictability.
In a small sample context, using a larger population may lead to more robust estimates of parameter coefficients. One possible way could be to build models over a group of players instead of a model per player32. Our results did not confirm such benefits since there is no benefit to using player-specific models for predicting A-V profiles with the current data.
An overall analysis and model comparison highlight that despite slight differences between top and bottom model ranking (see Table 2), no significant differences in prediction errors were reported. Accordingly, neither commercial, new features extracted from time and frequency domain analysis nor the pooling methods and model framework (time-series forecasting or multivariate regressions) led to significantly better prediction of A-V parameters. Once again, it questions the relevancy of GNSS-based features for modeling physiological adaptations to training52 or their value for explaining outcomes under a substantial opponent influence. It is essential to point out the lack of information for the GNSS signal quality. As mentioned in the introduction, GNSS signal accuracy relies on time/frequency and spatial parameters. The receiver manufacturer used in our study (Fieldwiz V1, CH) did not store any spatial accuracy factor such as horizontal dilution of precision. Therefore, we recommend the manufacturer to report signal quality details for practical use and research purposes53,54. Nevertheless, using features not based on expert hypotheses but fully extracted from signal processing methods appeared to be as valuable as the commercial ones. It leverages information that could be drawn from GNSS data and opens the way to future works on data mining and knowledge discovery in the sports field. However, this perspective comes with feature interpretability issues, particularly those related to the frequency domain.
Table 2.
Summary of models performances according to intercept and slope coefficients.
| Model | Target | Population | Aggregation | |
|---|---|---|---|---|
| Multi-modal Ensemble | Intercept | I | N/A | 0.076 |
| LSTM (raw) | Intercept | G | N/A | 0.077 |
| Ridge | Intercept | G | Exponential | 0.080 |
| Ridge | Intercept | G | Mean | 0.080 |
| Uni-modal Ensemble | Intercept | I | N/A | 0.080 |
| LSTM | Intercept | G | Exponential | 0.084 |
| LSTM | Intercept | G | Mean | 0.084 |
| Ridge | Intercept | I | Mean | 0.085 |
| Ridge | Intercept | I | Exponential | 0.085 |
| LSTM (raw) | Intercept | I | N/A | 0.088 |
| LSTM | Intercept | I | Mean | 0.089 |
| LSTM | Intercept | I | Exponential | 0.090 |
| LSTM | Slope | G | Mean | 0.114 |
| LSTM | Slope | G | Exponential | 0.114 |
| Uni-modal Ensemble | Slope | I | N/A | 0.115 |
| Multi-modal Ensemble | Slope | I | N/A | 0.116 |
| RIDGE | Slope | G | Mean | 0.116 |
| RIDGE | Slope | G | Exponential | 0.116 |
| LSTM (raw) | Slope | G | N/A | 0.119 |
| LSTM (raw) | Slope | I | N/A | 0.121 |
| LSTM | Slope | I | Mean | 0.126 |
| RIDGE | Slope | I | Mean | 0.128 |
| LSTM | Slope | I | Exponential | 0.128 |
| RIDGE | Slope | I | Exponential | 0.129 |
represents the averaged MAPE over individuals and validation folders. The population represents either models computed over the group of players (G) or individually computed models (I).
Significant values are in [bold].
Features retained for regression after the feature selection procedure reveal KPIs of A-V profile changes. Based on a top ten representation (see Appendix 1, Table A2), we could state that the distance covered at high intensity is not necessarily the highest value when regarding other variables, such as the number of accelerations and decelerations for specific intensity bands. Such KPIs should help guide field and resistance training regarding individual objectives. However, since multivariate models suffered from explanatory power regarding reference models (i.e. univariate forecasting models), interpretation of the selected features for practical application should be made with caution.
Finally, when considering re-scaled MAPE, prediction errors varied between 7% and 10%. We believe this is an acceptable accuracy since the A-V profile depends on unmeasured and uncontrolled factors, namely the opponent activity, then any psychological, environmental, or nutritional aspects. Therefore, GNSS wearable sensors could stand of value, though limited, for prediction purposes and more generally included in athlete monitoring systems while estimating external training loads9. Regardless, in light of the above limits, monitoring processes should be carried out under a more data-informed than the data-driven approach in which external training indicators are monitored along with internal markers, and environmental factors55.
In our study, we provided a simple estimation of the A-V predictability through the control task, which benefits from the relationship between the commercial features and the modelled profiles of a given game. However, a deeper analysis of the A-V estimator noise and heteroscedasticity of the outcome variables should be carried out in a future study.
Beyond predictive applications, A-V profiles provide relevant insights regarding the theoretical maximal isometric force of hip extensors (i.e. through the profile intercept) and the capacity to produce a significant level of horizontal force at high velocities (i.e. according to the slope of A-V relationship). These mechanical factors may be key determinants of soft-tissue injury occurrence56, short sprint performances57 and could guide individual training prescription.
The technological rise provides higher sampling frequency systems (e.g. IMU and motion capture systems) as compared to GNSS devices, intended for discriminating exercise and its demand in ecological conditions. A physiological representation of the responses to exercise may be therefore extracted. Besides, going through raw data recorded by these systems may contribute to solving the enigma of injury occurrence, which remains a hot research area in sports science with major economic repercussions58,59.
Conclusion
In this study, we aimed at modeling coefficients of individual A-V profiles. For this purpose, we first considered time-series forecasting models, which used data from games only as the baseline of models’ performances. Then, multivariate modeling approaches were compared to these baseline models with a regression task using a regularised linear regression (ridge) and a neural network architecture (RNN-LSTM). Two distinct functions were employed to aggregate training sessions predictors; a mean and an exponential weighting function (both of them are defined in Eqs. 1 and 2). Finally, we extracted new signal processing features from the GNSS raw data and assessed their contribution to the modeling process. We recall that except for time-series forecasting, models were fitted either per player or over the group of players. Overall, no method showed significant better performances in prediction than the time-series forecasting. Global navigation satellite system features seemed to be of limited relevance for predicting individual in-situ A-V profiles. However, time-domain and frequency-domain features extracted from the raw data outlined the potential of signal processing methods for extracting new information. That opens up new perspectives in athletic performance or injury occurrence modeling, using IMU and movement tracking systems concurrently.
Key points
Global navigation satellite systems are valuable for modeling in-situ A-V profiles. However, its predictability using GNSS-derived features from training sessions remains limited.
Multivariate modeling highlights key performance indicators of A-V changes among commercial, training-related features. Alternatively, signal processing methods pave the way to new modeling perspectives of performance and injury modeling, mainly if applied to measurement systems with higher sampling rates (e.g. IMUs).
A-V time-derived features are likely as relevant as GNSS-based features for explaining changes in A-V profiles. It emphasizes the necessity for multidimensional modeling while considering the opponent’s activity, psychological and environmental factors.
Supplementary Information
Acknowledgements
We are grateful to Christian Schmidt for collaboration and sharing data sets.
Author contributions
Conceptualisation, F.I., W.R., V.L., R.C. (Romain Chailan); methodology and investigation, F.I., W.R., V.L., R.C. (Romain Chailan); recruitment, S.P.; formal analysis and data curation F.I., W.R., V.L., R.C. (Romain Chailan); resources S.P.; writing original draft preparation, F.I.; writing-review and editing, F.I., W.R., V.L., R.C. (Robin Candau), S.P.; visualisation, F.I.; supervision, W.R., S.P.; project administration, S.P., R.C. (Robin Candau); funding acquisition, F.I. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Association Nationale de la Recherche et de la Technologie (ANRT) Grant number 2018/0653.
Data availability
The data sets generated during and/or analysed during the current study are not publicly available due to property of FC Lucerne but are available from the corresponding author on reasonable request.
Competing interest
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Waleed Ragheb and Valentin Leveau.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-022-19484-y.
References
- 1.Jennings D, Cormack S, Coutts AJ, Boyd L, Aughey RJ. The validity and reliability of gps units for measuring distance in team sport specific running patterns. Int. J. Sports Pysiol. Perform. 2010;5:328–341. doi: 10.1123/ijspp.5.3.328. [DOI] [PubMed] [Google Scholar]
- 2.Buchheit M, et al. Monitoring accelerations with gps in football: Time to slow down? Int. J. Sports Physiol. Perform. 2014;9:442–445. doi: 10.1123/ijspp.2013-0187. [DOI] [PubMed] [Google Scholar]
- 3.Akenhead R, Nassis GP. Training load and player monitoring in high-level football: Current practice and perceptions. Int. J. Sports Physiol. Perform. 2016;11:587–593. doi: 10.1123/ijspp.2015-0331. [DOI] [PubMed] [Google Scholar]
- 4.Bourdon PC, et al. Monitoring athlete training loads: Consensus statement. Int. J. Sports Physiol. Perform. 2017;12:S2–161. doi: 10.1123/ijspp.2016-0095. [DOI] [PubMed] [Google Scholar]
- 5.Cardinale M, Varley MC. Wearable training-monitoring technology: Applications, challenges, and opportunities. Ind. J. Sports Physiol. Perform. 2017;12:S2–55. doi: 10.1123/ijspp.2016-0095. [DOI] [PubMed] [Google Scholar]
- 6.Malone JJ, Lovell R, Varley MC, Coutts AJ. Unpacking the black box: Applications and considerations for using gps devices in sport. Int. J. Sports Physiol. Perform. 2017;12:S2–18. doi: 10.1123/ijspp.2016-0090. [DOI] [PubMed] [Google Scholar]
- 7.Coppalle S, et al. Relationship of pre-season training load with in-season biochemical markers, injuries and performance in professional soccer players. Front. Physiol. 2019;10:409. doi: 10.3389/fphys.2019.00409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kupperman N, Hertel J. Global positioning system-derived workload metrics and injury risk in team-based field sports: A systematic review. J. Athl. Train. 2020;55:931–943. doi: 10.4085/1062-6050-473-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ravé, G., Granacher, U., Boullosa, D., Hackney, A. C. & Zouhal, H. How to use global positioning systems (gps) data to monitor training load in the “real world” of elite soccer. Front. Physiol.11 (2020). [DOI] [PMC free article] [PubMed]
- 10.Ryan S, Kempton T, Coutts AJ. Data reduction approaches to athlete monitoring in professional australian football. Int. J. Sports Physiol. Perform. 2020;1:1–7. doi: 10.1123/ijspp.2020-0083. [DOI] [PubMed] [Google Scholar]
- 11.Theodoropoulos, J. S., Bettle, J. & Kosy, J. D. The use of gps and inertial devices for player monitoring in team sports: A review of current and future applications. Orthop. Rev.12 (2020). [DOI] [PMC free article] [PubMed]
- 12.Gómez-Carmona, C. D., Bastida-Castillo, A., Ibáñez, S. J. & Pino-Ortega, J. Accelerometry as a method for external workload monitoring in invasion team sports. a systematic review. PloS ONE15, e0236643 (2020). [DOI] [PMC free article] [PubMed]
- 13.Rossi A, et al. Effective injury forecasting in soccer with gps training data and machine learning. PloS ONE. 2018;13:e0201264. doi: 10.1371/journal.pone.0201264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Claudino JG, et al. Current approaches to the use of artificial intelligence for injury risk assessment and performance prediction in team sports: A systematic review. Sports Med. Open. 2019;5:1–12. doi: 10.1186/s40798-019-0202-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Maupin D, Schram B, Canetti E, Orr R. The relationship between acute: Chronic workload ratios and injury risk in sports: A systematic review. J. Sports Med. 2020;11:51. doi: 10.2147/OAJSM.S231405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Vallance E, Sutton-Charani N, Imoussaten A, Montmain J, Perrey S. Combining internal-and external-training-loads to predict non-contact injuries in soccer. Appl. Sci. 2020;10:5261. doi: 10.3390/app10155261. [DOI] [Google Scholar]
- 17.Osgnach C, Poser S, Bernardini R, Rinaldo R, Di Prampero PE. Energy cost and metabolic power in elite soccer: A new match analysis approach. Med. Sci. Sports Exerc. 2010;42:170–178. doi: 10.1249/MSS.0b013e3181ae5cfd. [DOI] [PubMed] [Google Scholar]
- 18.Barrett S, Midgley A, Lovell R. PlayerloadTM: reliability, convergent validity, and influence of unit position during treadmill running. Int. J. Sports Physiol. Perform. 2014;9:945–952. doi: 10.1123/ijspp.2013-0418. [DOI] [PubMed] [Google Scholar]
- 19.Di Prampero PE, Botter A, Osgnach C. The energy cost of sprint running and the role of metabolic power in setting top performances. Eur. J. Appl. Physiol. 2015;115:451–469. doi: 10.1007/s00421-014-3086-4. [DOI] [PubMed] [Google Scholar]
- 20.Scott MT, Scott TJ, Kelly VG. The validity and reliability of global positioning systems in team sport: A brief review. J. Strength Cond. Res. 2016;30:1470–1490. doi: 10.1519/JSC.0000000000001221. [DOI] [PubMed] [Google Scholar]
- 21.Crang, Z. L. et al. The validity and reliability of wearable microtechnology for intermittent team sports: A systematic review. Sports Med. 1–17 (2020). [DOI] [PubMed]
- 22.Zhang Q, Chen Z, Rong F, Cui Y. Preliminary availability assessment of multi-gnss: A global scale analysis. IEEE Access. 2019;7:146813–146820. doi: 10.1109/ACCESS.2019.2946221. [DOI] [Google Scholar]
- 23.Morin J-B, et al. Individual acceleration-speed profile in-situ: A proof of concept in professional football players. J. Biomech. 2021;123:110524. doi: 10.1016/j.jbiomech.2021.110524. [DOI] [PubMed] [Google Scholar]
- 24.Forrest D, Simmons R. Forecasting sport: The behaviour and performance of football tipsters. Int. J. Forecast. 2000;16:317–331. doi: 10.1016/S0169-2070(00)00050-9. [DOI] [Google Scholar]
- 25.Koopman SJ, Lit R. Forecasting football match results in national league competitions using score-driven time series models. Int. J. Forecast. 2019;35:797–809. doi: 10.1016/j.ijforecast.2018.10.011. [DOI] [Google Scholar]
- 26.Do, H. D. et al. Time series forecasting with data transform and its application in sport. In RICE, 29–32 (2021).
- 27.Hsu Y-C. Using convolutional neural network and candlestick representation to predict sports match outcomes. Appl. Sci. 2021;11:6594. doi: 10.3390/app11146594. [DOI] [Google Scholar]
- 28.Miller R, Schwarz H, Talke IS. Forecasting sports popularity: Application of time series analysis. Acad. J. Interdiscip. Stud. 2017;6:75. doi: 10.1515/ajis-2017-0009. [DOI] [Google Scholar]
- 29.Sands WA, Kavanaugh AA, Murray SR, McNeal JR, Jemni M. Modern techniques and technologies applied to training and performance monitoring. Int. J. Sports Physiol. Perform. 2017;12:S2–63. doi: 10.1123/ijspp.2016-0405. [DOI] [PubMed] [Google Scholar]
- 30.Macdonald, B. Adjusted plus-minus for nhl players using ridge regression with goals, shots, fenwick, and corsi. J. Quant. Anal. Sports8 (2012).
- 31.Kostrzewa M, et al. Significant predictors of sports performance in elite men judo athletes based on multidimensional regression models. Int. J. Environ. Res. Public Health. 2020;17:8192. doi: 10.3390/ijerph17218192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Imbach F, Perrey S, Chailan R, Meline T, Candau R. Training load responses modelling and model generalisation in elite sports. Sci. Rep. 2022;12:1–14. doi: 10.1038/s41598-022-05392-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970;12:55–67. doi: 10.1080/00401706.1970.10488634. [DOI] [Google Scholar]
- 34.Marquardt DW, Snee RD. Ridge regression in practice. Am. Stat. 1975;29:3–20. [Google Scholar]
- 35.Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
- 36.Tsunoda, T., Komori, Y., Matsugu, M. & Harada, T. Football action recognition using hierarchical lstm. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 99–107 (2017).
- 37.Chen J, Samuel RDJ, Poovendran P. Lstm with bio inspired algorithm for action recognition in sports videos. Image Vis. Comput. 2021;112:104214. doi: 10.1016/j.imavis.2021.104214. [DOI] [Google Scholar]
- 38.Guo J, Liu H, Li X, Xu D, Zhang Y. An attention enhanced spatial-temporal graph convolutional lstm network for action recognition in karate. Appl. Sci. 2021;11:8641. doi: 10.3390/app11188641. [DOI] [Google Scholar]
- 39.Uddin MZ, Soylu A. Human activity recognition using wearable sensors, discriminant analysis, and long short-term memory-based neural structured learning. Sci. Rep. 2021;11:1–15. doi: 10.1038/s41598-020-79139-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ullah M, et al. Attention-based lstm network for action recognition in sports. Electron. Imag. 2021;2021:302–1. [Google Scholar]
- 41.Zhang, Q. et al. Sports match prediction model for training and exercise using attention-based lstm network. Digit. Commun. Netw. (2021).
- 42.Thanjavur K, et al. Recurrent neural network-based acute concussion classifier using raw resting state eeg data. Sci. Rep. 2021;11:1–19. doi: 10.1038/s41598-021-91614-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Cox, D. R. & Miller, H. D. The Theory of Stochastic Processes (Routledge, 2017).
- 44.Fuller, W. A. Introduction to Statistical Time Series, vol. 428 (Wiley, 2009).
- 45.Christ M, Braun N, Neuffer J, Kempa-Liehr AW. Time series feature extraction on basis of scalable hypothesis tests (tsfresh-a python package) Neurocomputing. 2018;307:72–77. doi: 10.1016/j.neucom.2018.03.067. [DOI] [Google Scholar]
- 46.Christ, M., Braun, N. & Neuffer, J. Overview on time series feature extraction (tsfresh–a python package).
- 47.Taylor SJ, Letham B. Forecasting at scale. Am. Stat. 2018;72:37–45. doi: 10.1080/00031305.2017.1380080. [DOI] [Google Scholar]
- 48.Assimakopoulos V, Nikolopoulos K. The theta model: A decomposition approach to forecasting. Int. J. Forecast. 2000;16:521–530. doi: 10.1016/S0169-2070(00)00066-2. [DOI] [Google Scholar]
- 49.Tiao GC, Box GE. Modeling multiple time series with applications. J. Am. Stat. Assoc. 1981;76:802–816. [Google Scholar]
- 50.Vaswani, A. et al. Attention is all you need. In Advances in neural information processing systems, 5998–6008 (2017).
- 51.Schelling X, Robertson S. A development framework for decision support systems in high-performance sport. Int. J. Comput. Sci. Sport. 2020;19:1–23. doi: 10.2478/ijcss-2020-0001. [DOI] [Google Scholar]
- 52.Hader, K. et al. Monitoring the athlete match response: Can external load variables predict post-match acute and residual fatigue in soccer? a systematic review with meta-analysis. Sports Med.-Open5, 1–19 (2019). [DOI] [PMC free article] [PubMed]
- 53.Principe, V. A., Vale, R. G. d. S. & Nunes, R. d. A. M. A systematic review of load control in football using a global navigation satellite system (gnss). Motriz: Revista de Educacão Fisica26 (2020).
- 54.Rico-González M, Los Arcos A, Clemente FM, Rojas-Valverde D, Pino-Ortega J. Accuracy and reliability of local positioning systems for measuring sport movement patterns in stadium-scale: A systematic review. Appl. Sci. 2020;10:5994. doi: 10.3390/app10175994. [DOI] [Google Scholar]
- 55.Montull L, Slapšinskaitė-Dackevičienė A, Kiely J, Hristovski R, Balagué N. Integrative proposals of sports monitoring: Subjective outperforms objective monitoring. Sports Med. Open. 2022;8:1–10. doi: 10.1186/s40798-021-00382-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Clark RA. Hamstring injuries: Risk assessment and injury prevention. Ann. Acad. Med. Singap. 2008;37:341. [PubMed] [Google Scholar]
- 57.Buchheit M, et al. Mechanical determinants of acceleration and maximal sprinting speed in highly trained young soccer players. J. Sports Sci. 2014;32:1906–1913. doi: 10.1080/02640414.2014.965191. [DOI] [PubMed] [Google Scholar]
- 58.McMahon, B. Report estimates the cost of injuries to premier league players at \$267m (2019).
- 59.Eliakim E, Morgulev E, Lidor R, Meckel Y. Estimation of injury costs: Financial damage of english premier league teams’ underachievement due to injuries. BMJ Open Sport Exerc. Med. 2020;6:e000675. doi: 10.1136/bmjsem-2019-000675. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data sets generated during and/or analysed during the current study are not publicly available due to property of FC Lucerne but are available from the corresponding author on reasonable request.



