Abstract
This study presents the application of regression shrinkage and artificial neural networks in predicting the results of 400-metres hurdles races. The regression models predict the results for suggested training loads in the selected three-month training period. The material of the research was based on training data of 21 Polish hurdlers from the Polish National Athletics Team Association. The athletes were characterized by a high level of performance. To assess the predictive ability of the constructed models a method of leave-one-out cross-validation was used. The analysis showed that the method generating the smallest prediction error was the LASSO regression extended by quadratic terms. The optimal model generated the prediction error of 0.59 s. Otherwise the optimal set of input variables (by reducing 8 of the 27 predictors) was defined. The results obtained justify the use of regression shrinkage in predicting sports outcomes. The resulting model can be used as a tool to assist the coach in planning training loads in a selected training period.
Keywords: Predicting in sport, 400-metres hurdles, Regression shrinkage, Neural modelling
INTRODUCTION
In times of ubiquitous information technology coaches and athletes are able to use advanced mathematical methods in modelling the training process. These methods certainly include advanced regression models and artificial neural networks (ANN) and they are applied in various aspects of optimization of sports training [1, 2, 3].
One of the most complex athletic competitions is the 400-metres hurdles. It combines the elements of all motor abilities [4]. Research on hurdles is mainly concerned with kinematics [5], physiology and biochemistry of performance [6, 7] and the impact of external factors (e.g., wind) on the results in the 400-metres hurdles [8]. The problem of training loads is a topic rarely discussed in the context of training hurdlers. In many sport events, the planning of the training process is conducted on the basis of the practical experience of the coach, and it lacks a scientific background. Therefore, it is very important to find and verify different ways of training plan optimization. Thanks to the advanced computational methods, the expected level of athlete’s development can be modelled and the optimal training load can be generated, so that the athlete can achieve the desired result [9, 10, 11]. Selection of a suitable training load taking into account the intensity and amount of work to be done should also include the individual capabilities of the athlete’s body, as well as its reaction to the applied load. If this principle is not followed, the body can be overloaded (overtrained), and the further development of the athlete’s abilities may be inhibited.
There is a lot of research describing the use of complex linear and non-linear models (including ANN) in sport training [12]. An example of a practical solution is the research by Maszczyk [13], which shows the linear regression models and ANN used for selecting future competitors in javelin throw. Many applications concern the planning and optimization of training loads [2, 9, 14]. In the paper by Przednowek and Wiktorowicz [10] the shrinkage model performed the tasks of predicting results in race walking. The rise in levels of achievement in sports leads to the creation of new opportunities for coaches and athletes through the modelling of sports training.
The aim of this study is to examine regression shrinkage models and artificial neural networks in predicting the results of 400-metres hurdles races in a three-month training period. This examination is based on training data of athletes whose level of sport abilities was very high.
MATERIALS AND METHODS
The material of the research was based on training data of 21 Polish hurdlers taking part in competitions between 1989 and 2011, who were distinguished by their high level of performance (score for 400-metres hurdles: 51.26±1.24 s). The competitors were members of the Polish National Team representing Poland at the Olympic Games, and World and European Championships in the age categories of juniors, colts and seniors. To build the models of predicting the result of 400-metres hurdles, 28 variables were used. The set of variables included one dependent variable and 27 independent variables. The dependent variable was the result of run (y 1). Independent variables included athlete’s parameters (x1 - x3), variables representing training periods (x4 - x5), training loads developing speed (x6 - x8), training loads developing endurance (x9 - x12), training loads developing run endurance (x13 - x14), training loads developing strength (x15 - x21) and training loads developing technique and rhythm (x22 - x27). Table 1 presents the variables under consideration and their basic statistics.
TABLE 1.
Characteristics of the variables used to construct the models.
| Variable | Description | xmin | xmax | sd | V | |
|---|---|---|---|---|---|---|
| y | Expected result in 500 m (s) | 65.2 | 60.9 | 71.2 | 2.1 | 3.2 |
| x1 | Age (years) | 22.3 | 19.0 | 27.0 | 2.0 | 8.8 |
| x2 | BMI | 21.7 | 19.7 | 24.1 | 1.0 | 4.7 |
| x3 | Current result in 500 m (s) | 66.4 | 61.5 | 72.1 | 2.0 | 3.0 |
| x4 | General preparation period* | - | - | - | - | - |
| x5 | Special preparation period* | - | - | - | - | - |
| x6 | Maximal speed (m) | 1395 | 0 | 4300 | 799 | 57.3 |
| x7 | Technical speed (m) | 1748 | 0 | 7550 | 1293 | 74.0 |
| x8 | Technical and speed exercises (m) | 1418 | 0 | 5100 | 840 | 59.2 |
| x9 | Speed endurance (m) | 4218 | 0 | 93670 | 7985 | 189.3 |
| x10 | Specific hurdle endurance (m) | 4229 | 0 | 13700 | 2304 | 54.5 |
| x11 | Pace runs (m) | 54599 | 0 | 211400 | 37070 | 67.9 |
| x12 | Aerobic endurance (m) | 121086 | 4800 | 442100 | 75661 | 62.5 |
| x13 | Strength endurance I (m) | 8690 | 0 | 31300 | 6806 | 78.3 |
| x14 | Strength endurance II (amount) | 1999.8 | 0 | 21350 | 2616 | 130.8 |
| x15 | General strength of lower limbs (kg) | 41353 | 0 | 216100 | 35566 | 86.0 |
| x16 | Directed strength of lower limbs (kg) | 19460 | 0 | 72600 | 12540 | 64.4 |
| x17 | Specific strength of lower limbs (kg) | 13887 | 0 | 156650 | 16096 | 115.9 |
| x18 | Trunk strength (amount) | 15480 | 0 | 200000 | 21921.4 | 141.6 |
| x19 | Upper body strength (kg) | 1102 | 0 | 24960 | 2121 | 192.5 |
| x20 | Explosive strength of lower limbs (amount) | 274.7 | 0 | 1203 | 190.4 | 69.3 |
| x21 | Explosive strength of upper limbs (amount) | 147.9 | 0 | 520 | 116.1 | 78.5 |
| x22 | Technical exercises – walking pace (min) | 141.6 | 0 | 420 | 109.7 | 77.5 |
| x23 | Technical exercises – running pace (min) | 172.9 | 0 | 920 | 135.7 | 78.4 |
| x24 | Runs over 1-3 hurdles (amount) | 31.9 | 0 | 148 | 30.3 | 95.0 |
| x25 | Runs over 4-7 hurdles (amount) | 56.5 | 0 | 188 | 51.6 | 91.3 |
| x26 | Runs over 8-12 hurdles (amount) | 50.5 | 0 | 232 | 52.7 | 104.3 |
| x27 | Hurdle runs in varied rhythm (amount) | 285.7 | 0 | 1020 | 208.7 | 73.0 |
– in accordance with the rule of introducing a qualitative variable of a “training period type” with the value of general preparation period, special preparation period and starting period was replaced with two variables, x4 and x5, holding the value of 1 or 0.
The collected data involved 144 training programmes. The registered training was used in one of the three periods during the annual cycle of training, lasting three months each (general preparation, special preparation and starting period). As the inputs, in addition to training loads, parameters and the current result of the competitor are used (Fig. 1). The system generates the predicted result for a 500 m run which will be obtained by a competitor after the completion of the proposed training. This allows the coach to observe the possible effects of the modification of individual values of training loads.
FIG. 1.
The model of predicting the result for 400-metres hurdles races.
Prediction of the result in terms of the period of training for the 400-metres hurdles requires a definition of the training indicator. This is due to the fact that the use of a track test for the 400-metres hurdles in each of the analysed periods of the annual cycle is technically impossible (e.g., in winter). Therefore the result of a 500 m flat run was used as the criterion over the selected periods [15, 16]. The correlation between the results obtained for 500 m and 400-metres hurdles races during the starting period is very strong (rxy = 0.84), demonstrating the statistic significance level α = 0.001 (Fig. 2). Therefore it confirms the validity of choosing the result for 500 m as a dependent variable in the construction of models.
FIG. 2.
Correlation between 500 m flat run and 400-metres hurdle races.
Methods of regression shrinkage
While the number of input variables is greater than the number of patterns, or the input variables are correlated, we say that the problem is ill-posed or ill-conditioned. There are many methods used to improve the conditioning of the problem, one of them being regression shrinkage [17], which includes ridge, LASSO and elastic net regression.
In ridge regression models [18] a parameter λ is selected, which determines the additional penalty associated with the regression coefficients. The larger the λ parameter, the greater is the penalty imposed on the weights. The solution is obtained solving:
In LASSO regression as well as in ridge regression a penalty is added to the quality criterion. The difference is that in ridge regression the penalty is the sum of squares, while in the LASSO model the penalty has the form of the sum of absolute values [19]. The solution is obtained solving
To implement the LASSO regression the LARS algorithm is used [20]. In this algorithm the penalty depends on the parameter s, whose value ranges from 0 to 1. Using these methods we can also select the optimal input set.
Elastic net regression combines the functionality of LASSO and ridge regression [21]. In this regression, there are two parameters (λ1, λ2) determining the penalties imposed on the model parameters. To solve the problem
the LARS-EN algorithm is used, which is a modified LARS algorithm. In this algorithm the penalty is decided by the parameter s (as in LASSO) and the parameter λ.
Extensions of the linear model
In order to obtain better prediction, a model consisting of the linear and non-linear part was calculated. The linear part is equal to the best linear regression, and the nonlinear part has the form of a quadratic function of selected predictors (x1, x2, x3 – parameters of the athlete). The extended model has the following form:
All regression models (along with modification and evaluation) were calculated in the R programming language using extra packages [22].
Neural models
In this study the Multilayer Perceptron (MLP) is used, which is the most common type of artificial neural networks [23]. It requires iterative learning that is sometimes time-consuming, but the resulting networks are small (in structure) and quick, and give satisfactory results. Learning of networks was implemented using the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm, and exponential (exp) and hyperbolic tangent (tanh) functions were used as the activation functions of hidden neurons. All the analysed networks have only one hidden layer. The Statistica 10 software was used for calculations [24].
Selection and validation of models
For selecting and evaluating calculated models the method of crossvalidation (CV) [25] was applied. In this method, the data set is divided into two subsets: learning and testing (validation). The first of them is used to build the model, and the second one to evaluate its prediction ability. There are a few types of cross-validation; in this paper leave-one-out cross-validation (LOOCV) was chosen. In LOOCV the test set is composed of a selected pair of data (xi, xi), and the number of tests is equal to the number of data n. During the cross-validation process two errors were calculated:
where: n – number of patterns, y–1 – the output value of the model built in the i-th step of cross-validation based on a data set containing no testing pair (x1, y1), – the output value of the model built in the i-th step based on the full data set, RMSECV – root mean square error of prediction, RMSET – root mean square error of training. In addition to the prediction error, which was the main criterion, the training error was calculated. This error describes how the model matches the data.
RESULTS
Regression shrinkage
The ordinary least squares (OLS) method generates the prediction error RMSECV = 0.72 s and training error RMSET = 0.57 s. The calculated model has the following form:
In order to obtain a smaller prediction error, the nonlinear terms, in the form of square functions, were added to the OLS regression. After the extension, cross-validation was conducted again. The extended model has the errors RMSECV = 0.63 and RMSE_T = 0.62. Thus it can be assumed that non-linearity can improve the predictive ability of the OLS model. The coefficients of the non-linear part are
The first model of regression shrinkage type is ridge regression. This model, is calculated to find a value of λ for which the model obtains the smallest prediction error. The ridge regression models were calculated for parameter λ in the range from 0 to 20 with steps of 0.1 (Fig. 3). Based on these results, it was found that the best ridge regression is achieved for λ = 3 with RMSECV = 0.71. The training error was RMSECV = 0.57. Similarly as in the case of OLS regression, all weights are non-zero, and thus all the input variables are involved in determining the output of the model.
FIG. 3.
Prediction and training errors for ridge regression; red line marks the best model. inner axis represents the errors
The obtained ridge regression model for predicting the result of 400-metres hurdles races has the form of:
As in the case of OLS regression, the obtained ridge regression model was also modified. The modified model has the prediction error RMSECV = 0.61 and training error RMSECV = 0.60. The weights of the non-linear part are
The next model from the regression shrinkage type is LASSO regression. The model was cross-valuated for s equal from 0 to 1 with s = 0.76 steps of 0.01 (Fig. 4). The best LASSO model was obtained for with the error RMSECV = 0.67. The training error is similar as in the case of ridge regression and is equal to RMSET = 0.58. Additionally, the LASSO method eliminated the predictors x2, x5, x8, x11, x15, x16, x23, x25 (coefficients equal 0), so the model has the form:
FIG. 4.
Prediction and training errors for LASSO regression; red line marks the best model.
The modification of the LASSO model was made involving only two variables (x1, x3). Parameter x2 was not included, because the weight of this parameter in the linear part equals 0. The modified LASSO model generated errors RMSECV = 0.59 and RMSET = 0.59. The coefficients of the non-linear part are equal to
The application of elastic net regression failed to bring any improvement in reducing the prediction error. The best elastic net model was obtained for the pair of parameters s = 0.76 and λ = 0. The results of cross-validation are shown in Figure 5. Due to the fact that the parameter λ is zero, the model is reduced to LASSO regression.
FIG. 5.
Prediction and training errors for elastic net regression.
Artificial neural networks
The neural network models were cross-validated for the number of neurons in the hidden layer changing from 1 to 10. Based on the results presented in Figure 6, the optimal structure of the model was chosen with one neuron in the hidden layer and an exponential function of activation. This network generates the prediction error of RMSECV= 0.72 and training error RMSET= 0.56.
FIG. 6.
Prediction and training errors for MLP with: (a) exp function, (b) tanh function; outer axis represents number of hidden neurons, inner axis represents the errors
For comparison of the models, RMSECV and RMSET errors gener-ated for each method were put together (Table 2).
TABLE 2.
Summary of results.
| Methods | RMSE CV[s] | RMSET[s] |
|---|---|---|
| OLS (ordinary least squares regression) | 0.72 | 0.57 |
| OLS with nonlinear part | 0.63 | 0.62 |
| Ridge regression (λ = 3) | 0.71 | 0.57 |
| Ridge regression with nonlinear part | 0.61 | 0.60 |
| LASSO regression (s = 0.76) | 0.67 | 0.58 |
| LASSO regression with nonlinear part | 0.59 | 0.59 |
| Elastic net regression (s = 0.76, λ = 0) | 0.67 | 0.58 |
| MLP (tanh) 26-1-1* | 0.73 | 0.56 |
| MLP (exp) 26-1-1* | 0.72 | 0.56 |
Note: * – network architecture (number of neurons in the following layers: input-hidden-output).
DISCUSSION
In this article the effectiveness of the use of regression shrinkage and artificial neural networks in predicting the outcome of competitors training for the 400-metres hurdles was verified. The best model validated using LOOCV was the LASSO extended by quadratic terms. The resulting model generates the prediction error RMSECV = 0.59, which confirms the validity of this method in the implementation of the task. Using this model in practice allows for the optimal selection of training loads, and thus supports the achievement of the desired result. The task of predicting outcomes is, from the coach’s point of view, very important in the process of sports training. Using the constructed model, a coach can predict how the training will affect the sports result. These models perform predictions based on the proposed training introduced as the sum of the training loads of each training means applied at a given training phase.
However, the results obtained by neural networks turned out to be disappointing. It is noted that the use of more than one neuron in the hidden layer causes a rapid increase in the prediction error (Fig. 6), in particular, for the exponential activation function (the values outside the range 0-10 s are cut off in order to better illustrate the dependence). Looking at the available studies, it can be stated that neural models are more often used in the implementation of the tasks of predicting sports results than linear regression [1, 2, 9, 12, 13]. In most of these works ANN are characterized by a smaller prediction error. When analyzing the results for the prediction of the outcome for the 400-metres hurdles in terms of the selected training period, it is noted that this situation is not reflected here.
As has been demonstrated in numerous studies the final result is influenced by different factors including coordination capacity, techniques of jumping hurdles and its specific rhythm [26, 27, 28, 29]. All these aspects are important, yet, the results obtained in this paper suggest that the development of a sports result is not directly dependent on the variables BMI and a variable representing the period of special preparation. The eliminated training means included: technical and speed exercises, pace runs, general strength of lower limbs, directed strength of lower limbs, technical exercises – running pace, and runs over 4-7 hurdles (Table 1). All these training means belong to the “target” groups [26]. The results of the analysis confirm the views of researchers of competitive sports, claiming that in highly qualified training these exercises should be limited and the focus should be placed on special training. It is a mistake, however, to draw conclusions leading to statements that the previously mentioned groups of training means should not be used.
Similar conclusions were drawn by Iskra [26] by examining the correlation between the sports level of 400-metres hurdles and the size of training loads. He showed that the previously mentioned training means are characterized by a lack or a very small dependence of the final result for the 400-metres hurdles at a selected stage of the annual cycle. An exception is the exercises strengthening lower limbs and directed strength of lower limbs, where the statistical significance of the correlation with the final result has been demonstrated, unlike in the present study.
In the literature [30, 31], we can find some other attempts to describe the correlation between the training means and the final result for the 400-metres or 110 m hurdles. However, in these analyses data came from a record season. Such an analysis can lead to erroneous conclusions because the sports level of the competitor is often shaped within a long period of time (macrocycle) [32].
In prediction using the resulting LASSO model the result is expected not only on the basis of the suggested training loads but also on the basis of the current result over 500 m, which is, however, shaped within a long macrocycle. Therefore, a small error and the idea of this solution lead to including this model among tools supporting planning training loads in a selected period of the annual cycle. In practice, this allows for prediction of the effects of completing the suggested training and if they are not satisfactory, it enables making necessary corrections before the start of the planned training.
Acknowledgement
This work has been supported by the Polish Ministry of Science and Higher Education within the research project “Development of Academic Sport” in the years 2016-2018, project No. N RSA4 00554.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of the article.
REFERENCES
- 1.Silva A, Costa A, Olivveira P, Reis V, Saavedra J, Perl J, Rouboa A, Marinho D. The use of neural network technology to model swimming performance. Sports Sci Med. 2007;6:117–125. [PMC free article] [PubMed] [Google Scholar]
- 2.Maszczyk A, Roczniok R, Waśkiewicz Z, Czuba M, Mikołajec K, Zając A, Stanula A. Application of regression and neural models to predict competitive swimming performance. Percept Mot Skills. 2012;114(2):610–626. doi: 10.2466/05.10.PMS.114.2.610-626. [DOI] [PubMed] [Google Scholar]
- 3.Roczniok R, Maszczyk A, Stanula A, Czuba M, Pietraszewski P, Kantyka J, Starzyński M. Physiological and physical profiles and on-ice performance approach to predict talent in male youth ice hockey players during draft to hockey team. Isokinet Exerc Sc. 2013;21(2):121–127. [Google Scholar]
- 4.McFarlane B. The science of hurdling and speed. 4th ed. Athletics Canada: Ottawa; 2000. [Google Scholar]
- 5.Chow JW. A panning videographic technique to obtains selected kinematic characteristics of the strides in sprint hurdling. J Appl Biomech. 1998;9:149–159. [Google Scholar]
- 6.Ward-Smith AJ. A mathematical analysis of the bioenergetics of hurdling. J Sport Sci. 1997;15:517–526. doi: 10.1080/026404197367155. [DOI] [PubMed] [Google Scholar]
- 7.Kłapcińska B, Iskra J, Poprzęcki S, Grzesiok K. The effect of sprint (300m) running on plasma lactate, uric acid, creatine kinase and lactate dehydrogenase in competitive hurdlers and untrained men. J Sports Med Phys Fitness. 2001;41:306–311. [PubMed] [Google Scholar]
- 8.Quinn MD. External effects in the 400-m hurdles race. J Appl Biomech. 2010;2:171–179. doi: 10.1123/jab.26.2.171. [DOI] [PubMed] [Google Scholar]
- 9.Ryguła I. Engineering in Medicine and Biology Society 2005. IEEE-EMBS 2005.27th Annual International Conference of the. Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. 2005. Artifical Neural Networks As a Tool of Modeling of Training Loads; pp. 2985–2988. [DOI] [PubMed] [Google Scholar]
- 10.Przednowek K, Wiktorowicz K. Prediction of the result in race walking using regularized regression models. Journal of Theoretical and Applied Computer Science. 2013;7(2):45–58. [Google Scholar]
- 11.Przednowek K, Iskra J, Przednowek KH. Predictive Modeling in 400-Metres Hurdles Races. In: Cabri Jan, Correia Pedro Pezarat, Barreiros João., editors. International Congress on Sports Sciences Research and Technology Support. icSports 2014: Proceedings of the 2nd International Congress on Sports Sciences Research and Technology Support. Rome, Italy: 2014. pp. 137–144. [Google Scholar]
- 12.Pfeiffer M, Hohmann A. Application of neural networks in training science. Hum Mov Sci. 2013;31:344–359. doi: 10.1016/j.humov.2010.11.004. [DOI] [PubMed] [Google Scholar]
- 13.Maszczyk A, Zając A, Ryguła I. A neural Network model approach to athlete selection. Sports Eng. 2011;13:83–93. [Google Scholar]
- 14.Iskra J, Ryguła I. The optimization of training loads in high class hurdles. J Hum Kinet. 2001;6:59–72. [Google Scholar]
- 15.Alejo B. Weight training for the 400 hurdler. Track Technique. 1993;123:3915–3918. [Google Scholar]
- 16.Iskra J. W poszukiwaniu czynników decydujących o poziomie wyników w biegu na 400 m przez płotki. Sport Wyczynowy. 1996;(11–12):35–49. [in Polish] [Google Scholar]
- 17.Hastie T, Tibshiranie R, Friedman J. The Elements of Statistical Learning. 2th ed. New York: Springer Series in Statistics; 2009. [Google Scholar]
- 18.Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67. [Google Scholar]
- 19.Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc. 1996;58(1):267–288. [Google Scholar]
- 20.Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression (with discussion) Ann Stat. 2004;32(2):407–499. [Google Scholar]
- 21.Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc. Series B (Statistical Methodology) 2005;67(2):301–320. [Google Scholar]
- 22.James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning: with Applications in R. New York: Springer Texts in Statistics; 2013. [Google Scholar]
- 23.Bishop C. Neural Networks for Pattern Recognition. Oxford: Oxford University Press; 1995. [Google Scholar]
- 24.StatSoft, Inc. STATISTICA (data analysis software system), version 10. 2011. [Google Scholar]
- 25.Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Stat Surv. 2010;4:40–79. [Google Scholar]
- 26.Iskra J. Morphological and functional determinants of performance levels in hurdle races. Katowice: Academy of Physical Education; 2001. [Google Scholar]
- 27.Iskra J. Athlete typology and training strategy. New Studies in Athletics. 2012;27:9–26. [Google Scholar]
- 28.Balsalobre-Fernández C, TejeroGonzález C, Campo-Vecino J, Alonso-Curiel D. The Effects of a Maximal Power Training Cycle on the Strength, Maximum Power, Vertical Jump Height and Acceleration of High-Level 400-metres Hurdles. J Hum Kinet. 2013;36:119–126. doi: 10.2478/hukin-2013-0012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Iskra J, Čoh M. Biomechanical studies on running the 400 m hurdles. Hum Mov. 2011;12(4):315–323. [Google Scholar]
- 30.Iskra J. Characteristic of the training of European 400m hurdles champion. Modern Athlete and Coach. 1999;37(3):20–23. [Google Scholar]
- 31.Adamczyk J. Relation between result and size of training loads in 400-metre hurdle race of men. Education. Physical Training. Sport. 2009;75(4):5–9. [Google Scholar]
- 32.Bompa TO, Haff GG. Theory and Methodology of Training. 5th edition. Champaign. Illinois: Human Kinetics; 2009. Periodization. [Google Scholar]






