Abstract
Background
Time-to-event modelling is underutilised in sports injury research. Still, sports injury researchers have been encouraged to consider time-to-event analyses as a powerful alternative to other statistical methods. Therefore, it is important to shed light on statistical approaches suitable for analysing training load related key-questions within the sports injury domain.
Content
In the present article, we illuminate: (i) the possibilities of including time-varying outcomes in time-to-event analyses, (ii) how to deal with a situation where different types of sports injuries are included in the analyses (ie, competing risks), and (iii) how to deal with the situation where multiple subsequent injuries occur in the same athlete.
Conclusion
Time-to-event analyses can handle time-varying outcomes, competing risk and multiple subsequent injuries. Although powerful, time-to-event has important requirements: researchers are encouraged to carefully consider prior to any data collection that five injuries per exposure state or transition is needed to avoid conducting statistical analyses on time-to-event data leading to biased results. This requirement becomes particularly difficult to accommodate when a stratified analysis is required as the number of variables increases exponentially for each additional strata included. In future sports injury research, we need stratified analyses if the target of our research is to respond to the question: ‘how much change in training load is too much before injury is sustained, among athletes with different characteristics?’ Responding to this question using multiple time-varying exposures (and outcomes) requires millions of injuries. This should not be a barrier for future research, but collaborations across borders to collecting the amount of data needed seems to be an important step forward.
Keywords: injury, statistics, training load
Introduction
Readers of sports injury science may have heard of survival analysis, which is traditionally used in settings where there is only a single type of mortality-related event (eg, literally ‘survival’ in medical research when dealing with diseases such as cancer). In the sports injury setting, the analytical concept is nearly the same with two differences: (i) the outcome is injury, not death and (ii) unlike death, athletes can sustain multiple sports injuries.
Time-to-event models, of which survival analysis is one type, are frequently used in other research settings. For instance, 57% of all 238 original articles published in the New England Journal of Medicine (NEJM) in 2015 included time-to-event methods.1 This ranks time-to-event methods as the second most frequently used statistical method after calculations of power and sample size.1 In the sports injury context, two authors (MLB and MM) performed an exploratory search and an independent blinded evaluation, similar to the one used in NEJM to identify original articles using time-to-event models published in the British Journal of Sports Medicine (BJSM) in 2017. This search found 66 published original articles of which only 8 (12%) used descriptive and/or comparative time-to-event statistics. Based on this finding, it can be concluded that time-to-event models are rarely used in sports injury research. In corroboration, a total of 103 sports injury-related articles which had used time-to-event models was found in an earlier systematic search between January 1993 and July 2013.2 Of those articles, 88% were published after 2005 perhaps reflecting the increased availability of time-to-event modelling options in common statistical packages and/or a growing recognition of their utility. The overall number of articles, however, remained relatively low given the volume of sports injury data published. Although the reasons for this are unknown, it could be due to either a lack of awareness about the potential utility of time-to-event models, or rather, the perceived difficulty around their application in practice. Alternatively, other approaches such as logistic regression analyses have been prioritised over time-to-event modelling given the traditional pedagogical roots and teachings associated with tertiary education programmes in the health sciences.
In the BJSM article entitled ‘Time-to-Event Analysis for Sports Injury Research Part 1: Time-Varying Exposures’, we argued that time-to-event modelling is well suited to deal with changes in training load as a time-varying exposure to sports injury. Clearly, this is important as it allows researchers to answer the question ‘how much change in training load is ‘too much’ before sports injury is sustained, among athletes with different characteristics?’ In addition, that paper argued that, in contrast to logistic regression analyses and the Χ2 test, time-to-event modelling enables us to consider censoring and the within-athlete correlation between each athlete’s follow-up data by using delayed entry functions. This makes time-to-event modelling more suitable for examining training load-related questions when compared with the analytical approaches that are frequently used in sports injury research to date. Owing to underutilisation of time-to-event modelling in sports injury research, it is necessary to further discuss the statistical approaches that are readily available and suitable for addressing key research questions relating to training load changes within the sports injury domain. Given the focus of part 1 on time-varying exposures, in part 2 we will give attention to the role of time-to-event modelling in relation to time-varying outcomes. As such, this article will elaborate on the following areas: (i) the possibilities of including time-varying outcomes in the time-to-event analysis; (ii) how to deal with a situation where different types of sports injuries are included (ie, competing risk); (iii) how to deal with the situation whereby multiple subsequent injuries occur in the same athlete and (iv) the theoretical assumptions and statistical requirements underpinning the analyses. Consequently, the purpose of this paper is to discuss how the concepts of time-varying outcomes, competing risks and subsequent injuries can be used in time-to-event models to investigate sports injury aetiology. In addition, we present the often-overlooked assumptions and requirements relating to the events-per-variable (EPV) condition and number of injuries. In table 1, a brief overview of the key questions addressed in the manuscript as well as the key point associated with each question is provided.
Table 1.
Question 1
How to deal with time-varying outcomes? |
Key question 1: a researcher collected data on sports injury status (in statistical terms: states) over time in a group of athletes to investigate the aetiology of Achilles tendinopathy. In weekly self-assessments, the athletes classified their injury severity into no Achilles injury, moderate Achilles tendinopathy and severe Achilles tendinopathy. The next step for the researcher is to analyse the data. Here, the question remains: is time-to-event a suitable analytical approach to deal with a time-varying outcome?
Key point 1: Time-to-event models allow for the inclusion of time-varying outcomes using the concept of multistate transitions. To date, there is no universally accepted way to classify sports injury into different outcome states. Sports injury researchers have the opportunity to use certain injury definitions, and have a degree of flexibility to choose the cut-offs that separate each injury state. |
Question 2
How to deal withsubsequent injuries? |
Key question 2: imagine a researcher having collected data on subsequent injuries (eg, athletes that sustained Achilles tendinopathy three times during the follow-up). The next step for the researcher is to analyse such data. Are there certain analytical approaches needed to deal with this type of data?
Key point 2: in time-to-event modelling, the researcher can consider subsequent injuries using the concept of shared frailty. This allows for correction for selection of ‘less-injury-prone’ athletes over time. |
Question 3
How to deal withcompeting risk? |
Key question 3: in your dataset, there are data on many different injury types (eg, Achilles tendinopathy, patella-femoral pain, iliotibial band syndrome, patellar tendinopathy). However, you may only be interested in studying Achilles tendinopathy. Should you just omit all other injuries (patella-femoral pain, iliotibial band syndrome, patellar tendinopathy) when analysing the data?
Key point 3: researchers should ‘stick to this world’ by including many injury types into the analysis using a competing risk setup. Excluding injuries of less interest is strongly discouraged as it will generate misleading results because the injury risk is overestimated. |
Question 4
How to deal withassumptions and requirements? |
Key question 4: you may speculate: What are the downsides of time-to-event modelling?
Key point 4: sports injury researchers need to calculate the event/variable ratio to avoid biased results. In addition, sports injury researchers should ensure there are at least five injuries in each exposure state to be analysed. Dealing with (multiple) time-varying exposures requires a considerable number of injuries to avoid violating the requirements underpinning the time-to-event analysis. Analysing data without consideration to number of injuries in each exposure state will easily lead to sparse data bias. |
Question 5
Are there any considerations when designing my study? |
Key question 5: I want to design a new study looking into the association between changes in training load and sports injury. What should I consider when I am designing my data collection?
Key point 5: researcher must consider: am I able to get the number of injuries needed in order to analyse changes in training load as a time-varying exposure to sports injury? How many injuries are likely to occur in each exposure state (or transition)? How many cut-offs to separate the exposure groups are suitable? |
Question 6
Are there any alternative methods? |
Key question 6: it is difficult to collect the amount of data needed to avoid violating the assumptions and requirements needed to perform a robust time-to-event analyses on a change in training load-related question. Accordingly, are there any alternative methods that could be considered?
Key point 6: the use of computational modelling could be considered as a complementary and alternative approach to time-to-event modelling in future sports injury research applications because no consideration to number of injuries is needed. However, unlike traditional statistical modelling, the assumptions underpinning computational models are often based on subject matter knowledge and other various forms of empirical evidence. If these are wrong, the results from the analyses will be questionable. |
Time-to-event and time-varying outcomes
Time-to-event modelling is able to handle outcomes that are dichotomised (eg, yes or no) or categorised (eg, no injury, moderate injury, severe injury).3 Historically, dichotomised outcomes have primarily been used given that death has been the primary clinical end point. Here, only two states are possible: dead or alive. As a corollary, in the sports science context, many researchers have also used dichotomised sports injury definitions to reflect either an injured or injury-free status.4–8 Over the past decade, the understanding of a sports injury being more than a dichotomised time-fixed variable has gained traction.9 Indeed, a sports injury can take on differing severities and can change status over time.10 11 Traditionally, sports injury data in a time-to-event setting has been analysed as time-to-first-injury. This approach forces the researcher to omit data from recovery periods, as well as data up to a potential second injury (and third, fourth, fifth injury). Depending on the research question, the use of data in this way can be questioned. Therefore, the need for statistical methods that are able to handle multiple injury occurrences has been highlighted.9 12–14 The idea of sustaining multiple injuries within a given period of epidemiological surveillance makes sports injury a time-varying outcome as the athlete is able to switch (in statistical terms: transition) between an injured and an injury-free state over time (or between categorised states, eg, none, moderate, severe). Here, we will discuss key questions: (i) how to deal with a time-varying outcome? (see time-to-event outcome question 1); (ii) how to deal with subsequent injuries? (see time-to-event outcome question 2) and (iii) how to deal with competing risks? (see time-to-event outcome question 3).
Time-to-event outcome question 1: how to deal with a time-varying outcome?
Key question 1: a researcher collected data on sports injury status (in statistical terms: states) over time in a group of athletes to investigate the aetiology of Achilles tendinopathy. In weekly self-assessments, the athletes classified their injury severity into no Achilles injury, moderate Achilles tendinopathy and severe Achilles tendinopathy. The next step for the researcher is to analyse the data. Is time-to-event a suitable analytical approach to deal with a time-varying outcome?
In part 1 of the time-to-event series dealing with time-varying exposures, the idea of states, and transitions between states, was presented using Gabbett’s acute:chronic workload ratio (ACWR)15 as an example. In a similar way, sports injury (eg, Achilles tendinopathy) can be handled as a time-varying outcome. Using an example with a weekly injury classification of three states, for example, no Achilles injury (state 1), moderate Achilles tendinopathy (state 2) and severe Achilles tendinopathy (state 3) allows each athlete to be in one of these states at a specific time-point during follow-up. Then, the athlete can transit (in nine different ways) between these states during follow-up. In figure 1, the concepts of states and transitions are visualised using the three-state and nine-transition setup. Naturally, this example was made for illustrative purposes. There are other approaches to classifying injury states; however, no evidence-based guidelines support one classification model as superior over another. Therefore, the decision to use certain classification cut-offs should be based on contemporary sports theory and expert subject matter knowledge that takes into consideration the target athlete population(s). Dealing with time-varying injury-related outcomes is a highly technical and challenging task. Here, we guide the reader to further information about the specifics of multistate transitions in statistical papers.14 15
If injury classification is extended beyond a dichotomised yes/no definition, standardised and/or consensus-based frameworks to systematically classify injuries into substates are needed. In a multistate framework for the analysis of subsequent injury in sport (M-FASIS), Shrier et al 10 presented a 19-state injury model classification scheme that took factors like the activity level and the treatment level into account. Although the challenges of having enough data to perform a robust statistical analysis were highlighted, the model represents an opportunity to include the concept of states (n=19) or transitions between states (n=19×19=361 transitions) using the concept of a time-varying outcome. Another example is the severity-score from the Oslo Sports Trauma Reseach Center (OSTRC) questionnaire, which is based on categorical data of 74 states between 0 and 100.11 16 Use of the OSTRC to analyse changes in injury severity implies 74 states and 5476 transitions between states. Unfortunately, an extremely large data set with many injuries would be required to analyse data in such a detailed manner. A reduction in the number of states seems necessary. To the best of our knowledge, the M-FASIS and the OSTRC are theoretical models and none has been used in a study combining time-to-event with the concept of time-varying outcomes. Still, the concepts emphasise that athletes change their injury status over time and this can be dependent on anatomical location and/or diagnoses. For these concepts, time-to-event is an appropriate analytical approach.
Importantly, the concept of multistate transition is used for the same injury type. For instance, it can be used for addressing the development of Achilles tendinopathy (eg, no Achilles tendinopathy, moderate Achilles tendinopathy, severe Achilles tendinopathy). The next step is to acknowledge that: (i) athletes can sustain the same injury multiple times during follow-up (subsequent injuries) and (ii) athletes are able to sustain other injuries than just Achilles tendinopathy (competing risk).
Key point 1: time-to-event models allow for the inclusion of time-varying outcomes using the concept of multistate transitions. To date, there is no universally accepted way to classify sports injury into different outcome states. Sports injury researchers have the opportunity to use certain injury definitions, and have a degree of flexibility to choose the cut-offs that separate each injury state.
Time-to-event outcome question 2: how to deal with subsequent injuries?
Key question 2: a researcher has collected data on subsequent injuries (eg, athletes that sustained Achilles tendinopathy three times during the follow-up). The next step for him/her is to analyse such data. Are there certain analytical approaches needed to deal with this type of data?
Finch and Cook17 developed the subsequent injury classification (SIC) model to address multiple, subsequent and exacerbation of injuries.17 During an extended follow-up period, athletes can: (i) never sustain an injury; (ii) sustain one injury or (iii) sustain more than one injury.14 In the latter case, researchers face the analytical challenge of dealing with injuries that are statistically related as they occur in the same athlete.9 12 Most likely, the risk of sustaining subsequent injuries is influenced by previous occurrences,18 hence the correlation between within-athlete subsequent injuries needs to be taken into account when analysing data.17 19 As an example, Finch et al 19 took relationships between injuries into account and found a high level of subsequent (and multiple) injuries leading to missed games in an elite athlete group.
As considerable time and resources are expended to collect these high-quality longitudinal injury data, researchers have been encouraged to make use of all of the available data.12 Unfortunately, most data analyses from these studies do not adequately address repeated injury events on the same athlete, and therefore squander its potential.12 Possibly, a reason for the limited number of analyses including multiple injuries from the same person is the complexity behind the statistical analyses. Researchers need to grasp the concept of frailty, as frailty models have been recommended for studies involving modelling of subsequent sports injury data.9 The main reason for choosing frailty models originates from their ability to correct for selection of ‘less-injury-prone’ athletes over time.20 In a 2012 paper, the frailty model was applied to rugby-related injury data to identify risk factors for contact injuries.21
Now the reader may speculate: are the concepts of multistate transitions (question 1) and subsequent injury (question 2) similar? Importantly, multistate transitions and subsequent injury should be considered two different, but nevertheless related, concepts. In figure 1, the difference between the two concepts is visualised as subsequent injury refers to injury 1 and injury 2 (marked with red), whereas each state is marked with a blue circle with arrows (representing the transitions) are connecting the states.
Key point 2: in time-to-event modelling, the researcher can consider subsequent injuries using the concept of shared frailty. This allows for correction for selection of ‘less-injury-prone’ athletes over time.
Time-to-event outcomes question 3: how to deal with competing risks?
Key question 3: in your dataset, there are data on many different injury types (eg, Achilles tendinopathy, patella-femoral pain, iliotibial band syndrome, patellar tendinopathy). However, you may only be interested in studying Achilles tendinopathy. Should you just omit all other injuries (patella-femoral pain, iliotibial band syndrome, patellar tendinopathy) when analysing the data?
In the initial presentation of the M-FASIS, OSTRC and SIC models, the possibility of dealing with multiple injuries was discussed, and the first steps were taken to promote the concept of competing risk.10 17 Standard time-to-event data include the time span from a specified time origin until the occurrence of one type of event (eg, Achilles tendinopathy).22 If there is a simultaneous risk of several types of injuries (eg, patella-femoral pain, iliotibial band syndrome, patellar tendinopathy), then these different types of injuries are considered competing, in the sense that a person can only sustain one of them at a given point in time. In the Achilles tendinopathy example, the competing risks were based on injury diagnoses. In a sports injury setting, competing risks can be assessed based on: (i) classification, for example, tendon-related injury, muscle injuries, joint-related injury and injuries unrelated to sport; (ii) anatomical location, for example, knee injury, foot injury, hip injury or (iii) diagnoses, for example, patella-femoral pain, patellar tendinopathy, Achilles tendinopathy.
To analyse competing risk data, a model describing each of these competing risks is needed. Multistate models generalise competing risk models by also describing transitions to intermediate events. Methods to analyse such models have been developed over the past three decades.23 24 Fortunately, most analyses can be performed within standard statistical software packages, but may require some extra effort with respect to data preparation and programming.22 An example from the sports injury literature that incorporates competing risks can be found elsewhere.7
The Kaplan-Meier plot, or survival curve, is recognised for its ability to visualise the probability of being injury-free (ie, surviving) or the probability of sustaining injuries (ie, not surviving) as a function of time (eg, days, weeks), training sessions or training duration.3 Importantly, Kaplan-Meier curves are no longer valid if models are extended to consider more than one type of outcome.9 23 As an example, in a competing risk setting, researchers may want to examine the probability of sustaining a tendon-related injury, with other types of injuries, such as muscle and joint-related injury and injuries occurring outside of running, being of limited interest. This presents two ways to handle data, neither of which are correct or recommended23 24:
(i) exclude all participants who sustain non-tendon-related injury;
(ii) include all participants who sustain non-tendon-related injury as right-censored observations at time of non-tendon-related injury (ie, ‘survived’ from a tendon injury).
In the first situation, substantial information about injury occurrences is ignored and it could appear (incorrectly) that only one type of injury, in this example, tendon injury, can occur. This is flawed since we already know that athletes are at risk of sustaining many types of injuries and that there is a statistical relationship between subsequent injuries.17–19
In the second situation, the Kaplan-Meier method computes the cumulative injury incidence proportion inclusive of the participants who are right-censored. However, this violates a principle that should be employed in time-to-event analyses: do not consider individuals as being at risk if they are injured, although a different type of injury as it can still result in them not being ‘exposed’ to risk. This will result in an overestimation of the cumulative incidence proportion as fewer athletes than appears are at risk at any one time. Consequently, statisticians have recommended researchers to include other types of injuries as competing risks and, somewhat provocatively, have also recommended researchers to ‘stick to this world’.23 In the sports science context, this essentially means that sports injury researchers who are willing to accept the assumption that injury-free athletes are only able to sustain one certain type of injury (eg, only Achilles tendinopathy) effectively ‘live in another world’.
In the competing risks setting, the Kaplan-Meier method should be replaced by the Aalen-Johansen estimator to consider competing risks to avoid overestimating the cumulative incidence proportion. The difference between computing the cumulative incidence proportion using the Kaplan-Meier method (figure 2A) and the Aalen-Johansen (figure 2B) can be substantial. Using the incorrect Kaplan-Meier method in the competing risk example in figure 2A, the proportion of athletes sustaining tendon injuries is 82.1% (95% CI 65.5% to 98.8%), muscle injuries is 55.6% (95% CI 38.7% to 73.0%), joint-related injuries is 61.6% (95% CI 35.7% to 88.0%) and injuries not related to running is 29.0% (95% CI 4.2% to 53.6%). By summarising these proportions, the total proportion of athletes sustaining a first-time-injury reaches 228%. Clearly, this is impossible, since an athlete only sustains a first-time-injury once and the proportion of athletes sustaining first-time-injuries cannot possibly exceed 100%. Based on this, the proportions calculated using the Kaplan-Meier method are overestimated. Instead, the proportions reported in figure 2B, using the Aalen-Johansen estimator, are unbiased, since the total proportion of tendon injuries is 42.5% (95% CI 34.2% to 50.7%), muscle injuries is 21.6% (95% CI 14.7% to 28.4%), joint-related injuries is 25.2% (95% CI 17.9% to 32.3%) and injuries not related to running is 8.6% (95% CI 3.9% to 13.3%) does not exceed 100%. Therefore, it is strongly recommended that researchers dealing with competing risks use the Aalen-Johansen estimator as their preference.
Key point 3: researchers should ‘stick to this world’ by including many injury types into the analysis using a competing risk setup. Excluding injuries of less interest is strongly discouraged as it will lead to misleading results because the injury risk is overestimated.
Time-to-event models: requirements and considerations
Proportional hazards and right-censored observations are important assumptions to consider when evaluating the appropriateness of time-to-event models. Detailed descriptions of these assumptions have been presented elsewhere.3 Here, we will deal with other theoretical assumptions and statistical requirements underpinning time-to-event analysis. Unfortunately, these assumptions and requirements can be a party pooper for the sports injury researcher willing to analyse training load-related data.
Time-to-event outcome question 4A: how to deal with EPV
Key question 4: in the present article and in ‘Time-to-Event Analysis for Sports Injury Research Part 1: Time-Varying Exposures’, we have been enthusiastic about the potential that time-to-event modelling offers the sports injury researcher. However, in science there are always caveats and limitations. So, what are the downsides of time-to-event modelling?
One of the most important and perhaps lesser known requirements when undertaking statistical modelling of data is the EPV requirement.23 25 26 This is also known as the event/variable ratio,27 which can lead to bias if inappropriate.28 To be precise, as with any regression model, time-to-event modelling can be biased if the number of explanatory variables is large in relation to the number of injuries observed.25 28 29 In an analysis using cumulative risk difference as measure of association, the recommended number of EPVs was 10.25 A three-state version of the ACWR requires at least 20 injuries, whereas the nine transitions necessitates at least 80 injuries. Moreover, at least five injuries are required in each state/transition to avoid sparse data bias (see part B below).28 At first glance, 20–80 injuries can appear manageable for most sports injury datasets. However, for this work, we extracted the sample size from 35 studies examining training load and sports injury and identified only 11 studies with a sample size exceeding 150 participants (see table 1 in the accompanying article entitled ‘Time-to-Event Analysis for Sports Injury Research Part 1: Time-Varying Exposures’). In a 150-person study, at least half of the sample size must sustain an injury to reduce the risk of bias. Greater data collection possibilities facilitated by modern wearable technologies, such as sports watches, fitness trackers and internet-based electronic health platforms support the potential for unprecedented data collection possibilities and options for the easier recording of large data.30 When designing studies on changes in training load and injury development in the future, sports injury researchers are advised to consider EPV as a supplement to sample size or power calculations. The researcher could include more athletes into the study. Another (or supplementary) approach would be to extend the follow-up period to capture a greater number of injuries.
We note that EPV considerations do not account for other contributing factors to sparse data bias such as explanatory variables with narrow distributions or with categories that are very uncommon,28 31 nor do they consider the impact of the commonly used stepwise variable selection approach which requires even more EPV than do models with prespecified variables. A better diagnostic for sparse data bias is to repeat the analysis using mild shrinkage or penalisation methods: substantial changes warn of serious bias in the original estimates.28 31 32
Key point 4A: sports injury researchers need to calculate the event/variable ratio to avoid biased results.
Time-to-event outcome question 4B: how to deal with number of injuries in each exposure state?
In addition to the EPV requirements, all exposure states and/or transitions in the analysis must include at least five events to conduct a robust statistical analysis. In table 2, the cumulative incidence proportion for different states of two exposure variables (changes in running distance and change in running intensity) are presented as an example of a result based on a flawed time-to-event. Clearly, the cumulative injury incidence proportions of −7.6% and −18.9% are flawed as an injury incidence proportion can never reach a value below 0%. Consequently, sports injury researchers working with time-to-event analyses are encouraged to show the number of injuries in each exposure state to enable readers to assess the robustness of the models presented. If the number of injuries in a certain state is below five, analysts should carefully consider reclassifying their data based on other cut-offs or reducing the number of states used in the analysis.
Table 2.
Biweekly change in running distance (states) | |||||
Reg>10% | Reg 10%–0% | Prog 0%–10% | Prog>10% | ||
Biweekly change in running intensity (states) | Reg>10% | 3.8% (5) | 1.7% (0) | −18.9%(0) | 13.9% (3) |
Reg 10%–0% | 24.2% (16) | 6.8% (17) | 44.8% (8) | 12.3% (20) | |
Prog 0%–10% | 10.3% (13) | 16.6% (11) | 25.3% (10) | 22.3% (21) | |
Prog>10% | 18.0% (3) | 0.1% (0) | −7.6%(0) | 9.9% (4) |
In reality, cumulative injury incidence proportions range between 0% and 100%. However, some proportions in the example are negative because too few injuries in that state lead to biased estimated.
Number in parentheses represents number of injuries in each exposure state. Results based on a supplementary analysis of the RUNCLEVER dataset.40
Reg, regression; Prog, progression.
With these considerations in mind, time-to-event statistical modelling can offer a range of opportunities for researchers to include exposure variables, such as changes in training load (either as states or transitions), across the course of a study.
The concept of sparse data bias has implications for future research in sports injury aetiology. The requirement forces sports injuries to be evenly distributed across the states or transitions if the requirements behind the analysis are to be fulfilled. As visualised in figure 3, this requirement becomes particularly difficult to accommodate when a stratified analysis is required as the number of variables increases exponentially for each additional strata included. Do we consider stratified analysis in a sport injury setting, you may ask? Certainly, stratified analyses are needed if the aim is to answer the question: ‘how much change in training load is too much before injury is sustained, among athletes with different characteristics?' Responding to this question using multiple time-varying exposures (and outcomes) requires many injuries. In figure 3, an example is provided that visualises why many injuries are needed. This should not be a barrier for future research, but collaborations across borders to collect the amount of data needed seem to be an important step forward.33
Key point 4B: sports injury researchers should ensure there are at least five injuries in each exposure state to be analysed. Dealing with (multiple) time-varying exposures requires a considerable number of injuries to avoid violating the requirements underpinning the time-to-event analysis. Analysing data without consideration to number of injuries in each exposure state will easily lead to sparse data bias.
Time-to-event outcomes question 5: are there considerations when designing my study?
Key question 5: I want to design a new study looking into the association between changes in training load and sports injury. What should I consider when I am designing my data collection?
In the previous section, we presented important assumptions and requirements underpinning the analysis when dealing with the research question 'how much change in training load is too much before sports injury is sustained, among athletes with different characteristics?' We argued that careful attention to the EPV requirement and having at least five injuries in each exposure state is needed to avoid biased results. From experience, we have learnt that many researchers do not consider these requirements.
In most prospective sport injury studies, fewer than 1000 athletes or participants have been included (see table 1 in the accompanying article entitled ‘Time-to-Event Analysis for Sports Injury Research Part 1: Time-Varying Exposures’). Although the workload needed to logistically handle this number of participants is considerable and time-consuming, advanced data analyses involving multiple time-varying exposures and a time-varying injury outcome can literally necessitate tens of thousands (or possibly millions) of athletes to experience a sufficient number of injuries to avoid sparse data bias.28 This issue has been noted in the wider epidemiological literature.34 35 Sports injury researchers cannot always conduct the most rigorous study and/or satisfy the theoretic and practical requirements that are necessary to undertake a robust statistical analysis. However, in situations when it is financially and logistically possible to design and conduct a high-quality, large-scale epidemiological cohort study, researchers should make a concerted effort to consider and implement the necessary principles and directives discussed in this article. Moreover, to appropriately advance the science of sports injury control and prevention, sports injury researchers are expected to question assumptions underpinning statistical analyses and ask whether there are better ways of analysing data, and asking ‘the right’ questions while equally challenging contemporary aetiological theories. In doing so, advanced statistical approaches such as time-to-event analyses that are widely used in other disciplines and health science contexts can be raised to the same level of application and scrutiny for sports injury research. Time-to-events analyses offer a range of opportunities regarding modelling approaches (Cox regression vs pseudo-observation method), measure of association and graphical presentations. As these concepts have been presented elsewhere,3 an extensive description is not provided. However, the informed reader is provided with an overview of the opportunities in table 3.
Table 3.
Method | Description |
Cox regression | |
Measure of association | Hazard rate ratio. An injury rate (hazard rate) in each exposure group is estimated and the rates are compared on a relative scale (ratio). |
Graphical presentation | Individual or average survival curves. |
Main assumptions | Hazard rate ratio has to be constant (proportional hazard rates). The assumptions behind the Cox model can be validated using a log-minus-log plot. Do not condition on the future. |
Time-varying exposure | Inclusion of one or more time-varying exposures is possible. |
Time-varying outcome | Inclusion of a time-varying outcome is possible. |
Advantage | The difference between groups is calculated across all points of the time scale—hence, only one estimate needs to be presented. |
Events per variable | 10 |
Shortcomings | It is not plausible to interpret a hazard rate ratio as a risk if the injury incidence mostly exceeds 10% in sports injury studies. A hazard rate ratio becomes meaningless if the assumption of proportionality is violated. |
Pseudo-observation method | |
Measures of association | An injury proportion (cumulative risk) in each exposure group is estimated and the proportions are compared on an additive scale (cumulative risk difference) or on a relative scale (cumulative relative risk). Alternatively, the area under the Kaplan-Meier curve (restricted mean) or under the Aalen-Johansen curve (number of years/session/time-spent sport lost) can be estimated and the difference can be compared across exposure groups. |
Graphical presentation | Kaplan-Meier graph (single event) or Aalen-Johansen graph (competing risk). |
Main assumptions | Right censored observations, you do not condition on the future. |
Time-varying exposure | Inclusion of one or more time-varying exposures is possible. |
Time-varying outcome | Inclusion of a time-varying outcome is possible. |
Advantages | Cumulative risk difference and cumulative relative risk is easier to interpret than a hazard rate ratio because the difference between groups is calculated at a single point on the time scale. |
Events per variable | 10 (risk difference) or 15 (relative risk). |
Shortcomings | Requires a priori selection (and justification) of one or more time points at which comparisons are made. |
Adapted with permission from Nielsen et al.3
Key point 5: researcher must consider: am I able to get the number of injuries needed in order to analyse changes in training load as a time-varying exposure to sports injury? How many injuries are likely to occur in each exposure state (or transition)? How many cut-offs to separate the exposure groups are suitable?
Time-to-event outcomes versus other methods
This article has introduced the concept of time-varying outcomes, including competing risk and subsequent events in context of time-to-event modelling. One condition of time-to-event analysis is that the outcome of interest must be expressed as a dichotomous or categorical variable as opposed to continuous data on a ratio-interval scale.3 Nowadays, most data on sports injury are non-continuous, irrespective of whether the outcome definition is time-loss-based, burden-based, medical-attention-based or based on severity. Consequently, time-to-event analyses are appropriate in most cases. However, if injury data are collected based on a continuous scale (eg, fluctuating symptoms of a pathology such as tendinopathy), other statistical methods are needed.
Time-to-event outcomes question 6: are there alternative methods?
Key question 6: it is difficult to collect the amount of data needed to avoid violating the assumptions and requirements needed to perform a robust time-to-event analyses on a change in training load-related question. Accordingly, are there any alternative methods that could be considered?
Complex systems and computational modelling have received more attention in the sports injury science literature recently.36 These methods are complementary to traditional statistical modelling and time-to-event analyses. In a small sample setting or in the absence of large-scale data, alternative computational systems science methods, including simulation-based techniques, could be considered alongside, or integrated with, traditional statistical approaches.36 For example, the use of agent-based modelling (ABM) has been recently promoted and discussed as a complementary method for sports injury research.37 Specifically, ABM is a form of computational science that involves modelling the behavioural dynamics of individual micro-entities known as ‘agents’. These agents can interact with one another and learn over time based on past experiences; update their internal ’states' autonomously and/or create global patterns of behaviour. In relation to both time-to-event modelling and sports injury aetiology, the clear advantage of ABM lies in its capability to model hundreds or thousands of athletes, of whom can be assigned real-world demographics (eg, age), biologic (eg, sex), lifestyle (eg, diet) and/or training-related (eg, primary workload exposure) characteristics.37
We have demonstrated in this paper that in order to conduct a robust statistical sports injury analysis and avoid sparse data bias, the number of injuries observed in each exposure state (or transition) should exceed 5. Accordingly, the flexibility of ABM and other simulation-based techniques could offer a potential workaround to the requirements in traditional statistical analyses, especially when sports injury researchers aim to further stratify samples to prioritise and understand how workloads and other time-varying exposures change status during follow-up.30 38 39 With continued application and ingenuity, computational simulations might be able to capture a sufficient number of sports injuries per explanatory variable modelled, affording theoretical insight into the supposed aetiologic mechanism(s). Despite the versatility of computational methods, a word of caution is advised. Unlike traditional statistical modelling, the assumptions underpinning computational models are often based on subject matter knowledge and other various forms of empirical evidence. Thus, the underlying data-driven assumptions and theoretical causal mechanisms encoded into simulations should be explicitly described as a basis for evaluating model predictions.34 35
Key point 6: the use of computational modelling could be considered as a complementary and alternative approach to time-to-event modelling in future sports injury research applications because no considerations to number of injuries is needed. However, unlike traditional statistical modelling, the assumptions underpinning computational models are often based on subject matter knowledge and other various forms of empirical evidence. If these are wrong, the results from the analyses will be questionable.
Conclusion
In this paper, we have discussed how the concept of time-varying outcomes, including competing risk and subsequent injuries can be used in time-to-event modelling to investigate injury aetiology in a sports injury context. First, time-to-event models was described that permit the inclusion of time-varying outcomes using the concept of multistate transitions. Second, researchers can consider subsequent injuries using the concept of shared frailty. Third, competing risk was highlighted as it enables researchers to include all types of injuries in their analyses. Finally, we presented often overlooked requirements related to events per variables and number of injuries in each exposure state. Consideration to these requirements are needed prior to any data collection to avoid conducting statistical analyses on time-to-event data leading to biased results.
Footnotes
Contributors: All authors contributed equally in writing the educational review. DR performed the analyses leading to the results in Table 2 and Figure 2.
Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests: None declared.
Patient consent: Obtained.
Ethics approval: Local ethics committee central Denmark region (N-20140069)
Provenance and peer review: Not commissioned; externally peer reviewed.
References
- 1.Sato Y, Gosho M, Nagashima K, et al. . Statistical Methods in the Journal - An Update. N Engl J Med 2017;376:1086–7. 10.1056/NEJMc1616211 [DOI] [PubMed] [Google Scholar]
- 2.Mahmood A, Ullah S, Finch CF. Application of survival models in sports injury prevention research: A systematic review. Br J Sports Med 2014;48:630.2–630. 10.1136/bjsports-2014-093494.190 [DOI] [Google Scholar]
- 3.Nielsen RØ, Malisoux L, Møller M, et al. . Shedding light on the etiology of sports injuries: A look behind the scenes of time-to-event analyses. J Orthop Sports Phys Ther 2016;46:300–11. 10.2519/jospt.2016.6510 [DOI] [PubMed] [Google Scholar]
- 4.Malisoux L, Chambon N, Urhausen A, et al. . Influence of the heel-to-toe drop of standard cushioned running shoes on injury risk in leisure-time runners: A randomized controlled trial with 6-month follow-up. Am J Sports Med 2016;44:2933–40. 10.1177/0363546516654690 [DOI] [PubMed] [Google Scholar]
- 5.Malisoux L, Chambon N, Delattre N, et al. . Injury risk in runners using standard or motion control shoes: a randomised controlled trial with participant and assessor blinding. Br J Sports Med 2016;50:481–7. 10.1136/bjsports-2015-095031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Møller M, Nielsen RO, Attermann J, et al. . Handball load and shoulder injury rate: a 31-week cohort study of 679 elite youth handball players. Br J Sports Med 2017;51:231–7. 10.1136/bjsports-2016-096927 [DOI] [PubMed] [Google Scholar]
- 7.Nielsen RØ, Parner ET, Nohr EA, et al. . Excessive progression in weekly running distance and risk of running-related injuries: an association which varies according to type of injury. J Orthop Sports Phys Ther 2014;44:739–47. 10.2519/jospt.2014.5164 [DOI] [PubMed] [Google Scholar]
- 8.Nielsen RO, Bertelsen ML, Parner ET, et al. . Running more than three kilometers during the first week of a running regimen may be associated with increased risk of injury in obese novice runners. Int J Sports Phys Ther 2014;9:338–45. [PMC free article] [PubMed] [Google Scholar]
- 9.Ullah S, Gabbett TJ, Finch CF. Statistical modelling for recurrent events: an application to sports injuries. Br J Sports Med 2014;48:1287–93. 10.1136/bjsports-2011-090803 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Shrier I, Steele RJ, Zhao M, et al. . A multistate framework for the analysis of subsequent injury in sport (M-FASIS). Scand J Med Sci Sports 2016;26:128–39. 10.1111/sms.12493 [DOI] [PubMed] [Google Scholar]
- 11.Clarsen B, Myklebust G, Bahr R. Development and validation of a new method for the registration of overuse injuries in sports injury epidemiology: the Oslo Sports Trauma Research Centre (OSTRC) overuse injury questionnaire. Br J Sports Med 2013;47:495–502. 10.1136/bjsports-2012-091524 [DOI] [PubMed] [Google Scholar]
- 12.Finch CF, Marshall SW. Let us stop throwing out the baby with the bathwater: towards better analysis of longitudinal injury data. Br J Sports Med 2016;50:712–5. 10.1136/bjsports-2015-094719 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Finch CF, Fortington LV. So you want to understand subsequent injuries better? start by understanding the minimum data collection and reporting requirements. Br J Sports Med 2017. [DOI] [PubMed] [Google Scholar]
- 14.Fortington LV, van der Worp H, van den Akker-Scheek I, et al. . Reporting multiple individual injuries in studies of team ball sports: A systematic review of current practice. Sports Med 2017;47:1103–22. 10.1007/s40279-016-0637-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Gabbett TJ. The training-injury prevention paradox: should athletes be training smarter and harder? Br J Sports Med 2016;50:273–80. 10.1136/bjsports-2015-095788 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Yamato TP, Saragiotto BT, Lopes AD. A consensus definition of running-related injury in recreational runners: a modified Delphi approach. J Orthop Sports Phys Ther 2015;45:375–80. 10.2519/jospt.2015.5741 [DOI] [PubMed] [Google Scholar]
- 17.Finch CF, Cook J. Categorising sports injuries in epidemiological studies: the subsequent injury categorisation (SIC) model to address multiple, recurrent and exacerbation of injuries. Br J Sports Med 2014;48:1276–80. 10.1136/bjsports-2012-091729 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Toohey LA, Drew MK, Cook JL, et al. . Is subsequent lower limb injury associated with previous injury? A systematic review and meta-analysis. Br J Sports Med 2017;51:1670–8. 10.1136/bjsports-2017-097500 [DOI] [PubMed] [Google Scholar]
- 19.Finch CF, Cook J, Kunstler BE, et al. . Subsequent injuries are more common than injury recurrences: An analysis of 1 season of prospectively collected injuries in professional australian football. Am J Sports Med 2017;45:1921–7. 10.1177/0363546517691943 [DOI] [PubMed] [Google Scholar]
- 20.Olesen AV, Parner ET. Correcting for selection using frailty models. Stat Med 2006;25:1672–84. 10.1002/sim.2298 [DOI] [PubMed] [Google Scholar]
- 21.Gabbett TJ, Ullah S, Finch CF. Identifying risk factors for contact injury in professional rugby league players-application of a frailty model for recurrent injury. J Sci Med Sport 2012;15:496–504. 10.1016/j.jsams.2012.03.017 [DOI] [PubMed] [Google Scholar]
- 22.Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: competing risks and multi-state models. Stat Med 2007;26:2389–430. 10.1002/sim.2712 [DOI] [PubMed] [Google Scholar]
- 23.Andersen PK, Keiding N. Interpretability and importance of functionals in competing risks and multistate models. Stat Med 2012;31(11-12):1074–88. 10.1002/sim.4385 [DOI] [PubMed] [Google Scholar]
- 24.Andersen PK, Geskus RB, de Witte T, et al. . Competing risks in epidemiology: possibilities and pitfalls. Int J Epidemiol 2012;41:861–70. 10.1093/ije/dyr213 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hansen SN, Andersen PK, Parner ET. Events per variable for risk differences and relative risks using pseudo-observations. Lifetime Data Anal 2014;20:584–98. 10.1007/s10985-013-9290-4 [DOI] [PubMed] [Google Scholar]
- 26.Peduzzi P, Concato J, Feinstein AR, et al. . Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol 1995;48:1503–10. 10.1016/0895-4356(95)00048-8 [DOI] [PubMed] [Google Scholar]
- 27.Abraira V, Muriel A, Emparanza JI, et al. . Reporting quality of survival analyses in medical journals still needs improvement. A minimal requirements proposal. J Clin Epidemiol 2013;66:1340–6. 10.1016/j.jclinepi.2013.06.009 [DOI] [PubMed] [Google Scholar]
- 28.Greenland S, Mansournia MA, Altman DG. Sparse data bias: a problem hiding in plain sight. BMJ 2016;352:i1981 10.1136/bmj.i1981 [DOI] [PubMed] [Google Scholar]
- 29.Peduzzi P, Concato J, Kemper E, et al. . A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996;49:1373–9. 10.1016/S0895-4356(96)00236-3 [DOI] [PubMed] [Google Scholar]
- 30.Bertelsen ML, Hulme A, Petersen J, et al. . A framework for the etiology of running-related injuries. Scand J Med Sci Sports 2017;27:1170–80. 10.1111/sms.12883 [DOI] [PubMed] [Google Scholar]
- 31.Greenland S, Mansournia MA. Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions. Stat Med 2015;34:3133–43. 10.1002/sim.6537 [DOI] [PubMed] [Google Scholar]
- 32.Mansournia MA, Geroldinger A, Greenland S, et al. . Separation in logistic regression: Causes, consequences, and control. Am J Epidemiol 2018;187:864–70. 10.1093/aje/kwx299 [DOI] [PubMed] [Google Scholar]
- 33.van Dyk N, van der Made AD, Timmins RG, et al. . There is strength in numbers for muscle injuries: it is time to establish an international collaborative registry. Br J Sports Med 2018;52:1228–9. 10.1136/bjsports-2016-097318 [DOI] [PubMed] [Google Scholar]
- 34.Hernán MA. Invited commentary: Agent-based models for causal inference—reweighting data and theory in epidemiology. Am J Epidemiol 2015;181:103–5. 10.1093/aje/kwu272 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Marshall BD, Galea S. Formalizing the role of agent-based modeling in causal inference and epidemiology. Am J Epidemiol 2015;181:92–9. 10.1093/aje/kwu274 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Eh I, Rahmandad H, Shoham DA, et al. . Reconciling statistical and systems science approaches to public health. Health Educ Behav 2013;40123S–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Hulme A, Thompson J, Nielsen RO, et al. . Towards a complex systems approach in sports injury research: simulating running-related injury development with agent-based modelling. Br J Sports Med. In Press 2018:bjsports-2017-098871 10.1136/bjsports-2017-098871 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Nielsen RO, Bertelsen ML, Moller M, et al. . Training load and structure-specific load: Applications for sport injury causality and data analyses. Br J Sports Med 2017. [DOI] [PubMed] [Google Scholar]
- 39.Drew MK, Finch CF. The relationship between training load and injury, illness and soreness: A systematic and literature review. Sports Med 2016;46:861–83. 10.1007/s40279-015-0459-8 [DOI] [PubMed] [Google Scholar]
- 40.Ramskov D, Nielsen RO, Sørensen H, et al. . The design of the run Clever randomized trial: running volume, -intensity and running-related injuries. BMC Musculoskelet Disord 2016;17:177 10.1186/s12891-016-1020-0 [DOI] [PMC free article] [PubMed] [Google Scholar]