Time-to-event modelling is underutilised in sports injury research. Still, sports injury researchers have been encouraged to consider time-to-event analyses as a powerful alternative to other statistical methods. Therefore, it is important to shed light on statistical approaches suitable for analysing training load related key-questions within the sports injury domain.
Content
In the present article, we illuminate: (i) the possibilities of including time-varying outcomes in time-to-event analyses, (ii) how to deal with a situation where different types of sports injuries are included in the analyses (ie, competing risks), and (iii) how to deal with the situation where multiple subsequent injuries occur in the same athlete.
Conclusion
Time-to-event analyses can handle time-varying outcomes, competing risk and multiple subsequent injuries. Although powerful, time-to-event has important requirements: researchers are encouraged to carefully consider prior to any data collection that five injuries per exposure state or transition is needed to avoid conducting statistical analyses on time-to-event data leading to biased results. This requirement becomes particularly difficult to accommodate when a stratified analysis is required as the number of variables increases exponentially for each additional strata included. In future sports injury research, we need stratified analyses if the target of our research is to respond to the question: ‘how much change in training load is too much before injury is sustained, among athletes with different characteristics?’ Responding to this question using multiple time-varying exposures (and outcomes) requires millions of injuries. This should not be a barrier for future research, but collaborations across borders to collecting the amount of data needed seems to be an important step forward.
Readers of sports injury science may have heard of survival analysis, which is traditionally used in settings where there is only a single type of mortality-related event (eg, literally ‘survival’ in medical research when dealing with diseases such as cancer). In the sports injury setting, the analytical concept is nearly the same with two differences: (i) the outcome is injury, not death and (ii) unlike death, athletes can sustain multiple sports injuries.
Time-to-event models, of which survival analysis is one type, are frequently used in other research settings. For instance, 57% of all 238 original articles published in the New England Journal of Medicine (NEJM) in 2015 included time-to-event methods.1 This ranks time-to-event methods as the second most frequently used statistical method after calculations of power and sample size.1 In the sports injury context, two authors (MLB and MM) performed an exploratory search and an independent blinded evaluation, similar to the one used in NEJM to identify original articles using time-to-event models published in the British Journal of Sports Medicine (BJSM) in 2017. This search found 66 published original articles of which only 8 (12%) used descriptive and/or comparative time-to-event statistics. Based on this finding, it can be concluded that time-to-event models are rarely used in sports injury research. In corroboration, a total of 103 sports injury-related articles which had used time-to-event models was found in an earlier systematic search between January 1993 and July 2013.2 Of those articles, 88% were published after 2005 perhaps reflecting the increased availability of time-to-event modelling options in common statistical packages and/or a growing recognition of their utility. The overall number of articles, however, remained relatively low given the volume of sports injury data published. Although the reasons for this are unknown, it could be due to either a lack of awareness about the potential utility of time-to-event models, or rather, the perceived difficulty around their application in practice. Alternatively, other approaches such as logistic regression analyses have been prioritised over time-to-event modelling given the traditional pedagogical roots and teachings associated with tertiary education programmes in the health sciences.
In the BJSM article entitled ‘Time-to-Event Analysis for Sports Injury Research Part 1: Time-Varying Exposures’, we argued that time-to-event modelling is well suited to deal with changes in training load as a time-varying exposure to sports injury. Clearly, this is important as it allows researchers to answer the question ‘how much change in training load is ‘too much’ before sports injury is sustained, among athletes with different characteristics?’ In addition, that paper argued that, in contrast to logistic regression analyses and the Χ^{2} test, time-to-event modelling enables us to consider censoring and the within-athlete correlation between each athlete’s follow-up data by using delayed entry functions. This makes time-to-event modelling more suitable for examining training load-related questions when compared with the analytical approaches that are frequently used in sports injury research to date. Owing to underutilisation of time-to-event modelling in sports injury research, it is necessary to further discuss the statistical approaches that are readily available and suitable for addressing key research questions relating to training load changes within the sports injury domain. Given the focus of part 1 on time-varying exposures, in part 2 we will give attention to the role of time-to-event modelling in relation to time-varying outcomes. As such, this article will elaborate on the following areas: (i) the possibilities of including time-varying outcomes in the time-to-event analysis; (ii) how to deal with a situation where different types of sports injuries are included (ie, competing risk); (iii) how to deal with the situation whereby multiple subsequent injuries occur in the same athlete and (iv) the theoretical assumptions and statistical requirements underpinning the analyses. Consequently, the purpose of this paper is to discuss how the concepts of time-varying outcomes, competing risks and subsequent injuries can be used in time-to-event models to investigate sports injury aetiology. In addition, we present the often-overlooked assumptions and requirements relating to the events-per-variable (EPV) condition and number of injuries. In table 1, a brief overview of the key questions addressed in the manuscript as well as the key point associated with each question is provided.
T1
Key questions and associated key points that are covered in the article
Question 1How to deal with time-varying outcomes?
Key question 1: a researcher collected data on sports injury status (in statistical terms: states) over time in a group of athletes to investigate the aetiology of Achilles tendinopathy. In weekly self-assessments, the athletes classified their injury severity into no Achilles injury, moderate Achilles tendinopathy and severe Achilles tendinopathy. The next step for the researcher is to analyse the data. Here, the question remains: is time-to-event a suitable analytical approach to deal with a time-varying outcome?Key point 1: Time-to-event models allow for the inclusion of time-varying outcomes using the concept of multistate transitions. To date, there is no universally accepted way to classify sports injury into different outcome states. Sports injury researchers have the opportunity to use certain injury definitions, and have a degree of flexibility to choose the cut-offs that separate each injury state.
Question 2How to deal withsubsequent injuries?
Key question 2: imagine a researcher having collected data on subsequent injuries (eg, athletes that sustained Achilles tendinopathy three times during the follow-up). The next step for the researcher is to analyse such data. Are there certain analytical approaches needed to deal with this type of data?Key point 2: in time-to-event modelling, the researcher can consider subsequent injuries using the concept of shared frailty. This allows for correction for selection of ‘less-injury-prone’ athletes over time.
Question 3How to deal withcompeting risk?
Key question 3: in your dataset, there are data on many different injury types (eg, Achilles tendinopathy, patella-femoral pain, iliotibial band syndrome, patellar tendinopathy). However, you may only be interested in studying Achilles tendinopathy. Should you just omit all other injuries (patella-femoral pain, iliotibial band syndrome, patellar tendinopathy) when analysing the data?Key point 3: researchers should ‘stick to this world’ by including many injury types into the analysis using a competing risk setup. Excluding injuries of less interest is strongly discouraged as it will generate misleading results because the injury risk is overestimated.
Question 4How to deal withassumptions and requirements?
Key question 4: you may speculate: What are the downsides of time-to-event modelling?Key point 4: sports injury researchers need to calculate the event/variable ratio to avoid biased results. In addition, sports injury researchers should ensure there are at least five injuries in each exposure state to be analysed. Dealing with (multiple) time-varying exposures requires a considerable number of injuries to avoid violating the requirements underpinning the time-to-event analysis. Analysing data without consideration to number of injuries in each exposure state will easily lead to sparse data bias.
Question 5Are there any considerations when designing my study?
Key question 5: I want to design a new study looking into the association between changes in training load and sports injury. What should I consider when I am designing my data collection?Key point 5: researcher must consider: am I able to get the number of injuries needed in order to analyse changes in training load as a time-varying exposure to sports injury? How many injuries are likely to occur in each exposure state (or transition)? How many cut-offs to separate the exposure groups are suitable?
Question 6Are there any alternative methods?
Key question 6: it is difficult to collect the amount of data needed to avoid violating the assumptions and requirements needed to perform a robust time-to-event analyses on a change in training load-related question. Accordingly, are there any alternative methods that could be considered?Key point 6: the use of computational modelling could be considered as a complementary and alternative approach to time-to-event modelling in future sports injury research applications because no consideration to number of injuries is needed. However, unlike traditional statistical modelling, the assumptions underpinning computational models are often based on subject matter knowledge and other various forms of empirical evidence. If these are wrong, the results from the analyses will be questionable.
Time-to-event and time-varying outcomes
Time-to-event modelling is able to handle outcomes that are dichotomised (eg, yes or no) or categorised (eg, no injury, moderate injury, severe injury).3 Historically, dichotomised outcomes have primarily been used given that death has been the primary clinical end point. Here, only two states are possible: dead or alive. As a corollary, in the sports science context, many researchers have also used dichotomised sports injury definitions to reflect either an injured or injury-free status.4–8 Over the past decade, the understanding of a sports injury being more than a dichotomised time-fixed variable has gained traction.9 Indeed, a sports injury can take on differing severities and can change status over time.10 11 Traditionally, sports injury data in a time-to-event setting has been analysed as time-to-first-injury. This approach forces the researcher to omit data from recovery periods, as well as data up to a potential second injury (and third, fourth, fifth injury). Depending on the research question, the use of data in this way can be questioned. Therefore, the need for statistical methods that are able to handle multiple injury occurrences has been highlighted.9 12–14 The idea of sustaining multiple injuries within a given period of epidemiological surveillance makes sports injury a time-varying outcome as the athlete is able to switch (in statistical terms: transition) between an injured and an injury-free state over time (or between categorised states, eg, none, moderate, severe). Here, we will discuss key questions: (i) how to deal with a time-varying outcome? (see time-to-event outcome question 1); (ii) how to deal with subsequent injuries? (see time-to-event outcome question 2) and (iii) how to deal with competing risks? (see time-to-event outcome question 3).
Time-to-event outcome question 1: how to deal with a time-varying outcome?
Key question 1: a researcher collected data on sports injury status (in statistical terms: states) over time in a group of athletes to investigate the aetiology of Achilles tendinopathy. In weekly self-assessments, the athletes classified their injury severity into no Achilles injury, moderate Achilles tendinopathy and severe Achilles tendinopathy. The next step for the researcher is to analyse the data. Is time-to-event a suitable analytical approach to deal with a time-varying outcome?
In part 1 of the time-to-event series dealing with time-varying exposures, the idea of states, and transitions between states, was presented using Gabbett’s acute:chronic workload ratio (ACWR)15 as an example. In a similar way, sports injury (eg, Achilles tendinopathy) can be handled as a time-varying outcome. Using an example with a weekly injury classification of three states, for example, no Achilles injury (state 1), moderate Achilles tendinopathy (state 2) and severe Achilles tendinopathy (state 3) allows each athlete to be in one of these states at a specific time-point during follow-up. Then, the athlete can transit (in nine different ways) between these states during follow-up. In figure 1, the concepts of states and transitions are visualised using the three-state and nine-transition setup. Naturally, this example was made for illustrative purposes. There are other approaches to classifying injury states; however, no evidence-based guidelines support one classification model as superior over another. Therefore, the decision to use certain classification cut-offs should be based on contemporary sports theory and expert subject matter knowledge that takes into consideration the target athlete population(s). Dealing with time-varying injury-related outcomes is a highly technical and challenging task. Here, we guide the reader to further information about the specifics of multistate transitions in statistical papers.14 15
F1
Overview of the concepts of states, transitions and subsequent injury using an n=1 athlete example. Imagine that we register the injury status of one athlete during an 11-week follow-up. On the y-axis, the sports injury (in this case Achilles tendinopathy) can be classified into one of the three following states each week during the 11-week follow-up (marked with blue circle): state 1: no Achilles injury; state 2: moderate Achilles tendinopathy and state 3: severe Achilles tendinopathy. Then, the athlete is able to move/switch/transit between these states between each week. Consequently, the following nine multistate transitions (MST) is possible in the example: MST 1: no Achilles injury and remaining with no Achilles injury; MST 2: no Achilles injury to moderate Achilles tendinopathy; MST 3: no Achilles injury to severe Achilles tendinopathy; MST 4: moderate Achilles tendinopathy to no injury; MST 5: moderate Achilles tendinopathy and remaining with a moderate Achilles tendinopathy; MST 6: moderate Achilles tendinopathy to severe Achilles tendinopathy; MST 7: severe Achilles tendinopathy to no Achilles injury; MST 8: severe Achilles tendinopathy to moderate Achilles tendinopathy and MST 9: severe Achilles tendinopathy and remaining with a severe Achilles tendinopathy. The concept of states and transitions illustrated in the figure is directly transferable to time-varying exposures (eg, changes in training load) and time-varying effect-measure modifiers. As the athlete is classified into state 1 ‘no Achilles injury’ in week 6 and week 7, the athlete sustains two Achilles tendinopathies: the first one from week 2 to week 5 (injury 1) and the subsequent injury from week 8 to week 10 (injury 2).
If injury classification is extended beyond a dichotomised yes/no definition, standardised and/or consensus-based frameworks to systematically classify injuries into substates are needed. In a multistate framework for the analysis of subsequent injury in sport (M-FASIS), Shrier et al10 presented a 19-state injury model classification scheme that took factors like the activity level and the treatment level into account. Although the challenges of having enough data to perform a robust statistical analysis were highlighted, the model represents an opportunity to include the concept of states (n=19) or transitions between states (n=19×19=361 transitions) using the concept of a time-varying outcome. Another example is the severity-score from the Oslo Sports Trauma Reseach Center (OSTRC) questionnaire, which is based on categorical data of 74 states between 0 and 100.11 16 Use of the OSTRC to analyse changes in injury severity implies 74 states and 5476 transitions between states. Unfortunately, an extremely large data set with many injuries would be required to analyse data in such a detailed manner. A reduction in the number of states seems necessary. To the best of our knowledge, the M-FASIS and the OSTRC are theoretical models and none has been used in a study combining time-to-event with the concept of time-varying outcomes. Still, the concepts emphasise that athletes change their injury status over time and this can be dependent on anatomical location and/or diagnoses. For these concepts, time-to-event is an appropriate analytical approach.
Importantly, the concept of multistate transition is used for the same injury type. For instance, it can be used for addressing the development of Achilles tendinopathy (eg, no Achilles tendinopathy, moderate Achilles tendinopathy, severe Achilles tendinopathy). The next step is to acknowledge that: (i) athletes can sustain the same injury multiple times during follow-up (subsequent injuries) and (ii) athletes are able to sustain other injuries than just Achilles tendinopathy (competing risk).
Key point 1: time-to-event models allow for the inclusion of time-varying outcomes using the concept of multistate transitions. To date, there is no universally accepted way to classify sports injury into different outcome states. Sports injury researchers have the opportunity to use certain injury definitions, and have a degree of flexibility to choose the cut-offs that separate each injury state.
Time-to-event outcome question 2: how to deal with subsequent injuries?
Key question 2: a researcher has collected data on subsequent injuries (eg, athletes that sustained Achilles tendinopathy three times during the follow-up). The next step for him/her is to analyse such data. Are there certain analytical approaches needed to deal with this type of data?
Finch and Cook17 developed the subsequent injury classification (SIC) model to address multiple, subsequent and exacerbation of injuries.17 During an extended follow-up period, athletes can: (i) never sustain an injury; (ii) sustain one injury or (iii) sustain more than one injury.14 In the latter case, researchers face the analytical challenge of dealing with injuries that are statistically related as they occur in the same athlete.9 12 Most likely, the risk of sustaining subsequent injuries is influenced by previous occurrences,18 hence the correlation between within-athlete subsequent injuries needs to be taken into account when analysing data.17 19 As an example, Finch et al19 took relationships between injuries into account and found a high level of subsequent (and multiple) injuries leading to missed games in an elite athlete group.
As considerable time and resources are expended to collect these high-quality longitudinal injury data, researchers have been encouraged to make use of all of the available data.12 Unfortunately, most data analyses from these studies do not adequately address repeated injury events on the same athlete, and therefore squander its potential.12 Possibly, a reason for the limited number of analyses including multiple injuries from the same person is the complexity behind the statistical analyses. Researchers need to grasp the concept of frailty, as frailty models have been recommended for studies involving modelling of subsequent sports injury data.9 The main reason for choosing frailty models originates from their ability to correct for selection of ‘less-injury-prone’ athletes over time.20 In a 2012 paper, the frailty model was applied to rugby-related injury data to identify risk factors for contact injuries.21
Now the reader may speculate: are the concepts of multistate transitions (question 1) and subsequent injury (question 2) similar? Importantly, multistate transitions and subsequent injury should be considered two different, but nevertheless related, concepts. In figure 1, the difference between the two concepts is visualised as subsequent injury refers to injury 1 and injury 2 (marked with red), whereas each state is marked with a blue circle with arrows (representing the transitions) are connecting the states.
Key point 2: in time-to-event modelling, the researcher can consider subsequent injuries using the concept of shared frailty. This allows for correction for selection of ‘less-injury-prone’ athletes over time.
Time-to-event outcomes question 3: how to deal with competing risks?
Key question 3: in your dataset, there are data on many different injury types (eg, Achilles tendinopathy, patella-femoral pain, iliotibial band syndrome, patellar tendinopathy). However, you may only be interested in studying Achilles tendinopathy. Should you just omit all other injuries (patella-femoral pain, iliotibial band syndrome, patellar tendinopathy) when analysing the data?
In the initial presentation of the M-FASIS, OSTRC and SIC models, the possibility of dealing with multiple injuries was discussed, and the first steps were taken to promote the concept of competing risk.10 17 Standard time-to-event data include the time span from a specified time origin until the occurrence of one type of event (eg, Achilles tendinopathy).22 If there is a simultaneous risk of several types of injuries (eg, patella-femoral pain, iliotibial band syndrome, patellar tendinopathy), then these different types of injuries are considered competing, in the sense that a person can only sustain one of them at a given point in time. In the Achilles tendinopathy example, the competing risks were based on injury diagnoses. In a sports injury setting, competing risks can be assessed based on: (i) classification, for example, tendon-related injury, muscle injuries, joint-related injury and injuries unrelated to sport; (ii) anatomical location, for example, knee injury, foot injury, hip injury or (iii) diagnoses, for example, patella-femoral pain, patellar tendinopathy, Achilles tendinopathy.
To analyse competing risk data, a model describing each of these competing risks is needed. Multistate models generalise competing risk models by also describing transitions to intermediate events. Methods to analyse such models have been developed over the past three decades.23 24 Fortunately, most analyses can be performed within standard statistical software packages, but may require some extra effort with respect to data preparation and programming.22 An example from the sports injury literature that incorporates competing risks can be found elsewhere.7
The Kaplan-Meier plot, or survival curve, is recognised for its ability to visualise the probability of being injury-free (ie, surviving) or the probability of sustaining injuries (ie, not surviving) as a function of time (eg, days, weeks), training sessions or training duration.3 Importantly, Kaplan-Meier curves are no longer valid if models are extended to consider more than one type of outcome.9 23 As an example, in a competing risk setting, researchers may want to examine the probability of sustaining a tendon-related injury, with other types of injuries, such as muscle and joint-related injury and injuries occurring outside of running, being of limited interest. This presents two ways to handle data, neither of which are correct or recommended23 24:
(i) exclude all participants who sustain non-tendon-related injury;
(ii) include all participants who sustain non-tendon-related injury as right-censored observations at time of non-tendon-related injury (ie, ‘survived’ from a tendon injury).
In the first situation, substantial information about injury occurrences is ignored and it could appear (incorrectly) that only one type of injury, in this example, tendon injury, can occur. This is flawed since we already know that athletes are at risk of sustaining many types of injuries and that there is a statistical relationship between subsequent injuries.17–19
In the second situation, the Kaplan-Meier method computes the cumulative injury incidence proportion inclusive of the participants who are right-censored. However, this violates a principle that should be employed in time-to-event analyses: do not consider individuals as being at risk if they are injured, although a different type of injury as it can still result in them not being ‘exposed’ to risk. This will result in an overestimation of the cumulative incidence proportion as fewer athletes than appears are at risk at any one time. Consequently, statisticians have recommended researchers to include other types of injuries as competing risks and, somewhat provocatively, have also recommended researchers to ‘stick to this world’.23 In the sports science context, this essentially means that sports injury researchers who are willing to accept the assumption that injury-free athletes are only able to sustain one certain type of injury (eg, only Achilles tendinopathy) effectively ‘live in another world’.
In the competing risks setting, the Kaplan-Meier method should be replaced by the Aalen-Johansen estimator to consider competing risks to avoid overestimating the cumulative incidence proportion. The difference between computing the cumulative incidence proportion using the Kaplan-Meier method (figure 2A) and the Aalen-Johansen (figure 2B) can be substantial. Using the incorrect Kaplan-Meier method in the competing risk example in figure 2A, the proportion of athletes sustaining tendon injuries is 82.1% (95% CI 65.5% to 98.8%), muscle injuries is 55.6% (95% CI 38.7% to 73.0%), joint-related injuries is 61.6% (95% CI 35.7% to 88.0%) and injuries not related to running is 29.0% (95% CI 4.2% to 53.6%). By summarising these proportions, the total proportion of athletes sustaining a first-time-injury reaches 228%. Clearly, this is impossible, since an athlete only sustains a first-time-injury once and the proportion of athletes sustaining first-time-injuries cannot possibly exceed 100%. Based on this, the proportions calculated using the Kaplan-Meier method are overestimated. Instead, the proportions reported in figure 2B, using the Aalen-Johansen estimator, are unbiased, since the total proportion of tendon injuries is 42.5% (95% CI 34.2% to 50.7%), muscle injuries is 21.6% (95% CI 14.7% to 28.4%), joint-related injuries is 25.2% (95% CI 17.9% to 32.3%) and injuries not related to running is 8.6% (95% CI 3.9% to 13.3%) does not exceed 100%. Therefore, it is strongly recommended that researchers dealing with competing risks use the Aalen-Johansen estimator as their preference.
F2
Kaplan-Meier vs Aalen-Johansen estimator. Comparing outputs from a flawed analysis using the Kaplan-Meier estimator (A) and a more appropriate analysis using the Aalen-Johansen estimator (B). In the former biased scenario, the proportion of athletes sustaining injury is 228%. This is impossible, since the proportion is unable to exceed 100%. In the latter scenario, the injury proportion is close to 100%. RRI, running-related injury.
Key point 3: researchers should ‘stick to this world’ by including many injury types into the analysis using a competing risk setup. Excluding injuries of less interest is strongly discouraged as it will lead to misleading results because the injury risk is overestimated.
Time-to-event models: requirements and considerations
Proportional hazards and right-censored observations are important assumptions to consider when evaluating the appropriateness of time-to-event models. Detailed descriptions of these assumptions have been presented elsewhere.3 Here, we will deal with other theoretical assumptions and statistical requirements underpinning time-to-event analysis. Unfortunately, these assumptions and requirements can be a party pooper for the sports injury researcher willing to analyse training load-related data.
Time-to-event outcome question 4A: how to deal with EPV
Key question 4: in the present article and in ‘Time-to-Event Analysis for Sports Injury Research Part 1: Time-Varying Exposures’, we have been enthusiastic about the potential that time-to-event modelling offers the sports injury researcher. However, in science there are always caveats and limitations. So, what are the downsides of time-to-event modelling?
One of the most important and perhaps lesser known requirements when undertaking statistical modelling of data is the EPV requirement.23 25 26 This is also known as the event/variable ratio,27 which can lead to bias if inappropriate.28 To be precise, as with any regression model, time-to-event modelling can be biased if the number of explanatory variables is large in relation to the number of injuries observed.25 28 29 In an analysis using cumulative risk difference as measure of association, the recommended number of EPVs was 10.25 A three-state version of the ACWR requires at least 20 injuries, whereas the nine transitions necessitates at least 80 injuries. Moreover, at least five injuries are required in each state/transition to avoid sparse data bias (see part B below).28 At first glance, 20–80 injuries can appear manageable for most sports injury datasets. However, for this work, we extracted the sample size from 35 studies examining training load and sports injury and identified only 11 studies with a sample size exceeding 150 participants (see table 1 in the accompanying article entitled ‘Time-to-Event Analysis for Sports Injury Research Part 1: Time-Varying Exposures’). In a 150-person study, at least half of the sample size must sustain an injury to reduce the risk of bias. Greater data collection possibilities facilitated by modern wearable technologies, such as sports watches, fitness trackers and internet-based electronic health platforms support the potential for unprecedented data collection possibilities and options for the easier recording of large data.30 When designing studies on changes in training load and injury development in the future, sports injury researchers are advised to consider EPV as a supplement to sample size or power calculations. The researcher could include more athletes into the study. Another (or supplementary) approach would be to extend the follow-up period to capture a greater number of injuries.
We note that EPV considerations do not account for other contributing factors to sparse data bias such as explanatory variables with narrow distributions or with categories that are very uncommon,28 31 nor do they consider the impact of the commonly used stepwise variable selection approach which requires even more EPV than do models with prespecified variables. A better diagnostic for sparse data bias is to repeat the analysis using mild shrinkage or penalisation methods: substantial changes warn of serious bias in the original estimates.28 31 32
Key point 4A: sports injury researchers need to calculate the event/variable ratio to avoid biased results.
Time-to-event outcome question 4B: how to deal with number of injuries in each exposure state?
In addition to the EPV requirements, all exposure states and/or transitions in the analysis must include at least five events to conduct a robust statistical analysis. In table 2, the cumulative incidence proportion for different states of two exposure variables (changes in running distance and change in running intensity) are presented as an example of a result based on a flawed time-to-event. Clearly, the cumulative injury incidence proportions of −7.6% and −18.9% are flawed as an injury incidence proportion can never reach a value below 0%. Consequently, sports injury researchers working with time-to-event analyses are encouraged to show the number of injuries in each exposure state to enable readers to assess the robustness of the models presented. If the number of injuries in a certain state is below five, analysts should carefully consider reclassifying their data based on other cut-offs or reducing the number of states used in the analysis.
T2
Examples of flawed cumulative incidence proportions (%) following an analysis of data with less than five injuries in a certain state based on a relative biweekly change in running distance (categorised into four states) and relative biweekly change in running intensity (categorised into four states)
Biweekly change in running distance (states)
Reg>10%
Reg 10%–0%
Prog 0%–10%
Prog>10%
Biweekly change in running intensity (states)
Reg>10%
3.8% (5)
1.7% (0)
−18.9%(0)
13.9% (3)
Reg 10%–0%
24.2% (16)
6.8% (17)
44.8% (8)
12.3% (20)
Prog 0%–10%
10.3% (13)
16.6% (11)
25.3% (10)
22.3% (21)
Prog>10%
18.0% (3)
0.1% (0)
−7.6%(0)
9.9% (4)
In reality, cumulative injury incidence proportions range between 0% and 100%. However, some proportions in the example are negative because too few injuries in that state lead to biased estimated.
Number in parentheses represents number of injuries in each exposure state. Results based on a supplementary analysis of the RUNCLEVER dataset.40
Reg, regression; Prog, progression.
With these considerations in mind, time-to-event statistical modelling can offer a range of opportunities for researchers to include exposure variables, such as changes in training load (either as states or transitions), across the course of a study.
The concept of sparse data bias has implications for future research in sports injury aetiology. The requirement forces sports injuries to be evenly distributed across the states or transitions if the requirements behind the analysis are to be fulfilled. As visualised in figure 3, this requirement becomes particularly difficult to accommodate when a stratified analysis is required as the number of variables increases exponentially for each additional strata included. Do we consider stratified analysis in a sport injury setting, you may ask? Certainly, stratified analyses are needed if the aim is to answer the question: ‘how much change in training load is too much before injury is sustained, among athletes with different characteristics?' Responding to this question using multiple time-varying exposures (and outcomes) requires many injuries. In figure 3, an example is provided that visualises why many injuries are needed. This should not be a barrier for future research, but collaborations across borders to collect the amount of data needed seem to be an important step forward.33
F3
Stratification requires many injuries. Injury (event) requirements according to (i) a crude analysis (top green) and (ii) when including one (top yellow), two (bottom yellow), three (top red) or four (bottom red) effect-measure modifiers. In the examples, the number of injuries (events) required in a time-to-event analysis is calculated based on a cumulative risk difference (CRD) as measure of association. If other measures of association are used, the numbers could differ. In the crude analysis using acute:chronic workload ratio (ACWR) categorised into 3 states (<0.8, 0.8–1.3 and >1.3) as primary exposure (top, green), a total of 20 injuries are needed since (3 states–1 reference state)×10 injuries (events) per variable (EPV)=20. If the analysis is extended to include one effect-measure modifier (top yellow), 40 injuries are required (20 injuries in each gender-strata). If four effect-measure modifiers are included (bottom red), eg, gender (2 time-fixed groups), age (eg, 5 time-fixed groups), level of training experience (eg, 3 time-fixed groups or time-varying states) and body mass index (eg, 3 time-fixed groups or time-varying states), the total number of injuries required reach 1800 injuries (20 injuries in each of the 90 substrata).
Key point 4B: sports injury researchers should ensure there are at least five injuries in each exposure state to be analysed. Dealing with (multiple) time-varying exposures requires a considerable number of injuries to avoid violating the requirements underpinning the time-to-event analysis. Analysing data without consideration to number of injuries in each exposure state will easily lead to sparse data bias.
Time-to-event outcomes question 5: are there considerations when designing my study?
Key question 5: I want to design a new study looking into the association between changes in training load and sports injury. What should I consider when I am designing my data collection?
In the previous section, we presented important assumptions and requirements underpinning the analysis when dealing with the research question 'how much change in training load is too much before sports injury is sustained, among athletes with different characteristics?' We argued that careful attention to the EPV requirement and having at least five injuries in each exposure state is needed to avoid biased results. From experience, we have learnt that many researchers do not consider these requirements.
In most prospective sport injury studies, fewer than 1000 athletes or participants have been included (see table 1 in the accompanying article entitled ‘Time-to-Event Analysis for Sports Injury Research Part 1: Time-Varying Exposures’). Although the workload needed to logistically handle this number of participants is considerable and time-consuming, advanced data analyses involving multiple time-varying exposures and a time-varying injury outcome can literally necessitate tens of thousands (or possibly millions) of athletes to experience a sufficient number of injuries to avoid sparse data bias.28 This issue has been noted in the wider epidemiological literature.34 35 Sports injury researchers cannot always conduct the most rigorous study and/or satisfy the theoretic and practical requirements that are necessary to undertake a robust statistical analysis. However, in situations when it is financially and logistically possible to design and conduct a high-quality, large-scale epidemiological cohort study, researchers should make a concerted effort to consider and implement the necessary principles and directives discussed in this article. Moreover, to appropriately advance the science of sports injury control and prevention, sports injury researchers are expected to question assumptions underpinning statistical analyses and ask whether there are better ways of analysing data, and asking ‘the right’ questions while equally challenging contemporary aetiological theories. In doing so, advanced statistical approaches such as time-to-event analyses that are widely used in other disciplines and health science contexts can be raised to the same level of application and scrutiny for sports injury research. Time-to-events analyses offer a range of opportunities regarding modelling approaches (Cox regression vs pseudo-observation method), measure of association and graphical presentations. As these concepts have been presented elsewhere,3 an extensive description is not provided. However, the informed reader is provided with an overview of the opportunities in table 3.
T3
Differences between two time-to-event approaches, the Cox proportional hazards regression model and the generalised linear model (pseudo-observation method)
Method
Description
Cox regression
Measure of association
Hazard rate ratio. An injury rate (hazard rate) in each exposure group is estimated and the rates are compared on a relative scale (ratio).
Graphical presentation
Individual or average survival curves.
Main assumptions
Hazard rate ratio has to be constant (proportional hazard rates). The assumptions behind the Cox model can be validated using a log-minus-log plot. Do not condition on the future.
Time-varying exposure
Inclusion of one or more time-varying exposures is possible.
Time-varying outcome
Inclusion of a time-varying outcome is possible.
Advantage
The difference between groups is calculated across all points of the time scale—hence, only one estimate needs to be presented.
Events per variable
10
Shortcomings
It is not plausible to interpret a hazard rate ratio as a risk if the injury incidence mostly exceeds 10% in sports injury studies. A hazard rate ratio becomes meaningless if the assumption of proportionality is violated.
Pseudo-observation method
Measures of association
An injury proportion (cumulative risk) in each exposure group is estimated and the proportions are compared on an additive scale (cumulative risk difference) or on a relative scale (cumulative relative risk). Alternatively, the area under the Kaplan-Meier curve (restricted mean) or under the Aalen-Johansen curve (number of years/session/time-spent sport lost) can be estimated and the difference can be compared across exposure groups.
Graphical presentation
Kaplan-Meier graph (single event) or Aalen-Johansen graph (competing risk).
Main assumptions
Right censored observations, you do not condition on the future.
Time-varying exposure
Inclusion of one or more time-varying exposures is possible.
Time-varying outcome
Inclusion of a time-varying outcome is possible.
Advantages
Cumulative risk difference and cumulative relative risk is easier to interpret than a hazard rate ratio because the difference between groups is calculated at a single point on the time scale.
Events per variable
10 (risk difference) or 15 (relative risk).
Shortcomings
Requires a priori selection (and justification) of one or more time points at which comparisons are made.
Adapted with permission from Nielsen et al.3
Key point 5: researcher must consider: am I able to get the number of injuries needed in order to analyse changes in training load as a time-varying exposure to sports injury? How many injuries are likely to occur in each exposure state (or transition)? How many cut-offs to separate the exposure groups are suitable?
Time-to-event outcomes versus other methods
This article has introduced the concept of time-varying outcomes, including competing risk and subsequent events in context of time-to-event modelling. One condition of time-to-event analysis is that the outcome of interest must be expressed as a dichotomous or categorical variable as opposed to continuous data on a ratio-interval scale.3 Nowadays, most data on sports injury are non-continuous, irrespective of whether the outcome definition is time-loss-based, burden-based, medical-attention-based or based on severity. Consequently, time-to-event analyses are appropriate in most cases. However, if injury data are collected based on a continuous scale (eg, fluctuating symptoms of a pathology such as tendinopathy), other statistical methods are needed.
Time-to-event outcomes question 6: are there alternative methods?
Key question 6: it is difficult to collect the amount of data needed to avoid violating the assumptions and requirements needed to perform a robust time-to-event analyses on a change in training load-related question. Accordingly, are there any alternative methods that could be considered?
Complex systems and computational modelling have received more attention in the sports injury science literature recently.36 These methods are complementary to traditional statistical modelling and time-to-event analyses. In a small sample setting or in the absence of large-scale data, alternative computational systems science methods, including simulation-based techniques, could be considered alongside, or integrated with, traditional statistical approaches.36 For example, the use of agent-based modelling (ABM) has been recently promoted and discussed as a complementary method for sports injury research.37 Specifically, ABM is a form of computational science that involves modelling the behavioural dynamics of individual micro-entities known as ‘agents’. These agents can interact with one another and learn over time based on past experiences; update their internal ’states' autonomously and/or create global patterns of behaviour. In relation to both time-to-event modelling and sports injury aetiology, the clear advantage of ABM lies in its capability to model hundreds or thousands of athletes, of whom can be assigned real-world demographics (eg, age), biologic (eg, sex), lifestyle (eg, diet) and/or training-related (eg, primary workload exposure) characteristics.37
We have demonstrated in this paper that in order to conduct a robust statistical sports injury analysis and avoid sparse data bias, the number of injuries observed in each exposure state (or transition) should exceed 5. Accordingly, the flexibility of ABM and other simulation-based techniques could offer a potential workaround to the requirements in traditional statistical analyses, especially when sports injury researchers aim to further stratify samples to prioritise and understand how workloads and other time-varying exposures change status during follow-up.30 38 39 With continued application and ingenuity, computational simulations might be able to capture a sufficient number of sports injuries per explanatory variable modelled, affording theoretical insight into the supposed aetiologic mechanism(s). Despite the versatility of computational methods, a word of caution is advised. Unlike traditional statistical modelling, the assumptions underpinning computational models are often based on subject matter knowledge and other various forms of empirical evidence. Thus, the underlying data-driven assumptions and theoretical causal mechanisms encoded into simulations should be explicitly described as a basis for evaluating model predictions.34 35
Key point 6: the use of computational modelling could be considered as a complementary and alternative approach to time-to-event modelling in future sports injury research applications because no considerations to number of injuries is needed. However, unlike traditional statistical modelling, the assumptions underpinning computational models are often based on subject matter knowledge and other various forms of empirical evidence. If these are wrong, the results from the analyses will be questionable.
Conclusion
In this paper, we have discussed how the concept of time-varying outcomes, including competing risk and subsequent injuries can be used in time-to-event modelling to investigate injury aetiology in a sports injury context. First, time-to-event models was described that permit the inclusion of time-varying outcomes using the concept of multistate transitions. Second, researchers can consider subsequent injuries using the concept of shared frailty. Third, competing risk was highlighted as it enables researchers to include all types of injuries in their analyses. Finally, we presented often overlooked requirements related to events per variables and number of injuries in each exposure state. Consideration to these requirements are needed prior to any data collection to avoid conducting statistical analyses on time-to-event data leading to biased results.
All authors contributed equally in writing the educational review. DR performed the analyses leading to the results in Table 2 and Figure 2.
The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
None declared.
Obtained.
Local ethics committee central Denmark region (N-20140069)
Not commissioned; externally peer reviewed.
ReferencesSatoY, GoshoM, NagashimaK, et al. Statistical Methods in the Journal - An Update. MahmoodA, UllahS, FinchCF. Application of survival models in sports injury prevention research: A systematic review. NielsenRØ, MalisouxL, MøllerM, et al. Shedding light on the etiology of sports injuries: A look behind the scenes of time-to-event analyses. MalisouxL, ChambonN, UrhausenA, et al. Influence of the heel-to-toe drop of standard cushioned running shoes on injury risk in leisure-time runners: A randomized controlled trial with 6-month follow-up. MalisouxL, ChambonN, DelattreN, et al. Injury risk in runners using standard or motion control shoes: a randomised controlled trial with participant and assessor blinding. MøllerM, NielsenRO, AttermannJ, et al. Handball load and shoulder injury rate: a 31-week cohort study of 679 elite youth handball players. NielsenRØ, ParnerET, NohrEA, et al. Excessive progression in weekly running distance and risk of running-related injuries: an association which varies according to type of injury. NielsenRO, BertelsenML, ParnerET, et al. Running more than three kilometers during the first week of a running regimen may be associated with increased risk of injury in obese novice runners. UllahS, GabbettTJ, FinchCF. Statistical modelling for recurrent events: an application to sports injuries. ShrierI, SteeleRJ, ZhaoM, et al. A multistate framework for the analysis of subsequent injury in sport (M-FASIS). ClarsenB, MyklebustG, BahrR. Development and validation of a new method for the registration of overuse injuries in sports injury epidemiology: the Oslo Sports Trauma Research Centre (OSTRC) overuse injury questionnaire. FinchCF, MarshallSW. Let us stop throwing out the baby with the bathwater: towards better analysis of longitudinal injury data. FinchCF, FortingtonLV. So you want to understand subsequent injuries better? start by understanding the minimum data collection and reporting requirements. FortingtonLV, van der WorpH, van den Akker-ScheekI, et al. Reporting multiple individual injuries in studies of team ball sports: A systematic review of current practice. GabbettTJ. The training-injury prevention paradox: should athletes be training smarter and harder?YamatoTP, SaragiottoBT, LopesAD. A consensus definition of running-related injury in recreational runners: a modified Delphi approach. FinchCF, CookJ. Categorising sports injuries in epidemiological studies: the subsequent injury categorisation (SIC) model to address multiple, recurrent and exacerbation of injuries. TooheyLA, DrewMK, CookJL, et al. Is subsequent lower limb injury associated with previous injury? A systematic review and meta-analysis. FinchCF, CookJ, KunstlerBE, et al. Subsequent injuries are more common than injury recurrences: An analysis of 1 season of prospectively collected injuries in professional australian football. OlesenAV, ParnerET. Correcting for selection using frailty models. GabbettTJ, UllahS, FinchCF. Identifying risk factors for contact injury in professional rugby league players-application of a frailty model for recurrent injury. PutterH, FioccoM, GeskusRB. Tutorial in biostatistics: competing risks and multi-state models. AndersenPK, KeidingN. Interpretability and importance of functionals in competing risks and multistate models. AndersenPK, GeskusRB, de WitteT, et al. Competing risks in epidemiology: possibilities and pitfalls. HansenSN, AndersenPK, ParnerET. Events per variable for risk differences and relative risks using pseudo-observations. PeduzziP, ConcatoJ, FeinsteinAR, et al. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. AbrairaV, MurielA, EmparanzaJI, et al. Reporting quality of survival analyses in medical journals still needs improvement. A minimal requirements proposal. GreenlandS, MansourniaMA, AltmanDG. Sparse data bias: a problem hiding in plain sight. PeduzziP, ConcatoJ, KemperE, et al. A simulation study of the number of events per variable in logistic regression analysis. BertelsenML, HulmeA, PetersenJ, et al. A framework for the etiology of running-related injuries. GreenlandS, MansourniaMA. Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions. MansourniaMA, GeroldingerA, GreenlandS, et al. Separation in logistic regression: Causes, consequences, and control. van DykN, van der MadeAD, TimminsRG, et al. There is strength in numbers for muscle injuries: it is time to establish an international collaborative registry. HernánMA. Invited commentary: Agent-based models for causal inference—reweighting data and theory in epidemiology. MarshallBD, GaleaS. Formalizing the role of agent-based modeling in causal inference and epidemiology. EhI, RahmandadH, ShohamDA, et al. Reconciling statistical and systems science approaches to public health. HulmeA, ThompsonJ, NielsenRO, et al. Towards a complex systems approach in sports injury research: simulating running-related injury development with agent-based modelling. NielsenRO, BertelsenML, MollerM, et al. Training load and structure-specific load: Applications for sport injury causality and data analyses. DrewMK, FinchCF. The relationship between training load and injury, illness and soreness: A systematic and literature review. RamskovD, NielsenRO, SørensenH, et al. The design of the run Clever randomized trial: running volume, -intensity and running-related injuries.