Article Text

Download PDFPDF
How can we prove that a preventive measure in elite sport is effective when the prevalence of the injury (eg, ACL tear in alpine ski racing) is low? A case for surrogate outcomes
  1. Josef Kröll1,
  2. Jörg Spörri1,
  3. Sophie Elspeth Steenstrup2,
  4. Hermann Schwameder1,
  5. Erich Müller1,
  6. Roald Bahr2
  1. 1Department of Sport Science and Kinesiology, University of Salzburg, Hallein-Rif, Austria
  2. 2Department of Sports Medicine, Oslo Sports Trauma Research Center, Norwegian School of Sports Sciences, Ullevål Stadion, Oslo, Norway
  1. Correspondence to Dr Josef Kröll, Department of Sport Science and Kinesiology, University of Salzburg, Schlossallee 49, 5400 Hallein-Rif, Austria; josef.kroell{at}sbg.ac.at

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

When dealing with small cohorts, as is typical in elite sport, the well-known four-step ‘sequence of prevention’ described by van Mechelen et al1 (figure 1) potentially represents a vicious circle: When introducing a prevention measure, an otherwise reasonable call for targeting specific subgroups (ie, relevant groups of athletes, injury locations and specific injury causes) may undermine study power, breaking down an already-small baseline cohort into undersized pieces. Consequently, statistical testing becomes impossible.

Figure 1

The four-step ‘sequence of prevention’ as described by van Mechelen et al.1

To illustrate the problem we (1) discuss a recently implemented preventive measure in alpine ski racing as an example, (2) highlight the influence of sample size and effect size on study power and the possibility for statistical hypothesis testing and (3) provide a solution to increase study power for comparable injury prevention initiatives in elite sports.

General effects but underpowered in subgroups

In elite alpine ski racing, we recently tested potential preventive measures (eg, ski equipment changes) that target a specific body part (eg, knee/ACL injuries),2 their specific mechanisms (eg, aggressive ski–snow interaction driven by the skiing equipment)3 4 and specific disciplines (eg, different ski alterations in downhill, super-G and giant slalom).5 Based on this research process,6 the International Ski Federation (FIS) introduced new equipment rules for the 2012–2013 season.

The effect of these changes was assessed by repeating step 1 of the van Mechelen model.7 Pre–post comparison data documented that the overall relative injury rate was reduced substantially and statistically significantly, by 24%, in the three seasons after implementing the new ski regulations compared with the six seasons before (risk ratio 0.76, 95% CI 0.59 to 0.98)7 . However, due to limited statistical power within subgroups, the ability to draw conclusions on the effects on specific injury types (eg, knee/ACL injuries) in specific disciplines (eg, giant slalom) is severely restricted.7 This is illustrated in table 1, which includes data before and after changing the ski regulations. There is substantial overlap in the confidence intervals, and therefore, no statistically significant differences were detected for ACL injuries within any of the subdisciplines, although the risk ratio for giant slalom was 0.65.

Table 1

Number of ACL ruptures, runs and relative injury rate (injuries per 1000 competition runs with 95% CI) for each discipline

Statistically testing the effectiveness of prevention measures: mission impossible in elite sports?

Statistical hypothesis testing is a dichotomous process of rejecting the null hypothesis and accepting the alternative hypothesis. Problems associated with this categorical cut-off of p values were discussed in a recent BJSM editorial.8 If a p value exceeds the predefined significance level, there is a risk of making a type II error, that is, concluding that there is no significant difference between groups when, in fact, such a difference exists. Inversely related to the risk of making such a type II error is the statistical power, which is determined by (1) effect size (larger effects are easier to detect) and (2) sample size (larger numbers of observations or cases make it easier to detect a difference).

Figure 2 illustrates the effect of modelling how increasing the effect size would change our example, that is, if we could reduce the number of postintervention (ski equipment change in 2012) ACL injury cases to an extremely low level while keeping the number of observations (skier runs) constant. For giant slalom, differences would only reach significance if ACL injuries were reduced to one or zero case. To restrict ACL injuries to just one case, the effect size would need to be 0.09 (ie, a 91% risk reduction); this is far unrealistic in the setting of an injury with multiple causes.

Figure 2

Observed and hypothetical relative ACL rupture incidences (injuries per 1000 competition runs with 95% CI) for the giant slalom discipline: Pre (2006/2007–2011/2012) represents six seasons before, and Post (2012/2013–2015/2016) represents four seasons after the change in ski regulations. Post (maximised effect) represents simulated data with artificially increased effects (only one case of ACL injury). The incidence 95% CI calculations are based on a Poisson model, as described in table 1.

The other option, increasing statistical power by increasing the sample size, would mean either collecting more injury data within the same observation period or extending the observation period. The FIS Injury Surveillance System already captures injury and exposure data on 75% of all World Cup skiers.2 What about extending the observation period? For giant slalom, assuming that the ACL injury incidence rate stayed the same (risk ratio 0.65; table 1), we would have to observe for 44 years prior to and for a further 44 years after introducing new equipment to recognise a statistically significant reduction in ACL injuries! This too is unrealistic.

Between a rock and a hard place: not just in skiing

Our example illustrates that under certain circumstances, a specific and targeted prevention measure cannot be evaluated by classical statistical hypothesis testing. However, it would be inappropriate to state that a preventive program is ineffective if there are non-significant results and the statistical power is low.

Note that the problem of underpowered studies is not restricted to alpine skiing. Consider testing the effect of stricter interpretation of the Laws of the Game in professional football on contact injuries: A priori power calculations showed that to detect an effect on head injury, injury risk would have to be reduced by as much as 70%, which is unrealistic.9 In summary, we have made the case that when dealing with small cohorts and targeting specific injury types, studies are likely to be underpowered. Hence, is step 4 of the van Mechelen model (ie, testing the effectiveness of prevention measures)1mission impossible in elite sports?

A potential solution

For the above-mentioned football study, the solution proved to be a surrogate outcome measure. The authors counted contact-related match interruptions after head impacts—these were believed to represent a measure of head injury risk—instead of counting only situations where contact injuries actually occurred.9 This provided a larger number of observations. This decreased the confidence interval and, in turn, lowered the risk of erroneously concluding that there was no significant difference between groups. Statistical power was increased for the surrogate outcome measure (head-contact-related interruptions rather than head injuries alone).

Applying this approach to our example in alpine ski racing would mean assessing all incidents where skiers did not finish their run and/or suffer events similar to those that typically lead to ACL injury.4 We conclude that identifying appropriate surrogate measures of injury risk would increase statistical power when testing the effectiveness of preventive measures in elite sports. This would be valid, as long as the observed (surrogate) incidents were frequently associated with injury.

References

Footnotes

  • Contributors JK designed and conceptualised the paper. SES and RB served the new ACL data. JK and JS wrote the first draft of the paper. All authors contributed to the manuscript and approved the final version.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.