Statistics from Altmetric.com
P values have been the subject of debate for decades. Many researchers tend to think—or at least describe—their study outcomes to be either true or false based solely on p values.1 While a recent British Journal of Sports Medicine editorial provided a primer for sports medicine researchers to correctly understand p values,2 we aim to extend this discussion by reminding researchers to adopt a thoughtful approach to the entire statistical analysis process, using the relevant tools to build their case.
The goal of the sports medicine researcher—a building analogy
Scientific reasoning can be viewed as building a case for the existence (or non-existence) of phenomena. For sports medicine researchers, their specific goals may include the case for or against a treatment’s effectiveness, the risk of injury associated with certain risk factors, or the ability to predict a certain outcome given a set of criteria.
As with any building process, a number of steps are required. In this analogy, the process should include the following: designing the blueprint (preregistering studies where possible and outlining planned analyses), laying a foundation of sound data collection (appropriate sampling strategy, blinding where possible, ensuring measurement validity/reliability), understanding the flooring and roofing of study power (effect size, sample size and others), and painting the internal/external validity of the study. The entire process takes place in the neighbourhood of the research field, where existing knowledge informs the believability of the hypothesis and the minimally important differences. Finally, for the integrity of the building, researchers should stick with the building plan during the study and when reporting (eg, the Strengthening the Reporting of Observational Studies in Epidemiology, the Consolidated Standards of Reporting Trials and others).3
What tools do researchers have?
Data analysis and interpretation are an important step in this scientific building process. Critical, thoughtful analysis has too often been replaced with a binary decision of importance solely based on the p value.4 This is not good practice. Instead, a researcher should carefully consider which tools are available, choosing the ones that most closely align with their research question. Figure 1 lists some tools (generally used within the frequentist/null hypothesis significance testing framework) that should be part of the researchers’ toolkit—and used on a ‘fit-for-purpose’ basis (figure 1). The biological concept of ‘form follows function’ is meant for a different setting but it comes to mind here—the function is key.
Subject matter knowledge
Content knowledge of the field helps researchers know what stage of research is appropriate (eg, exploratory vs confirmatory, aetiology vs effectiveness vs implementation). Further, it informs how substantial an effect must be before it is practically relevant. It is also the substrate within which research findings are interpreted, not by positioning ‘significant’ versus ‘non-significant’ findings against one another, but critically evaluating various research designs, magnitudes of effect and inferences from significance tests.
In many areas of medical research, effect size carries more weight than statistical significance, because it answers the questions researchers really want to know—‘how well does a treatment work?’ and ‘how large is the risk?’.5 Effect sizes allow researchers to interpret whether the findings are practically relevant, not just if they have enough precision to be considered ‘statistically significant’. It is always advisable to accompany statistical tests with some measure of effect size.6 Moreover, relative effects (eg, relative risks, odds ratios, etc) should be reported alongside absolute effects, so that readers may understand an effect or association more clearly.
Along with effect sizes, CIs are part of the ‘New Statistics’, which propose estimation as the antidote to significance testing.7 If one calculates, say, 95% CIs repeatedly in valid applications, 95% of them, on average, will contain (ie, include or cover) the true effect size. The 95% is a property of a long sequence of CIs computed from valid statistical models and independent experiments, rather than a property of any single CI. Given their connection to the p value, they share the limitation of sample size sensitivity,8 but are championed by many over p values since they convey information about effect magnitudes and their precision, instead of a binary interpretation based on whether they cross 0 or not.
Minimal important difference
When is a study result important for athletes, clinicians and coaches? The minimal important difference may provide answers since it represents a more formalised method for researchers to convey their effect sizes and CIs in the context of their specific research area.4 By using a predetermined value for the ‘minimal important difference’, for which all smaller values are deemed ‘trivial’, researchers may avoid the trap of accepting a small, but statistically significant, finding as real-world meaningful. Magnitude-based inferences are also based on this idea, where qualitative inferences are calculated for the likelihood that the effects are negative, trivial or beneficial.9
Clearly, the p value is a limited tool.5 However, researchers may find p values helpful in specific situations, reporting them as complementary measures with other tools. For example, some authors may conduct exploratory research, using p values as an ‘unexpectedness test’.5
What tools should sports medicine researchers build with?
There is no ‘correct’ statistical tool, only the ones that fit the authors’ analytical goals. Note, researchers may adopt an entirely different framework for their analysis (eg, Bayesian analyses). When critically selecting from the interpretational tools listed above, or others not listed here, researchers should thoughtfully articulate their findings within their research context—they may even do so without p values. More power to them. Unthinking, uncritical use of any tool, statistical or otherwise, is dangerous10—it can lead to incorrect interpretations of data, and thus inappropriate patient management and athlete advice. Researchers should avail themselves of the many statistical tools at their disposal—critically selecting and justifying the right tool for each specific analysis, ideally with the help of a statistician and/or an epidemiologist.
Contributors JW wrote the first draft of the manuscript. All three authors contributed to the concept and critical revision of the manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors. At the time this editorial was written, JW was a Vanier Scholar, funded by the Canadian Institutes of Health Research.
Competing interests None declared.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.