Statistics from Altmetric.com
Randomised trials are widely considered the ‘gold standard’ for causal inference, because on average randomisation balances covariates between treatment groups, even if those covariates are unobserved. However, trials are not immune to random confounding, as well as selection bias and measurement bias. Therefore, special care is needed in the design and analysis stages of randomised trials. Here, we review some important methodological aspects of randomised controlled trials in the context of a recently published paper, which assessed the effect of McKenzie method of mechanical diagnosis and therapy on pain and disability in patients with chronic, non-specific, low back pain using a randomised placebo-controlled trial.1
Method of randomisation
Garcia et al state that they randomly assigned 148 participants to two groups of similar sizes, that is, 74 patients per group using simple randomisation. However, simple (unrestricted) randomisation, equivalent to repeated fair coin tossing, can lead to treatment groups of markedly different sizes in small trials and thus imprecise effect estimates. In fact, the authors were very fortunate, as the probability of complete balance in their study is just 6.5%, and the probability of imbalances equal or greater than 10 (ie, 79 vs 69) is non-negligible (46.0%). Balanced block randomisation with, say, 37 blocks of size 4, would have insured balance in the number of patients being allocated to intervention or placebo in this study, although its sequence is more predictable than simple randomisation (to fix the latter problem, one can use larger block sizes, and randomly varying the block size). More importantly, balanced blocking prevents from substantial periodic imbalance and thus is often recommended for assignment in randomised trials.2 Successful randomisation also depends on allocation concealment, which authors achieved by use of sealed, opaque, numbered envelopes. Failure to conceal the assignments at the point of enrolment is a well-known cause of bias.3
Adjustment for the baseline values of the outcome
Randomised trials are subject to random (chance) confounding as randomisation balances all baseline covariates only in expectation, and a particular allocation could be imbalanced with respect to baseline risk factors.4 An important potential confounder is the outcome at baseline; thus analysis of covariance (ANCOVA), with baseline values of the outcome as covariate, has been recommended for the analysis of randomised trials with one follow-up visit.5 6 Unknown to many researchers, the analysis of change scores, that is, the difference between follow-up and baseline values, does not adjust for the baseline values as the differences are clearly correlated with baseline values, sometimes known as regression to the mean. Note that the effect estimates from ANCOVA and the analysis of change scores are the same if the outcome baseline values are exactly balanced, although ANCOVA may still be preferred in terms of the precision of the effect estimate. Extending ANCOVA to the analysis of randomised trials with more than one follow-up visit will be discussed in the next point.
The baseline values of the outcome of pain intensity are not substantially imbalanced (table 1 of the paper), so we are not very concerned about confounding bias, but adjustment for baseline values can still be helpful for increasing efficiency as the model used was linear. The imbalance in baseline values of disability does not seem to be negligible. It is good that the authors did not report the significance tests of baseline difference, a still common misuse in the literature; adjustment for variables that differ significantly at baseline is likely to bias the treatment effect estimate.3
Analysis of longitudinal data from randomised trials with more than one follow-up visit
According to the design of Garcia and colleagues’ paper, patients had to attend four follow-up visits: at the end of treatment (5 weeks), and 3, 6 and 12 months after randomisation. In this trial, the prespecified primary outcome was at 5 weeks follow-up, which does not preclude an analysis using all time points. An important point is that the analysis method should account for within-subject correlation in repeated outcome measurements. One possibility is using random-effect models, of which repeated measures analysis of variance is a special case; another alternative is generalised estimating equation method.7 There is also a simpler approach, called summary measures or response feature analysis, where, in the first step, one gets rid of within-subject correlation by combining the repeated measures on individuals into a suitable summary measure (eg, average outcome over time), which is then analysed in a second step.8
There are two general approaches to assessing treatment effects. One approach is testing the interaction between treatment group and time, which is a generalisation of the analysis of change scores mentioned in the previous point. Like the analysis of change scores, a disadvantage of this approach is that it does not account for the baseline values of outcome. In the statistical methods section of the paper, the authors state that they used interaction terms between treatment groups and time in a linear mixed model. Unfortunately, they did not report the p value for the global interaction test. A better approach starts the analysis from 5 weeks after treatment, that is, time is coded as month since the end of treatment (at week 5) and adjusts for the baseline values of the outcome. This approach can estimate the effect of treatment at week 5 and the mean change in the effect of treatment per month after the end of treatment.7
Clinically important effect size
The effect size estimate for pain intensity was 1.00 (95% CI0.01 to 2.09) on a 10-point scale. The authors had used 1-point change in pain intensity in the sample size calculation at the design stage. The authors concluded that ‘We found a small and likely not clinically relevant difference in pain intensity…’, as based on some references a 2-point change was considered as the minimal detectable change for the numerical pain rating scale.
We draw attention to two points: (1) if 1-point change in pain intensity is not a clinically important effect size, it should not be used in the sample size calculation section of the paper. Of course, the required sample size with 2-point change is smaller than the actual sample size calculated for 1-point change, so there is no concern about the power of the study; and (2) clinically important effect size should ideally be determined based on clinical considerations, for example, the important consequences of the outcome.9 In the absence of such information, one can gauge the effect size in relation to its SD in the studied population. As an example, the estimate of 1-point difference in pain intensity is more than one half of its baseline SD reported in table 1, which would be at least a medium effect size based on Cohen’s rule.10
Postrandomisation exclusion and intention-to-treat analysis
The paper states that the analysis was intention-to-treat. However, one participant was excluded after randomisation because he had a diagnosis of cancer during the period of treatment. Any exclusion after randomisation violates the intention-to-treat principle and could introduce selection bias.11 However, when ineligible participants are mistakenly included, investigators could safely remove them without violating the intention-to-treat principle, if the decision only relies on information that reflects the patient’s status before randomisation. We note that here only one patient was excluded after randomisation, so impact would be negligible irrespective of the reason for exclusion.
It should be noted that the intention-to-treat effect can only be estimated in the absence of censoring and other forms of missing outcome as is the case of Garcia and colleagues’ study. In the presence of a non-negligible number of missing outcomes, the analysis of randomised trials requires appropriate adjustment for selection bias using multiple imputation or inverse probability weighting to estimate the intention-to-treat effects.12 13
Blinding of outcome assessors
One important point about Garcia and colleagues’ paper is that assessment of outcome can be done blind even when the therapy cannot be delivered blinded. The authors assessed the success of blinding by asking the assessor after the trial, which is not a good idea: if the active intervention is indeed beneficial, his/her guesses are expected to be better than those produced by chance.3
As a service to the BJSM community, we have illustrated a few important methods issues by using examples from this paper. The randomised trial is good in terms of methodology; the validity of results would have been strengthened if the authors had addressed the critical issues mentioned in this commentary. As there are many ways in which researchers need to take care in how they design, analyse and interpret trials, BJSM will continue to feature methods advances in stand-alone educational features and alongside relevant papers.3
We thank Rasmus Østergaard Nielsen for helpful comments on an earlier draft of this commentary.
Competing interests None declared.
Provenance and peer review Not commissioned; internally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.