Introduction In sports physiotherapy, medicine and orthopaedic randomised controlled trials (RCT), the investigators (and readers) focus on the difference between groups in change scores from baseline to follow-up. Mean score changes are difficult to interpret (‘is an improvement of 20 units good?’), and follow-up scores may be more meaningful. We investigated how applying three different responder criteria to change and follow-up scores would affect the ‘outcome’ of RCTs. Responder criteria refers to participants’ perceptions of how the intervention affected them.
Methods We applied three different criteria—minimal important change (MIC), patient acceptable symptom state (PASS) and treatment failure (TF)—to the aggregate Knee injury and Osteoarthritis Outcome Score (KOOS4) and the five KOOS subscales, the primary and secondary outcomes of the KANON trial (ISRCTN84752559). This trial included young active adults with an acute ACL injury and compared two treatment strategies: exercise therapy plus early reconstructive surgery, and exercise therapy plus delayed reconstructive surgery, if needed.
Results MIC: At 2 years, more than 90% in the two treatment arms reported themselves to be minimally but importantly improved for the primary outcome KOOS4. PASS: About 50% of participants in both treatment arms reported their KOOS4 follow-up scores to be satisfactory. TF: Almost 10% of participants in both treatment arms found their outcomes so unsatisfactory that they thought their treatment had failed. There were no statistically significant or meaningful differences between treatment arms using these criteria.
Conclusion We applied change criteria as well as cross-sectional follow-up criteria to interpret trial outcomes with more clinical focus. We suggest researchers apply MIC, PASS and TF thresholds to enhance interpretation of KOOS and other patient-reported scores. The findings from this study can improve shared decision-making processes for people with an acute ACL injury.
- knee ACL
- knee surgery
- randomised controlled trial
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Patient-reported outcome measures (PROM) are recommended as the primary outcome in musculoskeletal clinical trials.1 PROM results are often presented as (1) change in and (2) absolute mean scores. These can be difficult to interpret clinically—if a patient improves her PROM score after treatment by 39 points, is that success or failure? What other ways are there to evaluate treatment outcome?
Responder analysis is a simple concept where each participant in a study is classified as either a ‘responder’ or a ‘non-responder’ to the treatment.2 3 There are numerous ways of making that distinction and we will here focus on three clinically relevant ones—(1) minimal important change (MIC), (2) patient acceptable symptom state (PASS) and (3) patient-reported ‘treatment failure’ (TF).
The minimal clinically important difference was defined in 1989.4 Since then, more than 10 different definitions and calculation methods have been introduced, all based on a change in the PROM score.5 We use the term MIC introduced by de Vet et al in 2006.6 MIC is ‘A change that patients would consider important to reach in their situation, dependent on baseline values or severity of disease, on the type of intervention, and on the duration of the follow-up period.’6 An example of an MIC applicable to young adults with a surgically reconstructed knee would be a change of 12 units on a scale from 0 to 100.7
If a patient’s health were to change from ‘terrible’ to ‘mediocre’, this might seem like a ‘success’ to the clinician (and prove statistically significant in a trial) but does the patient feel ‘mediocre’ to be an acceptable state? How patients feel at time of follow-up may be more important than how much they have changed. The PASS identifies patients who consider themselves well.8 Researchers can calculate the PASS threshold for a PROM score by providing the patient with an anchor item (eg, ‘Considering your knee function, do you feel that your current state is satisfactory?’).8 An example of a PASS threshold applicable to young adults with a surgically reconstructed knee would be to reach 72 on a 0–100 worst to best PROM scale.9 The MIC and the PASS complement each other and identify participants who are (1) ‘feeling better’ (who achieved the MIC threshold of 12 for a change score) and (2) those ‘feeling good’ (achieved the PASS threshold of 72 at treatment follow-up).
What about patients at the other end of the result spectrum? TF has largely been glossed over in the literature, often being ‘the element that must not be named.’10 We have introduced a novel anchor-based method to define those participants who find their treatment has failed them (TF).7 9 An example of a TF threshold applicable to young adults with a surgically reconstructed knee would be a score of 28 or less on a 0–100 worst to best scale.9
The question ‘What is successful treatment?’ appears simple at first glance but has many complexities. We examined each of the three different approaches discussed above, using data from a clinically relevant, much debated trial. The KANON trial (ISRCTN84752559) was the first and to date only high-quality randomised controlled trial comparing the two treatment strategies: exercise therapy plus early reconstructive ACL surgery, and exercise therapy plus delayed ACL reconstructive surgery, if needed.11 The 2-year and 5-year reports showed that both treatment groups improved and that traditional ‘success measures’ (group mean change in Knee injury and Osteoarthritis Outcome Score, KOOS) could not identify any relevant between-group differences.11 12
We tested whether by applying three different responder criteria—MIC, PASS and TF—we might reveal previously hidden secrets in the outcome of that study.
We applied three different responder criteria: the MIC, the PASS and TF to the 2-year KOOS intention to treat (ITT) data from the KANON trial.
The three responder criteria MIC, PASS and TF used here were previously established in a cohort of about 600 randomly selected subjects having had surgical reconstruction of their ACL and enrolled in the Norwegian Knee Ligament Registry (mean age at time of 6–24 months’ follow-up 29.7 years, 53% females).7 9 The responder criteria were established using anchor-based methods. Six to 24 months after their ACL reconstruction (ACLR), subjects responded to a set of anchor items for the MIC, PASS and TF (table 1). PASS and TF values were calculated as mean summary scores for subjects who responded ‘yes’ to the PASS and TF questions.9 MIC values were derived with the predictive modelling method and represent the smallest change in KOOS score that is deemed important by the average patient.7 13 An in-detail description of the predictive modelling method used to determine the KOOS MIC values is found in the original publication.7 KOOS cut-off scores corresponding to the three responder criteria are given in table 2.
The KANON trial
The KANON trial compared two treatment strategies in 121 young, active adults with acute ACL injury to a previously uninjured knee: physiotherapist-supervised progressed goal-based exercise therapy plus early ACLR (n=62) and physiotherapist-supervised progressed goal-based exercise therapy plus delayed ACLR, if needed (n=59). At 2 years, 60 subjects had had exercise therapy plus early ACLR, 36 subjects had had exercise therapy alone and 23 subjects had had exercise therapy plus delayed ACLR.11 The primary outcome in the KANON trial was the change from baseline to 2 years in the average score for four of the five KOOS subscales, covering pain, symptoms, difficulty in sports and recreational activities, and quality of life (QOL) (KOOS4), with scores ranging from 0 (worst) to 100 (best). Prespecified secondary outcomes included results on all five KOOS subscales (the fifth scale being activities of daily living). The results from both ITT and as treated analyses were previously reported at 2 and 5 years with no differences between groups.11 12 In this study we apply responder criteria to the ITT results at 2 years.
Results are presented as the proportions of patients in the two treatment arms achieving the KOOS scores corresponding to an MIC, PASS and TF at 2 years. For the purpose of the present study, we decided a priori that a 10% difference between trial arms in the proportion of patients meeting the MIC and PASS criteria was to be considered as a clinically meaningful difference. As TF is considered a more serious outcome than satisfaction, we considered a 5% difference to be clinically important. These cut-offs do not rest on solid evidence but are rather operational for the purpose of this paper. We took into consideration clinical reasoning about the difference needed to recommend one treatment over another for this patient population.
Minimal important change
Applying the change responder criteria (MIC) to the KANON trial change scores from baseline to 2 years gave the information that across the KOOS4 and the five KOOS subscales, 90%–95% in the two treatment arms found themselves to be at least minimally but importantly improved, with the exception of the secondary outcome KOOS QOL where 75% of those randomised to early reconstructive ACL surgery and 84% of those randomised to optional delayed reconstructive ACL surgery were reported to be minimally but importantly improved (figure 1 and online supplementary file 1). There were no statistically significant or clinically meaningful between-group differences.
Patient acceptable symptom state
Applying the cross-sectional PASS responder criteria to the KANON 2-year follow-up scores showed that 53% in both treatment groups reported their KOOS4 scores to be satisfactory (figure 2 and online supplementary appendix table 2). Between 42% and 71% of subjects reported their KOOS subscale scores to be satisfactory (figure 2 and online supplementary appendix table 2). There were no statistically significant or clinically meaningful between-group differences. The greatest between-group difference (9%) was found in the KOOS symptoms subscale where satisfactory results were more frequent among those randomised to exercise therapy plus optional delayed reconstructive surgery compared with those having had early reconstructive surgery in addition to exercise therapy.
Of those 112 individuals who achieved the MIC threshold, only 56% were above the PASS threshold and 5% had KOOS4 scores representing TF (table 3).
A total of 121 patients were included in the KANON trial, and the 5% considered a meaningful between-group difference in proportions of patients reporting scores representing TF translated into very few individuals. For transparency, we therefore report both proportions (%) and absolute numbers (n) when appropriate. Results are shown in figure 2 and online supplementary appendix table 3.
Applying the TF criterion yielded the information that for KOOS4, 8% and 9%, or five patients, in each treatment arm thought their knee status was so unsatisfactory that their treatment had failed them (figure 2 and online supplementary appendix table 3). With the exception of QOL, there were no clinically meaningful between-group differences. For the KOOS subscale QOL there was a difference of 5% considered clinically meaningful; 5% (n=3) in the group randomised to exercise therapy plus early reconstructive surgery reported a score representing TF compared with 10% (n=6) in the group randomised to exercise therapy plus optional delayed reconstructive surgery. There were no statistically significant between-group differences.
We present three complementary ways of sharing trial results in a clinically meaningful way. This approach is starkly different from the commonly reported group mean treatment results. It matters if treatment outcome is defined as ‘feeling better’ (score improvement from baseline to 2 years) or as ‘feeling good’ (absolute score at 2 years). While 94% and 92% in the two KANON treatment arms were ‘feeling better’ after 2 years, only every second patient in both treatment arms ‘felt good’ at 2 years.
At 2 years, almost 10% of participants in the two treatment arms had KOOS4 scores that corresponded to what ACL-reconstructed patients consider as TF. We found a similar (‘failing’) overall pattern for each of the five KOOS subscale scores; the results for both groups of participants had wide and overlapping CIs indicating that there were no statistically significant differences in outcomes between treatment arms.
Use of change and follow-up scores to evaluate treatment outcome
In the 2-year report from the KANON trial, the two treatment groups improved with 39.2 and 39.4 points, respectively, in the primary outcome KOOS4.11 An improvement of 39 points on a 0–100 scale is substantial but difficult to interpret in a clinically meaningful way. Applying MIC thresholds added the information that over 90% of the participants in the two treatment groups improved enough for them to regard it as important.
However, applying MIC thresholds to evaluate treatment success may overestimate the success rate in a clinical trial. Our findings in the KANON trial confirm previous studies in other patient groups showing that achieving a minimal (and in our case also specified as important) change is not equivalent to experiencing a satisfactory outcome.14 Ninety per cent or more of patients in the KANON trial improved importantly for a variety of patient-reported outcome domains after treatment for an ACL injury, with no difference between treatment groups. This high proportion of importantly improved patients is not in line with clinical experience and the collective knowledge about outcomes following treatment for an acute ACL injury. For example, in the KANON study, and in available meta-analyses of observational studies, only 40%–55% were able to return to sport.11 12 15 We suggest that achieving the PASS, or ‘feeling good’ at 2 years, which every other trial participant did for the primary outcome, may better align with other measures of success, such as return to sport.
Studies, including on patients having had reconstructive ACL surgery,7 have consistently shown that the correlation of anchor question responses (used to define the MIC) is higher with post-treatment scores than with change scores, supporting the notion that post-treatment status is more important to patients than the score improvement from before to after treatment. Suggested explanatory mechanisms are recall bias and response shift. Recall bias implies that patients simply do not recall their pretreatment status correctly16 and response shift implies that patients change their internal standards for how they rate their health state during the course of treatment.17 Patients thus seem to regard their current status as more important than the change and we therefore argue that applying responder analysis with a PASS criterion may represent a complementary patient-centred and clinically valid approach to interpretation of clinical trials.18
Patients with a poor PROM score are rarely highlighted in sports medicine and orthopaedic studies. Traditionally, outcomes in orthopaedic studies were categorised as poor, fair, good and excellent. These categories were usually assigned by the treating surgeon or by using arbitrary cut-offs applied to clinical scores developed without input from patients. We believe that applying the same rigour in identifying patients who think their treatment has failed as we do to identify those who are responders to treatment is helpful and will facilitate a more balanced discussion between clinicians and patients about benefits and harms of the available treatment choices.
We found that almost 1 out of 10 patients, irrespective of treatment strategy, reported KOOS4 scores at 2 years’ follow-up to be so unsatisfactory that they thought the treatment had failed them. This rate of self-reported TF is twice that of the 3%–4% occurrence of graft rupture and subsequent revision surgery reported in the Scandinavian ligament registries.19 This statistic—almost 10% TF—should be included in the shared clinical decision-making process.
Strengths and limitations
Strengths of this study include using data from a large national registry population that underwent ACLR7 9 and applying an MIC threshold that was determined using predictive modelling. This method is more robust than the receiver operating characteristic method or the mean score change method.13 It is a limitation that neither MIC nor the PASS or TF responder criteria have been established in a group treated with exercise therapy alone. In theory, the patient expectations of non-surgical treatment may be lower than for an invasive, expensive surgical treatment associated with potentially serious side effects. If this were the case, the proportions of participants achieving the responder criteria thresholds in the exercise therapy alone arm in this study have been underestimated.
Specific strengths of the KANON trial are the randomised design—which allows for longitudinal comparison between treatment arms with minimal risk for confounding—and that no patient was lost to follow-up at 2 years. A limitation is the relatively small sample size. All other aspects being equal, responder analyses where patients are categorised into two groups require a larger sample size than comparing change in mean scores between treatment groups. The KANON trial was powered to detect a 10-point between-group difference in mean change score of the primary outcome but not necessarily to detect significant differences between patients categorised into the type of responder groups we investigated in this study. We therefore caution against overinterpreting the difference of 5% (representing three patients) between the two treatment arms reporting a KOOS QOL score low enough to correspond to TF.
Future directions, implications for clinic and research
We found that applying change score (MIC) and postintervention (PASS, TF) responder criteria improves clinical interpretation of results from the KANON trial, and we suggest presenting such responder analyses would also be beneficial in other studies and settings.
The results from this randomised study, especially in terms of ‘feeling well’ and TF rates at 2 years, can be implemented in digital tools and clinical consultations about shared decision-making with people who have sustained a traumatic knee injury involving the ACL. Importantly, additional information, such as sick leave required, number of exercise therapy sessions required, adverse events, and so on, associated with the different available treatment options, is also required in fully functional shared decision-making tools. Finally, the well-informed patient’s preferences should be taken into account when arriving at a treatment decision.
A general limitation to applying responder criteria is that the thresholds need to be determined in a population similar to the population under study. To enhance interpretation of KOOS and other PROM data in the future, we suggest that the seven KOOS anchor questions listed in table 1 (adapted as necessary for other PROMs and settings) be added to surgical and other databases and registries where large numbers of PROM data are collected. The seven items should be added for a period long enough to collect data from several hundreds of patients to be able to calculate MIC, PASS and TF values. It should be noted that calculation of these thresholds is complex and requires predictive modelling.7 9
Applying change and follow-up responder criteria makes interpretation of patient-reported outcomes more meaningful compared with reporting change or absolute group mean scores only. We suggest applying MIC, PASS and TF thresholds will allow clinicians and patients to better interpret KOOS and other PROM data. The findings from this study can improve shared decision-making processes for people with an acute ACL injury.
What are the findings?
Applying change and cross-sectional follow-up responder criteria (eg, minimal important change (MIC), patient acceptable symptom state (PASS) and treatment failure (TF)) made Knee injury and Osteoarthritis Outcome Score data more meaningful compared with reporting change or absolute group mean scores only.
At 2 years following an ACL tear and regardless of initial treatment strategy (exercise therapy and early reconstructive ACL surgery, or exercise therapy only with the option of reconstructive ACL surgery later), 9 out of 10 patients were improved (MIC) but only 5 out of 10 were satisfied (PASS) and 1 out of 10 felt the treatment had failed (TF).
How might it impact on clinical practice in the future?
Applying MIC, PASS and TF thresholds enhances interpretation of patient-reported outcome measure data in clinical trials.
Findings from this and similar studies can improve shared decision-making processes for people with ACL injury and a variety of musculoskeletal injuries.
Contributors EMR conceived and designed this exploratory analysis, and wrote the first draft of the manuscript. EB conducted the analyses, and all coauthors contributed to the interpretation thereof. All authors contributed in revising the manuscript and gave their final approval of the submitted version.
Funding The KANON study received funding from the Swedish Research Council (RBF, LSL, EMR), Medical Faculty of Lund University (RBF, LSL, EMR), Region Skåne (LSL, RBF, EMR), Thelma Zoegas Fund (RBF), Stig & Ragna Gorthon Research Foundation (RBF), Swedish National Centre for Research in Sports (LSL, RBF), Crafoord Foundation (RBF), Tore Nilsson Research Fund (RBF) and Pfizer Global Research (LSL). EMR is the developer of Knee injury and Osteoarthritis Outcome Score (KOOS) and several other freely available patient-reported outcome measures.
Competing interests None declared.
Patient consent for publication Obtained.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No data are available.