FormalPara Key Points

The effect of exercise in knee and hip osteoarthritis depends on type of exercise and outcome of interest.

Aerobic and mind–body exercises appear to be the two most effective exercise therapies for pain and function, whereas strengthening and flexibility exercises appear to be good for moderate improvement of multiple outcomes.

Mixed exercise is the least effective exercise. However, it may be used for patients who do not respond to other types of exercise therapy because it is still better than no exercise control for all four patient-centred outcomes.

1 Introduction

Pain from knee and hip osteoarthritis (OA) can have a significant impact on the physical function and quality of life (QoL) of affected individuals worldwide [1]. Exercise is one of the core therapies for OA [2] to improve pain and function [3, 4]. Existing evidence indicates that the magnitude of response varies according to the type of exercise (e.g. strengthening, aerobic etc.) [5]. However, little is known about the relative efficacy between different exercises for different outcomes.

Most randomised controlled trials (RCTs) compare exercise regimens against non-exercise interventions, and direct comparisons between different exercises are uncommon. This is because a head-to-head comparison trial is very costly and it is impractical to undertake RCTs to examine the relative effects between all types of exercises. Alternatively, network meta-analysis (NMA) can indirectly compare multiple interventions through a common comparator when head-to-head RCTs are sparse or absent [6]. It utilises all available evidence in the network, both direct and indirect, to enhance the power of the estimation [7].

Previously, Uthman et al. [8] undertook a sequential analysis and NMA to examine whether there was sufficient evidence to support the use of exercise for people with lower limb OA, and whether one exercise was better than another. They found that up to 2002, sufficient evidence existed to show a significant benefit of exercise over no exercise. Strengthening exercise yielded the largest effect size for pain outcomes, whereas a combined intervention of strengthening, flexibility and aerobic exercise had the largest effect size for function. However, no performance or QoL measures were included.

In this review, we aimed to extend the work of Uthman et al. [8] by updating the evidence, expanding the outcomes to include objective performance measures and QoL, and refining the exercise classification to include mind–body exercise such as tai chi and yoga.

2 Methods

2.1 Search Strategy and Selection Criteria

This NMA is part of a larger review that included RCTs comparing all forms of exercise to non-exercise interventions, or to another exercise type. Detailed inclusion criteria for the larger review are available in our registered and published protocol (PROSPERO CRD42016033865) [9]. The specific inclusion criteria for this NMA were RCTs that (1) recruited participants with knee OA, hip OA, or mixed knee and hip OA diagnosed clinically and/or radiographically; (2) assigned exercise programmes without additional active treatment (e.g. analgesics) as the intervention; (3) assigned usual care/waiting list or a different exercise as the control group; and (4) measured at least one outcome for pain, function, objective performance or QoL.

The systematic search was conducted in December 2015 and updated in December 2017. Nine electronic databases (Allied and Complementary Medicine Database (AMED), Cochrane Central Register of Controlled Trials (CENTRAL), Cumulative Index to Nursing and Allied Health Literature (CINAHL), Excerpta Medica Database (EMBASE), MEDLINE Ovid, Physiotherapy Evidence Database (PEDro), PubMed, SPORTDiscus and Google Scholar) were searched for peer-reviewed publications without language or publication date limitations. As an example, the Medline search strategy is shown in Electronic Supplementary Material (ESM) Appendix 1. The reference lists of systematic review protocols published in Cochrane Library since 2014 were used to supplement the electronic database search. Publication of study protocols were flagged pending the full publication of the trials.

Selection of relevant studies and subsequent data extraction was undertaken by a single reviewer (SLG), with advice from a second reviewer (MH) should queries arise. A third reviewer (WZ) was involved if agreement could not be reached. Data extraction was compared between SLG and either MSMP, JS or YFH in a random sample (10%) of selected studies. Should disagreement be over 5% of the total extracted variables, the whole set of the studies would be double extracted, otherwise the single extraction was used; that is, a maximum 5% disagreement was allowed for data extraction.

2.2 Interventions

Exercises were classified into muscle strengthening, aerobic, or flexibility/neuro-motor skills training (flexibility/skill) according to the American College of Sports Medicine (ACSM) recommendation [10]. Strengthening exercises are exercises that aim to increase force of muscle contraction (e.g. lifting dumbbells, squats); aerobic exercises to improve cardiorespiratory endurance (e.g. swimming, jogging); flexibility exercises to improve joint range of motion and muscle pliability (e.g. hamstring stretch, gastrocnemius stretch); and neuromotor skills training to improve balance and coordination (e.g. wobble board, walking on foam). In addition, an exercise programme was classified as mind–body exercise if it integrated mindfulness/relaxation into physical movements (e.g. tai chi, yoga), and classified as mixed exercise when it included more than one core exercise type mentioned above, or when the authors did not specify it as a single component exercise.

‘Usual care’ control was determined based on the report. In ‘usual care’, participants were expected to continue the routine standard of care provided by their general practitioners. Control groups that were not given any specific intervention such as ‘waiting list’ or usual physical activity or where the authors did not specify the nature of the control were also classified as ‘usual care’. ‘Waiting-list’ controls were given active intervention after a period of observation, with no new intervention being delivered during the trial period.

2.3 Outcomes

Our primary outcome of interest was pain, and secondary outcomes were self-reported function, objective performance (e.g. walking speed, strength, range of motion), and QoL. The primary time point was 8 weeks after commencement of the exercise regimen or the time point nearest to this. Eight weeks was chosen because it was the most frequently reported time point. When more than one scale was presented for pain, function or QoL, the more comprehensively reported scale was selected in the ranking order proposed by Fransen and McConnell [4] and Regnaux et al. [11].

For the performance, gait and walking parameters (e.g. walking distance, walking time, etc.) were prioritised. This was because the measurement and reporting of these parameters were relatively standard across trials compared with other performance outcomes such as strength or power. Limb-specific parameters, such as strength, power, or range of motion were only used if gait parameters were not available. Strength parameters extracted were, in descending order of preference, knee extensors, knee flexors, hip abductors, and then other muscle groups. When tests performed at varying intensities were reported, the results from the highest intensity tests were chosen.

2.4 Data Analysis

The standardised mean difference of the change score (end-point minus baseline score) was used to estimate the effect size (ES). Standard deviations (SD) were imputed for trials that did not provide the SD or did not provide sufficient information to calculate the SD. The missing SD was imputed using the largest SD of the same scale reported in other trials if available, otherwise an arithmetic mean of other SDs was used [12].

A Bayesian random effects NMA model for continuous outcome data was used for the primary analysis. The WinBUGS codes were adapted from Dias et al. [13] and are provided in ESM Appendix 2. The posterior mean of the ES was reported with its 95% credibility intervals (CrI). Bayesian NMA produces simulations that allow interventions to be ranked from first to sixth. The median ranking and corresponding 95% CrI was generated alongside the pooled ES to identify the most effective exercise choice [14]. The significance of the ES hierarchical trend was assessed using meta-regression analysis [15].

Non-informative prior distributions were used and three Markov chains were run simultaneously. The initial 40,000 simulations were discarded as the burn-in period and the subsequent 120,000 simulations were used. Inspection of Gelman–Rubin tracing was performed to ensure that convergence or stabilisation of the simulations had been achieved.

Model fit, a measure of how well predictions from the model were supported by the observed data, was assessed. Consistency in the network was assessed by the node-splitting method [16] and design by treatment forest plot [17] based on frequentist analysis. The node-splitting method examines the agreement between direct and indirect comparisons. Design by treatment forest plot, on the other hand, visually demonstrates agreement between studies of different designs (e.g. whether estimation between A and C, obtained from two-arm design, is consistent with those obtained from multi-arm ABC or ACD designs). Data were processed and analysed using Microsoft Access, Excel, Stata (StataCorp. 2017. Stata Statistical Software: Release 15. College Station, TX, USA: StataCorp LLC), and WinBUGS (Version 1.4.3).

2.5 Sensitivity Analysis and Subgroup Analysis

A modified Cochrane risk of bias assessment tool was used to assess study quality. Sensitivity analyses were performed on two of the items with the highest risk of bias and also on studies for which SD had been imputed. Subgroup analyses were performed to assess the efficacy at different joints (knee OA versus hip OA) and for different patient contexts, such as participants awaiting total joint replacement (TJR) versus participants not awaiting TJR.

3 Results

From the initial 13,596 citations retrieved from the databases and 76 hand searches, we identified 239 articles (217 trials) to be eligible under the broader search strategy that included all types of non-exercise comparators including other non-pharmacological therapies or drugs (Fig. 1). Since the present NMA only considered trials comparing the five defined exercises with usual care or each other, only 103 trials (9134 participants) were included [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130]. Of these, 76 (74%) trials used usual care as the control and 27 were head-to-head comparisons. Disagreement for double extraction of data was within the acceptable limit, so predominantly single extraction was retained. The characteristics of the included trials are listed in Table 1. Pain was assessed in 89 trials (7184 participants), function in 87 trials (7153 participants), performance in 95 trials (6760 participants), and QoL in 40 trials (3190 participants) (Table 2). Preliminary assessment of funnel plots identified one outlying study for pain [112] and another for QoL [48]. Both studies showed strong positive effects (ES > 5), very different from other studies. These studies were subsequently excluded from the main analysis. Egger’s statistical test is suggestive of publication bias (p < 0.05) for all outcomes except QoL (ESM Appendix 3). Figure 2 demonstrates the network for pain, function, performance and QoL. The comparisons were most seen between strengthening versus usual care, as well as between mixed exercise versus usual care.

Fig. 1
figure 1

Study flow diagram for comparison between exercise and usual care and between different exercises. NMA network meta-analysis, RCT randomised controlled trials

Table 1 Characteristics of included studies
Table 2 Characteristics of studies by outcomes
Fig. 2
figure 2

Network of direct comparisons formed by included studies. The size of nodes and lines connecting the nodes are proportionate to the number of participants and the number of trials, respectively. Data represent number of trials (number of participants). Flex/Skills flexibility and skills or neuromuscular training

The efficacy of different exercises compared with usual care and each other is represented in Fig. 3. For pain, function and performance, all types of exercise were significantly better than usual care, the ES ranging from ES 0.4–1.1. The largest effect was observed for aerobic and mind–body exercise for pain and function. By contrast, the benefits of exercise on QoL were not as marked, with the magnitude of ES ranging from 0.2 to 0.4. Strengthening and flexibility/skill exercises had a moderate ES, whereas mixed exercise gave the minimum ES for all outcomes and was significantly less effective than aerobic or mind–body exercise for pain. The median ranking largely corresponded to the magnitude of ES shown by each exercise. Aerobic was the best-ranked exercise for pain and performance, whereas mind–body was also the best-ranked for pain and self-reported function. Strengthening and flexibility/skill generally received mid-level rankings while mixed exercise was the lowest ranked exercise, superior only to usual care (ESM Appendix 4). Meta-regression demonstrated significant trend for pain (p = 0.01) but not for three other outcomes (function, p = 0.07; performance, p = 0.06; QoL, p = 0.65), according to the effect sizes of outcome in descending order. Evidence of lack of model fit was found for pain (\(\bar{D}_{\text{res}}\): 189.3, 185 data points; deviant studies were mainly small studies), performance (\(\bar{D}_{\text{res}}\): 201.1, 194 arm-level data points; deviant study recruited younger than average patients—mean age 40 years), and QoL (\(\bar{D}_{\text{res}}\): 86.3, 81 data points; possibly due to non-homogeneous groups). The model fit for function, on the other hand, was good (\(\bar{D}_{\text{res}}\): 183.2, 182 data points). There was significant heterogeneity for all outcomes with the mean between-studies standard deviation ranging from 0.25 to 0.74. No disagreements were found between direct and indirect evidence (ESM Appendix 5) or between estimates from different study designs.

Fig. 3
figure 3

Effect size of different exercise types versus different comparators presented in standardised means difference (95% credibility interval). Flex/Skills flexibility and skills exercises, n number analysed

Physician and participant blinding was not achieved in any study (ESM Appendix 6). The risk of bias assessment for individual items per article is detailed in ESM Appendix 7. Sample size, allocation concealment and SD imputation were used for assessing the robustness of the NMA estimate. As there were only seven studies with sample size > 100/arm, we undertook a sensitivity analysis based on ≥ 30 participants/arm—a consensus of the minimum sample size for a trial [131]. The analysis as summarised in ESM Appendix 8 suggested that the results obtained are robust.

Subgroup analysis by joint confirmed the exercise benefits in knee OA for pain, self-reported function and performance, whereas substantial uncertainty for benefits was observed in hip OA. In addition, exercise appeared to be more beneficial among participants who were not awaiting TJR compared with those who were (Table 3).

Table 3 Subgroup analysis by joint and recruitment

4 Discussion

This NMA confirms that exercise is beneficial for people with knee and hip OA for outcomes of pain, function, performance and QoL. In additon, we have found (1) aerobic and mind–body exercise have the largest ES for improvements in pain and function; (2) strengthening and flexibility/skill exercises improve multiple outcomes to a varying degree; and (3) mixed exercise (more than one core type) is the least effective exercise across all outcomes and is significantly inferior to aerobic and mind–body exercise for pain.

The results of this NMA differ from the previous NMA by Uthman et al. [8] for the following possible methodological reasons. Firstly, this NMA was primarily designed to examine the relative efficacy between exercises in knee and hip OA, whereas Uthman et al. set out to examine the conclusiveness of the available evidence for exercise using trial sequential analysis. Secondly, our study included 103 trials, whereas the previous NMA included only 60. Thirdly, we used a different exercise classification. Our classification was based on the ACSM criteria [11] but included an additional mind–body exercise and a ‘mixed’ exercise category (that grouped all exercise combinations together irrespective of whether it was two or more different types of exercise). The previous review, on the other hand, examined only three types of exercise (aerobic, flexibility and strengthening) either individually or in combinations of two, or all three. Their results showed that combinations of any two types of exercise tended to have smaller ESs and lower probability of being the best, whereas when all three were combined the overall ES was considerably larger. Fourthly, the previous review used non-exercise controls, which could include other interventions (e.g. patient education, electrotherapy), whereas we used usual care with no new interventions (e.g. ‘waiting-list’ or no intervention apart from usual care/activities). Estimation performed in this way is more precise as treatment effects vary with the type of controls, even with inert agents [132]. Finally, we examined four outcomes (pain, self-reported function, observed performance and QoL), whereas the previous review examined only two (pain and function). Both reviews agree that the effect of exercise depends on the types of exercise or components of the exercise programme. Our results align with other conventional systematic reviews and meta-analyses where aerobic [133] and mind–body exercise [134] tend to have larger effect sizes than strengthening exercise, and mixed exercise tends to have the lowest effect size for pain [5]. Also in line with the literature is the smaller effect size and greater uncertainties of exercise benefits in hip compared with knee OA [4, 135], which still requires further investigation.

A novel finding from this NMA is that we were able to demonstrate that mind–body exercise had similar effects to aerobic exercise for pain. Mind–body exercise such as tai chi and yoga can be characterised as low to moderate intensity exercise performed with an intentional awareness (mindfulness) on breathing and slow controlled movement [136]. Although the underlying mechanism remains unclear, the effect of both aerobic and mind–body exercise may be attributable to the potential of these exercises to influence altered central elements such as central pain sensitisation, sleep disturbance, and mood disorders [137, 138]. Pain experience as well as level of function and QoL are the results of interactions between these central impairments and peripheral pain mechanisms [139, 140]. As aerobic and mind–body exercise could influence both central and peripheral pain mechanisms, this additive effect may explain their additional benefits over other exercises that predominantly address only joint level deficits.

There is no satisfactory biological explanation for the poor efficacy of mixed exercise across all outcomes, particularly when considering that there are many domains of physical impairment in people with OA. However, it may be that the lack of response to mixed exercise reflects flawed implementation of the programme, such that intensity of the individual components was insufficient or poorly adhered to due to the complexity of the regimen compared with a single exercise programme.

There are limitations to this NMA. A key limitation is that we were fully reliant on author descriptions for the classification of exercises and control groups. Exercise programmes and ‘usual care’ are not standardised and vary considerably between studies. Even when the focus of exercise is strength improvement, it is typical to also find some elements of flexibility and/or aerobic exercise included in the programme. As far as possible, we adhered to the classification presented by the authors. The decision to group different types of controls, such as waiting list, usual physical activity and usual care, together for the analysis is open to question. Unlike non-pharmacological treatments for mental health, where a difference between non-treatment and waiting-list controls has been observed [141], no such distinction has been reported for exercise interventions in OA. Instead, many published reports in OA extend controls to include other types of non-exercise interventions (e.g. patient education and behavioural therapy) rather than limiting them to ‘usual care’ [4, 142]. Secondly, the estimates for aerobic, mind–body and flexibility/skill exercises were open to considerable uncertainty with wide credibility intervals as the number of studies were small. However, examination of exercise rankings using different approaches (i.e. probability of the exercise being the best, highest median ranking, or magnitude of ES) showed that the estimates were generally in agreement, supporting the trend observed. Another caveat is that we did not fully explore the reasons for heterogeneity because efforts to identify covariates for exercise effect in OA have generally been unsuccessful in many meta-analyses [8, 143]. This probably requires more sophisticated analytical approaches and warrants separate reporting. Finally, the focus of the included studies was relatively short term and involved mainly single-joint OA. Therefore, we could not determine whether the observed differences between exercises would persist in the longer term or whether people with knee plus hip OA would attain similar exercise benefits.

5 Conclusions

In conclusion, this NMA confirms that exercise therapy has clear benefits for people with knee and hip OA and also shows that the magnitude of effect varies according to type of exercise and outcome of interest. Aerobic and mind–body exercises were found to be the best for pain and function, whereas strengthening and flexibility/skill exercises are potentially next best for multiple outcomes. Mixed exercise is the least effective exercise for knee and hip OA but is still superior to usual care for all outcomes and therefore remains an acceptable option for patients who do not respond well to single-component exercises. The findings of this review may help clinicians guide their prescription of exercise type with respect to treatment outcomes. Further research is warranted to confirm if the hierarchy observed are consistent across all patients with OA.