Article Text

Comparative effectiveness of treatments for patellofemoral pain: a living systematic review with network meta-analysis
  1. Marinus Winters1,
  2. Sinéad Holden1,2,
  3. Carolina Bryne Lura1,
  4. Nicky J Welton3,
  5. Deborah M Caldwell3,
  6. Bill T Vicenzino4,
  7. Adam Weir5,6,7,
  8. Michael Skovdal Rathleff1,2
  1. 1 Centre for General Practice at Aalborg University, Aalborg, Denmark
  2. 2 SMI, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
  3. 3 Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
  4. 4 School of Health and Rehabilitation Sciences: Physiotherapy, University of Queensland, Brisbane, Queensland, Australia
  5. 5 Sports Groin Pain Centre, Aspetar Orthopaedic and Sports Medicine Hospital, Doha, Qatar
  6. 6 Sports Medicine and Exercise Clinic Haarlem (SBK), Haarlem, Netherlands
  7. 7 Department of Orthopaedics, Erasmus MC University Medical Center for Groin Injuries, Rotterdam, Netherlands
  1. Correspondence to Dr Marinus Winters, Center for General Practice at Aalborg University, Aalborg, Denmark; marinuswinters{at}


Objective To investigate the comparative effectiveness of all treatments for patellofemoral pain (PFP).

Design Living systematic review with network meta-analysis (NMA).

Data sources Sensitive search in seven databases, three grey literature resources and four trial registers.

Eligibility criteria Randomised controlled trials evaluating any treatment for PFP with outcomes ‘any improvement’, and pain intensity.

Data extraction Two reviewers independently extracted data and assessed risk of bias with Risk of Bias Tool V.2. We used Grading of Recommendations, Assessment, Development and Evaluation to appraise the strength of the evidence.

Primary outcome measure ‘Any improvement’ measured with a Global Rating of Change Scale.

Results Twenty-two trials (with forty-eight treatment arms) were included, of which approximately 10 (45%) were at high risk of bias for the primary outcome. Most comparisons had a low to very low strength of the evidence. All treatments were better than wait and see for any improvement at 3 months (education (OR 9.6, 95% credible interval (CrI): 2.2 to 48.8); exercise (OR 13.0, 95% CrI: 2.4 to 83.5); education+orthosis (OR 16.5, 95% CrI: 4.9 to 65.8); education+exercise+patellar taping/mobilisations (OR 25.2, 95% CrI: 5.7 to 130.3) and education+exercise+patellar taping/mobilisations+orthosis (OR 38.8, 95% CrI: 7.3 to 236.9)). Education+exercise+patellar taping/mobilisations, with (OR 4.0, 95% CrI: 1.5 to 11.8) or without orthosis (OR 2.6, 95% CrI: 1.7 to 4.2), were superior to education alone. At 12 months, education or education+any combination yielded similar improvement rates.

Summary/conclusion Education combined with a physical treatment (exercise, orthoses or patellar taping/mobilisation) is most likely to be effective at 3 months. At 12 months, education appears comparable to education with a physical treatment. There was insufficient evidence to recommend a specific type of physical treatment over another. All treatments in our NMA were superior to wait and see at 3 months, and we recommend avoiding a wait-and-see approach.

PROSPERO registeration number PROSPERO registration CRD42018079502.

  • knee
  • sports and exercise medicine

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Musculoskeletal disorders are one of the largest global contributors to years lived with disability, and costly to society.1 Patellofemoral Pain (PFP) is one of the most common knee complaints in individuals between 10 and 50 years of age.2 PFP impacts function, ability to participate in leisure time activities, work, sport and reduces quality of life.3 It is a clinical diagnosis made when patients present with pain around or behind the patella during daily activities such as stair walking, squatting or running. Similar to low back pain, PFP is characterised by a high degree of persistency and recurrence of symptoms. Nearly 40% of those with PFP continue to experience symptoms after 2 years, which is associated with frequent use of pain killers, lowering of physical activity levels and low quality of life.4 5

Many different treatments are used in clinical practice to help patients with PFP.6 While there are several systematic reviews evaluating treatments for PFP,7–9 the comparative effectiveness of all available treatments has never been examined. This makes deciding on the most appropriate treatment challenging and may explain the variation in clinical practice.6 In this study, we use network meta-analysis (NMA), a technique which allows the simultaneous comparison of multiple interventions in a single coherent analysis. Traditional systematic review quickly become outdated.10 Living systematic reviews are continuously updated and incorporate new evidence when available.11 A living systemic review with NMA would enable clinicians to consult a contemporary, comprehensive overview of the comparative effectiveness of treatments for a given condition. This living systematic review with NMA evaluates the comparative effectiveness of all available treatments for patients with PFP, providing a comprehensive and up-to-date overview of evidence-based treatments.


Protocol registration

The living systematic review with NMA was prospectively registered on PROSPERO and a full protocol was published.12 The findings are reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist extension for NMA.13

Deviations from protocol

Deviations from the protocol relate to including a minimum number per treatment arm (>10) (to avoid resource intensive review work for very little gain), exclusion of adverse effects as an outcome (due to a lack of available data) and removal of threshold analyses (due to substantial overlap in credible intervals (CrIs)). Deviations are outlined in more detail in online supplemental appendix 1.

Supplemental material

Administration, dissemination and updating the living systematic review

This review will be administered at the Center for General Practice at Aalborg University, Denmark, and we plan to update the NMA annually for a minimum of 5 years. As described in our protocol, we will screen the literature annually to identify new data that may alter our conclusions and recommendations. When new data have become available, we will update the analysis and present the updated findings at the website of Aalborg University ( Here, we will also provide a plain-language summary for patients and clinicians dealing with PFP.

Patient and public involvement

Seven patients with PFP formed a patient reference group to select the hierarchy of outcomes (Global Rating of Change (GROC) Scale and pain scales). One researcher not involved in the study (acknowledgements) explained the various outcomes and participants indicated what they considered the most relevant instrument. Six of the seven indicated a preference for the GROC Scale. Consequently, the outcomes were prioritised as follows:

Primary outcome measure

Any improvement on a GROC Scale, a commonly used outcome with excellent reliability, intraclass correlation coefficients (ICCs) range from 0.90 to 0.99.14 15 Any improvement is defined as any15 degree of recovery or improvement. ‘Unchanged’ to ‘worse than ever’ are considered a treatment failure.

Secondary outcome measures

Pain intensity measured by ‘worst pain in the past week’ on a Visual Analogue Scale (VAS; 0–10/0–100) or Numerical Rating Pain Scale (NRS; 0–10/0–100). The reliability is excellent, ICC=0.76.15 16

Patient-rated pain during specific activities of daily life and during sporting activities measured16 on a VAS or NRS as outlined above. Reliability for pain during activity is excellent, ICC=0.83.

Patient-reported outcome measures were not included as they have not been used for PFP until very recently.17 These may be included in future updates.

Research question

Based on the main purpose of the study and the input from our patient reference group, we formulated the following research question: Which treatment(s)/treatment category (class(es)) is most likely to be effective for PFP on any improvement and patient-rated pain?

Eligibility criteria

Type of studies

Published or unpublished randomised controlled trials (RCTs) (including randomisation through minimisation, or clustering) were eligible for inclusion providing a full-text report was available.

Type of population

All patients with a clinical diagnosis of PFP were included. Studies were included if they used synonyms for PFP, but as minimum criterion, described patients with retropatellar or peripatellar pain, of at least 6-week duration, and a non-traumatic onset. The diagnostic criteria used in the original studies were followed, given that pain was described as being retropatellar or peripatellar pain. Studies examining other conditions were excluded (eg, patellar dislocations, patellofemoral osteoarthrosis, patellar tendinopathy, Osgood-Schlatter, iliotibial band syndrome, Sinding-Larsen-Johansson syndrome). Trials that included participants diagnosed with PFP, but with concomitant pain around the patella caused by other conditions (eg, patellar tendinopathy), were eligible for inclusion. No age restrictions were imposed.

Type of treatments and control treatments

Any conservative or invasive treatment, control treatment, placebo, wait-and-see or no treatment group studied were eligible for inclusion.

Type of outcomes

Studies assessing the treatment effect after a minimum of 6 weeks were included. Studies assessing the primary and secondary outcomes above were included (GROC Scale, worst pain intensity during the previous week and pain during activities).

Search strategy

We developed a sensitive search strategy which included a mix of indexed and free-text terms (see Winters et al 12). No restrictions (eg, language or full-text availability) were applied. We searched conventional databases, grey literature databases and trial registers. The following sources were searched from their date of inception up till 4 June 2019: Embase, PubMed (including MEDLINE), Cochrane Central Register of Controlled Trials, Scopus, Web of Science, CINAHL, SPORTDiscus. and were searched for unpublished studies and conference proceedings were identified from all Patellofemoral Research Retreats (2009, 2011, 2013, 2015 and 2017). For unpublished or ongoing studies, we searched the WHO International Clinical Trials Registry Platform ( Clinical, The European Union Clinical Trials Register and the ISRCTN registry. Finally, we screened reference lists of all Cochrane reviews (N=6) on PFP and the reports included in this review for possible relevant studies that were not identified by our search.

Study selection

Two researchers (MW and CBL) screened titles and abstracts independently after duplicate removal. Consensus was sought in cases of initial disagreement. If consensus could not be reached, the report was included for full-text evaluation. Both investigators independently applied our inclusion and exclusion criteria to the full-text reports. In case of disagreement, consensus was sought; however, if disagreement persisted a third author (MSR or AW) arbitrated the decision.

We used Covidence (Melbourne, Australia) for independent study selection, data extraction and risk of bias assessment.

Data extraction

Data were extracted by two researchers using standardised extraction forms adapted from the Cochrane Collaboration.18 Any disagreements were resolved by consensus. We extracted the following data:

  • Publication and study details: For example, authors, year of publication, funding source, possible conflicts of interest, study aim, design and unit of allocation.

  • Population: Number of patients included, population characteristics for age, sex, body mass index, activity level, setting where population was recruited, baseline scores for outcome measures (mean, SDs, standard errors extracted for continuous outcomes and number and percentage for categorical outcomes).

  • Eligibility criteria and diagnostic criteria used for PFP.

  • Treatments: For example, number randomised to group, detailed description of for example, application, dose, intensity, frequency, number of sessions, delivery, tailoring (individual/group), duration of treatment, providers, cotreatments, modification (change to treatment), adherence. We used items from the Template for Intervention Description and Replication checklist12 19 to assure comprehensive data extraction in this section of the extraction form.17

  • Outcomes: Timepoints measured, and the timepoints reported on, outcome definition, person measuring, unit of measurement, scales (upper and lower limits), imputation of missing data, primary and secondary outcomes used in the original trials, unintentional outcomes (eg, adverse events, adverse effects, side effects).

  • Data and analysis: Comparisons, outcomes, subgroups, timepoints, results (central estimates and measures of dispersion; eg, mean for both groups, mean difference (MD), SDs/95 CIs/standard errors), number of missing patients, statistical methods used and appropriateness of these.

  • Other information: Key conclusions of study authors.

Risk of bias assessment

The Cochrane Risk of Bias Tool V.2 was used to assess the risk of bias for each outcome per study. We assessed bias following the ‘intention-to-treat’ principle (ie, assignment to intervention).20 This tool has a fixed set of items to use for the risk of bias appraisal, that is, ‘bias arising from the randomisation process’, ‘bias due to deviations from intended interventions’, ‘bias due to missing outcome data’, ‘bias in measurement of the outcome’, ‘bias in selection of the reported result’ and overall risk of bias judgement for each outcome.

Pairs of two (MW and CBL; MW and AW) independently assessed all RCTs included. Each major domain of bias was appraised in light of each included study outcome. The tool’s signalling questions and criteria were followed to inform a domain-based appraisal of the risk of bias. The risk of distortion of the outcome estimate was appraised as at ‘low’, ‘some’ or ‘high’ risk of bias. We made judgements regarding the direction of distortion ‘favours experimental’, ‘favours comparator’, ‘towards null’, ‘away from null’ or ‘unpredictable’. Each outcome within a study received an overall risk of bias judgement based on the individual domains; ‘low’, ‘some’ or ‘high’ risk of bias, following the guidance from the tool. In case of disagreements between reviewers, consensus was reached through discussions in all cases.

Data synthesis and statistical methods

We constructed network plots using Stata software (StataCorp. V.2017. Stata Statistical Software:118 Release 15. College Station, Texas: StataCorp LLC) to visualise all head-to-head comparisons for all outcomes.21 Networks of treatment comparisons were constructed for the primary and secondary outcome separately. Three authors (MW, SH, MSR) appraised the clinical homogeneity before the start of the analysis, by tabulating study and population characteristics and inspecting them for differences in potential effect modifiers. This informed assessment of the assumption of exchangeability required for NMA. Treatments were also assigned to categories (ie, classes) (table 1).

Table 1

Treatments and treatment categories (ie, classes) for the ‘any improvement’

Our primary outcome measure, any improvement, was summarised using an OR for improvement versus non-improvement, with a 95% CrI. This was in line with our protocol, and done because various GROC Scales, with different number of response options and descriptors (eg, improvement vs recovery), did not allow for data to be analysed in a proportional odds regression model. We used a conservative intention-to-treat approach to the analyses where missing data in original studies was handled as a treatment failure.

For our secondary outcome measures, worst pain and pain descending stairs intervention effects were expressed as MDs, with 95% CrIs, when outcomes were measured with the same instrument. Pain measured on a 0–100 scale was converted to 0–10 where necessary. Mean and median ranks and 95% credibility limits were used to estimate the likelihood of individual treatments being superior to the other treatments for the individual with PFP.

NMA models were fitted for both treatment categories (ie, classes’; overarching treatment categories outlined in table 1), and individual treatments. For all analyses, we fitted fixed and random effects NMA models22 at the treatment level, and compared model fit using the deviance information criterion and posterior mean residual deviance (lower measures of both statistics are preferred, with differences of 3 or more considered meaningful). In accordance with our protocol, we grouped outcome follow-ups based on the available data, that is, 3 and 12 months. If there were multiple timepoints available for an outcome, and these were equally close to the timepoint to be synthesised across studies, the last follow-up in this timeframe was used. For the class-level models, we fitted hierarchical (ie, random) and fixed effect between treatment within-class models and compared model fit as described above. For the secondary outcome measures worst pain and pain descending stairs, we also attempted to fit bivariate models to capture the correlation in treatment effects on these related outcomes.

If >10 studies were available per comparison, we assessed statistical heterogeneity by inspecting the between-study SD, and by comparing fit of the fixed and random effect models. We assessed the consistency assumption for each network by comparing model fit between the NMA model and an unrelated mean effects model that relaxes the consistency assumption.23 We assessed small study bias using comparison-adjusted funnel plots if 10 or more trials were available for 1 comparison.24 We ran a sensitivity analysis to test if our findings were robust for our decision to pool the study arms (education vs education+exercise) in van Linschoten et al 25 with the study arms in Collins et al 26 and Rathleff et al 27 (both education vs education+exercise+patellar treatments).

NMA models were fitted in a Bayesian framework using Markov chain Monte Carlo simulations in WinBUGS (V.1.4, Medical Research Council, UK, and Imperial College of Science, Technology and Medicine, University of Cambridge, UK). A Bayesian analysis estimates a posterior distribution which we summarise with the mean or median, and 95% CrIs. 95% CrIs are interpreted as a range of values within which a parameter lies with a 95% chance. CrIs are very similar to CIs in frequentist analyses but may be wider for random effects models due to Bayesian analyses incorporating uncertainty around the between-studies SD.

Certainty of the evidence (Grading of Recommendations, Assessment, Development and Evaluation (GRADE) approach)

After the analysis, we used the GRADE approach to assess the certainty of the evidence from the NMA.28 29 Two researchers graded (MW and AW, see acknowledgements) the evidence on the basis of the risk of bias assessments, inconsistency, indirectness, imprecision and publication bias. We assigned an overall judgement about the certainty of the evidence for all comparisons, and for the evidence from the NMA as a whole.


Selection process

We included 22 RCTs. figure 1 details the search and selection process. Online supplemental appendix 2 shows the studies excluded+the reasons for exclusion. Thirty-three trials were found through register searches; sixteen were completed, nine were ongoing and eight trials’ status was unknown (online supplemental appendices 3 and 4).

Figure 1

Flow diagram, search on 4 June 2019. PFP, patellofemoral pain; RCTs, randomised controlled trials.

Characteristics of the studies included

Twenty-one treatments were investigated in twenty-two trials.25–27 30–48 Forty-eight treatment arms were investigated, with sample sizes from 10 to 109 per treatment arm. Thirty-six (75%) of arms included exercise. Other common intervention categories included patient education, orthotics and wait and see (or a combination). For full details see table 1 and online supplemental appendix 5. Eleven RCTs used a GROC Scale to measure patient-reported outcomes, and nineteen trials used a worst pain scale or measured pain during a specific activity. Cumulatively, 1472 patients with PFP were included. Online supplemental appendix 5 details all study and patient characteristics of the studies included.

Risk of bias and certainty of the evidence

The majority of outcomes were at high risk of bias (79%), with some concerns in 21% (online supplemental appendix 6, tables 1 and 2a,b). Outcome measurement (45%), deviations from the intended intervention (45%), missing outcome data (32%) and reporting (23%) were the most common sources of bias.

Table 2a

Treatment effects for the primary outcome: Comparative treatment class effects for any improvement at 3 months (fixed effects model with random between treatment within class effect)

Table 2b

Treatment effects for the primary outcome: Comparative treatment effects for any improvement at 12 months (fixed effects model without class effect)

The certainty of evidence for all comparisons was low to very low, except for hip/knee exercises versus education+orthosis, for which there was moderate evidence (online supplemental appendix 7). Evidence for comparisons was rated down for risk of bias (n=100, 100%; ie, ‘serious’ to ‘very serious’) and imprecision (n=85, 85%).

Network meta-analyses

Figure 2 shows direct treatment comparisons in the field of PFP for any improvement. Nine studies could be included in the NMA at 3 months, and three studies in the NMA at 12 months. Six classes were included in network analyses for the primary outcome (table 1). Model fit statistics are reported in online supplemental appendix 8. On the basis of model fit, we report results for the fixed effect model with random between treatment within class effects for any improvement at 3 months. We report results for a fixed effect treatment model without class for any improvement at 12 months, as it was not possible to conduct class-level analyses for any outcomes at 12 months, or for pain while descending stairs at 3 months, as only single treatments were available. Estimated individual treatment effects for ‘any improvement’, ‘worst pain’ and ‘pain while descending stairs’ are presented in online supplemental appendix 9. No other measures for pain during an activity could be synthesised in an NMA. None of the networks showed evidence of inconsistency between direct and indirect comparisons, where data from direct and indirect evidence were available. Too few trials were available to investigate heterogeneity, or small study bias. Six studies that could not be included in an NMA are presented in a descriptive synthesis in online supplemental appendix 10; among the six there were two studies that could have been connected to the network if data were available.

Figure 2

Network graphs for direct treatment comparisons for any improvement at 3 months (A) and 12 months (B). Blue text represents the number of treatment comparisons, and the text in black represents the number of participant that received the respective treatment. The thickness of the lines and the size of the dots are proportional to the number of trial comparisons and the number of participants in the treatment arms, respectively.

Online supplemental appendix 11 provides the findings for the sensitivity analysis. It shows that our main findings are robust for our decision to pool the education+exercise arm in van Linschoten et al 25 with the education+exercise+patellar treatments arms in Collins et al 26 and Rathleff et al.27

Comparative treatment (class) effectiveness on the primary outcome (‘any improvement’)

Any improvement at 3 months

The fixed effect model with hierarchical (ie, random) effect between treatments within the class showed that all treatment classes were superior to wait and see: education (OR 9.6, 95% CrI 2.2 to 48.8), exercise (OR 13.0, 95% CrI: 2.4 to 83.5), education+orthosis (16.5, 95% CrI: 4.9 to 65.8), education+exercise+patellar taping/mobilisations (25.2, 95% CrI: 5.7 to 130.3) and education+exercise+patellar taping/mobilisations+orthosis (38.8, 95% CrI: 7.3 to 236.9) (table 2a).

Education+exercise+patellar taping/mobilisations, with or without orthosis, was superior to education alone (2.6, 95% CrI: 1.7 to 4.2, and, 4.0, 95% CrI: 1.5 to 11.8, respectively). Neither exercise (prescribed on its own) nor orthosis+education was superior to education alone (1.3, 95% CrI: 0.3 to 6.3, and, 1.7, 95% CrI: 0.8 to 4.0, respectively). No specific type of exercise was superior to another type of exercise (online supplemental appendix 9).

Any improvement at 12 months

At 12 months, four treatment arms were compared in a fixed effects treatment model. No differences were found for education+exercise+patellar taping/mobilisations (OR 1.5, 95% CrI: 0.9 to 2.4), education+orthosis (2.3, 95% CrI: 1.0 to 6.2) or education+exercise+patellar taping/mobilisations+orthosis (1.9, 95% CrI: 0.8 to 4.9) when compared with education alone (table 2b).

Treatment (class) rankings ‘any improvement’

Education+exercise+patellar taping/mobilisations, either with or without orthosis, was the best combination of treatments for PFP at 3 months (median ranking 1, median’s 95% CrI: 1 to 3; and, median ranking 2, 95% CrI: 1 to 4, respectively). Wait and see was least likely to be effective (median ranking 6, 95% CrI: 6 to 6) (table 3).

Table 3

Treatment rankings from the network meta-analyses for any improvement.

At 12 months, education+exercise+patellar taping/mobilisations, either with (median ranking 2, 95% CrI: 1 to 4) or without (median ranking 3, 95% CrI: 1 to 4) orthosis, and education+orthosis (median ranking 1, 95% CrI: 1 to 4) showed similar rankings. Education alone seemed least likely to be effective (median ranking 4, 95% CrI: 3 to 4).

Comparative treatment effectiveness on the secondary outcomes

Comparative treatment effects for worst pain at 3 months, as estimated by random effects model with random between treatments within class effect, are reported in online supplemental appendix. We report on the results from a fixed effects model without class effect for worst pain at 12 months. The pooled findings for ‘worst pain’ at 3 months show that none of the treatments was found to be superior to any other treatment or to wait and see. At 12 months, education+exercise+patellar taping/mobilisations appears superior to education alone. Education+exercise+patellar taping/mobilisations+orthosis appears better than education alone but was not found to be superior to education+exercise+patellar taping/mobilisations.

For pain while descending stairs, three treatments could be compared. All analyses were performed using fixed effects models without class effects. At 3 months, an exercise programme including hip, knee and trunk exercises was superior to hip and knee exercises alone and to a programme including ‘minimal’ hip/knee exercises. No difference was found between minimal hip/knee exercises and usual hip/knee exercises. At 12 months, hip, knee and trunk exercises were superior to a combination of hip/knee exercises and arthroscopy, and also superior to hip/knee exercises alone. No difference was found between hip/knee exercises+arthroscopy or hip/knee exercises alone.


This living systematic review included 22 RCTs with a total of 1472 patients with PFP for more than 6 weeks. All treatments were superior to a wait-and-see approach for the primary outcome, any improvement, at 3 months. Patient education combined with a physical treatment appears most effective in the short term. After 12 months, education or education+additional exercises, orthosis or patellar treatment yielded similar improvement rates. However, no studies included wait and see for any improvement at 12 months precluding comparisons to determine if this was also inferior in the longer term. For the secondary outcome, pain intensity during previous week, no treatment was superior to wait and see.

Guidelines for musculoskeletal pain often recommend a wait-and-see approach in general practice.49 Based on this NMA, wait and see is the least effective treatment available. Clinicians should, therefore, avoid a wait-and-see approach. Clinicians can consider a minimum of patient education at the first consultation and potentially add exercise and patellar taping/mobilisations if the patient and clinician agree on the time requirements and benefit. Costs may be another aspect to consider when offering add-on treatments. If additional treatments such as exercise and patellar taping/mobilisations have limited persisting benefit in the long term, these treatments may not represent value for money. Trials should include cost evaluations in the future.

One of the studies not included in NMA demonstrated that among runners, patient education on load management may be equally as effective as patient education and exercise therapy or patient education and gait retraining. These findings are in contrast to the findings from the NMA that suggest education may be inferior to education+physical treatments at 3 months. Future trials may need to explore which types of education may be most effective, and how it can be improved even further as a low cost and scalable intervention. Education varied in terms of how it was delivered. Education in all trials included information about PFP, information on pain and guidance on how to manage activity in the context of pain, without stopping all exercise. This also has important implications for research, as many studies include some educational material as a control, with few available studies using a true control or placebo control. The ‘wait-and-see’ arms often also included patients continuing with their planned treatments and therefore may not reflect a true no treatment control. Despite all treatments being superior to wait and see for any improvement, there was no superior effect of active treatments compared with wait and see on pain intensity.

If a resistance exercise programme is chosen, the specific type of exercises prescribed may not be important. They mainly comprise strengthening exercises for the hip, the knee or both the hip and knee. This gives clinicians the opportunity to devise an individual prescription together with their patient. The additional benefit of adding patellar taping/mobilisations to an exercise programme is presently unknown. Shared decision making between the patient and their healthcare practitioner should guide administering the addition of a physical treatment to education. It appears that foot orthoses do not have an additive effect to education and exercise so clinicians should consider this when selecting adjuncts to education and exercise. On the other hand, orthosis and education combined had a similar effect as the combination of exercise, education and patellar treatments. This highlights two different treatment approaches with different time requirement, which appear equally effective.

Strengths and weaknesses in relation to previous studies

Previous systematic reviews and RCTs on the management of PFP were restricted to traditional comparisons of one treatment versus another, for example, patient education versus exercise. Similar to a Cochrane review,7 we found exercise to be superior to wait and see on any improvement, but not on pain. However, our NMA further suggests that combing exercise, (with or without foot orthoses) to education may be most effective for 3-month improvements. This is in keeping with a recent consensus statement from experts in the field of PFP that recommends exercise, orthosis, manual therapy and combined treatments in the management of PFP.50 Our quantitative NMA supports the view that combined treatment is associated with the best outcomes, but that orthoses may not improve outcomes when added to education, exercises and patellar taping/mobilisations. Our living NMA will be updated when new evidence becomes available, ensuring a contemporary overview of the evidence for the best treatment of PFP for patients and clinicians dealing with the condition.

Strengths and weaknesses of this study

There was an overall lack of high-quality, large studies, with only one study including>100 subjects per trial arm.43 Small trials with varying quality predominate in the field of musculoskeletal health. To ensure, we included credible research to inform the NMA we only included research where (1) patients had the condition for ≥6 weeks, and (2) studies included relevant patient-rated outcomes and follow-ups (≥6 weeks). As a result, we excluded 128 RCTs. While the comparisons in those studies could add value to the body of knowledge, these were not evaluated by the patient—which is considered the gold standard of outcome measurement by WHO, and ‘essential’ by Cochrane.51 52 Consensus on which outcomes to use in RCTs through the development of a core outcome set has the potential to move the field of PFP forward. In our study, patients were involved in prioritising our outcomes making our results relevant to patients and clinical practice. We performed a comprehensive search covering published and unpublished studies, state-of-the-art risk of bias and GRADE assessments. We synthesised evidence in Bayesian hierarchical NMAs to optimally use all evidence to determine which treatment is best for PFP.

Limitations in the conclusions drawn by the NMA are primarily caused by the original data. Overall evidence was graded as very low to low. There was no evidence of heterogeneity between studies with fixed effect models preferred over random effects models based on model fit; however, this may be due to the limited number of studies available per comparison. Consistency between direct and indirect evidence could not be checked for all comparisons in the NMAs. However, where this was possible, by comparing consistency models to models that relaxed the consistency assumption, we did not find evidence of inconsistency. Treatment outcomes may be different on the basis of some characteristics (ie, effect modification). Lankhorst et al 53 suggested that sex and symptom duration may be effect modifiers in the relationship between exercise and function at 3 months, but no significant association was found. As there is no good evidence for any potential effect modifier, we did not include potential effect modifiers when planning the synthesis. This is a limitation that can be overcome in the future when new evidence for effect modifiers becomes available. Studies should break down their results on the basis of potential effect modifiers, such as sex and duration of symptoms, which would allow exploring modification effects in the future.

Treatments provided for PFP, as with any musculoskeletal condition, are diverse. For example, the mode of delivery, frequency and intensity may be different for various exercise regimes that we pooled together. This is a limitation; when more studies are done the effect of delivery, frequency and intensity of exercise could be formerly tested. We did subgroup exercise regimes per exercise region (eg, hip/knee or hip/knee/trunk exercises). The NMA suggests that it may not matter in which region exercises are performed. We ran a sensitivity analysis to test if our findings were robust for our decision to pool the study arms (education vs education+exercise) in van Linschoten et al 25 with the study arms in Collins et al 26 and Rathleff et al 27 (both education vs education+exercise+patellar treatments). The sensitivity analysis showed that this decision did not affect the NMA’s comparative estimates, and second, that patellar treatments in addition to education+exercise appear not to be effective to education+exercises alone.

Publication bias could not be investigated due to a lack of studies.18 24 We found a number of trials in registries that may have remained unpublished but it is unclear if this was due to publication or small study bias. Collectively, bias and low study quality decrease the certainty of our findings. We chose a conservative statistical approach with the NMA, handling missing data as a treatment failure. This may balance out inflated effects in the original studies, or could even underestimate the comparative effects.


Education combined with a physical treatment (exercise, orthoses or patellar taping/mobilisation) is most likely to be effective. There was insufficient evidence to recommend a specific type of physical treatment, or a combination of physical treatments, over another. All treatments analysed in our NMA were superior to wait and see at 3 months, and we recommend avoiding a wait-and-see approach.

What is already known

  • Patellofemoral is a persisting condition, four in every ten patients continue to have symptoms after 2 years. It is disabling and it impacts on quality of life.

  • Many patients receive multiple treatments that impacts healthcare consumption.

  • The comparative effectiveness of all available treatments is currently unknown.

What are the new findings

  • Our living network meta-analysis (NMA) of randomised controlled trials suggests that education in combination with a physical treatment (exercise, orthoses or patellar taping/mobilisation) is most likely to be effective at 3 months.

  • At 12 months education alone is comparable to education combined with a physical treatment (exercises, orthoses or patellar taping/mobilisation).

  • All treatments analysed in our NMA were superior to wait and see, a common first-line approach currently administered by general practitioners.


We would like to thank Christian Lund Straszek, MSc, and the Research Unit for General Practice’s patellofemoral pain patient reference group for their help with developing our research questions. We thank Rebecca Mellor, PhD, for her help with extracting data, and Arco van der Vlist, MD, for his help with the Grading of Recommendations, Assessment, Development and Evaluation assessments. We thank Negar Pourbordbari, MD, Asiah Rahi Sherzaman and Hedaiat Saei for translating Persian articles to English, and Xu Wang for translating Chinese articles to English.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @marinuswinters, @Sinead_Holden, @Bill_Vicenzino

  • Contributors MW, AW and MSR came up with the study idea. MW, SH, BTV, AW and MSR designed the study. MW, NJW, DMC, SH and MSR designed the statistical analyses plan. MW constructed the search with help of a research librarian. MW and CBL performed the search and selection process. MW, CBL, MSR, AW, SH, and Rebecca Mellor performed data extraction. MW, CBL and AW performed risk of bias assessments. MW and NJW performed all data analyses. MW and Arco van der Vlist performed Grading of Recommendations, Assessment, Development and Evaluation assessments. MW, SH and MSR drafted the first version of the manuscript. MW and MSR are the study guarantors. All authors provided feedback and gave important intellectual input. All authors read and consented to the content of the article.

  • Funding The Tryg Foundation is acknowledged for provided support for this project (Grant ID: 118547). The foundation had no role in the planning, conduct or reporting of this work.

  • Competing interests NJW lead a research project in collaboration with Pfizer plc. Pfizer part-funded a junior researcher. The projects is purely methodological, using historical data on treatments for pain relief. NJW has no other conflicts. All other authors report to have no conflicts of interest.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.