Article Text

Download PDFPDF

EULAR recommendations for knee and hip osteoarthritis: a critique of the methodology
  1. W Zhang,
  2. M Doherty
  1. Academic Rheumatology, University of Nottingham, Nottingham City Hospital, Nottingham, UK
  1. Correspondence to:
 Professor Doherty
 Academic Rheumatology, University of Nottingham, Clinical Sciences Building, Nottingham City Hospital, Nottingham NG5 1PB, UK; michael.doherty{at}


The quality of the EULAR recommendations for the management of hip and knee osteoarthritis (OA) was evaluated using a validated instrument. The quality and methods were compared with other guidelines and recommendations. EULAR recommendations were found to be among the best for overall quality. They show strengths with respect to scope, rigour of development, and clarity, but weaknesses with respect to stakeholder involvement, applicability, and editorial independence. However, a principal strength is their attempt to fill the gap between guidelines based solely on either research evidence or expert opinion. The methods used to synthesise research evidence (systematic review) and expert opinion (Delphi exercise) are robust. Strength of recommendation, based on combined consideration of research evidence, clinical expertise, and perceived patient preference, is valid and approaches the true essence of “evidence based practice” that considers each of these different forms of evidence.

  • ACR, American College of Rheumatology
  • EULAR, European League Against Rheumatism
  • NSAID, non-steroidal anti-inflammatory drug
  • OA, osteoarthritis
  • RCT, randomised controlled trial
  • SOR, strength of recommendation
  • VAS, visual analogue scale
  • hip
  • knee
  • osteoarthritis
  • guidelines

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

In recent years the European League Against Rheumatism (EULAR) Osteoarthritis (OA) Task Force has published separate recommendations for the management of knee OA (the first in 2000, with an update in 2003) and hip OA (2005).1–3 The methods used to develop these recommendations include expert consensus and systematic review of research evidence. Strength of recommendation for each specific key statement was also provided.

The recommendations have been widely disseminated in Europe and other parts of the world,4,5 and the methodology used by EULAR has been adapted by other guideline development teams.6,7 The EULAR recommendations for knee OA (2000) and the American College of Rheumatology (ACR) recommendations for both hip and knee OA (2000) were compared in 20038 and reasonable agreement was found (table 1). In the same year as this comparison, EULAR published updated recommendations for knee OA with two major changes: (a) the order of the recommendations was optimised according to topic—that is, general, non-pharmacological, pharmacological, and surgical treatments (table 2); (b) new evidence was added to support the use of knee bracing, topical capsaicin, cyclo-oxygenase (COX)-2 selective inhibitors, gastroprotective agents, and opioids. However, unlike ACR, EULAR decided to maintain separation between recommendations for knee and hip OA and subsequently developed a parallel and independent set of recommendations for hip OA.3 The rationale for this decision reflected the following multiple differences between hip and knee OA.

Table 1

 Interventions evaluated by American College of Rheumatology (ACR) and European League Against Rheumatism (EULAR) guidelines

Table 2

 Final set of 10 recommendations based on both evidence and expert opinion

  • Differences in anatomy and physiology: the hip is a very stable ball and socket joint with multiple planes of movement, whereas the knee is a modified hinge with one plane of movement and is prone to developing instability.

  • Different risk factors for development of OA—for example, knee OA is female predominant, and obesity is an important risk factor, whereas hip OA is male predominant at younger ages but female predominant in older age, and obesity is only a modest risk factor.

  • Differences in treatment applicability—for example, topical non-steroidal anti-inflammatory drugs (NSAIDs), mechanical bracing, and intra-articular injections are more suited to knee OA.

  • Possible differences in response to the same treatment—for example, NSAIDs produce greater pain reduction in patients with knee OA than in those with hip OA9 and different effects from weight loss and diacerhein.3

In addition to the separation of knee from hip OA, there were other interesting and potentially important differences in recommendations that resulted from the different methodologies used by EULAR and ACR (table 3), despite both groups apparently examining the same research literature base. Thus methodology does appear to profoundly influence the recommendations that are generated. However, the quality of the EULAR recommendations in the context of all available OA guidelines remains unknown. In this article, we have undertaken a critical appraisal of the methodological strengths and weaknesses with respect to scope, stakeholder involvement, rigour, clarity, applicability, and editorial independence of the EULAR recommendations compared with other published guidelines. Such appraisal should prove useful in improving future guideline development and the quality of patient care.

Table 3

 Differences in methodology between American College of Rheumatology (ACR) and European League Against Rheumatism (EULAR) recommendations published in 2000


The EULAR recommendations for hip OA and the latest version for knee OA formed the basis of this critical appraisal.1,3 Guidelines/recommendations on the same topic were searched systematically using Medline, EMBASE, CIHNAL, AMED, WOS, and Google. Guidelines were included if they were (a) specifically for the management of hip and/or knee OA, and (b) systematically developed by a group of experts with either a consensus approach and/or an evidence based approach. If the guidelines had been updated, the most recent version was included. Review articles, commentaries, and appraisals were excluded. Only guidelines in English were included. The quality of the guidelines was assessed by one of the authors (WZ) using the AGREE instrument, which is specifically designed to assess the quality of guidelines.10 Quality score was calculated on a percentage scale for each domain as well as overall. Student’s t test and one way analysis of variance were performed for two group and more than two group comparisons respectively. Mean (SD) or mean (95% confidence interval (95%CI)) were calculated as appropriate.


In addition to the EULAR recommendations, 1447 citations relating to hip and/or knee OA guidelines were retrieved from the literature search. After scrutiny, only 21 guidelines met the inclusion and exclusion criteria including the two EULAR recommendations.1,3,6,11–28

Quality of the EULAR recommendations

Overall quality scores (maximum 100) of the EULAR recommendations were 51% and 57% for hip and knee OA respectively. They were in the highest quintile of the available guidelines (mean score 41%, range 9–65). The domain breakdown scores showed that the EULAR recommendations performed very well with respect to scope and purpose, rigour of development, and clarity. In contrast, both recommendations scored low for stakeholder involvement, applicability, and editorial independence (table 4).

Table 4

 Quality of the European League Against Rheumatism (EULAR) recommendations in the context of all guidelines in the management of hip and knee osteoarthritis (OA)

Qualities of opinion based, evidence based, and hybrid guidelines

Guidelines may be categorised into three different types according to the source of evidence. If the source of evidence is predominantly derived from expert consensus, they are “opinion based guidelines”—for example, the Royal College of Physicians guidelines.26 If the source of evidence is solely research evidence, they are termed “evidence based guidelines”—for example, the Prodigy guidelines.13 If both expert consensus and research evidence were used, the term “hybrid guidelines” is used—for example, the EULAR recommendations.1,3 Overall, with the AGREE instrument, hybrid guidelines had the highest quality scores, followed by the evidence based guidelines and then the opinion based guidelines (p<0.0001) (fig 1).

Figure 1

 Overall quality score and type of guidelines. OBG, Opinion based guidelines; EBG, evidence based guidelines; HG, hybrid guidelines.

Strength of recommendation

Like many guidelines, the EULAR recommendations for the management of knee OA derived the strength of recommendation (SOR) according to the traditional method: the category of research evidence.1 However, the EULAR Task Force altered this system when they developed guidelines for hip OA because they recognised inherent problems in this system. For example, for diacerhein and intra-articular steroid injection, randomised controlled trial (RCT) evidence was available but not supportive of these interventions for hip OA. Use of the traditional system would have resulted in a high SOR (grade A) for these treatments even though they are considered ineffective. In contrast, total hip replacement is recognised to be a clinically excellent treatment for severe hip OA, but for methodological and ethical reasons it has not been subjected to assessment by an RCT. Therefore total hip replacement could only be assigned a low SOR (grade C) even though it carried the full support of the Task Force.

Such caveats to the traditional SOR scale led the group to develop two alternative trade off scales: a visual analogue scale (VAS 0–100 mm) and an ordinal scale (A–E) for the SOR. Instead of assigning a SOR solely against the level of research evidence (in most cases, for efficacy only), the Task Force members were asked to mark their SORs on a 0–100 mm horizontal line with only two descriptive ends: “not recommended at all” and “fully recommended” taking into account the research evidence (for efficacy, safety, and cost effectiveness), clinical experience, logistical issues—for example, availability, cost, ease of delivery—and perceived patient acceptability and preference. They were also asked to select a discrete SOR from A (fully recommended), B (strongly recommended), C (moderately recommended), D (weakly recommended), and E (not recommended) in the same way. To date, two groups have used these trade off scales.3,6 The quality of the guidelines with these novel trade off scales tends to be higher numerically (fig 2).

Figure 2

 Overall quality score and strength of recommendation.

The criterion validity of the trade off VAS has been examined.29 As there is no gold standard in this area, the traditional scale was used as a proxy measure. The examination was undertaken within the same group of treatment modalities—exercise therapy for the management of hip and knee OA—where research evidence of different kinds is available and therefore both scales could be fairly compared. A significant linearity was observed between the trade off VAS and the traditional SOR (p<0.001) (fig 3).29

Figure 3

 Relation between the traditional scale and the trade off visual analogue scale (VAS) for strength of recommendation. Data are mean and 95% confidence interval. A, level I evidence; B, level II or extrapolated from level I evidence; C, level III or extrapolated from level I or II evidence; D, level IV or extrapolated from level II or III evidence; NR, not recommanded. Adapted with permission.29

The reliability of the VAS and the ordinal scale has been examined using a test-retest method. Assessment of the SOR was repeated after two weeks by the same group of experts. The intraclass correlation coefficient was 0.60 (95%CI 0.52 to 0.67) for the VAS, and the weighted κ was 0.41 (95%CI 0.32 to 0.49) for the ordinal scale. The full details of the methods and results will be reported elsewhere.

The Delphi exercise

The EULAR recommendations used a Delphi exercise to reach expert consensus. This requires each Task Force member to propose, independently and away from a committee setting, an agreed number of key propositions for management—for example, 10. The propositions from all members are then compiled into one list; at this stage an independent expert can amalgamate propositions that are very similar or overlap and can edit them for English and clarity (important for international committees). This first round list is then returned to all the Task Force members and they are asked to select their top favourites (a pre-defined number such as their top 10). Propositions are accepted if over half of the participants select them, whereas propositions receiving only a very few votes (a pre-defined number) are removed. Propositions receiving less than 50% but more than the minimum number enter the next Delphi round and members again vote for their top favourites. The procedure is repeated until the pre-defined number of propositions have been accepted. There are possible variations on this Delphi technique, but the key principles are:

  • lack of influence from dominant individuals in an open committee setting

  • acceptance of propositions by a majority decision

  • equal weighting of all members with respect to proposing and voting.

When the key propositions have been agreed, the research evidence to support or refute each proposition is then examined using a systematic search strategy. The Task Force then discusses the evidence and determines the SORs.

Among 21 guidelines for OA, 15 used a consensus method: four of these used a Delphi exercise,1,3,8,20 two used conference consensus,24,27 and nine used methods that were not clearly defined.12,15,16,19,21,23,25,26,28 Interestingly the quality score of the guidelines using the Delphi technique appears to be greater than those with other consensus approaches (fig 4).

Figure 4

 Overall quality score and Delphi exercise.

Other issues

Quality scoring of individual studies included in the research evidence, using a validated checklist, was undertaken by the EULAR OA Task Force in the development of their recommendations for management of knee OA.1 However, they abandoned this practice when they developed recommendations for management of hip OA3 for the following reasons.

  • “Quality scores” often judge the quality of reporting rather than the quality of the study design and the robustness of the evidence produced

  • With recent adoption of the CONSORT agreement (involving a checklist of key information for clinical trials), such quality scores have increased with recent publications and older publications tend to score much lower, making the quality score largely a function of date of publication

  • Unless the quality scores are used to weight or influence in some way the category of evidence or SOR, it seems a relatively pointless, time consuming exercise; the panel members can judge the overall quality and robustness of the evidence without such a numerical score before deciding their SOR.

Three other guidelines6,15,19 assessed the quality of studies, but only one reported the quality scores for the evidence applied.6 Nevertheless, the quality of the guidelines was not affected by the quality scoring research evidence either for the overall score or individual domain scores of the AGREE instrument (p>0.05).

The EULAR recommendations were developed by clinical experts—for example, rheumatologists, orthopaedic surgeons—with a special interest in OA. Therefore they are strongly biased towards a secondary, or even tertiary, care perspective, even though most of their recommendations are also applicable to primary care. No allied health professionals, general practitioners, or pharmacists were included in the Task Force even though such professionals are involved in care of patients with OA. Importantly, patient perspectives were also entirely unrepresented. The recommendations produced by EULAR are specific for hip or knee OA, not OA in general. In addition, the EULAR recommendations only highlight 10 key management issues; they are not intended as comprehensive guidelines for the complete management of patients with hip or knee OA (table 2).


The methodology involved in the development of treatment guidelines in OA has evolved rapidly during the last decade. Since the first OA guidelines were published by the Royal College of Physicians in 199326 through to 2005 when the EULAR recommendations were developed,3 the paradigm has shifted from opinion based to evidence based guidelines. The latter attempt to provide the best available evidence to support clinical decision making.30 As a result, a number of evidence based guidelines have been developed for the management of hip and/or knee OA. Whereas some are solely evidence based such as the Prodigy guidance,13 others try to integrate both expert opinion and research evidence—that is, hybrid guidelines such as the EULAR recommendations.1,3 The overall quality of the guidelines has improved considerably (fig 1). This is mainly due to the improved scope and purpose, rigour of development, and editorial independence of the guidelines.

Interestingly, evidence based guidelines tend to have the lowest applicability, although the differences are not statistically significant (mean score: opinion based guidelines 17, evidence based guidelines 13, and hybrid guidelines 22, p  =  0.33). This perhaps in part reflects the current difficulty in implementing evidence based guidelines, but also the gap between RCTs that demonstrate “efficacy” (an intervention works) and clinical “effectiveness” (how often and well the intervention will work in clinical practice). Although often regarded as the gold standard, RCTs have the following common problems.

  • A focus on a highly selected, homogeneous sample of patients with knee or hip OA; often this means that the findings cannot be generalised to the whole population of patients with OA and clinical predictors of outcome cannot be examined

  • Examination of a monotherapy rather than combined treatments or a package of care; interactions between treatments are under-studied

  • Short study duration (a few weeks or months) even though symptomatic OA is usually a chronic condition.

Hybrid guidelines would be expected to improve applicability, as expert opinion can temper the rigidity of research data and close the gap between research and clinical practice.

The EULAR recommendations have claimed three strengths. Firstly, they align with the evidence based, clinical decision making scenario, a group decision based on research evidence, clinical expertise, and patient preference.30 Although no patients were involved, their perceived preferences with respect to tolerability and acceptance were considered in the SOR. Secondly, EULAR recommendations use two established methods—systematic literature review and the Delphi exercise—to gather evidence from different sources—that is, research and clinical practice. Systematic review (or meta-analysis) is a well known and widely used quantitative method to synthesise research evidence, whereas the Delphi exercise is an established qualitative method to synthesise expert opinion.31 Both methods aim to obtain the best available evidence with the minimum individual (single study and single expert) bias. EULAR recommendations successfully apply both methods and thus substantially increase the quality of the guidelines, especially the rigour of the development (table 4). Finally, EULAR recommendations are supported by a trade off scale for SOR. Unlike the traditional SOR, which directly reflects the level of research evidence for efficacy,32,33 the trade off scale allows the developers to consider evidence from different sources (research and clinical) and to weigh, for example, benefits against harms and cost against effectiveness. The scales have been preliminarily validated with good criterion validity (fig 3) and moderate reliability.

However, EULAR recommendations also have shortcomings. Firstly, the guideline development team was biased by its predominance of rheumatologists and orthopaedic surgeons resulting in a lower score for stakeholder involvement (table 4). Although the recommendations have been adapted by many primary care groups, different recommendations may have been emphasised if general practitioners and allied health practitioners had been included in the Task Force. How to include patient representation in an international group representing many diverse countries is problematic, but might best be achieved by a multi-country questionnaire survey rather than by representation of just a few select patients on the Task Force. Secondly, EULAR recommendations are not comprehensive guidelines for the management of knee or hip OA; only 10 key clinical questions have been addressed (table 2). Many other interventions were not considered, which may decrease the applicability of the recommendations. Thirdly, the meeting costs for the development of the EULAR recommendations were met by the pharmaceutical industry, and consequently the recommendations have lower editorial independence scores (16.67) (table 4).

What is already known on this topic

  • EULAR has developed separate recommendations for management of hip and knee OA

  • There are differences in detail from other guidelines on OA which largely result from the different evidence based formats used

What this study adds

  • EULAR recommendations have high methodological rigour

  • However, they are selective, not fully comprehensive, and both stakeholder involvement and editorial independence merit further improvement

This critical appraisal was based on the AGREE instrument. Although it is validated,34 the AGREE instrument was developed at a time when expert opinion predominated guideline development, and research evidence was largely neglected. During our assessment, we experienced difficulty in assigning a score to some of the items and felt that apparently objective descriptions often required subjective judgment. Therefore we feel that some of the items in this instrument do not fairly reflect the true quality of guidelines and that further development of such an instrument is required. In addition, the guidelines included in this critical appraisal were assessed by only one person; because of the subjectivity of the instrument, the results have yet to be stabilised. Finally, the assessment only included OA guidelines in English. The quality of non-English guidelines remains unknown.

Overall, the EULAR recommendations have pioneered the development of hybrid guidelines, a possible future direction of clinical practice guidelines. Although limitations remain, the continuing evolution of the EULAR methodology provides an opportunity to improve the overall quality of guidelines for OA and other conditions.


View Abstract


  • Competing interests: none declared