Article Text

Patient-reported outcome measures for hip-related pain: a review of the available evidence and a consensus statement from the International Hip-related Pain Research Network, Zurich 2018
  1. Franco M Impellizzeri1,
  2. Denise M Jones2,
  3. Damian Griffin3,
  4. Marcie Harris-Hayes4,5,
  5. Kristian Thorborg6,
  6. Kay M Crossley2,
  7. Michael P Reiman7,
  8. Mark James Scholes2,
  9. Eva Ageberg8,
  10. Rintje Agricola9,
  11. Mario Bizzini10,
  12. Nancy Bloom11,12,
  13. Nicola C Casartelli10,13,
  14. Laura E Diamond14,
  15. Hendrik Paulus Dijkstra15,16,
  16. Stephanie Di Stasi17,
  17. Michael Drew18,
  18. Daniel Jonah Friedman19,
  19. Matthew Freke20,
  20. Boris Gojanovic21,22,
  21. Joshua J Heerey2,
  22. Per Hölmich6,
  23. Michael A Hunt23,
  24. Lasse Ishøi6,
  25. Ara Kassarjian24,25,
  26. Matthew King2,
  27. Peter R Lawrenson26,
  28. Michael Leunig27,
  29. Cara L Lewis28,
  30. Kristian Marstrand Warholm29,
  31. Sue Mayes2,30,
  32. Håvard Moksnes31,
  33. Andrea Britt Mosler2,
  34. May Arna Risberg32,33,
  35. Adam Semciw2,
  36. Andreas Serner34,
  37. Pim van Klij9,
  38. Tobias Wörner35,
  39. Joanne Kemp2
  1. 1 Human Performance Research Centre, Faculty of Health, University of Technology Sydney (UTS), Sydney, New South Wales, Australia
  2. 2 La Trobe Sports Exercise Medicine Research Centre, School of Allied Health, Human Services and Sport, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, Australia, Bundoora, Victoria, Australia
  3. 3 Warwick Orthopaedics, University of Warwick, Coventry, Warwick, UK
  4. 4 Physical Therapy, Washington University School of Medicine in St Louis, St Louis, Missouri, USA
  5. 5 Orthopaedic Surgery, Washington University School of Medicine in St Louis, St Louis, Missouri, USA
  6. 6 Sports Orthopedic Research Center - Copenhagen (SORC-C), Arthroscopic Center, Department of Orthopedic Surgery, Copenhagen University Hospital, Amager-Hvidovre Hospital, Hvidovre, Copenhagen, Denmark
  7. 7 Orthopedic Surgery, Duke University Medical Center, Durham, North Carolina, USA
  8. 8 Department of Health Sciences, Lund University, Lund, Sweden
  9. 9 Department of Orthopaedic Surgery, University Medical Centre Rotterdam, Rotterdam, The Netherlands
  10. 10 Human Performance Lab, Schulthess Clinic, Zurich, Switzerland
  11. 11 Physical Therapy, Washington University, St Louis, Missouri, USA
  12. 12 Orthopaedic Surgery, Washington University School of Medicine in Saint Louis, Saint Louis, Missouri, USA
  13. 13 Laboratory of Exercise and Health, ETH Zurich, Schwerzenbach, Switzerland
  14. 14 Griffith Centre of Biomedical and Rehabilitation Engineering (GCORE), Menzies Health Institute Queensland Griffith University, School of Allied Health Sciences, Gold Coast, Queensland, Australia
  15. 15 Aspetar Sports Medicine Hospital, Doha, Qatar
  16. 16 Department for Continuing Education, University of Oxford, Oxford, UK
  17. 17 Division of Physical Therapy, The Ohio State University, Columbus, Ohio, USA
  18. 18 University of Canberra Research into Sport and Exercise (UCRISE), University of Canberra, Canberra, Australian Capital Territory, Australia
  19. 19 Monash School of Medicine, Monash University, Melbourne, Victoria, Australia
  20. 20 School of Health & Rehabilitation Sciences, The University of Queensland, Brisbane, Queensland, Australia
  21. 21 Swiss Olympic Medical Center, Hopital de la Tour, Meyrin, Geneva, Switzerland
  22. 22 SportAdo consultation, University Hospital of Lausanne (CHUV) Multidisciplinary Unit of Adolescent Health, Lausanne, Switzerland
  23. 23 Department of Physical Therapy, University of British Columbia, Vancouver, British Columbia, Canada
  24. 24 Musculoskeletal Radiology, Corades, LLC, Brookline, Massachusetts, USA
  25. 25 Musculoskeletal Radiology, Elite Sports Imaging, SL, Madrid, Spain
  26. 26 School of Health & Rehabilitation Sciences, University of Queensland, St Lucia, Queensland, Australia
  27. 27 Department of Orthopaedics, Schulthess Klinik, Zürich, Switzerland
  28. 28 Physical Therapy & Athletic Training, Boston University, Boston, Massachusetts, USA
  29. 29 Division of Orthopaedic Surgery, Oslo University Hospital, Oslo, Norway
  30. 30 The Australian Ballet, Southbank, Victoria, Australia
  31. 31 Oslo Sports Trauma Research Center, Oslo, Norway
  32. 32 Department of Sport Medicine, Norwegian School of Sport Sciences, Oslo, Norway
  33. 33 Division of Orthopedic Surgery, Oslo University Hospital, Oslo, Norway
  34. 34 Aspetar Orthopaedic and Sports Medicine Hospital, Doha, Qatar
  35. 35 Department of Health Sciences, Lunds University, Lund, Sweden
  1. Correspondence to Professor Franco M Impellizzeri, Faculty of Health, University of Technology Sydney, Sydney, New South Wales, Australia; franco.impellizzeri{at}


Hip-related pain is a well-recognised complaint among active young and middle-aged active adults. People experiencing hip-related disorders commonly report pain and reduced functional capacity, including difficulties in executing activities of daily living. Patient-reported outcome measures (PROMs) are essential to accurately examine and compare the effects of different treatments on disability in those with hip pain. In November 2018, 38 researchers and clinicians working in the field of hip-related pain met in Zurich, Switzerland for the first International Hip-related Pain Research Network meeting. Prior to the meeting, evidence summaries were developed relating to four prioritised themes. This paper discusses the available evidence and consensus process from which recommendations were made regarding the appropriate use of PROMs to assess disability in young and middle-aged active adults with hip-related pain. Our process to gain consensus had five steps: (1) systematic review of systematic reviews; (2) preliminary discussion within the working group; (3) update of the more recent high-quality systematic review and examination of the psychometric properties of PROMs according to established guidelines; (4) formulation of the recommendations considering the limitations of the PROMs derived from the examination of their quality; and (5) voting and consensus. Out of 102 articles retrieved, 6 systematic reviews were selected and assessed for quality according to AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews). Two showed moderate quality. We then updated the most recent review. The updated literature search resulted in 10 additional studies that were included in the qualitative synthesis. The recommendations based on evidence summary and PROMs limitations were presented at the consensus meeting. The group makes the following recommendations: (1) the Hip and Groin Outcome Score (HAGOS) and the International Hip Outcome Tool (iHOT) instruments (long and reduced versions) are the most appropriate PROMs to use in young and middle-aged active adults with hip-related pain; (2) more research is needed into the utility of the HAGOS and the iHOT instruments in a non-surgical treatment context; and (3) generic quality of life measures such as the EuroQoL-5 Dimension Questionnaire and the Short Form Health Survey-36 may add value for researchers and clinicians in this field. We conclude that as none of the instruments shows acceptable quality across various psychometric properties, more methods studies are needed to further evaluate the validity of these PROMS—the HAGOS and iHOT—as well as the other (currently not recommended) PROMS.

  • hip
  • questionnaire
  • consensus
  • quality of life
  • groin

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Hip-related pain is an increasingly recognised complaint in both young and middle-aged active adults and athletes.1–3 In these populations, hip disorders are associated with increased disability, as defined by the International Classification of Functioning, Disability and Health (ICF) developed by the WHO ( 5 According to this biopsychosocial model, disability involves dysfunctions at one or more of the following three levels: impairment, activity limitations and participation restriction.6 Indeed, people suffering from hip-related disorders commonly experience pain, impairments of body function and structure, and difficulties when executing activities of daily living and sports.7

In order to examine and compare the effects of different treatments on disability, it is necessary to use patient-reported outcome measures (PROMs). Currently, PROMs are considered a necessary aspect of medical treatment evaluation,8 9 and are used in national and international registries.10 Furthermore, PROMs are frequently used and recommended to support clinical decision-making, health policies and reimbursement processes.11 This requires the systematic collection of PROMs in the clinical setting. For these purposes, PROMs need to be valid and possess adequate psychometric properties. Lack of validity or suboptimal measurement properties of PROMs might bias (positively or negatively) the effects of randomised controlled trials.12 The respondent and patient burden of the selected PROMs must also be considered for successful implementation in research and clinical practice.9 Given the importance of using appropriate PROMs, internationally recognised guidelines such as the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) were developed ( The COSMIN initiative targets improving the quality of studies investigating measurement properties. By developing methodology and practical tools for assessing measurement properties, the COSMIN guidelines can be used by clinicians and researchers to select the most appropriate instruments.

The aim of this paper was to present the consensus reached at the first International Hip-related Pain Research Network (IHiPRN) meeting (November 2018, Zurich, Switzerland) on the most appropriate PROMs to assess disability in young and middle-aged active adults with hip-related pain in both research and clinical settings.


Consensus process

The first step of the five-step process for gaining consensus included a systematic review (SR) of SRs to define the best PROMs based on available literature. After examination of the quality of the selected SRs, the working group decided to update the most recent high-quality review. We assessed the quality of the psychometric properties of the PROMs recommended by Thorborg et al 13 and those identified in our update of this SR. Based on the quality and limitations of the PROMs obtained from the update and quality assessment, recommendations were developed for voting and consensus.

Step 1: systematic review of the systematic reviews

Eligibility, inclusion and exclusion criteria

The review included peer-reviewed SRs examining the psychometric properties of PROMs for patients with hip-related pain and which included the following: population: patients with hip pain (including hip osteoarthritis and femoro-acetabular impingement (FAI) syndrome and groin pain); measurement properties: all measurement properties in any clinical context (surgery, non-surgery and so on); and instrument: PROMs.

Search strategy

The literature search was conducted in MEDLINE, Cochrane Database of Systematic Reviews, Database of Abstracts of Reviews of Effects and Web of Science (with no language or date restrictions: all articles before 31 July 2018) according to the following search strategy, adapted for each database: #1 (hip) OR (groin) OR (inguinal AND hernia); #2 (outcome AND assessment*) OR (self AND assessment*) OR (questionnaire*) OR (patient AND reported AND outcome*) OR (self AND report*); #3 (psychometric AND property*) OR (validity) OR (clinimetrics); #4 (systematic AND review); #5 #1 AND #2 AND #3 AND #4.

Selection, data extraction and assessments

Title, abstract and full text were screened, and aim, population, context/setting, number of instruments, suggested instruments and main authors’ conclusions were extracted (online supplementary appendix 1). The screening, selection, data extraction and assessments of the SRs were carried out by two reviewers (FMI, DMJ), with a third (JK) acting as referee to solve conflicts. While for study selection there was a substantial agreement, for quality assessment kappa coefficient was fair to moderate (k<0.40). This was the consequence of difficulties in the interpretation of the new COSMIN guidelines. Therefore discussion for solving and addressing sources of conflicts was necessary. This harmonisation improved the agreement between reviewers (k>0.76).

Supplemental material

Quality assessment

The quality of SRs was assessed using A MeaSurement Tool to Assess systematic Reviews (AMSTAR 2;, adapted to the topic of the SR that included studies investigating the psychometric properties of questionnaires. Specifically, item 14 (‘Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review?’) was not considered applicable because the heterogeneity of the results in methodological studies is not checked quantitatively, and some heterogeneity in the results is expected since psychometric properties are population-specific and context-specific. Similarly, items 11, 12 and 15 were not considered applicable since no quantitative meta-synthesis has been performed in the reviews.

Results of the systematic review of systematic reviews

After duplicate removals, 102 articles were screened from titles and abstracts. Fourteen full texts were selected.13–26 Eight literature reviews were excluded and six were included (see articles in table 1).13 17 20 24–26 The flow diagram (figure 1) was presented according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses. The quality of the SRs (table 1 and online supplementary appendix 1) according to the AMSTAR 2 was often deemed ‘critically low’ mainly because very few reviews assessed and took into consideration the risk of bias and/or methodological quality of the studies included in the SRs. Only two reviews were rated as moderate quality: Tijssen et al 25 and Thorborg et al.13 These two SRs examined the quality of the methodological studies included and the PROMs according to an older version of the COSMIN ( Based on their quality assessment, Tijssen et al 25 recommended the use of the Nonarthritic Hip Score (NAHS) and the Hip Outcome Score (HOS). The review by Thorborg et al 13 was an update of their previous SR published in 2011, where they also recommended the NAHS. However, in their update, Thorborg et al 13 excluded the NAHS and they recommended the HOS, the Hip and Groin Outcome Score (HAGOS), and the International Hip Outcome Tool (iHOT-12 and iHOT-33), since these were the PROMs with the smallest proportion of specific psychometric properties with poor methodology score.

Figure 1

PRISMA flow chart for the systematic review of systematic reviews. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Table 1

Rating of the quality of the systematic reviews according to AMSTAR 2

Step 2: first round discussion among the working group participants

The results of the SR of SRs were circulated among the members of the working group. We decided to update the review by Thorborg et al,13 which was deemed the SR with the higher quality, the most recent and specifically focused on the target population of this consensus.

Step 3: update of the systematic review by Thorborg et al13

The eligibility, exclusion and inclusion criteria, and search strategy were the same as used in the review by Thorborg et al,13 but with dates modified to include only studies from 2015 to 31 July 2018.

Results of the update

Out of 803 articles found, 20 full texts were assessed for eligibility27–46 and 10 studies were included in the qualitative synthesis.27 28 32 36 37 39–41 45 46 The flow chart of the literature search for the updated SR is presented in figure 2. We excluded the studies by Brans et al 29 and Stevens et al 43 since the mean age of the samples was higher (51–52 years) than 50 years old set as the upper limit by our inclusion criteria. This replicated the inclusion criteria of Thorborg et al 13 to ensure consistency. We conducted a sensitivity analysis to examine whether these two papers could have influenced the overall rating. The assessment suggested that these two papers would not substantially change the final recommendations and were consistent with those included in the summary assessment (online supplementary appendix 2).

Supplemental material

Figure 2

PRISMA flow chart for the literature update. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.13

Quality of the studies

For all papers included in the updated SR, the quality of the studies and of the psychometric properties was evaluated using the most recent COSMIN manual (V.1.0, updated February 2018; As per the SR of SRs, the same three reviewers were involved in the study selection, quality assessment and data extraction of all included articles.

The 10 selected studies examined five PROMs. Two PROMs (Core Outcome Measures Index and Oxford Hip Score) that were developed for other conditions (back pain and patients undergoing total hip arthroplasty, respectively) were evaluated for their performance in FAI population.36 37 These were assessed for their quality, but given they address very few psychometric properties and their content validity in hip-pain patients was not evaluated, they were excluded from further analysis (ie, quality assessment of the measurement properties). The remaining three PROMs were among the four recommended by Thorborg et al.13 To provide a summary of the quality of evidence, studies in the updated SR were combined with studies examining the same PROMs (iHOT-33, iHOT-12 and HOS) reported in the review by Thorborg et al 13 (tables 2 and 3). For consistency, since the updated SR used a different version of the COSMIN manual, the assessments undertaken by Thorborg et al were redone using the last version of the COSMIN manual (

Table 2

Rescoring of the articles from the review of Thorborg et al 13

Table 3

Scoring of the articles from the review update

Critical issues in rating the quality of the studies

There were critical issues relating to the rating of the quality of the studies for structural validity, internal consistency and cross-cultural validity. These are presented in online supplementary appendix 3.

Supplemental material

Quality of the measurement properties

The quality of the measurement properties was rated for the instruments recommended by Thorborg et al (HAGOS, HOS, iHOT-12 and iHOT-33). The psychometric properties in the studies reported in the previous SR by Thorborg et al 13 (table 4) were also reassessed.

Table 4

Quality of the psychometric properties (overall rating and quality of evidence) according to the COSMIN guidelines

The overall rating for structural validity reflected the lack of consistency in structure evaluation as mentioned in the previous section. In addition, no studies reported any fit indices, which is a requirement for assigning a positive rating using the COSMIN criteria. The measurement error was consistently higher than the minimal important change, thus resulting in a negative rating. Finally, the updated COSMIN now allows the reviewers to develop the hypotheses (for construct validity and responsiveness), even if these are not explicitly stated by the authors. This resulted in more positive ratings, but this approach makes the assessment quite reviewer-dependent and somewhat arbitrary.

Content validity

We evaluated the content validity using the new purposely developed COSMIN manual ( The COSMIN manual suggests three steps in assessing the content validity of the PROMS and the quality of the corresponding studies: (1) evaluate the quality of the PROM development, (2) evaluate the quality of content validity studies and (3) evaluate the content validity of the PROM. The analysis of content validity was performed for the iHOT-33 and HAGOS only. The HOS did not involve any patients in the development phase, and therefore as the content validity was considered inadequate it was excluded from further examination. The COSMIN suggests that a modified PROM should, in principle, be treated as a new instrument. However, COSMIN also states that if the PROM is a modified shortened version, the information can be taken from the original PROM. This is the reason why the iHOT-12 (shorter version of the iHOT-33) was included among the recommended PROMs despite the content validity of this shorter version not being addressed specifically.

Evaluate the quality of the PROM development

Based on the worst score approach, the overall rating for the quality of PROM design to ensure relevance was inadequate because inadequate was the first item addressing the description of the construct to be measured. Indeed, both HAGOS and iHOT-33 did not describe or provide any operational definitions of the constructs. This increases the difficulty in interpreting whether the items of the PROMs are relevant for the construct of interest. In addition, while the HAGOS referred to the ICF framework and the inclusion of body structure, function and participation, the iHOT-33 did not report any theoretical grounding. The HAGOS used the Hip dysfunction and Osteoarthritis Outcome Score (HOOS) as template and reported the constructs of symptoms, pain, physical activity, sport and quality of life. However, detail on the aspects of these constructs in HAGOS was not provided. For example, both HOOS and HAGOS purport to assess pain; however, the dimensions of pain (pain intensity or interference) are not described. Quality of life is another broad and multifaceted concept included in the HAGOS and HOOS and a clear description would be necessary, but is not reported. Examination of the items suggests that other dimensions of quality of life have been considered compared with those addressed by traditional generic quality of life questionnaires such as the EuroQoL-5 Dimension Questionnaire (EQ-5D), Short Form Health Survey (SF-36 and SF-12) or WHO Quality of Life Instruments. Most items relative to the methodological approach were rated as doubtful as clear descriptions of the methods were lacking.

Evaluate the quality of content validity studies

The COSMIN manual suggests that studies with a translation of a PROM should include a pilot study following translation to evaluate the comprehensibility of the translated PROM. All cross-cultural validation studies did not formally report pilot studies to examine comprehensibility. At best they mentioned that comprehensibility was addressed in groups of patients, but without reporting any methods or results. For this reason, these studies were not considered as content validity studies and hence were excluded.

Evaluate the content validity of the PROM

Content validity was assessed using only the PROM development study. The reviewers’ ratings were quite positive mainly based on the assumption that, even if not reported, some issues were probably addressed. This evaluation was subjective and based on an arbitrary interpretation of the items and response options included in the PROMs. The main problem of PROM development studies was that too few details about the content validity process were reported, such as how interviews were conducted, recorded and coded (eg, use of NVivo), and the reference framework for data extraction and coding. Details of the content validity assessment according to the COSMIN manual are presented in online supplementary appendix 4. The evidence synthesis of the content validity is reported in table 5.

Supplemental material

Table 5

Summary results for content validity

Step 4: formulation of the recommendation, including background and process of the consensus meeting

Selection of expert group members

The IHiPRN leadership group (JK, KMC, MB, ABM, CLL and Karim Khan) met in January 2017 to set the criteria to identify potential expert group members. Experts were selected based on their previous publications, and being current active researchers in the field of hip-related pain in young and middle-aged adults. Researchers who were also clinicians in the field were viewed favourably. Potential expert group members were contacted via email asking them an expression of interest in taking part in the first IHiPRN consensus meeting in Zurich in November 2018. Potential expert group members were also asked to suggest other experts for invitation that the leadership group may not have identified.

Following this expression of interest, four key areas were identified as priorities for consensus. These four key areas were the following:

  1. Classification of hip pain (including use of clinical tests and imaging).

  2. PROMs for hip pain (including hip-related measures, and maybe others including pain/coping/fear/utility measures).

  3. Standardised measurement of physical capacity in hip-related pain (including clinical measures, biomechanics, electromyography, physical activity, functional performance and return to sport).

  4. Physiotherapist-led treatment of hip-related pain.

The leadership group then identified experts to lead each of the four working groups. These were MAR and RA (group 1), ABM and CLL (group 2), FMI and JK (group 3), and JK and MB (group 4). This paper relates to working group 2. The members of the working groups were then determined following discussion between the leadership group and the working group leaders. This working group drafted the recommendations considering the limitations of the PROMs derived from the examination of the quality of their measurement properties and the corresponding reference studies.

Expert group demographics

All consensus meeting participants were considered to be experts and at the time of meeting were actively researching in the field of hip-related pain in young and middle-aged active adults. Areas of expertise among the participants included physiotherapy, orthopaedic surgery, sport and exercise medicine, biomechanics, diagnostics, imaging and radiology, PROMs, and exercise science. In addition, many of the participants were also expert clinicians who regularly treat young and middle-aged active adults with hip-related pain.

Step 5: consensus process

The evidence summaries and draft recommendations were emailed to the delegates, at least 2 weeks prior to the meeting in Zurich. At the meeting, each working group met to discuss recommendations, and revisions were made based on the discussion. The evidence summary and revised recommendations were presented to the expert group, with opportunity for discussion. The recommendation was then revised and finalised. At the conclusion of the discussion, each delegate was asked to vote on the recommendation. The voting was conducted anonymously, using a scoring system used at previous consensus meetings.47 48 A 10-point Likert scale was used to score each recommendation, where 0 was considered to be ‘inappropriate’ and 9 ‘appropriate’. As described previously,47 48 scores were pooled and the median (IQR) for each recommendation was determined. Scores of 0–3 were considered inappropriate, scores of 4–6 were considered uncertain, and scores of 7–9 were considered appropriate. Consensus statements were then developed based on the level of evidence available combined with the pooled voting score for that statement.


The consensus meeting in Zurich, Switzerland on November 17 and 18, 2018 was attended by 37 participants. In addition, six participants were not able to attend in person, one attended the meeting via videoconferencing. Thus 38 participants were involved in the consensus voting process. All delegates were considered to be experts and were actively researching in the field of hip-related pain in active adults. Areas of expertise within the delegates included physiotherapy, orthopaedic surgery, sport and exercise medicine, biomechanics, diagnostics, imaging and radiology, PROMs, and exercise science. In addition, many of the delegates were also clinicians treating adults with hip-related pain.

The median score (IQR) for the four statements was 9 (8) points. The scores for each statement are presented in figure 3, and the final four recommendations of the consensus group are presented in table 6.

Figure 3

Consensus voting on statements. S, statement.

Table 6

The final recommendations voted on at the consensus meeting and the results of the consensus voting


Recommendation 1: The HAGOS and iHOT instruments (long and reduced versions) are the most appropriate PROMs to use in young and middle-aged active adults with hip-related pain.

Based on the updated literature review and the quality assessment of the psychometric properties, we partially confirmed the suggestions by Thorborg et al,13 who recommended the HAGOS, HOS, iHOT-12 and iHOT-33. We excluded the HOS because this instrument was developed without the involvement of patients, which is necessary in ensuring content validity. Unfortunately, no subsequent studies examined the content validity of the HOS.

The HAGOS and the iHOT instruments (with iHOT-12 considered as a short version of the iHOT-33) had sufficient quality (mostly with high evidence) for cross-cultural validity, reliability and construct validity. The structural validity rating of all recommended PROMS was indeterminate because the structure of the subscales and not the whole instruments was examined. The internal consistency of the subscales was sufficient, with high evidence for HAGOS and low to moderate evidence for iHOT. High-quality studies, however, showed large measurement error for both HAGOS and iHOT, where the smallest detectable change was higher than the minimal clinically important change (when available). Therefore, the usefulness of HAGOS and iHOT in evaluating the response to treatment of individual patients over time seems to be limited.

Although we excluded the HOS from the recommended instruments, we acknowledge that the other psychometric properties of the HOS were comparable with the other instruments. Therefore, despite its exclusion, the HOS may be potentially appropriate for this population if the content validity is confirmed in the future.

Recommendation 2: HAGOS and iHOT were developed mainly in surgical context. More research is needed into their utility in a non-surgical treatment context.

The HAGOS and iHOT have only been investigated in a surgical context (patients assessed before and after surgical interventions) or in mixed populations (undergoing both surgical and non-surgical treatments) (see details on population and context in online supplementary appendix 1). The magnitude of the effects following surgical interventions is not necessarily comparable with non-surgical treatment, which can impact the acceptability of measurement error and instrument responsiveness. Since the acceptability of the reproducibility level (instrument noise) depends on the context and the magnitude of changes determined by the interventions (signal), we recommended the HAGOS and iHOT-33 primarily as outcome measures in a surgical setting (which is the main context in which they were investigated), while in non-surgical treatment the aforementioned limitations should be taken into consideration.

Recommendation 3: EQ-5D and SF-36 are generic quality of life measures that can supplement the hip-related measures, HAGOS and iHOT.

The use of generic questionnaires, in addition to condition-specific PROMs, is commonly suggested to give a more complete picture of patient health status.49 These instruments were developed to be used with a generic population. There are several generic instruments available, and the selection of a generic questionnaire for use in a particular clinical population should be based on theoretical considerations (eg, what aspects of quality of life are of interest or whether a utility questionnaire is needed). For these reasons it is difficult to recommend a specific generic instrument. However, in the absence of a gold standard instrument, it is common to use generic questionnaires to examine the construct validity (convergent evidence and hypotheses generation). For example, EQ-5D50 51 and SF-3652 53 are the generic instruments most commonly used as reference for the HAGOS and iHOT. These two instruments can be suitable generic questionnaires to use in addition to HAGOS and iHOT considering that they also provide health utility measures54 and comparative values for hip-related pain population are available.

Recommendation 4: Future research should include further analysis of content and structural validity, and the relationship between individual measurement error and the minimal clinically important change for the recommended PROMs.

The examination of study quality and measurement properties highlighted inadequate structural validity, meaning that the structural validity of PROMs could not be determined despite us recommending their use. The structure of HAGOS55 was developed using the HOOS as a template,56 and not with a confirmatory analysis, but the HOOS structure was also not examined, but based on the structure of the Knee Injury and Osteoarthritis Outcome Score (KOOS).57 Since the KOOS structure was not examined, an SR on the KOOS psychometric properties scored the structural validity as ‘poor’ (according to the COSMIN).58 Similarly, the structure of the iHOT was not properly examined or confirmed. Lack of structural validity examination is an important weakness, especially for instruments providing a single score such as the iHOT, as this limits interpretation of the total score. The operational definitions and theoretical framework of the construct reflected by the subscales were also not specified for the HAGOS and iHOT. These limitations are reflected in the content validity score. Despite being rated as sufficient by the reviewers, the content validity was mostly deemed to be inconsistent or indeterminate due to the lack of methodological information. Therefore, future studies should examine the structural validity, clarify the constructs measured and analyse the content validity of the HAGOS and iHOT. Finally, the measurement error was higher than the minimal clinically important change, thus questioning the use of these PROMs at the individual level (eg, in clinical practice), particularly for the iHOT. While the measurement error may be sufficient to detect change over time at a group level (eg, research studies), further studies are needed to examine the minimal clinical change and its relationship with measurement error at the individual level, especially for the iHOT.


The expert group were from Europe, North America and Australia/New Zealand, limiting the cultural diversity of the group. Also, there were more men than women in the expert group and no patients were involved. Future meetings should try to improve all types of diversity and involve all stakeholders. While the use of the COSMIN manuals provided reference guidelines to assess the quality of the studies and the measurement properties of the PROMs, the interpretation of the items and hence the scoring is reviewer-dependent. However, COSMIN acts as a guideline (as also stated in the manual) and allows for a certain degree of interpretation. This might influence our quality assessment results and the corresponding recommendations. Nevertheless, we used systematic methods implemented by multiple expert reviewers to assess study quality. Furthermore, some difficulties in interpreting or a low rating occurred when information and methodological details were lacking in the studies. This highlights the necessity to increase the quality and the standard of reporting. As such, the COSMIN can be used both as a post-hoc assessment tool and as a guideline to ensure that the essential information is reported for a proper evaluation of the psychometric properties and methodological quality of studies.

Based on the literature reviews and the selected instruments,13 the constructs/domains assessed by the PROMs were symptoms, pain, sport and recreational function, participation in physical activity, activity of daily living, physical function, and quality of life. A previous study reported that pain and fear of the condition worsening are the two main reasons to undergo surgery in patients with FAI,59 together with improvement in everyday life and the ability to do sport. Most PROMs proposed for patients with hip-related pain include these domains. However, other constructs and transition questions such as satisfaction and patient acceptable symptoms state that were not addressed in this consensus may be important.


Although not all the psychometric properties can be considered adequate, the participants of the first IHiPRN consensus meeting recommend the HAGOS and iHOT for use in young and middle-aged active adults with hip-related pain. The participants agreed that generic quality of life measures such as EQ-5D and SF-36 may be a useful addition. Nevertheless, more methodological studies are needed to further evaluate the validity of these instruments and the others excluded from the recommended PROMs.


Supplementary materials


  • Twitter @francoimpell, @DamianGriffin, @MHarrisHayes, @KThorborg, @MikeReiman, @MarkScholes85, @EvaAgeberg, @RintjeAgricola, @NicCasartelli, @lauradiamond05, @DrPaulDijkstra, @S_DiStasi, @_mickdrew, @ddfriedman, @drsportsante, @JHeerey, @LasseIshoei, @akassarjian, @mattgmking1, @PeteLawrenson, @ProfCaraLewis, @HMoksnes, @AndreaBMosler, @ASemciw, @aserner, @Wuninho, @JoanneLKemp

  • Contributors All authors were fully involved in the preparation and completion of the manuscript. Each author has read and concurs with the content of this manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests Two members of the panel (DG, KT) belong to the research groups developing the instruments recommended in the current consensus. However, they did not take part in the literature update and the quality scoring of the studies and the psychometric properties.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.