Article Text

Stratified care in hip arthroscopy: can we predict successful and unsuccessful outcomes? Development and external temporal validation of multivariable prediction models
  1. Lasse Ishøi1,
  2. Kristian Thorborg1,
  3. Thomas Kallemose2,
  4. Joanne L Kemp3,
  5. Michael P Reiman4,
  6. Mathias Fabricius Nielsen1,
  7. Per Hölmich1
  1. 1 Sports Orthopaedic Research Center–Copenhagen (SORC-C), Arthroscopic Center, Department of Orthopedic Surgery, Hvidovre Hospital, Copenhagen University Hospital, Hvidovre, Denmark
  2. 2 Department of Clinical Research, Hvidovre Hospital, Copenhagen University Hospital, Hvidovre, Denmark
  3. 3 Latrobe Sports Exercise Medicine Research Centre, School of Allied Health, Human Services and Sport, La Trobe University, Bundoora, Victoria, Australia
  4. 4 Department of Orthopedic Surgery, Duke University, Duke University Medical Center, Durham, North Carolina, USA
  1. Correspondence to Mr Lasse Ishøi, Sports Orthopaedic Research Center–Copenhagen (SORC-C), Arthroscopic Center, Department of Orthopedic Surgery, Copenhagen University Hospital, Hvidovre Hospital, Hvidovre, Denmark; lasse.ishoei{at}


Objective Although hip arthroscopy is a widely adopted treatment option for hip-related pain, it is unknown whether preoperative clinical information can be used to assist surgical decision-making to avoid offering surgery to patients with limited potential for a successful outcome. We aimed to develop and validate clinical prediction models to identify patients more likely to have an unsuccessful or successful outcome 1 year post hip arthroscopy based on the patient acceptable symptom state.

Methods Patient records were extracted from the Danish Hip Arthroscopy Registry (DHAR). A priori, 26 common clinical variables from DHAR were selected as prognostic factors, including demographics, radiographic parameters of hip morphology and self-reported measures. We used 1082 hip arthroscopy patients (surgery performed 25 April 2012 to 4 October 2017) to develop the clinical prediction models based on logistic regression analyses. The development models were internally validated using bootstrapping and shrinkage before temporal external validation was performed using 464 hip arthroscopy patients (surgery performed 5 October 2017 to 13 May 2019).

Results The prediction model for unsuccessful outcomes showed best and acceptable predictive performance on the external validation dataset for all multiple imputations (Nagelkerke R2 range: 0.25–0.26) and calibration (intercept range: −0.10 to −0.11; slope range: 1.06–1.09), and acceptable discrimination (area under the curve range: 0.76–0.77). The prediction model for successful outcomes did not calibrate well, while also showing poor discrimination.

Conclusion Common clinical variables including demographics, radiographic parameters of hip morphology and self-reported measures were able to predict the probability of having an unsuccessful outcome 1 year after hip arthroscopy, while the model for successful outcome showed unacceptable accuracy. The externally validated prediction model can be used to support clinical evaluation and shared decision making by informing the orthopaedic surgeon and patient about the risk of an unsuccessful outcome, and thus when surgery may not be appropriate.

  • Hip
  • Arthroscopy
  • Groin
  • Sports medicine
  • Surveys and Questionnaires

Data availability statement

Data may be obtained from a third party and are not publicly available.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Level 1 evidence exists for the effectiveness of hip arthroscopy for femoroacetabular impingement syndrome.

  • Many patients have residual symptoms after hip arthroscopy.

  • It is unknown if presurgical clinical variables can be used to predict the outcome after hip arthroscopy


  • Common clinical variables obtained prior to hip arthroscopy can predict the risk of having an unsuccessful outcome (not achieving the patients acceptable symptom state) 1 year after hip arthroscopy.

  • Patients with a successful outcome (achieving the patients acceptable symptom state) were less accurately predicted.


  • The prediction model can be used to support the shared decision-making process by informing the orthopaedic surgeon and patient of when surgery may not be appropriate.

  • This may improve the overall outcomes after hip arthroscopy and improve the cost-effectiveness of the procedure.


Hip-related pain causes disability and low quality of life in young to middle-aged individuals.1–3 Since the conceptualisation of femoroacetabular impingement in the early 2000s by Ganz et al,4 large advances have been made in relation to definitions, diagnosis, classifications and treatment of hip-related pain.1 2 5 This has led to an exponential rise in the number of hip arthroscopies performed globally.3 6 Several studies have investigated outcomes after hip arthroscopy, showing favourable short-term to long-term results.7–9 However, residual symptoms and activity limitations are common,10 and up to 50% of patients seem to have unacceptable symptoms11–13 or are unable to return to preinjury sports activities after hip arthroscopy.14–16 These results suggest that, although considered effective at a group level, not all patients are suited for hip arthroscopy. Consequently, there has been recent focus on identifying prognostic factors (such as age and sex) associated with good and poor outcomes after hip arthroscopy to aid surgical candidate selection and improve surgical outcomes.17 ,18 19 While identification of prognostic factors can be used to guide preliminary decision making at a group level, development and external validation of clinical prediction models are needed for individual outcome prediction.20 21 ,22 Several prediction models have been published recently for hip arthroscopy patients,23–31 yet, only one model, predicting conversion to hip arthroplasty, has been externally validated.24 However, this study included intra-articular findings identified during hip arthroscopy as predictor variables, limiting the utility of the model prior to surgery.24 In addition, most existing prediction models attempted to predict achievement of the minimal clinically important difference (MCID),23 25 26 28–30 although achieving an acceptable symptom state or not matters more to patients than an improvement.32

We aimed to advance the field of individual prognosis in patients undergoing hip arthroscopy by developing and validating prediction models. These models were applied preoperatively to determine the probability of achieving an unsuccessful or successful outcome defined by the patient’s acceptable symptom state, as a primary aim and improvement or not, based on MCID, as a secondary aim, at 1-year post hip arthroscopy. In additional exploratory analyses, all models were reconstructed using intra-articular findings from the arthroscopic procedure to investigate the potential added benefit of such information.


For the current study, we followed the initial three steps of The PROGnosis RESearch Strategy (PROGRESS) framework (figure 1).33 The PROGRESS33 is a four-step framework for prognostic research: (step 1) description of outcomes of current care (fundamental prognosis research), (step 2) identification of factors associated with outcomes (prognostic factor research), (step 3) development and validation of prediction models (prognostic model research) and (step 4) utilisation of the information to tailor treatment (stratified care research). The final step in the PROGRESS framework (step 4: stratified care research) is beyond the scope of the present study.

Figure 1

Study process from initial idea to prediction model development inspired by The PROGnosis RESearch Strategy (PROGRESS) Framework.5 * and ** refers to reference 50 and 55 respectively.

The reporting of the present study adheres to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines,34 35 supplemented with recommendations from Prediction model Risk Of Bias Assessment Tool.36 ,37 We developed and validated four multivariable prediction models to determine 1-year outcomes of patients with hip-related pain undergoing hip arthroscopy in Denmark using only predictor variables that are available at the initial consultation before undergoing hip arthroscopy (demographic data, radiological data, patient-reported outcome measures) to reflect the intended use of the models.34–36 As supplementary and exploratory analyses, all models were also constructed including perioperative predictor variables (information in hip-joint cartilage and labral injury identified during surgery). These models were considered supplementary since the additional predictor variables were not available at the time the models are intended to be used clinically; that is before undergoing surgery,36 and thus merely serve as explorative analyses to scientifically understand the potential role of perioperative findings.

Source of data

Data were collected retrospectively from the Danish Hip Arthroscopy Registry (DHAR) for both development and temporal validation (steps 1C, 2B, 2C; figure 1).37 DHAR is a national database initiated in 2012 with ongoing web-based prospective registration of hip arthroscopies performed at 11 specialised public and private hospitals/clinics, including 21 orthopaedic surgeons, in Denmark (detailed information on DHAR is provided in references37–39). Hip arthroscopies included in the present study were performed between 25 April 2012 and 4 October 2017 (development sample) and 5 October 2017 and 13 May 2019 (validation sample).


All participants for the development and validation models were included from DHAR database.37 39 Inclusion criteria were as follows: Male/female who had a hip arthroscopy at the age of 15–50 years. Exclusion criteria were as follows: A previous periacetabular osteotomy; revision hip arthroscopy within 1 year (mean time to revision in DHAR: 17 months)38; previous hip pathology such as Perthes’ disease, slipped capital femoral epiphysis and/or avascular necrosis of the femoral head; any rheumatoid disease in the hip joint such as synovial chondromatosis, incompleteness of data regarding preoperative and postoperative self-reported hip and groin function and pain (see table 1 for key characteristics related to the development and validation sample).

All included patients were treated arthroscopically for various causes of hip-related pain.1 The DHAR contains data from several surgeons and the specific surgical techniques and indications for surgery may vary; and are not captured in DHAR.37 Commonly, surgeries were performed under general anaesthesia in supine position using a standard two-portal technique (anterolateral and inferior mid-anterior),10 40 with surgical procedures (eg, rim trimming, labral repair, chondral debridement and capsular closure) performed as indicated by the surgeon. Information on postoperative management is not contained in DHAR,37 however, all patients were offered physiotherapist-led rehabilitation, either at the surgical facility or at a local community/private physical therapy centre for a period of 3–5 months.10 40–42

Table 1

Summary of key study characteristics for the development and temporal validation samples (n=1546)

Patient and public involvement

We did not include patients or public stakeholder as part of the design and reporting of this study. We aim to include patients in the further validation of the prediction models to get insights of patient perspective in terms of use and potential implementation.

Outcomes predicted by the models

Successful and unsuccessful outcome (PASS or not)

The primary outcomes of interest to be predicted were defined a priori (step 1As and 1B; figure 1) and included patients who, at 1 year after surgery, had: (1) a successful or (2) an unsuccessful outcome. To determine a successful and unsuccessful outcome, we used previously established cut-off scores of the Patient Acceptable Symptom State (online supplemental file 1 13) based on the Copenhagen Hip and Groin Outcome Score (HAGOS).43 Patients were categorised as having a successful outcome if all HAGOS subscale scores at 1 year, extracted from DHAR, surpassed the individual subscale PASS cut-off scores. In contrast, patients were categorised as having an unsuccessful outcome if none of the HAGOS subscale scores surpassed the PASS cut-off scores. This means that patients who only had achieved PASS cut-off scores for some HAGOS subscales were included as comparator group in both prediction models.

Supplemental material

The primary endpoint of 1-year post hip arthroscopy was decided and based on previous literature indicating that patient-reported outcomes seem to plateau from 1 to 5 year postoperatively,44 as well as 1-year outcomes being associated with both revision surgery45 46 and 5-year outcomes.47 Definitions of outcomes were a priori agreed on among all authors of the present study.

Clinical improvement (MCID or not)

The secondary outcomes to be predicted were patients who had (1) an improvement or (2) not an improvement in self-reported hip and groin function and pain from before to 1 year after hip arthroscopy. To determine an improvement or not an improvement, we used the MCID48 of the HAGOS questionnaire. We calculated MCID for each HAGOS subscale as 0.5 SD of the preoperative HAGOS subscale values (online supplemental file 1).10 Patients were categorised as having an improvement if the change from preoperation to 1-year postoperation on all HAGOS subscales surpassed the MCID scores, whereas patients were categorised as not having an improvement if no change above the MCID scores in any HAGOS subscale were observed from preoperation to 1-year postoperation.

Predictor variables

All predictor variables were extracted from DHAR a priori.37 Based on availability of predictor variables, we decided on 26 predictor variables for the primary prediction models (and 5 additional variables for the supplementary models). Selection of variables were based on previous studies regarding prognostic factors for outcomes after hip arthroscopy17 combined with consensus among the authors (hip arthroscopy surgeon (n=1), physiotherapist (n=5), steps 2 A–C, figure 1). This was done by listing all the potential predictor variables contained in DHAR including items from HAGOS, and subsequently relating them to existing literature on risk factors for a poor or good outcome (table 2) combined with clinical experience of the authors (LI, 5 years; KT, 23 years; JLK, 28 years; MPR, 30 years; MFN, 3 years, PH, 40 years). A full list of predictor variables and reasons for selection is presented in table 2. Preoperative radiographies were assessed by the operating surgeon and included Lateral Centre Edge Angle, Ischial Spine Sign, Alpha Angle, Joint Space Width and Acetabular Index Angle as these represent common radiological measures to determine femoral head-neck and acetabular morphology.2 49 For a description of each measure, we refer to online supplemental file 2. Preoperative self-reported variables related to hip function, pain severity, psychosocial state was obtained using patient-reported outcome measures (table 2). We prioritised to include specific items as predictors rather than composite scores, as single items can represent specific constructs and be easily implemented in the history-taking process (table 2). Finally, perioperative findings of cartilage and labral injury were assessed during hip arthroscopy (online supplemental file 2), but these variables were only included in the supplementary prediction models.

Supplemental material

Table 2

Overview of a priori defined predictor variables included in the prediction models

Sample size considerations

An a priori sample size calculation was not performed as the sample size was determined by eligible patients in DHAR (n=1546). However, to minimise the risk of overfitting and ensure precise estimations, we performed the four-step sample size calculation approach suggested by Riley et al 50 using the ‘pmsampsize’ (V.1.1.0) package in R. This helped us to identify if the number of a priori defined predictor variables were reasonable to include in the development of the models before overfitting becomes a concern.50 With an outcome proportion of 0.3 for the primary outcome measures, 26 predictor variables, an expected shrinkage factor of ≤10 % and a C-statistics of 0.78 (estimated R2: 0.20) based on previous models,25 26 29 30 1043 patients were deemed adequate for model development, corresponding to 313 events and 12.03 events per predictor; we included 1082 patients in the development sample.50 The remaining 464 patients were used in the temporal external validation sample, which secured at least 100 events as recommended for the primary outcome measures.51 However, larger sample sizes may be needed for precise estimates of calibration.52

Missing data

Missing data for predictor variables were imputed by multiple imputation with chain equations on both development and validation sample with 20 imputations. Two radiological variables (alpha angle and acetabular index angle) had ~10% missing data (online supplemental file 1). Imputations models were based on all available data from the 26 predictor variables and outcome variables. Continuous variables were imputed by predictive mean matching and categorical variables by polytomous logistic regression. Prediction models were fitted by both imputed data as well as complete case to evaluate impact of the missing values.34

Statistical methods

Development and temporal validation of prediction models were analysed using logistic regression models including all 26 prediction variables as single term with no interactions to minimise risk of overfitting. Linearity of continuous variables was evaluated by comparing models with single term against restricted cubic spline terms, using 1–10 knots, for each variable. Models were compared using Akaike information criterion (AIC) and likelihood-ratio test. For most variables, spline models were not significantly different from single term models and in these cases the reduction in AIC was less than 0.5%. Because of this all variables are included as single term to reduce complicity and risk of overfitting, as the impact of possible violation of linearity is likely very small.34 The supplementary prediction models included five additional predictors related to perioperative findings (table 2). We chose a logistic regression model approach over machine learning, although machine learning is popular in hip arthroscopy research,23 25 28–30 a recent systematic review found similar predictive performance between the two approaches for clinical prediction models.53 In addition, logistic regression requires far less events per variable compared with machine learning strategies.54 All continuous variables were kept continuous34 and ordinal scales were treated as continuous, except JSW and HSA (table 2). Uniform shrinkage by bootstrapping, with 1000 replications, were applied to regression coefficients.34 All analyses were performed in R (R Foundation for Statistical Computing, Vienna, Austria, V.3.6.3).

External temporal validation

To evaluate the performance of the prediction models on the temporal external validation set, we obtained the predicted probability for each patient in the validation data set using the intercept and regression coefficients derived from the development data. Model performance was investigated in line with the TRIPOD recommendations34 using the framework presented by Steyerberg et al.55 We report the explained variance (Nagelkerke R2), calibration plots (and associated statistics) and56 discrimination statistics (area under the receiver operating characteristics curve, AUC).57 In addition, we report histograms to visualise the distribution of predicted probability between patients with and without the outcome55 and sensitivity and specificity for a range of probability thresholds.57

Calibration refers to the agreement between observed outcomes and outcome predictions, and thus is a measure of the model’s ability to provide unbiased estimates.34 We assessed calibration as defined by Van Calster et al 56 as: (1) mean calibration (calibration-in-the-large) reflecting if the observed outcome rate equals the average predicted risk, (2) weak calibration reflecting if the model, on average, overestimates or underestimates the risk assessed by calibration intercept and slope, with a target value of 0 and 1, respectively, and (3) moderate calibration, reflecting if the estimated risks corresponds to the observed proportions, assessed graphically using a calibration plot, with the target being a smoothed calibration curve lying closely around the 45° line.56 Calibration plots and associated parameters were produced using ‘’ package in R.58 Discrimination was assessed using AUC (c-statistics), which quantifies the model’s discriminative ability, that is the probability that the model estimates higher risks for patients with the outcome than patients without the outcome.57 AUC ranges from 0.5 to 1, representing no and perfect discriminative ability, respectively.57 All validation plots and values were generated for each imputed dataset and presented as ranges of values.


Of 2550 eligible patients, we included 1546 patients with complete outcome data at 1-year follow-up (figure 2). In general, very small differences were observed between included and patients with missing outcome data for demographics, radiology, operative findings and preoperative symptoms (online supplemental file 1).


In total, 1082 patients were used for developing the models, whereas 464 patients were used for validation, with samples being comparable in terms of demographics, radiology, operative findings, preoperative symptoms and outcomes (table 1; see online supplemental file 1 for a summary of the distribution of predictor variables in the development and validation sample). Since missing data in predictor variables were imputed, all patients with complete HAGOS at baseline and 1-year follow-up were included. The proportion of events were similar between the development and validation samples; successful outcome (development: 339 events (31.3%), validation: 137 events (29.5%)), unsuccessful outcome (development: 294 events (27.2%), validation: 117 events (25.2%)), improvement (development: 333 events (30.8%), validation: 161 events (34.7%)) and no improvement (development: 140 events (13.0%), validation: 51 events (11.0%)). Clear differences were found between groups in postoperative HAGOS scores and change in HAGOS score from presurgery to postsurgery (figures 3 and 4; similar findings were observed in the development sample; online supplemental file 4).

Supplemental material

Figure 3

Self-reported hip and groin pain and function measured using the Copenhagen Hip and Groin Outcome Score (HAGOS) in patients with a successful outcome defined as having a Patients Acceptable Symptom State (PASS) in all HAGOS subscales versus in some/no subscales (left figure), and patients with an unsuccessful outcome defined as having PASS in no HAGOS subscales versus in some/all subscales (right figure). Error bars show IQR. ADL, activities of daily living.

Figure 4

Changes in self-reported hip and groin pain and function measured using the Copenhagen Hip and Groin Outcome Score (HAGOS) in patients who have achieved an improvement defined as exceeding the Minimal Clinically Important Difference (MCID) in all HAGOS subscales versus in some/no subscales (left figure), and patients who have not achieved an improvement defined as not exceeding MCID in any HAGOS subscale versus in some/all subscales (right figure). Error bars show IQR. ADL, activities of daily living.

Model specification and performance

For the development models, calibration plots and associated statistics are presented in online supplemental file 3. The validation models, the best model performance was found for the primary outcome measure, an unsuccessful outcome (Nagelkerke R2 range: 0.25–0.26), which also showed adequate calibration (predicted mean probability vs actual mean probability: 27.0% vs 25.2%; intercept range: −0.10 to −0.11; slope range: 1.06 to 1.09 and discrimination (AUC range: 0.76 to 0.77) (see figure 5 for a representative calibration plot and online supplemental file 8 for all calibration plots derived from the multiple imputations).

Supplemental material

Supplemental material

Figure 5

Calibration plot for predicting patients who have achieved an unsuccessful outcome (PASS in no HAGOS subscale)Shaded area in calibration plots depicts 95% CIs. HAGOS; Copenhagen Hip and Groin Outcome Score.

The model for successful outcomes showed poor calibration and discrimination. A complete summary of model performance for all four models are available in online supplemental file 1); while sensitivity and specificity for probability thresholds (from 0.1 to 0.9) are presented in online supplemental file 5.

Supplemental material

For usage of the prediction models, the full models with estimates are presented in online supplemental file 6 while an excel calculator is provided online The complete-case analyses showed similar model performance for all outcomes. For the supplementary models, the addition of perioperative findings (information on cartilage and labrum injuries) did not improve model performance.

Supplemental material


The present study is the first to develop and externally temporal validate clinical prediction models to identify those hip arthroscopy patients who at 1 year after surgery can be considered having a successful (having achieved PASS) or unsuccessful (not having achieved PASS) outcome. Our findings indicate that by using 26 common clinical variables, including demographics, radiographic parameters of hip morphology and self-reported measures, the probability of patients with an unsuccessful outcome (1-year mean HAGOS Subscales scores ranging 13–43 points; figure 2) can be predicted with acceptable discrimination and adequate calibration. The calibration, however, becomes imprecise towards higher predicted probability due to few events (figure 3). The prediction model for successful outcomes showed less accuracy, and thus is unlikely to impact clinical practice.

The present study extends existing knowledge regarding prediction modelling for hip arthroscopy. Although several models have been published, these are associated with important methodological shortcomings, which may result in too optimistic and/or unstable predictive performance.24–26 28–31 First, only one of eight existing prediction models has been attempted externally validated,24 however, this was only based on 13 patients with the outcome of interest (a minimum of 100 events are recommended for external validation).51 52 Since prediction models show best performance on the development sample, external validation is needed to adjust initial optimism and improve application to future patients.34 In the present study, this is illustrated by C-statistics for all models being lower in the validation sample than the development sample. Compared with the present study, no sample size consideration has been made in any previous study, resulting in events per predictor ranging between 3 and 8.25 26 28–30 While this may not seem very different from the present study (events per predictor for the primary outcome: 11–13), the majority of published prediction models has been developed using machine learning strategies,25 28–30 which require >200 events per predictor before low optimism and stable performance measures are reached.54 Thus, the existing previous prediction models for hip arthroscopy patients are associated with a high risk of overfitting, and thus potentially unreliable predictions when applied on future patients.50

Clinical usefulness of the prediction models

The present study suggests that the probability of having an unsuccessful outcome, defined as not having PASS in any of the HAGOS subscales, can be predicted. While hip arthroscopy is considered an effective procedure for treatment of hip joint-related pain,8 up to 50% of patients do not achieve an acceptable symptom state at 1–2 years follow-up13 highlighting the clinical relevance of identifying patients for whom surgery may not be helpful. The proportion of patients with residual symptoms may thereby decrease and the overall outcome of hip arthroscopy improve. Thus, the prediction model is an initial step towards stratified care for patients with hip-related pain.59 However, before clinical adoption and stratified care is implemented (step 4 of the PROGRESS Framework),59 the model should be externally validated in a true external data set, while the effectiveness of the model is tested in a randomised controlled trial with stakeholder involvement.

How should the prediction models be used

The prediction model can support clinical evaluation and shared decision making by informing the orthopaedic surgeon and the individual patient about the risk of an unsuccessful outcome. In practice, the probability is derived using the prediction formula (presented in online supplemental file 6), and available as an Excel calculator ( for illustrative purposes, which combines the ORs for all 26 predictors into a single probability from 0% to 100% (An example can also be found in table 3). It is important to state that single predictors, although statistically significant, should not be used in isolation, as the performance of the prediction model relies on all predictors regardless of p values for individual predictors. Since the prediction model is developed and validated on patients who underwent surgery, the prediction model is best used once the orthopaedic surgeon has decided for surgery. In such instances, the model could be used as a data-driven ‘second opinion’ to estimate the risk of an unsuccessful outcome, and understand when surgery may not provide enough benefit to proceed. In clinical practice, this means that the prediction model is suited to be used in the final stages of a stepped-care approach60 starting with targeted exercise-based treatment followed by potential surgery if symptoms have not resolved.2 61 If used for dichotomous decisions in clinical practice (surgery vs no surgery), we advise that the predicted probability is combined with the sensitivity and specificity measures presented in online supplemental file 5, to understand the false positive and negative rates of the specific probability threshold, that is, misclassification of patients.

Table 3

Overview of predictive variables for two patients with either a low or high probability of having an unsuccessful outcome according to the prediction model


The present study is associated with some limitations. First, we appreciate that dichotomisation of a continuous outcome is generally not recommended due to loss of information57; but since HAGOS contains six subscales that cannot be aggregated to a single score,36 we chose the current approach to avoid dealing with six different predictions models (one for each subscale), that would complicate the clinical utility. We believe that patients who have exceeded the cut-off scores of all HAGOS subscales at 1-year follow-up are likely to represent a subgroup of patients that feel very well after surgery (a successful outcome) and vice-versa for patients who do not surpass a single subscale score (an unsuccessful outcome).32 Second, since the prediction models were developed based on data from the DHAR, predictor variables were limited to those contained in the registry.37 However, these were included based on their potential association with hip arthroscopy outcomes17 and represent common, currently used, and easily collectable clinical variables, although we cannot exclude the potential added value of additional variables. Third, although we included at least 100 events in the external validation models for the primary outcome based on rule-of-thumb,51 we appreciate that this rule may be imprecise.52 Based on the reviewer response, we were made aware that simulations of CI for C-statistic and slope estimate for calibration curves can be used to estimate the number of events in the validation sample, with R-code for simulations available at This was done post hoc with 500 repetitions, an SD of the linear predictor from the model of 0.8 (based on the coefficients from the prediction model fitted by the development data) and probability of event set to 30% (successful outcome, unsuccessful outcome and improvement) and 12% (no improvement). Curves for number of events ranging from 50 to 1000 can be found in online supplemental file 7. Additionally, estimations for sample size calculation were based on the number of variables used in the model not the number of parameters (number of levels minus 1). Because the Hip Sports Activity Scale and Joint Space Width were used as 4 and 3 level variables, respectively, this contributes 3 and 2 levels rather than 1 level each. All other variables were either 2 levels or continues variables each contributing 1 level. Sample size should, therefore, have been based on 29 parameters instead of 26 variables. Fourth, we appreciate that model development and validation was performed using all hospitals combined and that site-specific differences may exist that could impact on the predictive performance when applied in a specific setting. Therefore, further external validation is needed to confirm the present findings at each site. Fifth, like many other registries DHAR contains missing data on postoperative outcome, and these patients were excluded from the present study; thus, we cannot exclude that the study sample represent a selected cohort of patients. Finally, while we have no specific information on the postoperative rehabilitation received, we acknowledge that this is considered an integral part of the hip arthroscopy procedure61 with potential to affect postoperative outcomes,62–64 and thus the predictive performance.

Supplemental material


Common clinical variables including demographics, radiographic parameters of hip morphology and self-reported measures were able to predict the probability of having an unsuccessful outcome 1 year after hip arthroscopy. This temporal externally validated prediction model can be used to support clinical evaluation and shared decision making by informing the orthopaedic surgeon and patient about the risk of an unsuccessful outcome, and thus when surgery may not be appropriate. This may reduce unsuccessful outcomes and could therefore potentially improve the overall outcome of hip arthroscopy in the future. Patients with a successful outcome (achieving the patients acceptable symptom state) was less accurately predicted.

Data availability statement

Data may be obtained from a third party and are not publicly available.

Ethics statements

Patient consent for publication

Ethics approval

Data handling approval was granted by the Data Protection Agency of the Capital Region, Denmark (Review number: 2012-58-0004). The study was deemed exempt from review of the Danish Ethics Committee of the Capital Region as all data were extracted from a registry approved by the Danish Health Authorities.


Supplementary materials


  • Twitter @LasseIshoei, @KThorborg, @JoanneLKemp, @MikeReiman, @Physiomathias, @PerHolmich

  • Contributors Authors contributed to the concept and design (LI, KT, MPR, JLK, MN and PH), acquisition of the data (LI, KT and PH), analysis (LI and TK) and interpretation (all authors), drafting and revision (all authors), final approval (all authors) and agreement to be accountable (all authors). The guarantor (PH) accepts full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.