Article Text

Download PDFPDF

Reliability and validity of three pain provocation tests used for the diagnosis of chronic proximal hamstring tendinopathy
  1. Angelo Cacchio1,
  2. Fabrizio Borra2,
  3. Gabriele Severini3,
  4. Andrea Foglia4,
  5. Frank Musarra5,
  6. Nicola Taddio6,
  7. Fosco De Paulis7
  1. 1Department of Health Sciences, University of L’Aquila, School of Medicine, L’Aquila, Italy
  2. 2Department of Physiotherapy, Fisiology Center, Forlì, Italy
  3. 3Department of Physiotherapy, Catholic University “Sacro Cuore”, Milano, Italy
  4. 4Department of Physiotherapy, Centro “Riabilita”, Civitanova Marche (MC), Italy
  5. 5Department of Physiotherapy Centro “Lovanium”, Pesaro, Italy
  6. 6Department of Physiotherapy Centro “FKT Filanda”, Cittadella (PD), Italy
  7. 7Division of Diagnostic Imaging, “Valle Giulia Clinic”, Roma, Italy
  1. Correspondence to Angelo Cacchio, University of L’Aquila, P.le Salvatore Tommasi 1, L’Aquila, 67100 Italy; angelo.cacchio{at}


Background The clinical assessment of chronic proximal hamstring tendinopathy (PHT) in athletes is a challenge to sports medicine. To be able to compare the results of research and treatments, the methods used to diagnose and evaluate PHT must be clearly defined and reproducible.

Objective To assess the reliability and validity of three pain provocation tests used for the diagnosis of PHT.

Methods Ninety-two athletes with (N=46) and without (N=46) PHT were examined by one physician and two physiotherapists, who were trained in the examination techniques before the study. The examiners were blinded to the symptoms and identity of the athletes. The three pain provocation tests examined were the Puranen–Orava, bent-knee stretch and modified bent-knee stretch tests. Intraclass correlation coefficients (ICCs) based on the repeated measures analysis of variance were used to analyse the intraexaminer and interexaminer reliability, while sensitivity, specificity, predictive values and likelihood ratios were used to determine the validity of the three tests.

Results The ICC values in all three tests revealed a high correlation (range 0.82 to 0.88) for the interexaminer reliability and a high-to-very high correlation (range 0.87 to 0.93) for the intraexaminer reliability. All three tests displayed a moderate-to-high validity, with the highest degree of validity being yielded by the modified bent-knee stretch test.

Conclusion All three pain provocation tests proved to be of potential value in assessing chronic PHT in athletes. However, we recommend that they be used in conjunction with other objective measures, such as MRI.

Statistics from


The term proximal hamstring tendinopathy (PHT) is used to describe an overuse injury that involves pain at the attachment of the hamstring tendons to the ischial tuberosity.1,,3 In the literature, other terms have been used as ‘hamstring syndrome’, ‘ischiatic intersection syndrome’, ‘hamstring enthesopathy’, ‘high hamstring tendinopathy’ and ‘hamstring origin tendinopathy’.4,,8

Previous studies have clearly described the symptoms,1,,4 the MRI,5 ,7 ,9 histopathological findings,2 conservative3 ,8 and surgical treatment1 ,2 ,4 ,10,,12 of this condition. The immediate and correct diagnosis of PHT is a challenge because of its similarities to other disorders that can cause similar symptoms. MRI has a crucial role in the correct diagnosis and guidance of treatment.2 ,3 ,5 Typically, the diagnosis of PHT is based on the typical MRI findings combined with the exclusion of other conditions that can cause similar symptoms, such as piriformis syndrome, lumbar sciatic pain, ischial stress fractures, apophysitis, ischiogluteal bursitis or proximal hamstring strain injury.5 ,7 ,9 MRI provides detailed anatomical information of proximal hamstring tendons13 and their pathological changes.5 ,7 ,9 Typical MRI findings of PHT include increased tendon girth and intrasubstance signal heterogeneity.3 ,5 ,7 ,9 MRI also has an invaluable role in the differential diagnosis with malignancies of this area. However, information in the literature on the clinical identification of patients with PHT is limited, and the difficulties encountered in the clinical assessment of patients with PHT may in part be attributed to the lack of specific clinical tests.

Because the main symptom of PHT is an ill-defined pain during sports activities in the area of the ischial tuberosity, sometimes that radiates distally to the popliteal fossa,1,,4 ,6 ,8 ,10 ,11 pain provocation tests for the assessment of PHT can be used for diagnostic purposes.1,,4 ,8 We hypothesise that these tests, combined with the typical MRI findings of PHT, allow the physician and/or physiotherapist to make a correct diagnosis of PHT. However, no studies have yet assessed the association between pain provocation tests and the final clinical or imaging diagnosis, as well as their reliability and validity.

The aims of this study were to describe three pain provocation tests (two active and one passive) that are regularly usedcombined with typical MRI findings for the diagnosis of PHT to assess their reliability and validity and to evaluate their sensitivity and specificity using MRI as the criterion measure.

Materials and methods

Between March 2004 and December 2007, 92 professional athletes with (N=46) and without (N=46) PHT were recruited for this study. The athletes' demographic and baseline characteristics are shown in table 1.

Table 1

Baseline characteristics of the two groups of athletes

The inclusion criteria for all the athletes were at least 18 years of age; regular participation in a professional sports activity at least 3 years. The additional inclusion criteria for athletes with a diagnosis of PHT were a visual analogue scale (VAS) score of ≥4 cm during sports activities, preventing the athletes from completing the training sessions or competing at their usual levels; the presence of symptoms for at least 6 months.

The exclusion criteria for all the athletes were a history of a fracture, congenital defect, neoplasm, or previous surgery of the lumbar spine, pelvic girdle, hip joint or femur; signs indicating radiculopathy (asymmetric Achilles tendon reflex or passive straight-leg-raising restricted by pain in the lower leg); a systemic disease of the locomotor system; contraindications to MRI (eg, severe claustrophobia and foreign metal objects).

After giving their written informed consent, all the athletes underwent a thorough clinical examination of the pelvic region, including the hip and back, and then provided an account of their clinical history. The clinical examination preceded the history taking. The presence of PHT was thus assessed on the basis of the patient's clinical examination and history by an expert physician (AC), and confirmed by means of MRI performed by an experienced radiologist (FDP).

A diagnosis of PHT was made when the athlete had pain in the lower gluteal region (VAS score of ≥4 cm), tenderness (grades 1 to 3) in the ischial tuberosity area, and confirmed by typical MRI findings (grades 2 to 3). Self-rated pain intensity during sports activities was scored on a 10 cm horizontal VAS with scores ranging from 0 (no pain) to 10 (very severe pain). Tenderness in the ischial tuberosity area was subjectively graded (absent=0, mild=1, moderate=2 and considerable=3) by the examiner.

A positive MRI scan for chronic PHT was recorded when increased tendon signal intensity was considered present (grades 2 to 3).3 Signal intensity was graded according to the classification of Khan and colleagues14 (grade 1=normal tendon; grade 2=thickened tendon with homogeneous signal intensity; grade 3=intratendinous high signal intensity, which was diagnosed when there was a signal intensity change that was predominantly visible on T1-weighted images, because in these sequences the intensity change was not affected by the ‘magic angle’ effect).15 The proximal hamstring tendons were carefully compared with those on the contralateral side. However, it must be borne in mind that bilateral hamstring tendon involvement is not unusual in athletes.

Thirteen athletes (seven in the symptomatic group and six in the asymptomatic group) reported having a history of at least one hamstring tear. On average, these patients had 3.2 (range 1–6) episodes of suspected hamstring tears. Another three athletes (one in the symptomatic group and two in the asymptomatic group) reported having a history of anterior cruciate ligament rupture and reconstruction in the past.


Three physiotherapists (FB, GS and AF) with over 10 years of clinical experience in sport-related conditions were used as examiners for the purposes of this study. A physician (AC) trained the examiners in the techniques. The training involved three steps: (1) discussion of procedures and examination criteria; (2) a 2-day seminar providing training on the examination technique during which several assessments were performed on patients and revised and (3) a final discussion on patient cases aimed at refining the examination technique.


We examined three pain provocation tests. The active test was the Puranen–Orava (PO) test, while the two passive tests were the bent-knee stretch (BK) test and the modified bent-knee stretch (MBK) test. The result of each test was considered to be positive if the athlete's symptoms were exacerbated by the test.

The pain provoked by the passive tests was scored using a four-point scale: 0 (no pain), 1 (mild: report of pain without grimace, flinch or withdrawal), 2 (moderate: pain plus grimace or flinch), 3 (unbearable: the examiner/athlete is not able to complete the test because of withdrawal).

PO test

This test entails actively stretching the hamstring muscles in the standing position with the hip flexed at about 90°, the knee fully extended and the foot on a support (figure 1).4

Figure 1

Puranen–Orava test.

BK stretch test

The BK stretch test for the proximal hamstring tightness is performed with the patient supine. The hip and knee of the symptomatic leg are maximally flexed, and the examiner slowly straightens the knee.8

MBK stretch test

The patient lies in the supine position with the legs fully extended; the examiner grasps the symptomatic leg behind the heel with one hand and at the knee with the other hand, maximally flexes the hip and knee, and then rapidly straightens the knee (figure 2).3

Figure 2

Modified bent-knee stretch test.


Each patient was assessed by three examiners who were blinded to the athletes' clinical history and MRI findings. The three examiners administered the tests bilaterally in all the athletes independently and were blinded to each other's test results. The tests were conducted in a randomised order. Each athlete underwent all three examinations for each leg on the same day, usually within a 30-min period, to minimise the day-to-day variation in the athlete's symptoms and signs.

Thirty-five athletes were re-examined by two examiners (FB and GS) 3 days after they had first been examined to evaluate the intraexaminer reliability. The time interval between the two examinations was set at 3 days as this is considered16 to be sufficient to prevent the carry-over effects and to give athletes the time to recover from the first examination. Athletes were tested as close to the same time of the day as possible on each occasion. The athletes did not receive any therapy between the two measurements.

Statistical analysis

The statistical analysis was performed by FM and NT using SPSS version 9 for Windows (SPSS, Chicago, Illinois, USA) and GraphPad InStat version 3.05 for Windows (GraphPad Software, San Diego, California, USA). The results of the three examiners were compared by means of a two-way mixed-effect intraclass correlation coefficients (ICCs). This coefficient is derived from an analysis of variance model that incorporates athlete and examiner effects. The particular model we used assumes that the examiners were a random sample of all possible examiners (random effects model). Although convenience determined the choice of examiners, and the fixed effects model should consequently apply, estimates of ICCs and other parameters yielded by this model are slightly more conservative than those obtained in the fixed effects model. ICC values range from 0 to 1. High levels of agreement (ICC about 1) arise when the degree of variation resulting from the readers is smaller than that resulting from the sum of all the sources of variation (athlete, examiner and residual error). An important assumption in this model is that the measurements have consistent within-athlete and within-examiner variances.

ICCs were interpreted as follows: 0.00 to 0.25=little, if any, correlation; 0.26 to 0.49=low correlation; 0.50 to 0.69=moderate correlation; 0.70 to 0.89=high correlation; and 0.90 to 1=very high correlation.17

The goal of a diagnostic test is to distinguish between subjects with and those without a particular condition. The applicability of a test rests on its comparison with the ‘gold standard’, which discriminates between subjects who certainly have and those who do not have the condition being tested. However, we did not find a ‘gold standard’ that could easily be used as a reference point for the PHT condition. Therefore, discriminative validity was tested to determine whether the three provocative pain tests could discriminate between athletes with and those without PHT. We applied a criterion-referenced test of validity18 to assess the ability of the three tests to correctly classify athletes according to their a priori diagnosis, on the basis of clinical symptoms and MRI findings, of the presence or absence of PHT. We may assume that MRI findings, coupled with clinical symptoms, can serve as a ‘criterion measure’ for the presence of PHT in our series of athletes.

Sensitivity, specificity, positive and negative predictive values and positive and negative likelihood ratios for each of the three pain provocation tests with the corresponding 95% CI were calculated using standard formulas from a 2×2 table.19 Acceptable values were set at x≥0.80.


None of the athletes was excluded from the study.

Interexaminer reliability

The analysis of variance did not detect any examiner bias (F=0.26, p=0.85) or order effect (F=1.43, p=0.38) in the assessment.

The interexaminer reliability assessment revealed a very strong correlation between the three examiners. Interexaminer ICC values ranged from 0.82 to 0.88 for symptomatic athletes (table 2) and from 0.80 to 0.87 for asymptomatic athletes.

Table 2

Interexaminer reliability of the three pain provocation tests in symptomatic athletes at baseline

Intraexaminer reliability

There was no time effect for the three tests (F=0.16, p=2.02). The intraexaminer reliability obtained in this study was high or very high for all the examiners.

Intraexaminer ICC values ranged from 0.87 to 0.93 for FB and from 0.88 to 0.90 for GS for symptomatic athletes (table 3), and from 0.86 to 0.92 for FB and from 0.87 to 0.91 for GS for asymptomatic athletes.

Table 3

Intraexaminer reliability of the three pain provocation tests in symptomatic athletes


Results regarding the validity are summarised in table 4. The sensitivity for the PO test was 76%, while its specificity was 82%. A sensitivity of 76% indicates that 76% of athletes who had true chronic PHT had a positive PO test, and that about 24% of the athletes who had true chronic PHT were missed by the PO test. Although there are no agreed-upon standards for judging sensitivity and specificity,20 we believe the 76% sensitivity yielded by our study should be considered moderate because a quarter of the athletes who had true chronic PHT were misclassified. A specificity of 82% indicates that 82% of the athletes who were healthy had a negative PO test, and that about 20% of the healthy athletes were missed by the PO test. The fact that specificity was higher than sensitivity indicates that the PO test more effectively identifies athletes who are healthy than those who have true chronic PHT.

Table 4

Sensitivity, specificity, predictive values and likelihood ratios of the three pain provocation tests

Better results were obtained with the BK and MBK tests, which yielded a sensitivity of, respectively, 84% and 89%, and a specificity of, respectively, 87% and 91%. Thus, these two tests on their own failed to detect about 10% to 20% of athletes who had true chronic PHT (sensitivity) or who were healthy (specificity).

The positive and negative predictive values were, respectively, 81% and 77% for the PO test, 86% and 85% for the BK test and 91% and 89% for the MBK test. These data show that our tests more accurately identify athletes who have true chronic PHT than those who are healthy.

The positive and negative predictive ratios were, respectively, 4.2 and 0.29 for the PO test, 6.5 and 0.18 for the BK test and 10.2 and 0.12 for the MBK test. According to the guide to interpreting likelihood ratios drawn up by Jaeschke et al,21 the MBK test is subject to large shifts from the pretest to post-test probability, the BK test is subject to moderate shifts, while the PO test is subject to small (though potentially important) shifts from the pretest to post-test probability.


This study described three pain provocation tests used for the diagnosis of PHT and investigated their reliability and validity. The reliability and validity of clinical tests are important when documenting and assessing the outcome of a condition in clinical practice and research. To our knowledge, this is the first study assessing the diagnostic value of these tests, particularly in reference to sports-related chronic PHT.

The principal findings from this study indicate that all three pain provocation tests are relatively easy to perform and display a high (>0.80) to very high (>0.90) interexaminer and intraexaminer reliability.

Additionally, the results presented here suggest that although the results varied depending on the specific pain provocation test, the overall sensitivity, specificity, positive and negative predictive values, and positive and negative likelihood ratios for the three pain provocation tests were high, and the MBK test generally yielded the highest values, which ranged from 89% to 91%.

We also observed that athletes with a positive MBK test were with greater frequency also positive to the other tests. Although the mechanism of the pain provocation tests causing pain in the context of PHT is not clear, our findings suggest that the mechanism of pain causation may be similar for all tests, although the MBK test execution is faster. Further research needs to be undertaken to assess the exact mechanism of pain causation. However, it could be hypothesised that all these tests may cause increased tension across the proximal hamstring tendons and this may give rise to pain in these structures. To explain the lower pain provoked by the PO test, we hypothesised that the active administration of this test can lead to a pain-derived inhibition of test execution.

Overall, our findings indicate that the use of these three clinical tests can help physicians and physiotherapists to formulate a clinical diagnosis of PHT in an athlete. Moreover, when a physician or a physiotherapist suspects a PHT in an athlete and is pressed for time, the MBK should be chosen as the screening test. Nevertheless, the clinical diagnosis of PHT should always be confirmed by an MRI examination. Because MRI provides detailed anatomical information of tendons13 and their pathological changes,3 ,5 ,7 ,9 and is sensitive in depicting the causes for symptoms related to the hamstring tendons,3 ,5 ,7 ,9 we recommend the routine use of MRI to confirm the diagnosis of PHT.

There are some limitations in this study. The specific nature of the clinical conditions in our subjects means that our results cannot be generalised. Although the three pain provocation tests demonstrated acceptable validity index values in this study, it should be borne in mind that other diagnoses that may be responsible for pain at the ischial tuberosity level were not analysed. In particular, these three tests need to be evaluated in athletes with ischiatic bone pathology, ischiogluteal bursitis and quadratus femoris muscle injury. Therefore, the role of MRI is essential in the diagnosis of PHT.

What is already known on this topic

The clinical assessment of chronic proximal hamstring tendinopathy in athletes is a challenge to sports medicine. To be able to compare the results of research and treatments, the methods used to diagnose and evaluate PHT must be clearly defined and reproducible.

What this study adds

This study has evaluated the validity and reliability of three pain provocation tests which can be used in clinical setting to evaluate chronic PHT in athletes. This study has shown that these three tests have proven to be of potential value in assessing chronic PHT in athletes, and if used in conjunction with MRI they allow the physician to make a diagnosis of chronic proximal hamstring tendinopathy in athletes.

The assessment of clinical signs (tenderness) and symptoms (pain) that were used to make a diagnosis of chronic PHT is subjective. However, the athletes in this study were examined before their history was taken, which reduced the inherent bias that arises if the athletes are examined for tenderness and pain when their clinical history is already known.

We decided to use clinical and MRI findings as the gold standard for the diagnosis of chronic PHT in our study, because this closely reflects what clinicians do in daily practice when diagnosing chronic PHT.


Despite the limitations of this study, and bearing in mind that the tests we describe need to be assessed further before they can be routinely adopted in clinical practice, we believe that these three tests represent valid, reliable means of diagnosing PHT, and consequently of helping physiatrists, orthopaedists, physiotherapists and other physicians to identify athletes with this disorder. However, we recommend that they be used in conjunction with other objective measures, such as MRI.


The authors thank Dr Lewis Baker and Dr Emma Marcello for their assistance in the final draft of this manuscript.



  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles