Article Text
Abstract
Background Narrowing of the subacromial space has been noted as a common feature of rotator cuff (RC) tendinopathy. It has been implicated in the development of symptoms and forms the basis for some surgical and rehabilitation approaches. Various radiological methods have been used to measure the subacromial space, which is represented by a two-dimensional measurement of acromiohumeral distance (AHD). A reliable method of measurement could be used to assess the impact of rehabilitation or surgical interventions for RC tendinopathy; however, there are no published reviews assessing the reliability of AHD measurement.
Objectives The aim of this review was to systematically assess the evidence for the intrarater and inter-rater reliability of radiological methods of measuring AHD, in order to identify the most reliable method for use in RC tendinopathy.
Study appraisal and synthesis An electronic literature search was carried out and studies describing the reliability of any radiological method of measuring AHD in either healthy or RC tendinopathy groups were included. Eighteen studies met the inclusion criteria and were appraised by two reviewers using the Quality Appraisal for reliability Studies checklist.
Results Eight studies were deemed to be of high methodological quality. Study weaknesses included lack of tester blinding, inadequate description of tester experience, lack of inclusion of symptomatic populations, poor reporting of statistical methods and unclear diagnosis. There was strong evidence for the reliability of ultrasound for measuring AHD, with moderate evidence for MRI and CT measures and conflicting evidence for radiographic methods. Overall, there was lack of research in RC tendinopathy populations, with only six studies including participants with shoulder pain.
Conclusions The results support the reliability of ultrasound and CT or MRI for the measurement of AHD; however, more studies in symptomatic populations are required. The reliability of AHD measurement using radiographs has not been supported by the studies reviewed.
- Shoulder injuries
- Orthopaedics
- Evidence based reviews
Statistics from Altmetric.com
Background
Shoulder pain is a common musculoskeletal condition with point prevalence rates ranging between 7% and 26% in adults.1 The most common source of adult shoulder pain is rotator cuff (RC) tendinopathy, which is a multifactorial condition.2 The supraspinatus tendon, which runs in the subacromial space, is most commonly affected by pathological change. Narrowing of the subacromial space has been variously ascribed to: loss of RC function leading to superior migration of the humeral head,3 altered acromial morphology4 or postural alterations.5 This phenomenon of a reduced subacromial space, as well as the proposed resulting impingement of the RC tendons and subacromial bursa, has been widely implicated in the development of degenerative RC pathology and pain in both athletic and non-athletic populations.6
The size of the subacromial space is commonly quantified by the measurement of distance between the acromion and the humeral head, termed the ‘acromiohumeral distance’ (AHD), using a variety of different radiological methods, including radiographs, CT scans, MRI and ultrasound. Studies of AHD in asymptomatic shoulders have reported ranges of AHD from 6 to 12 mm in the neutral position.7––9 The source of this variation may be due to interindividual variability or the variety of measurement protocols used. Generally, AHD is found to be reduced as the arm moves into abduction up to 90°10 and has been shown to be influenced by muscle contraction11 and by muscle fatigue.12 In symptomatic populations, radiographic studies have suggested an AHD cut-off point of 6–7 mm to indicate the presence of a significant RC tear, with recent work by Goutallier et al13 suggesting that a 6 mm cut-off is indicative of a large tear, not amenable to surgical repair. Other radiological studies have demonstrated that AHD is smaller in patients with RC tendinopathy,14 is positively associated with the size of the RC tear and degree of fatty degeneration of the RC muscles,15 and is a predictor of both short-term disability16 and functional status.17
Surgical interventions, such as acromioplasty, as well as many rehabilitation interventions for RC tendinopathy, are based around attempting to correct or ameliorate a reduced AHD, with the expectation that this will improve shoulder symptoms and function.2 It is therefore important that a reliable method of AHD measurement is identified, in order to confirm the veracity of this hypothesis. Reliability of a measurement relates to the degree to which it is consistent and free from error. There are numerous variables that may influence the reliability of AHD measurement including: the type of imaging used, measurement protocol, patient position, presence and degree of tendinopathy and interexaminer variables. Although a recent review evaluated AHD measurement by ultrasound in RC tendinopathy18 concluding that ultrasound-measured AHD is smaller in individuals with RC tears, no assessment of measurement reliability was carried out. Until now, no reviews have examined the reliability of any other radiological method. There is a need for a systematic review assessing the reliability of AHD measurement, so that a more robust basis for the assessment of AHD in individuals with RC tendinopathy can be recommended. In turn, the contribution of reduced AHD to shoulder pain and loss of function, and the impact of AHD alteration with physical or surgical interventions could be determined. Therefore, the aim of this review was to systematically assess the evidence for the intrarater and inter-rater reliability of radiological methods of measuring AHD in relation to RC tendinopathy.
Methods
Inclusion/exclusion criteria
Studies describing the reliability of a method of measuring AHD by any radiological method (specifically radiographs, MRI/MRA, CT, ultrasound) were the focus of this review. We included studies that reported the collection and analysis of any reliability data, whether or not this was a primary aim. Studies involving human adult populations, either healthy participants or participants with diagnosed RC tendinopathy of any degree, as well as studies including those with RC tendinopathy as a subset of other shoulder pathologies were included. We excluded studies of patients with non-RC shoulder disorders, for example, instability and neurological conditions, as the degree and direction of change in AHD is likely to be different in these populations. We did, however, conduct a sensitivity analysis to assess how many papers of non-RC disorders were excluded and whether this had any influence on the conclusions of this review. We only included studies published in the English language.
Search strategy
The search strategy was developed with the help of a medical librarian and involved searches of the following databases, from inception until June 2012: PubMed, CINAHL, MEDLINE, AMED, Sport Discus (using a combined search on the EBSCO database); Google Scholar; ProQuest digital dissertations; Cochrane Central Register of Controlled Trials (CENTRAL) and the Physiotherapy Evidence Database (PEDro). Searches were conducted using the search terms and combinations illustrated in figure 1.
The names of the radiological methods (eg, CT and MRI) were not used in the final search, as this may have restricted the number of papers identified. We also did not include terms related to reliability, as our aim was to also include papers where reliability analysis was conducted as a pilot or secondary aspect of the study. The search syntax was modified to match that in use in each of the databases.
The reference list of each relevant full-text article was reviewed to identify any potential additional references, as well as that of the single relevant systematic review identified. Initial screening of articles by title and abstract to remove clearly unrelated titles was conducted by a single examiner. All identified references were examined independently by two reviewers (KM and JL) by title and abstract in relation to the inclusion and exclusion criteria. Potentially relevant articles were then obtained in full-text format. Two reviewers applied the selection criteria to the full text articles to determine the final ones to be included. A third examiner (JC) was available to resolve disagreement but was not required. A citation search for all included studies was carried out, but no further relevant studies were identified.
Quality assessment and data extraction
Quality assessment of the included studies was completed using the Quality Appraisal for Reliability Studies (QAREL) checklist.19 As recommended by Lucas et al,19 piloting of the checklist was carried out as follows: a single study (one of those included in the review) as jointly assessed by the examiners, with discussions and agreement as to how each item was to be defined in relation to this review. Then two studies excluded from this review (as they examined AHD measurement in neurological populations) were independently assessed with the checklist by two examiners (KM and JL). Agreement on the first study was 72%. Further discussions followed, and subsequently agreement on the second study was 100%.
QAREL checklist items are described in box 1. It was deemed that item five of the checklist was not applicable to this review, as there is currently no accepted, definitive reference standard for the measurement of AHD. In relation to the final item, regarding statistical measures, we required that studies using an intraclass correlation coefficient (ICC) report the model of ICC being used, and also that estimates of precision be presented, in order to achieve a ‘Yes score’ for this item. Two reviewers independently assessed all included studies. Studies were deemed to be of high quality if at least 50% of applicable items were rated as ‘Yes’ on the checklist. Data were extracted from the studies using the QAREL extraction form. The appropriateness of the radiological protocols used in the included studies was separately assessed by an experienced musculoskeletal radiologist (JC). For the purposes of this review, the reliability estimates from ICCs were categorised as suggested by Fleiss,20 that is, >0.75=excellent reliability, 0.40–0.75=fair to good reliability, and <0.40=poor reliability. Owing to the heterogeneous nature of the methods and populations studied, pooling of data were not deemed appropriate in this review. Instead, a ‘levels of evidence’ approach was taken, using a modified version of the Cochrane Back Pain Group criteria,21 that is
-
Strong evidence—consistent findings in multiple high quality studies;
-
Moderate evidence—consistent findings in one high quality and one or more lower quality studies;
-
Limited evidence—consistent findings in one or more lower quality studies;
-
No evidence—if there were no studies or conflicting results.
Results
The literature search retrieved a total of 2115 citations, from which 2073 non-relevant titles or duplicates were removed in the initial screening by a single examiner (see figure 2). Two reviewers then assessed the remaining 42 by title and abstract. These were narrowed down to 21 based on the specified inclusion and exclusion criteria. Following the examination of full text, a further two were obtained from the reference lists. Of these, 18 were included in the review, following the exclusion of 5 papers as 3 contained reliability coefficients only, without details of the method of reliability assessment and 2 did not present the AHD measures separately, but as part of a ratio measure. In the sensitivity analysis, two papers evaluating AHD measurement reliability in non-RC shoulder disorders were identified. Both were carried out in populations with hemiplegic shoulders, and both concluded that ultrasound was a highly reliable method of measurement, aligning closely with the conclusions of the papers included in this review.
Quality assessment
Agreement for QAREL items between the two assessors after independent assessment of the 18 studies was 88%, with an average Cohen's κ value of 0.95, demonstrating an excellent level of inter-rater agreement.22 Joint discussions resolved remaining areas of disagreement, and the final ratings are displayed in table 1. Of the 18 studies assessed with the QAREL checklist, 8 were deemed to be of high quality. There was limited detail describing the blinding of examiners, and randomisation of testing procedures, resulting in the majority of studies being rated ‘Unclear’ on items 6, 7 and 8. A high proportion of incorrect statistical analysis, or inadequate information, also led to item 11 being rated as ‘No’ or ‘Unclear’ for most studies. Six studies gave no information regarding training or experience of the testers, which left the rating for item 2 ‘Unclear’ for these studies. The decision to include studies which reported reliability data as a secondary aspect only presented a possible risk of increasing the proportion of lower quality studies. However, there was in fact a similar proportion of high-quality studies among both the primary (4/11) and secondary (3/7) reliability studies examined. There was also no difference in the proportion of high-quality studies according to radiological modality.
Box 1: QAREL checklist items
QAREL CHECKLIST
-
Was the test evaluated in a sample of subjects who were representative of those to whom the authors intended the results to be applied?
-
Was the test performed by raters who were representative of those to whom the authors intended the results to be applied?
-
Were raters blinded to the findings of other raters during the study? (inter-rater studies only)
-
Were raters blinded to their own prior findings of the test under evaluation? (Intra-rater studies only)
-
Were raters blinded to the subjects' disease status or the results of the accepted reference standard for the target disorder (or variable) being evaluated? (Excluded in this review)
-
Were raters blinded to clinical information that was not intended to form part of the study design or testing procedure?
-
Were raters blinded to additional cues that are not part of the test?
-
Was the order of examination varied?
-
Was the stability (or theoretical stability) of the variable being measured taken into account when determining the suitability of the time interval among repeated measures?
-
Was the test applied correctly and interpreted appropriately?
-
Were appropriate statistical measures of agreement used?
Types of study
The details of the studies included in this review in terms of study type, population, testers, methods, reliability data and mean AHD values reported, divided up according to the radiological modality used, are summarised in online supplementary tables S3–S5. Over half of the studies (10/18) employed US to assess AHD, with four studies using radiographs, two employing MRI only and two using combined methods (MR and radiographs, or MRI, radiographs and CT). Eight studies assessed intra-rater reliability—these were predominantly studies using USA (7/8). Inter-rater reliability, or both types of reliability, was assessed in five studies each. The majority of studies (12/18) assessed reliability of AHD measurement in healthy or athletic populations, with six investigating people with shoulder pain.
Ultrasound studies
Of the 10 studies using ultrasound to measure AHD,23–32 7 assessed the reliability of a single examiner (intra-rater), while one investigated the reliability of 2 or more examiners (inter-rater) and 2 studied both types of reliability. Two studies included reliability data on participants with shoulder pain. Kalra et al25 studied intra-rater reliability only in 31 participants with MRI-diagnosed RC disease (mean age: 53.5 years), and Pijls et al30 studied both inter-rater and intra-rater reliability in 43 people described as having subacromial impingement syndrome (mean age: 51 years) as diagnosed by an orthopaedic surgeon, without giving any details of how diagnosis was determined. The remaining eight studies assessed reliability in pain-free participants. Mean participant age in the pain-free study groups ranged between 21 and 34 years, with a single study by Kumar et al26 involving older participants (mean age 64.2 years). Five studies were deemed to be of high methodological quality (table 1), and each of these reported a good or excellent level of either inter-rater and/or intra-rater reliability.
The testers undertaking the ultrasound scanning were physiotherapists in seven of the studies,7–9 ,24 ,26 ,31 ,32 variously described as having training ranging from 1 hour to 3 months, and radiologists in two of the studies,23 ,30 with one study not providing details of the raters.25 One study reported similar degrees of reliability between an experienced radiologist (ICC=0.94) and a novice in ultrasound (ICC=0.92),30 while in two studies by the same authors, one using a physiotherapist trained in shoulder ultrasound, and the other using student physiotherapists with limited training or experience, slightly better reliability was reported with the experienced examiner (ICC=0.96–0.99)26 compared with the novices (ICC=0.88–0.91).27
All studies reported the reliability of measuring separate images of the same participant, while two also studied the intra-rater reliability of repeated measurements of the same image.30 ,32 There were varied time intervals used for the repeated scans, ranging from within-session scans to others taken up to 2 weeks26 or 6 weeks later.29 While various factors may influence the normal variation in AHD over time (such as posture, fatigue, activity), no studies described controlling for these factors.
All studies used a high frequency linear transducer (between 5 and 12.5 MHz) to acquire the ultrasound scans. There was variation in transducer placement, with two studies placing it on the anterior part of the acromion,23 ,31 while others used the posterior or mid-acromion25 or did not give adequate details of the testing protocol. There were also differences in how AHD was measured between studies. Six studies described that the measurement of the shortest distance between the acromion and humeral head was assessed, usually along a line parallel to the acoustic shadow cast by the acromion.23 ,25 ,29–32 In contrast, three studies measured the distance between the edge of the acromion and the tip of the greater tuberosity,7 ,8 ,26 which anatomically is a longer distance. Duerr24 reported equal reliability measuring both these distances. As the greater tuberosity cannot be visualised when the arm is moved into abduction, the author recommended the alternative measurement as the standard. The neutral shoulder position was used in all studies, while additional scans in various positions of either active or passive abduction (30°, 45°, 60° and 90°) were carried out by seven studies (see online supplementary table S3).
Overall, there was a strong level of evidence for the reliability of ultrasound in the measurement of AHD. Intra-rater reliability was found to be good to excellent with almost all ICC values being above 0.75. However inter-rater reliability was poorer, with the single high quality inter-rater study30 reporting an inter-rater ICC of just 0.70. Since study methods were similar across the ultrasound studies, forest plots were constructed to illustrate the range of ICC values reported (figures 3 and 4). The SE of measurement values for AHD were more variable; they were below 1 mm in the high quality studies by Duerr24 and Seitz et al31 and in a number of lower quality studies;26 ,28 ,29 however, Kalra et al25 reported higher SEM values of 0.9–1.6 mm in their high quality study. Three studies reported the minimal detectable change (MDC) for AHD measurement, which is an important concept representing the amount of change required to exceed measurement variability. The reported MDC values in neutral shoulder position for AHD were 0.9 mm,24 1.3 mm25 and 2.1 mm for acromion to greater tuberosity distance.28
Radiograph studies
Among the six studies assessing the reliability of using radiographs to measure AHD, four included people with shoulder pain15 ,33––35 (with two including people with confirmed RC disease15 ,35) and a further two solely using pain-free participants.36 ,37 The mean age range of the shoulder pain participants was from 55 to 59 years, while the pain-free groups were in the 20–35 year age range (see online supplementary table S4). Two studies each assessed either intra-rater36 ,37 or inter-rater reliability,15 ,35 while the remaining two studied both types of reliability.33 ,34 Three of the radiograph studies were deemed to be of high methodological quality.15 ,34 ,36
Three studies examined the reliability of radiographs in measuring AHD, one examined digital fluoroscopy37 and two studied radiographs along with other modalities.15 ,35 Three of the radiograph studies used standardised views,15 ,34 ,35 while Bernhardt et al33 examined non-standardised films from various clinics, and Fehringer et al36 studied the effect of differing beam angles. The majority of the studies examined the reliability of reading a single set of radiographs, whereas the Thompson et al37 and Fehringer et al36 studies measured the reliability of reading repeated films of the same participant, with each of the radiographs taken at a different angle or arm position in the Fehringer et al36 study. Poor reliability was reported for the measurement of these different views. In the studies by Bernhardt et al33 and Gruber et al,34 it was clear that reliability was enhanced when using standardised radiographs (intra-rater: maximum difference=3 mm), prospectively collected for the study, than when non-standardised radiographs were examined retrospectively (intra-rater: maximum difference=7 mm). In one of the high quality studies, Saupe et al15 examined the inter-rater reliability of measuring AHD on standard anteroposterior radiographs between an experienced and non-experienced radiologist, and reported excellent reliability (ICC=0.77), similar to the findings of the poorer quality Werner et al35 study, where four observers also achieved excellent intertester reliability examining AHD on 40 radiographs. Thompson et al37 reported good intra-rater reliability of measuring digital fluoroscopy images taken immediately in succession (ICC=0.75–0.99), with poorer reliability for those taken 9 months apart (ICC=0.3–0.99). It was not possible to undertake any direct comparisons of the reliability data between these studies, as a wide variety of statistical methods of reporting were used. However, overall, owing to the use of non-standardised imaging, or poor reporting of statistical analysis, the evidence was conflicting for the reliability of AHD measurement using radiographs, which, according to the Cochrane criteria, equates to no evidence.
CT and MRI studies
Two studies examined the measurement of AHD using open MRI systems,38 ,39 while one used conventional MRI15 and another used both MRI and CTscan.35 A single study by Saupe et al15 was rated as being of high quality.
The method of measurement was similar across all studies using CT or MRI, using the shortest distance between the inferior surface of the acromion and the upper subchondral surface of the humeral head (see online supplementary table S5). Two studies examined inter-rater reliability.15 ,35 Two also examined intra-rater reliability, with Hinterwimmer et al38 achieving this by repeating the scans within a single session, while Moffet et al39 did so by rereading the same scans 1 month later.
For the open MRI studies, only pain-free participants were used. Hinterwimmer et al38 used a single healthy volunteer for inter-rater reliability assessment and reported low coefficients of variation, suggesting reasonable accuracy of measurement, while Moffet et al39 assessed both inter-rater and intra-rater reliability in 13 pain-free participants, and reported excellent reliability, with ICCs all >0.75. People with shoulder pain participated in the studies by Werner et al35 assessing the reliability of AHD measures with MRI, CT and radiographs, and Saupe et al15 using MRI and radiographs. MRI and CT scans were shown to be similarly reliable to radiographs in the Werner et al35 study, while MRI had better reliability than radiographs in the Saupe et al15 study. Since Werner et al35 reported their reliability statistics using regression analysis (r=0.8 for CT and MRI), and Saupe et al15 reported an ICC value (0.91), it is difficult to make direct comparisons of the degree of reliability; however, it appears that good levels of inter-rater reliability were achieved in both studies. Overall, there was a moderate level of evidence for the reliability of AHD measurement using CT and MRI (based on results of one high and two lower quality studies).
Discussion
This review evaluated the reliability of AHD measurement using radiological means. The majority of reliability studies assessed ultrasound methods. Study quality, as assessed by the QAREL checklist, was generally poor with less than half of the studies deemed to be of high quality. Major weaknesses of the studies reviewed were in the areas of testers blinding to their own and others’ measures, as well as to additional cues, such as side of symptoms, while undertaking imaging and/or measurement. While the overall levels of reliability were good to excellent across the studies, there was more high-quality evidence for the reliability of ultrasound as a method of AHD measurement than for other modalities.
When assessing the reliability of imaging-based assessments, two distinct aspects of reliability exist. One is the reliability of measuring the image itself, incorporating any variability associated with localising anatomical landmarks, how measurements are made, and measurement error; and which is assessed by carrying out repeated measurements of the same image. The other is the reliability of taking repeated images of the same participant, which encompasses a myriad of variables such as; positioning of the participant, setting imaging parameters, operator-related variability, machine calibration, etc. The first type is likely to yield better reliability coefficients, with less potential for variation. The second type is more challenging, potentially yielding poorer reliability; however, it is important in the context of test–retest study. All the ultrasound studies assessing intra-rater reliability assessed the measurement of repeated images, while some also undertook remeasurement of the same images. However, the majority of the other imaging studies remeasured the same set of images, which may have led to overinflation of reliability levels. Thompson et al37 and Hinterwimmer et al38 carried out repeated imaging (in digital fluoroscopy and open MRI, respectively); however, this was in a very small number (N=1 and 5) of pain-free participants. It is accepted, however, that repeated radiation exposure may make this type of study ethically unacceptable for radiograph and CT studies.
Item 9 in the QAREL checklist (box 1) emphasises the importance of taking into account the stability of the measure when determining the time scale for repeated measures.19 The allowance of a significant time lapse between testing sessions, without indicating how possible confounding variables such as fatigue and posture have been controlled for, threatens the internal validity of the observations being made. In contrast, the studies in this review using time intervals of a few days up to 6 weeks in pain-free participants reported excellent reproducibility of the AHD measures, despite little description of controlling for confounding variables, suggesting that the measurement is reasonably stable over this time period. However, only within-session reliability of AHD measurement was available in this review for shoulder pain populations, and therefore the stability of the measure in RC pathology over time is unknown.
The widespread use of pain-free populations for reliability testing reduces the external validity of the findings, as there may be significant differences in the degree of reliability achieved in pain-free versus shoulder pain populations due to greater challenges of positioning the painful arm and the potential influence of pathology on image quality. One-third of the included studies involved people with shoulder pain, and one study29 included an athletic population. Within the pain-free populations, the age range of participants tended to be much lower than the typical population age range for RC disorders. The studies by Kumar et al26 ,27 demonstrated the importance of including the relevant age groups as controls, due to the lower AHD values in the older age group. In the studies of shoulder pain groups, Kalra et al25 and Saupe et al15 confirmed the RC pathology using imaging, with Pijls et al30 using a clinical diagnosis and Werner et al35 using unspecified diagnostic criteria, and a non-specific shoulder pain group being used in the remaining studies.33 ,34 Unfortunately, the single study to include both shoulder pain participants and pain-free controls did not separately report the reliability for the two groups.25 Further information is also required concerning AHD reliability in athletic versus non-athletic populations.
The issue of tester experience and qualifications is important in any imaging-based study. Ultrasound, in particular, is said to be highly operator-dependent.40 Two of the ultrasound studies stated that radiologists conducted the scans, while the remaining majority stated that the operators were physiotherapists with varying, but generally poorly described, levels of training and experience in shoulder ultrasound imaging. An experienced and novice ultrasound operator achieved similarly excellent reliability in the Pijls et al30 study, while the two studies by Kumar26 ,27 illustrate better reliability for an operator with moderate levels of training versus physiotherapy students (although both studies achieved ICC>0.75). However, these studies suggest that reliable AHD measures can be achieved with a limited level of ultrasound training, in contrast to the higher level of training and experience needed to accurately undertake full diagnostic assessment of the shoulder. Based on the MDC values provided, a single tester can achieve an accuracy level of between 0.9 and 1.3 mm for ultrasound measurement of AHD in neutral, so that any change beyond this can be accepted as true change. No MDC was reported in the intertester reliability studies.
The standardisation of imaging protocols for radiographs, CT and MRI are important elements in optimising reliability, clearly evidenced in the Bernhardt et al33 and Fehringer et al36 studies, where the use of non-standardised radiographs was shown to negatively influence reliability. As previously discussed in the review by Seitz and Michener 18 different landmarks were used for the measurement of AHD in some of the ultrasound reliability studies. Most authors measured the shortest distance between the inferolateral acromion and the closest part of the humeral head; however, others chose to measure AHD from the acromion to the tip of the greater tuberosity,26––28 while Duerr24 carried out both measures. The shortest distance measurement is most closely aligned with the measurement protocols used in radiograph, MRI and CT studies, and therefore it is suggested that this is the most useful measurement to report. Overall, it is recommended that clearer descriptions of tester experience and standardisation of imaging procedures be provided in imaging reliability studies to allow for better extrapolation of findings across studies.
While reliability of a measurement is critical to inform its use, since it refers to how consistent a measuring device is, it is equally important to evaluate the validity of the measurement, which confirms whether a study measures or examines what it claims to measure. As the subacromial space is a three-dimensional space, there is an inherent problem in that conventional radiographic imaging merely measures in two dimensions. There is no evidence or general agreement as to which modality provides a ‘gold standard’ for AHD measurement, which led to the exclusion of item 5 of the QAREL checklist for this review. With ultrasound, it is not possible to view the undersurface of the acromion due to the acoustic shadow produced by the bone; therefore, the area of the smallest AHD may not be viewed or measured accurately. While radiographs provide a clearer view of the bony structures, projection issues and bony overlap may lead to measurement inaccuracies. Both ultrasound and radiographs are conducted in the functional upright position, adding face validity to these measures; however, standard MRI and CT are carried out with the patient in the supine position, where the lack of the arm weight, gravity and absence of muscle activity may lead to lower AHD values being measured. This was observed in the studies by Werner et al35 and Saupe et al15 where intermethod comparisons were carried out between radiographs and CT or MRI. Saupe et al15 reported a poor correlation between AHD measured on radiographs and MRI, with AHD values being 2.8 mm less on average for MRI. Similar differences between radiographs and MRI were noted by Werner et al35 who used linear regression to provide a conversion formula. While a full discussion of the issues relating to validity of AHD measurement is beyond the scope of this review, it is important that validity is given further consideration before AHD measures are more widely used in diagnosis or treatment.
In summary, ultrasound measurement of AHD demonstrated sufficient intra-rater reliability in healthy populations and in two cohorts of patients with RC tendinopathy to be the recommended method of AHD measurement. However, because low ICC values were reported in the single high quality intertester study,30 an additional study of people with imaging-confirmed RC pathology is required to ascertain the intertester reliability in this population. The evidence for reliability of AHD measured by CT and MRI was moderate, with a number of studies demonstrating good to excellent reliability, generally derived from a re-reading of a single set of images. The evidence for radiographs was conflicting, with the use of non-standardised images making comparisons difficult. As the cheapest and most accessible method, with no radiation exposure concerns, and where excellent reliability can be achieved with limited training, ultrasound is the recommended method of AHD measurement.
Limitations
This review was comprehensive, including a variety of published sources, for example, peer-reviewed publications and theses; however, we did not search extensively for grey literature, which may have limited the number of studies. We only included English-language papers; however, no relevant papers were excluded due to the language restriction. While a lack of information in the published papers caused some studies to be rated as ‘Unclear’ on a number of QAREL items, we did not contact authors for further information, as it was deemed that such a process might be subject to excessive recall bias. The use of the QAREL checklist in this review provided a standardised method of quality assessment.19 The piloting process was important in the resulting high level of agreement reached between reviewers in this review. However, the QAREL checklist is a relatively new quality assessment tool, and as yet no published studies have reported on its reliability or validity for use in systematic reviews of diagnostic tests. Further testing of its psychometric properties is required before it can be broadly recommended for use. The extent of conclusions that could be reached was limited by the generally low quality of the included studies.
Conclusion
This review found that intra-rater reliability of AHD measurement by ultrasound is well supported in healthy populations, while also highlighting the scarcity of high quality studies in people with RC pathology, and intertester reliability studies. There was moderate evidence for the reliability of AHD measurement with CT and MRI, and no evidence for the reliability of radiographic methods. Based on the evidence reviewed, ultrasound is the authors’ recommended method of AHD measurement; however, further data on inter-rater reliability in symptomatic populations is required. With regard to MRI and CT, improved standardisation of methods and assessing the reliability of both repeated imaging and image remeasurement should be considered to provide a solid basis for using these methods to measure AHD. At present, radiographs are not recommended for AHD measurement as there is no evidence to support their reliability.
What this study adds
-
This is the first systematic review of the reliability of radiological methods of acromiohumeral distance (AHD) measurement.
-
The measurement of AHD using ultrasound is highly reliable for a single tester, while inter-rater reliability requires further investigation.
-
While there is moderate evidence to support the reliability of AHD measurement using CT and MRI, the reliability of radiographic methods has not been substantiated.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online tables
Footnotes
-
Contributors KMMC and JSL were involved in conception and design, independently reviewed the literature and extracted the study data. KMMC and JSL were involved in rating the literature, with JMC acting to mediate disagreements in ratings. All authors were involved in data analysis and interpretation, as well as in preparing the manuscript for publication.
-
Funding KMMC is funded under a Research Fellowship from the Health Research Board of Ireland.
-
Competing interests None.
-
Provenance and peer review Not commissioned; externally peer reviewed.