Article Text

Download PDFPDF

Body composition in sport: interobserver reliability of a novel ultrasound measure of subcutaneous fat tissue
  1. Wolfram Müller1,
  2. Martin Horn1,
  3. Alfred Fürhapter-Rieger1,
  4. Philipp Kainz1,
  5. Julia M Kröpfl1,
  6. Timothy R Ackland2,
  7. Timothy G Lohman3,
  8. Ronald J Maughan4,
  9. Nanna L Meyer5,
  10. Jorunn Sundgot-Borgen6,
  11. Arthur D Stewart7,
  12. Helmut Ahammer1
  1. 1Medical University of Graz, Institute of Biophysics, Graz, Austria
  2. 2University of Western Australia, Perth, Australia
  3. 3University of Arizona, Tucson, USA
  4. 4Loughborough University, School of Sport and Exercise Sciences, Loughborough, UK
  5. 5University of Colorado and United States Olympic Committee, Colorado Springs, USA
  6. 6NIH, The Norwegian School of Sport Sciences, Oslo, Norway
  7. 7Robert Gordon University, Aberdeen, UK
  1. Correspondence to Professor Wolfram Müller, Medical University of Graz, Institute of Biophysics, Harrachgasse 21/4, Graz 8010, Austria; wolfram.mueller{at}


Background Very low body mass, extreme mass changes, and extremely low per cent body fat are becoming increasingly common in many sports, but sufficiently reliable and accurate field methods for body composition assessment in athletes are missing.

Methods Nineteen female athletes were investigated (mean (SD) age: 19.5 (±3.3) years; body mass: 59.6 (±7.6) kg; height: 1.674 (±0.056) m; BMI: 21.3 (±2.3) kg/m2). Three observers applied diagnostic B-mode-ultrasound (US) combined with the evaluation software for subcutaneous adipose tissue measurements at eight ISAK sites (International Society for the Advancement of Kinanthrometry). Regression and reliability analyses are presented.

Results US measurements and evaluation of subcutaneous adipose tissue (SAT) thicknesses (including fibrous structures: Dincluded; n=378) resulted in an SE of estimate SEE=0.60 mm, R2=0.98 (p<0.001), limit of agreement LOA=1.18, ICC=0.968 (0.957–0.977). Similar values were found for Dexcluded: SEE=0.68 mm, R2=0.97 (p<0.001). Dincluded at individual ISAK sites: at biceps, R2=0.87 and intraclass-correlation coefficient ICC=0.811 were lowest and SEE=0.79 mm was highest. Values at all other sites ranged from R2: 0.94–0.99, SEE: 0.42–0.65 mm, and ICC: 0.917–0.985. Interobserver coefficients ranged from 0.92 to 0.99, except for biceps (0.74, 0.83 and 0.87). Evaluations of 20 randomly selected US images by three observers (Dincluded) resulted in: SEE=0.15 mm, R2=0.998(p<0.001), ICC=0.997 (0.993, 0999).

Conclusions Subject to optimal choice of sites and certain standardisations, US can offer a highly reliable field method for measurement of uncompressed thickness of the SAT. High accuracy and high reliability of measurement, as obtained with this US approach, are essential for protection of the athlete’s health and also for optimising performance.

  • Body composition methodology
  • Weight loss
  • Assessing validity and reliability of test of physiological parameters
  • Elite performance
  • Ultrasound
View Full Text

Statistics from


In most sports, optimum performance depends on many variables and parameters. In weight-sensitive sports, low weight is one of them, but loss of tissue can cause disastrous performance setbacks and severe illness.1–6 Both better protection of the athlete’s health7 and improved support of performance depend on the availability of accurate and valid methods for the assessment of body composition.8 ,9 US is widely used in medical imaging, but is used only rarely for determination of fat,10 although it is one of the more promising techniques for assessment of subcutaneous body fat when combined with appropriate image evaluation. Data from a previous study on US subcutaneous adipose tissue (SAT) measurement compared to skinfolds11 are used here for the analysis of interobserver reliability.



Participants were the same group of athletes as investigated in part I of these interconnected publications11: 11 female division two football players (F) and eight international and national level rhythmic gymnasts (G). Age and anthropometric data of test persons are presented in table 1. Permission to undertake the study was provided by the Ethics Commission of the Medical University of Graz (20-295ex08/09). All athletes received an information letter, had to complete a written consent form and the athletes and their coaches also had the opportunity to discuss methods and aims of the study with the investigators; on behalf of their child, a written parental consent was required for participants younger than 16 years.

Table 1

Test persons (athletes)


Three observers took US images of the 19 athletes at each of the eight ISAK sites (protocol of the International Society for the Advancement of Kinanthrometry),12–14 which had been marked on the skin by the first observer. Observers 1 and 2 (both ISAK level 1 certified)14 measured both skinfolds and thicknesses of SAT by means of US. All three observers were briefly instructed on how to handle the US equipment for taking US images at the ISAK points. None of the observers had prior experience in US imaging. Observers evaluated their US images several weeks later and this was preceded by a 2 h training session on handling the US measurement software; another 3 weeks later, observers evaluated 20 randomly selected US images again (out of the pool of 378 images taken by the 3 observers) to determine interobserver reliability of image evaluation using the region growing algorithm.

Measurement sites: criteria and decisions for acceptability of valid US measures

The ISAK protocol defines eight sites: triceps, subscapular, biceps, iliac crest, supraspinale, abdominal, front thigh and medial calf.12–14 From a total of 456 US images taken by the three observers at 152 sites (19 athletes, 8 ISAK sites), 405 images (89%) were evaluated in which observers were of the opinion that SAT boundaries (skin/SAT and SAT/fascia of the muscle) were imaged with sufficient quality. At five sites (of 152), none of the three images taken by the (inexperienced) US investigators was clear enough, at seven sites only one was clear, at 22 sites two and at 118 sites all three. Only those sites at which at least two US measurements were made are included in figure 1A–D: this amounts to n=398 in figure 1A; n=378 in figure 1B and 1C (which results from eliminating 19 wrong values due to 9 unclear images and 10 misinterpretations of Camper’s fascia; in addition, a 20th value had to be cancelled due to just a single measurement remaining at the measurement site after elimination of the 19 wrong images). In figure 1D, n was 213. Here, only results from the ISAK sites triceps, biceps, front thigh and medial calf are used. Wrong measurement was not detected at these four sites and only 13 images (of 456), that is, 2.9%, were not suitable for evaluation (triceps: 3, biceps: 4, front thigh: 6 and medial calf: 0). The other four sites accounted for 38 wrong measurements, that is, 8.3% of the images were unclear and were, therefore, eliminated from further evaluation (iliac crest: 13, supraspinale: 5, abdomen: 15, subscapular: 5).

Figure 1

(A) Individual US measurements of three observers (all sites, all values). Individual subcutaneous adipose tissue (SAT) thickness values (Dincluded) of the three examiners are plotted against respective means (measured at the 8 ISAK sites in 19 female athletes). Nineteen erroneous values, due to insufficient image quality in 9 and due to erroneous interpretation of an intermediate fascia (Camper's fascia) in 10 cases are also included here. SEE is 1.24 mm due to the (avoidable) extreme values in this plot. All erroneous measurements occurred at trunk sites indicating that these sites are not adequate choices (or not sufficiently defined by clear landmarks) for US measurements when performed by inexperienced practitioners. (B) Individual US measurements of three observers (erroneous evaluations excluded). Results show that SEE is 0.6 mm when unclear US images are not included (and the 95% limit of agreement is (−1.18 to 1.18) mm, compare to (C). (C) Individual observer deviations from the mean (all sites, erroneous evaluations excluded), deviations of the individual measurements (distances including fibrous structures) of the three observers from the mean value (Ddev from mean) are shown. The 95% limits of agreement (LOA=± 1.96 SD) was (−1.18 to 1.18) mm. Values for measurement deviations of distances not including fibrous structures (not shown in the figure) are similar: SD=0.68 mm, and 95% limit of agreement LOA is (−1.33 to 1.32) mm in this case. (D) Individual US measurements (Dincluded) of three observers (limb sites, all values). Although, all measurements are shown in this plot (there were no extreme values found at the four limb sites) SEE is 0.65 mm and the 95% limit of agreement is (−1.26 to 1.26). For the measurement deviations of distances not including fibrous structures (not shown) the 95% limit of agreement was (−1.44 to 1.44) mm. (E)US measurements of three observers at individual sites, data used in (B) are separated according to individual ISAK (International Society for the Advancement of Kinanthrometry) sites here. For statistical evaluation see table 2.

US B-mode imaging and thickness evaluation in SAT

US measurement technique and evaluation procedures were as described earlier.11

Measurements were made in standing position. The US probe was placed on a given site without any pressure by using 3–5 mm of US gel between the probe and the skin. The probe was held parallel to the direction of the skinfold. Conventional B-mode US systems were used (GE Logitec, Siemens AcusonX300PE; linear probes with 12 and 11.4 MHz, respectively—the according axial resolution was about 0.1–0.2 mm). Observers used the US systems in ‘default’ setting, except for frequency and time-gain-compensation setting. Image evaluation was carried out several weeks later and was preceded by a 2 h training session on handling the US evaluation software. The region-growing software for SAT analysis enables the operator to distinguish between distance values in which other embedded tissues (fibrous tissues, vessels, etc) are included (Dincluded) or excluded (Dexcluded). After having set the region of interest (ROI) properly, a series of thickness values (typically 100 in each US image) is automatically determined by the evaluation algorithm (FAT software) which results at tissue layers of constant thickness in very low SE of the mean.

Statistics and data inclusion

Measurement data of the study of Müller et al11 were used here for the analysis of interobserver and intraobserver reliability. Statistical analysis was performed with SPSS (IBM SPSS Statistics V.19) and GraphPad Prism 5. For linear regression analysis, results are presented with R2 and SE of estimate (SEE). A p value <0.05 was considered as significant.

Regression analysis: slope, intercept and corresponding SDs, significances (p values) and 95% CIs (conflow, confhigh) as well as Pearson’s regression coefficient squared (R2) and SE of estimate (SEE) are given.

For statistical methods see Heyward and Wagner.15

Bland-Altmann test: limit of agreement (LOA)16: Means of deviations from observer average and according to 95% limits of agreement (LOA=±1.96 SD) were calculated. Reliability: intraclass correlation coefficient (ICC; 2-way random model, single scores) according to Shrout and Fleiss,17 and Weir.18 The ICC characterises reliability (relative consistency) by the ratio: (between subjects variability)/(between subjects variability+error). The closer this ratio is to 1.0, the higher is the reliability.

Interobserver study of given US images (comparison of image evaluation only)

Twenty images were randomly selected from the pool of 378 images and all three observers evaluated the same 20 images by means of the SAT tissue evaluation software developed for this purpose (region growing algorithm for contour detection and measurement).


Interobserver comparison of US imaging and image evaluation

In figure 1A, 398 individual US measurements taken by three observers on 19 athletes, at eight ISAK sites12–14 are plotted against the mean values of the observers; the linear regression line is also shown. In this plot, 19 wrong US measurements are included (these should have been eliminated from evaluation by the observers from the very beginning). These 19 large deviations from the regression line resulted from misinterpretations of unclear structures (boundaries not clearly visible in 9 and mistakes due to Camper’s fascia in 10 images) and violated the conditions for acceptability of the data. With these extreme deviations included, SEE (with respect to distance measurements including fibrous structures embedded in the SAT: Dincluded) was 1.244 mm, and R2 was 0.925 (p<0.001). Similar values were found for Dexcluded (data not shown): SEE=1.239 mm, R2=0.915 (p<0.001).

These 19 cases of wrong settings of the region of interest (ROI) and one more (because at this site only one evaluable image resulted) are eliminated in figure 1B (n=378); this resulted in a SEE of 0.60 mm, R2=0.98 (p<0.001) for Dincluded. LOA was 1.18, ICC was 0.968 with a 95% CI of (0.957 to 0.977). Similar values (data not shown in figure 1B) were found for Dexcluded: SEE=0.68 mm, R2=0.97 (p<0.001), figure 1C shows the deviations of the individual measurement results of the three observers (distances including fibrous structures) from the mean value (Ddev from mean). The SE of differences of individual observer values from the mean was 0.60 mm, and 95% limit of agreement (L=±1.96·SD; n=378) was (−1.18 to 1.18) mm.

Values for measurement deviations of distances not including fibrous structures are: SD=0.68 mm, and 95% limit of agreement (−1.33 to 1.32) mm.

Figure 1D is similar to A; it includes all measurement values (n=213) of the three observers from the four sites on arm and leg: triceps, biceps, front thigh and medial calf. SEE is 0.65 mm. In contrast to data of figure 1A, there is not a single case of an extreme value due to erroneous evaluation. The SE of differences of individual observer values from the mean is 0.64 mm, and the according 95% limit of agreement L was (−1.26 to 1.26) mm.

Values for measurement deviations of distances not including fibrous structures are: SD 0.74 mm, and 95% limit of agreement L (−1.44 to 1.44) mm.

In figure 1E, data of figure 1B is separated according to individual ISAK sites. At biceps, R2 and ICC were lowest (0.87 and 0.811, respectively) and SEE (0.79 mm) and LOA (1.54 mm) were highest. Values at all other sites ranged as follows: R2: 0.94–0.99, SEE 0.42–0.65 mm, LOA 0.82–1.26 and ICC 0.917–0.985.

Detailed statistics are presented in table 2. Table 3 shows individual means (M), SDs and numbers of evaluated images (N) of all three observers (OB1, OB2 and OB3) for each ISAK site. Means of all three observers at triceps, subscapular, biceps, iliac crest, supraspinale, abdominal, front thigh and medial calf were 10.34, 4.88, 5.02, 9.59, 5.63, 8.82, 9.38 and 6.81, respectively (mean over all was 7.56 mm). Table 4 shows the interobserver coefficients of observers OB1, OB2 and OB3, separated by ISAK site. Coefficients at biceps were lowest (0.74, 0.83 and 0.87), followed by subscapular (0.92, 0.92 and 0.93); all others ranged from 0.93 to 0.99.

Table 2

Reliability statistics according to figure 1B (all) and E (individual ISAK sites)

Table 3

Observer statistics

Table 4

Interobserver correlation matrices

Interobserver evaluation comparison of given US images of subcutaneous adipose tissue

Figure 2A shows the individual values of the three observers (n=60) when they had to evaluate a given set of 20 randomly selected images from the pool of 378. SEE was 0.15 mm for measurements including fibrous structures embedded in the SAT (Dincluded), and R2 was 0.998 (p<0.001). The SD of differences of individual observer values from the mean was 0.15, and 95% limit of agreement L was (−0.29 to 0.28) mm.

Figure 2

(A) Individual image evaluation results plotted against means (fibrous structures included), Evaluation results obtained by the three observers from 20 randomly selected images out of the pool of 378 images. When 20 given images are to be evaluated in terms of subcutaneous adipose tissue thickness in the vicinity of the central US ray, an SD of 0.15 mm resulted (n=60) and the 95% limit of agreement was (−0.29 to 0.28) mm. (B) Individual image evaluation results plotted against means (fibrous structures excluded). For measurements which did not include fibrous structures, an SD of 0.22 mm resulted (n=60) and the 95% limit of agreement was (−0.43 to 0.43) mm.

Values for measurement deviations of distances not including fibrous structures (Dexcluded) are: SD=0.22 mm, and 95% limit of agreement L (−0.43 to 0.43) mm, figure 2B.


The study showed that there were no noticeable problems with identifying SAT borders at limb sites (figure 1D shows the excellent correlation obtained with these sites), but several evaluation mistakes occurred at the inferior border (SAT-muscle fascia) at sites on the trunk, particularly at the abdomen, iliac crest and subscapular (outlayers in figure 1A). These outliers occurred due to erroneous image interpretation and are eliminated in figure 1B. Improved training should reduce both the number of images that cannot be evaluated due to poor quality and the number of erroneous evaluations. After having participated in this study, all three observers stated that extended evaluation experience before performing the US imaging would have had a substantial impact on the clearness of images. Systematic screening for optimal US sites is needed to avoid unnecessary problems with sites at which identification of SAT borders is complicated, for example, by intermediate fasciae like Camper’s fascia in the abdomen region. With inexperienced examiners, interobserver reliability depends primarily on choice of sites and thereby on clearness of images. The subscapular site is also problematic for US measurements, as artefacts due to bones underneath the SAT cause imaging problems and thereby evaluation uncertainties. However, at the biceps site, which did not cause noticeable imaging problems, reliability was decreased relative to the other investigated sites (table 2); one reason for this may be the thin SAT layers observed there in this group of athletes. At given sites, skin thickness was found to vary only slightly among different healthy persons in previous studies (until the seventh decade of life)19 ,20: therefore, mean skin thickness at a given site can be used as a criterion in cases where fibrous tissues embedded in SAT may cause identification problems.

US imaging, combined with cadaver studies or during surgery, in cooperation with anatomists, histologists and surgeons would be valuable approaches for determining appropriate sites. In a second step, from the pool of advantageous sites in terms of anatomical clearness, those should be finally selected for standardisation of the US approach which provide the highest predictive value in terms of total body fat when validated against multicompartment models.9 ,21–24 It can be assumed that the steadily decreasing price of US systems with good resolution will contribute to the spread of this emerging measurement technique in sports medicine, where it is only rarely used10 although the advantages of using US for thickness measurement of adipose tissue have been described repeatedly.9 ,25–34

When the three observers did both imaging and image evaluation (Dincluded; at sites with clear SAT boundaries in the US image), interobserver reliability was high (R2=0.98, SEE=0.60 mm, LOA=1.18 mm, ICC=0.968; compare to figure 1B and C and to table 2) and the deviations of observer values from each other (tables 3 and 4) depend primarily on the exact positioning of the US probe (position, orientation and angle with respect to the skin) at a given site on the body surface. Consistently high interobserver reliability also resulted when individual measurement sites were compared (table 2).

Similar interobserver results were found for Dexcluded (thickness measurements without fibrous structures), but these were slightly higher because uncertainty increases when several furrowed borders are to be detected by the evaluation algorithm.

Reliability statistics of US image evaluation of given images (20 randomly selected images were used by 3 observers) demonstrates that observers differed only slightly (R2=0.998, SEE=0.15 mm, LOA=0.28 mm, ICC=0.997; compare to table 5), mirroring the principal limitations of image evaluation which are caused by biological factors: the furrowed borders of SAT layers determine the obtainable accuracy and reliability. Accuracy demands beyond the capability of US resolution are of little relevance because of the fractal-like tissue borders and due to the tissue’s plasticity.

Table 5

Reliability statistics according to figure 2A (ultrasound image evaluation solely)

Summary and conclusions

US can be a reliable tool for accurate measurement of thickness and patterning of subcutaneous adipose tissue, provided that borders of SAT are clearly imaged. In addition to SAT thickness, the amount of fibrous and other structures embedded in SAT (which echo US) can also be determined.

Systematic research for determining the best sites for US SAT measurement in terms of simplicity and distinctness of US image evaluation and in combination with validation studies is needed for optimising and standardising this US measurement approach.

Fibrous tissue embedded in SAT varies from one person to another and also depends on the measurement site.

US can detect and quantify even thin structures because of their differing acoustic impedance. This is an advantage which is not available with any other imaging or body composition technique, except for histological (biopsy) or cadaver examinations.

For future applications, it is important to choose sites at which the inferior SAT border is clearly visible and can be identified without doubt, even by inexperienced investigators and at which small deviations from these sites do not alter SAT thickness substantially.

The main advantages of the US technique are as follows: no ionising radiation is applied; tissue thickness can range from 1 to 300 mm; embedded fibrous tissues can be quantified, many thickness measurements from one image result in small SEs of the mean; rapid data acquisition and evaluation are possible; subject involvement is minimal; it is applicable in the field; and costs are low when compared with MRI or CT.9

Further studies with extended numbers of athletes and observers from several research centres are being prepared to bear out the findings presented here.

What are the new findings?

  • This ultrasound (US) measurement technique can be applied for highly accurate and reliable measurements of uncompressed subcutaneous adipose tissue (SAT) thickness; this holds true for all sites where borders of SAT can be clearly imaged and easily identified.

  • The interobserver reliability studies showed that evaluations of observer differed only negligibly (SEE: 0.15 mm; ICC: 0.997 (0.993, 0999)) when three observers evaluated the same images with the SAT measurement software used.

  • Interobserver correlation matrices for imaging and evaluation of images showed very high correlation coefficients (at biceps: 0.74–0.87, all others ranging from 0.92 to 0.99), SE of estimate was 0.60 mm, and ICC was 0.968 (0.957, 0.977).

  • This novel US technique for SAT measurement also enables quantification of fibrous or other structures embedded in SAT.

  • Measurement sites according to the ISAK protocol (International Society for the Advancement of Kinanthrometry) were used for the interobserver reliability tests here. However, these anatomical sites have been selected for skinfold measurement and not for US; in particular, ISAK sites on the trunk can cause identification problems of SAT borders. Defining US sites not only by landmarks on the surface, but also by means of information contained in the US image (‘landmarks’) may further increase reliability.

How might it influence clinical practice in the near future?

  • The method introduced here supports the application of ultrasound (US) as a laboratory and field method for assessing subcutaneous adipose tissue (SAT) in athletes with high accuracy and reliability.

  • There are good reasons to assume that standardisation of this US technique will replace the widely used skinfold, bioimpedance, and other field methods with their well-known inherent shortcomings.

  • Owing to the high accuracy and reproducibility of image evaluation, this measurement method can also be applied to calibrate other SAT imaging techniques like MRI or CT.


The authors would like to thank W Gröschl and P Rohrer for their measurements and evaluations (observers) and K Pfeiffer for reviewing the used statistics.


View Abstract


  • Contributors WM conducted this study in co-operation with AF-R. MH developed the edge detection algorithm, and HA and PK the region growing algorithm used for SAT analysis; JK and HA made the statistical analyses. The study was designed and developed in co-operation with the members of the IOC Medical Commission Ad Hoc Working Group on Body Composition, Health and Performance and by TA, TL, RJM, NM, ADS, and JS-B.

  • Funding Meetings of the Ad Hoc Working Group on Body Composition, Health and Performance were financed by the International Olympic Committee.

  • Competing interests HA, MH and WM, who developed the US method applied here, think of making the evaluation software commercially available.

  • Patient consent Obtained.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.