Article Text

Criterion-related validity of field-based fitness tests in youth: a systematic review
  1. J Castro-Piñero1,
  2. E G Artero2,
  3. V España-Romero2,3,
  4. F B Ortega2,3,
  5. M Sjöström3,
  6. J Suni4,
  7. J R Ruiz3
  1. 1Department of Physical Education, School of Education, University of Cadiz, Puerto Real,Spain
  2. 2Department of Physiology, School of Medicine, University of Granada, Granada, Spain
  3. 3Department of Biosciences and Nutrition at NOVUM, Unit for Preventive Nutrition, Karolinska Institutet, Huddinge, Sweden
  4. 4UKK Institute for Health Promotion Research, Tampere, Finland
  1. Correspondence to J R Ruiz, Department of Biosciences and Nutrition, Unit for Preventive Nutrition, NOVUM, 14157, Huddinge, Sweden; ruizj{at}ugr.es

Abstract

The objective of this systematic review was to comprehensively study the criterion-related validity of the existing field-based fitness tests used in children and adolescents. The studies were scored according to the number of subjects, description of the study population and statistical analysis. Each study was classified as high, low and very low quality. Three levels of evidence were constructed: strong evidence, when consistent findings were observed in three or more high quality studies; moderate evidence, when consistent findings were observed in two high quality studies; and limited evidence when consistency of findings and/or the number of studies did not achieve the criteria for moderate. The results of 73 studies (50 of high quality) addressing the criterion-related validity of field-based fitness tests in children and adolescents indicate the following: that there is strong evidence indicating that the 20 m shuttle run test is a valid test to estimate cardiorespiratory fitness, that the hand-grip strength test is a valid measure of musculoskeletal fitness, that skin fold thickness and body mass index are good estimates of body composition, and that waist circumference is a valid measure to estimate central body fat. Moderate evidence was found that the 1-mile run/walk test is a valid test to estimate cardiorespiratory fitness. A large number of other field-based fitness tests presented limited evidence, mainly due to a limited number of studies (one for each test). The results of the present systematic review should be interpreted with caution due to the substantial lack of consistency in reporting and designing the existing validity studies.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Physical fitness refers to the full range of physical qualities (ie, cardiorespiratory fitness, muscular strength, agility, coordination and flexibility).1 It can be understood as an integrated measurement of all functions (skeletomuscular, cardiorespiratory, haematocirculatory, psychoneurological and endocrine/metabolic) and structures involved in the performance of physical activity and/or physical exercise.2 Physical fitness, especially cardiorespiratory fitness and muscular strength, is considered an important marker of health in adults,3 4 as well as in young people.5 6

Physical fitness can be objectively and accurately measured through laboratory tests. However, due to their high cost, necessity for sophisticated instruments and qualified technicians, and time constraints, their use is limited in school settings and in population-based studies. Field-based tests provide a reasonable alternative since they are time-efficient, low in cost and equipment requirements and can be easily administered to a large number of people simultaneously.

Field-based fitness assessment depends on the prediction techniques, and thus is prone to error. In order for a test or a fitness test battery to be considered “good”, it should measure what it is supposed to measure (ie, validity).7 Criterionrelated validity refers to the extent to which a field test of a fitness component correlates with the criterion measure (ie, the gold standard).8 In deciding whether or not to use a test, the user should be satisfied that the test has established validity. In the 1990s, Safrit9 summarised the criterion-related validity of several fitness tests; yet, despite the growing interest in this area, no other attempt has been made to summarise the criterion-related validity of the existing field-based fitness tests in youth.

During the last two decades a great deal of attention has been devoted to the fitness of children and adolescents. As a result, numerous field-based fitness test batteries have been developed to assess fitness in this population (table 1).

Table 1

Existing field-based physical fitness test batteries for children and adolescents

The objective of the present systematic review was to comprehensively study the validity of the existing field-based fitness tests used in children and adolescents. To better understand whether or not a field-based test has established validity will help physical educators, exercise scientists, health agencies and private organisations dealing with sport, fitness and health to decide which field test should be used to assess physical fitness.

Methods

The present systematic review is produced as a part of the ALPHA (for “instruments for Assessing Levels of PHysical Activity and fitness”) study.25 The ALPHA study aims to provide a set of instruments for assessing levels of physical activity as well as health-related physical fitness in a comparable way within the European Union.

Procedures

The electronic databases MEDLINE, SCOPUS and SPORTS DISCUS were screened for criterionrelated validity studies in children and adolescents where one or more field-based fitness test were carried out. All the fitness tests from the most commonly used fitness test batteries in youth were included (table 1).

The keywords used (in various combinations) were: criterion validity, validity, validation, crossvalidation, estimation, prediction, physical fitness, fitness, aerobic capacity, cardiorespiratory fitness, maximum oxygen consumption, strength, flexibility, motor, endurance, speed, agility, balance, body composition, anthropometry, Body Mass Index (BMI), skin folds and waist circumference. The specific names of the tests were also included. Tables 2 to 5 summarise the field-based fitness tests used to assess cardiorespiratory fitness, musculoskeletal fitness, motor fitness and body composition, respectively.

Table 2

Field-based fitness tests used to assess cardiorespiratory fitness

Table 3

Field-based fitness tests used to assess musculoskeletal fitness

Table 4

Field-based fitness tests used to assess motor fitness

Table 5

Field-based fitness tests used to assess body composition

Table 6

Quality assessment criteria for criterion-related validity studies

The computer-based searches were limited to papers published from January 1990 to December 2008, full reports published in English or Spanish, in humans, and all children (0–18 years). An additional search using adolescents (13–18 years) was also performed. Additional studies were identified from reference lists.

The results of the most recent reviews were summarised first, and then the studies potentially relevant for the selected topics were screened for retrieval. Finally, a snowball search was performed, in which reference lists of the selected articles were checked for titles including validity of physical fitness.

Quality assessment of the study

The quality of the selected studies was scored using a quality assessment list. The list included three items based on number of study subjects, description of the study population and statistical methods; see table 6. The items were rated from 0 to 2, 2 being the best score. For all studies, a total quality score was calculated by counting up the number of positive items (a total score between 0 and 6). Studies were defined as high quality if they had a total score of 5 or higher. A total score of 3 or 4 was defined as low quality and a score lower than 3 was defined as very low quality. Two reviewers (JCP and JRR) evaluated the quality of the studies, separately. A consensus meeting was arranged to sort out differences between both reviewers. The articles were not blinded for authors because the reviewers who performed the quality assessment were familiar with the literature.

Levels of evidence

Three levels of evidence were constructed: (1) strong evidence: consistent findings in three or more high quality studies; (2) moderate evidence: consistent findings in two high quality studies; (3) limited evidence: consistent findings in multiple low quality studies, inconsistent results found in multiple high quality studies, or results based on one single study.

The degree of criterion-related validity of the field-based fitness test will be discussed for those tests on which we have found strong or moderate evidence that the test is (or not) valid.

Data extraction

We extracted information on fitness quality, population characteristics, fitness test, gold standard, statistical methods, main outcome and conclusions fromstudies defined as high quality. We regarded results with a p≤0.05 as statistically significant.

Results

Quality assessment

The literature search identified 73 studies addressing the criterion-related validity of field-based fitness tests in children and adolescents (see supplementary material, table 1). Of these, 23 studies were of low quality and were not included in this manuscript. There were no studies with a score ≤2, that is, with a very low quality. A total of 31 high quality studies had the highest score (score=6). The overall agreement between the two reviewers was 90% (κ=0.813). Disagreement was solved in a consensus meeting.

Levels of evidence

Cardiorespiratory fitness

The 20 m shuttle run test (20mSRT) was investigated in eight studies,26,,33 and the 1-mile run/walk test was investigated in three studies.34,,36 (see supplementary material, table 2).

The Douglas bag method is considered the gold standard to assess maximal oxygen consumption (VO2max),37 yet there is agreement on that respiratory gas analyser is a valid method to assess oxygen uptake.37 All the studies measured VO2max or peak oxygen consumption (VO2peak) when performing a maximal treadmill test to measure, except Ruiz et al26 27 that measured VO2max when performing the 20mSRT.

20mSRT

Several studies26 28 30,,32 attempted to develop an equation to estimate VO2max. McVeigh et al28 showed that the estimation of VO2peak from the 20mSRT might be improved by including skin fold thickness measurements in the regression model, particularly for girls (R2=0.85, standard error of estimate (SEE): 2.4 ml/kg/min for girls and R2=0.68, SEE: 3.23 ml/kg/min for boys), which concurs with others.30 31 In contrast, Mahar et al32 showed that a model including sex, number of laps completed and body weight or BMI was not accurate to estimate VO2peak (R2=0.65, SEE: 6.35 ml/kg/min) in boys and girls aged 12–14 years. More recently, we have developed a new equation to estimate VO2max from 20mSRT performance (stage), sex, age, weight and height in adolescents aged 13–19 years using a more advance mathematical model, that is, artificial neural network modelling (R2=0.92, percentage error: 7.30%, SEE: 2.84 ml/kg/min).26

Several studies have cross-validated the mentioned equations. 26 27 29 33 Pitteti et al29 cross-validated the Leger and the Fernhall equations, and found significant but modest relationships between both regression equations and VO2peak (r=0.57, p<0.01; r=0.66, p<0.01, respectively). More recently, Ruiz et al27 assessed the validity of five different equations (ie, the Ruiz et al,26 Léger et al,38 Barnett et al (a),30 Barnett et al (b)30 and Matsuzaka et al31 equations) for estimating VO2max from the 20mSRT test in a relatively small sample of 48 Portuguese adolescents 13–19 years of age. They reported that equations to estimate VO2max from the 20mSRT should not be used at an individual level, and suggested that the equations reported by Barnett (b)30 and Ruiz26 seems to be the most accurate to estimate VO2max in adolescents.

Distance run/walk tests

The most commonly used equation to estimate VO2peak from the 1-mile run/walk test is the Cureton equation,34 which was selected for the FITNESSGRAM battery to estimate VO2peak.11 We have examined the criterion-related validity of Cureton equation in 66 endurance trained children and adolescents aged 8–17 years.35 We observed that there was a significant mean difference between measured and estimated VO2peak (10.01 ml/kg/min, 95% confidence interval (CI) 9.2 to 11.8, p<0.001). The findings did not materially change when the analyses were performed by sex, age groups and weight status, which suggests that this equation is not accurate for estimating VO2peak in endurance trained children.35 Buono et al36 also developed an equation to estimate VO2peak from the 1-mile run/walk time, and reported a SEE of 4.3 ml/kg/min (R2=0.84).

The nature of this test deserves several comments. The 1-mile run/walk test is not a friendly test, especially in young people. One of its major problems is the participant's capacity to develop an appropriate pace. Participants may either start too fast so that they are not able to keep up the speed all through the test, or they may start too slow so that when they want to increase speed, the test is already finished. To ameliorate this problem, several versions were developed such us the 1-mile walk,39 the submaximal 1-mile track jog test (pacing test),40 and the 1/2-mile run/walk test (table 2).41 We assessed the criterionrelated validity of the 1/2-mile run/walk test in children aged 6–17 years, and also examined the criterion-related validity of the Fernhall equation in a subgroup of children aged 10–17 years.41 We computed a regression equation that was assessed through several error measures and the Bland–Altman method. We found that the 1/2-mile run/walk time, sex and BMI were significantly associated with VO2peak. There was no systematic bias in the validation group nor in the cross-validation group (p>0.1), and the root mean squared error (RMSE) and the percentage error were 6.5 ml/kg/min and 13.9%, respectively. The newly developed equation had a lower RMSE and percentage error than the Fernhall equation in the subgroup of children aged 10–17 years (7.2 vs 17.7 ml/kg/min and 16.0% vs 50.4%, respectively, p<0.001).

In conclusion, there is strong evidence indicating that the 20mSRT is a suitable test to estimate cardiorespiratory fitness. From the developed equations, it seems that the Barnett (b) and Ruiz equations yielded the most promising results to estimate VO2max. There is moderate evidence in the case of the 1-mile run/walk test, and limited evidence that the 1-mile walk test, the submaximal 1-mile track jog test (pacing test) and the 1/2- run/walk test are valid tests to estimate VO2max (or VO2peak). The Cureton equation seems the best equation to predict VO2peak from the 1-mile run/walk test, but the fitness levels of the individuals may affect its validity.

Musculoskeletal fitness

Two studies examined the criterion-related validity of the handgrip strength test (maximal isometric strength),42 43 one study examined the criterion-related validity of the bent arm hang, push-ups, pull-ups and modified pull-ups tests44 (upper body endurance strength) and one examined the criterion-related validity of the standing broad jump and vertical test.43 Two studies assessed the criterion-related validity of flexibility: one analysed the back saver sit and reach test,45 and the other one analysed the trunk lift test46 (see supplementary material, table 2).

There is no established gold standard for most of the musculoskeletal fitness tests, which makes it difficult to determine the criterion-related validity of these tests. The specificity of the type of muscular work performed and the use of different energy systems are both major challenges for establishing a gold standard method for maximal muscular strength and endurance strength tests.47 One repetition maximum (1RM) and repetitions to a certain percentage of 1RM (ie, 50% of 1RM or 70% of 1RM) have been used as gold standards.43 44 Concerning flexibility, radiography seems to be the best criterion measurement, but goniometry has also been used as a criterion measure.45 48,,50

Maximal isometric strength

We have studied the criterion-related validity of the hand-grip strength test using Jamar, DynEx and TKK dynamometers in adolescents aged 12–16 years.42 We used known weights (ranging from 20 to 70 kg) as the criterion measure. We observed a negative systematic bias (underestimation) for the Jamar and DynEx dynamometers (−1.92 and −1.43 kg, respectively, p<0.05), whereas a marginal positive overestimation was observed for the TKK dynamometer (0.49 kg, p<0.05). These results concur with those reported in studies performed in adults.51,,59 We also examined whether the elbow position (extended or flexed at 90 degrees) affects the hand-grip strength in adolescents. We observed that performing the hand-grip strength test with the elbow extended seems the most appropriate protocol to evaluate maximal hand-grip strength in adolescents when using the TKK dynamometer. We have also conducted a series of studies in children60 and adolescents,61 to determine if there is an optimal grip span for determining the maximum hand-grip strength, and if the optimal grip span was related to hand size. We found that there was an optimal grip span to which the dynamometer should be adjusted when measuring hand-grip strength in children60 and adolescents.61 We provided sex and age specific equations to adjust the grip span of the dynamometer to the hand size of the individual in order to obtain the actual maximal hand-grip strength. Milliken et al43 analysed the association between hand-grip strength (using the TKK dynamometer) and 1RM chest press in children aged 7–12 years. They found that the hand-grip strength test is valid to assess upper body maximal strength.

Upper body endurance strength

Woods et al44 studied the criterion-related validity of the bent arm hang, push-up, pull-ups and two modified pull-up tests using 1RM and repetitions at 50% of 1RM as criterion reference in children aged 9–11 years. They concluded that these tests are not valid to assess muscular endurance and that body fat percentage was the main determinant of performance. We observed that muscular strength is highly influenced by body weight in children aged 6–17 years,62 especially in relation to weight bearing tests. We showed that out of 2778, a total of 1037 (85%) of the girls and 889 (60%) of the boys were not able to perform a single repetition in the pull-up test. Likewise, a total of 478 (39%) of the girls and 409 (28%) of the boys were not able to perform for more than 0 s in the bent arm hang test. Collectively, these findings suggest that these tests are not appropriate to measure upper body endurance strength in children and adolescents.

Lower body explosive strength

Milliken et al43 studied the criterion-related validity of the standing broad jump test and the vertical jump test using 1RM of leg press as criterion measure in children aged 7–12 years. They reported that the standing broad jump and vertical jump test, with BMI, accounted for 44.4% and for 40.8% of the variation in 1RM leg press, respectively.

Flexibility

Patterson et al studied the criterion-related validity of the back saver sit and reach test using goniometry (hamstring flexibility) and the MacRae and Wright method (low back flexibility) as gold standard in children and adolescents aged 11–15 years.45 The results suggested that this test has a moderate validity to assess hamstring flexibility (r=0.51 to 0.72), and a low validity to assess the lower back flexibility (r=0.10 to 0.25). Paterson et al46 reported that the correlations of trunk lift scores and the goniometry scores were moderate (r=0.70 for boys and r=0.68 for girls) in children of similar ranges of age.

In conclusion, there is strong evidence that the hand-grip strength test with the elbow extended and with the grip span adapted to the individual's hand size (using the TKK dynamometer) is a valid test to assess isometric muscular strength. Due to a limited number of studies, we found limited evidence that: (1) the bent arm hang, push-up, pull-ups and two modified pull-ups tests are not valid to assess muscular endurance and (2) the back saver sit and reach test and the trunk lift test have moderate validity to measure hamstring flexibility and lumbar flexibility.

Body composition

A total of 22 studies investigated the criterion-related validity of BMI,63,,84 18 investigated the validity of skin fold thickness,64,,66 72,,76 80 82 85,,92 and 7 studied the validity of circumferences and/or ratios (ie, waist-to-hip ratio) (see supplementary material, table 2).63 65 66 82 83 92 93

Imaging methods (ie, axial CT and MRI),94 dual energy x ray aborptiometry (DXA),95 ultrasonography,96 and air displacement plethysmography97 98 are considered gold standards for assessing body composition in youth. Bioelectrical impedance analysis (BIA) has also been used a reference method for body fat determination,99 100 whereas hydrodensiometry is considered the gold standard in adults, but not in children due to the fact that they have difficulties with the breathing manoeuvre involved in determining underwater weight.85

Skin fold thickness

The validity of different skin fold* equations to estimate percentage body fat, mainly the Slaughter equation, have been extensively analysed by using the Bland–Altman method72 82 85 87,,89 and/or ANOVA for repeated measures.82 86 Rodríguez et al88 reported that the Slaughter equations using either triceps and subscapular or triceps and calf skin folds had the best agreement in male and female adolescents. Likewise, Buison et al89 found that the Brook equation is a valid alternative to measure percentage body fat in children aged 7–10 years (mean difference=−1.4% for percentage body fat; with limits of agreement of ±12.2%). Treuth et al87 reported that the Slaughter equation appears to be valid to estimate percentage body fat in prepubertal multiethnic girls (R2=0.69). Gutin et al85 reported a strong correlation between the Slaughter equation and the criterion measure (BIA and DXA), yet the limits of agreement were high (DXA vs skin fold thickness −3.65 to 9.50 and BIA vs skin fold thickness −10.81 to 9.89). Likewise, Campanozzi et al91 reported similar results in obese children and adolescents when they compared the Brook equation, BIA and DXA. It is noteworthy that discrepancies between methods seem to increase with the degree of obesity, which indicates the presence of heteroscedascity.

Ihmels et al90 showed a good overall agreement between estimates from BIA and sum of triceps and subscapular skin fold (classification agreement values: 82.8–92.6%). Guida et al64 reported, with BIA vector distribution, that triceps skin fold had moderate validity to assess percentage body fat (r=0.79, p<0.001), and that it was not affected by body size. Goran et al65 developed several equations to estimate intra-abdominal adipose tissue and subcutaneous abdominal adipose tissue. They concluded that intra-abdominal adipose tissue was best predicted by abdominal skin fold, ethnicity and subscapular skin fold (R2=0.82, SEE: 9.8 cm2), whereas subcutaneous abdominal adipose tissue was best predicted by waist circumference, subscapular skin fold, height and abdominal skin fold (R2=0.92, SEE: 28.8 cm2). However, in obese children and adolescents, using ultrasound as a gold standard method, triceps and subscapular skin fold did not show a good validity (r=0.13 to 0.34, r=0.02 to 0.37, respectively).66 Differences in criterion methods, statistical methods and techniques to measure skin fold thickness make comparison among studies difficult.

BMI

Correlations between BMI and body fat measured by more advanced methods generally exceed 0.50 and are frequently much higher.67 68 This finding supports, a priori, the validity of BMI, yet several studies highlight that BMI should be used with caution when comparing groups with different demographic characteristics and using DXA as criterion measure.68,,70 77 78 81 Ellis et al69 reported in children aged 3–10 years that the correlations between percentage body fat and BMI were significant for girls (R2=0.70, p<0.001) and boys (R2=0.34, p<0.001); however, when a linear model was used, the ability of BMI to accurately estimate percentage body fat was poor (SEE: 4.7% for girls and SEE: 7.3% for boys). Daniels et al70 studied whether BMI was a representative equivalent measure of body fat independent of age, race, gender, sexual maturation and distribution of fat in children and adolescents aged 7–17 years. They showed that BMI, gender, race, sexual maturation and distribution of fat were all significant independent correlates of the percentage body fat (R2=0.77).

In order to detect possible changes in body composition, Guida et al64 analysed the association between BIA vector distribution and BMI in children aged 8 years. They showed a relationship between fat mass using BIA and BMI (r=0.92, p<0.001) and fat free mass using BIA and BMI (r=0.58, p<0.001). However, they showed that BMI was not able to differentiate whether the weight change was due to a variation in body fat or in fat free mass, which concurs with other studies.66 70 Moreno et al71 tried to improve the International Obesity Task Force BMI cut-off values, in terms of prediction of percentage body fat in adolescents aged 13–17 years. They concluded that BMI cut-off points seem to be useful as an approximate classification of obesity status, but cannot accurately predict a specific individual's percentage body fat. Therefore, they suggested that BMI could be used as a screening test, but that in clinical setting the percentage body fat should be measured by using a more accurate method such as DXA.

Several studies have compared the accuracy of BMI and skin fold thickness to estimate body fat.72,,76 Steinberger et al72 reported that the Slaughter equation and BMI were highly correlated with DXA children and adolescents aged 11–17 years. Similarly, Sarria et al73 developed several equations to estimate body density from underwater weighing in boys aged 7–16 years. They found that the correlations between body density and logΣ4 skin folds (r=−0.781 to 0.820) were higher than those with BMI (r=−0.586 to −0.798) at all ages. The best estimators of body density were logΣ4 skin folds or a combination of BMI and triceps skin fold. Freedman et al74 examined the additional information provided by skin fold thickness on body fatness, beyond that conveyed by BMI for age, among children and adolescents aged 5–18 years. The use of sum of skin folds reduced the overall prediction errors (absolute value of the residuals) for percentage body fat by 20% to 30%, whereas among overweight children, the sum of skin folds reduced the prediction errors for percentage body fat by only 7% to 9%. Likewise, BMI was shown to be a less accurate predictor of degree of fatness than the sum of skin folds in lean female adolescents (r=0.67, p<0.001 between BMI and percentage of body fat-DXA; r=0.80, p<0.001 between skin fold sum and percentage of body fat-DXA).75

These findings support the idea that skin fold thicknesses are, in general, better predictors of childhood obesity than BMI is,72,,74 although the accuracy of BMI as an indicator of adiposity varies by the degree of body fatness,71 72 74 80 84 markedly improving at higher levels of BMI.66 71 72 80 84

Waist circumference (WC) and waist-to-hip (WHR) ratio

Goran et al65 developed several equations to estimate intraabdominal adipose tissue and subcutaneous abdominal adipose tissue by skin folds, circumferences and DXA in prepubertal children. They found that WHR had low correlation with intraabdominal adipose tissue (r=0.32) and subcutaneous abdominal adipose tissue (r=0.40), whereas WC was strongly correlated with intra-abdominal adipose tissue (r=0.84) and subcutaneous abdominal adipose tissue (r=0.93). This finding is consistent with the study by Taylor et al93 They showed that WC was valid measure of central adiposity in children and adolescents (r=0.92, p<0.001 in girls and boys), but not WHR (r=−0.40 and r=−0.04 in girls and boys, respectively). Likewise, Brambilla et al83 using MRI in a greater sample, also support that WC is a good predictor of body fat distribution. WC was the best single predictor of visceral adipose tissue (64.8% of variance) and BMI was the best predictor of subcutaneous adipose tissue (88.9% of variance), while WC explained 80.4%. Finally, one study examined the best anthropometric measure to estimate fat distribution in obese children, using ultrasound as criterion measure.66 The results showed that in obese children, BMI provided the best estimate of visceral adipose tissue (R2=0.53), while skin fold thickness, WC and WHR showed a lower association (r=0.02 to 0.37, r=0.08 to 0.42, r=0.08 to 0.42, respectively). In the control group, visceral adiposity was significantly correlated with BMI, skin fold thickness and WC (p<0.05), but not with WHR (p>0.05).

In conclusion, there is strong evidence that skin fold thickness and BMI are good predictors of body fat. The Slaughter equations are valid to estimate body fat in youth, but not in obese children, where BMI seems to be the best indicator. The accuracy of BMI varies by the degree of body fatness, significantly improving at higher levels of body fat. There is strong evidence that WC is a good measure to estimate central body fat, and there is limited evidence that WHR is a good measure to estimate central body fat.

What is already known on this topic

  • ▶. In the last two decades, much attention has been devoted to the status of physical fitness in youth, and many surveillance systems have been used across the world.

  • ▶. Despite the growing interest in this area, no attempt has been made to summarise the criterion-related validity of the existing field-based fitness tests in youth.

What this study adds

  • ▶. We have formulated an evidence-based proposal of the most valid field-based fitness tests in youth.

  • ▶. Our finding are: (i) The 20 m shuttle run test is a valid test to assess cardiorespiratory fitness, (ii) the hand-grip strength test is a valid test to assess upper body muscular strength and the standing broad jump is a valid test to assess lower body muscular strength, (iii) skin fold thickness is a valid measure to estimate body fat in non-obese youth, where body mass index seems to be the best indicator and (iv) waist circumference is a valid measure to estimate central fatness.

Discussion

In summary, the present systematic review highlights a number of important key points about the criterion-related validity of field-based fitness tests in children and adolescents (fig 1):

  • ▶. Cardiorespiratory fitness: the 20mSRT seems to be the most appropriate test to assess cardiorespiratory fitness. The Barnett (b) and Ruiz equations yielded the most promising results to estimate VO2max. An alternative to the 20mSRT might be the 1-mile run/walk test, using the Cureton equation, but not in trained children.

  • ▶. Musculoskeletal fitness: there is strong evidence that the hand-grip strength test with the elbow extended and with the grip span adapted to the individual's hand size (using the TKK dynamometer) is a valid test to assess isometric muscular strength. Due to a reduced number of studies, there is limited evidence that the standing broad jump and vertical jump tests are valid to assess explosive strength.

  • ▶. Motor fitness: there is no study, ranked as high quality, assessing the criterion-related validity of motor fitness.

  • ▶. Body composition: the Slaughter equations, based on skin fold thickness are valid to estimate body fat in youth, but not in obese children, where BMI seems to be the best indicator. The accuracy of BMI varies by the degree of body fatness, significantly improving at higher levels of body fat. There is strong evidence that WC is a valid measure to estimate central fatness.

Figure 1

Evidence-based proposal of most valid field-based fitness tests in youth. 120mSRT with the equation reported by Barnett (b) and Ruiz. 21-mile run/walk with the equation reported by Cureton, but not in trained children. 3TKK dynamometer with age-specific and sex-specific grip span equations. 20mSRT: 20 m shuttle run test; BMI, Body Mass Index.

Acknowledgments

We would like to thank Professor Willem van Mechelen, Professor Pekka Oja, Professor Han CG Kemper, Professor Kari Bø and Professor Jorge Mota for their valuable contributions to the conception and strategy of this review.

References

Supplementary materials

  • Web Only Data bjsm.2009.058321

    Files in this Data Supplement:

Footnotes

  • Funding This work was supported by the European Union within the framework of the Public Health Programme (ALPHA project, Ref: 2006120), the Swedish Council for Working Life and Social Research, the Spanish Ministry of Education (EX-2007-1124; AP-2004-2745; and AP2005-4358) and the Spanish Ministry of Education and Science-FEDER funds (Acciones Complementarias DEP2007-29933-E).

  • Competing interests none.

  • Provenance and peer review Not commissioned; externally peer reviewed.