Article Text
Abstract
Objective: Maximal oxygen uptake (Vo_{2max}) of 44 ml kg^{−1} min^{−1} is an accepted criterion (Vo_{2CR}) below which health and fitness for young male adults may be compromised. New algorithms validated for Vo_{2CR} screening using the 20 m multistage shuttle run test (20mMST) were developed.
Methods: Vo_{2max} was assessed in 110 males using a stationary gas analyser in a treadmill test (TT) and in 40 of these subjects using a portable gas analyser in the 20mMST. Vo_{2max} predicted from the 20mMST in 70 subjects was used for cross validation. Two equations predicting Vo_{2max} during 20mMST (EQ_{MST}) and TT (EQ_{TT}) were developed.
Results: Significant energy cost variance (EC_{V}) was detected between TT and 20mMST (p<0.001), correlated significantly with subject height, and was a significant predictor of Vo_{2max} differences between TT and 20mMST. The r^{2} of EQ_{MST} was 0.92 (p<0.001). Predicted Vo_{2max} values from EQ_{MST} correlated with directly measured 20mMST Vo_{2max} at r = 0.96 (p<0.001). ANOVA detected no mean difference (p>0.05) between predicted and measured values. Prevalence of low fitness based on Vo_{2CR} was 0.37. McNemar χ^{2} indicated significant differences in sensitivity (p<0.001) and specificity (p<0.05) between the original 20mMST equation (EQ_{LÉG}) and EQ_{TT}, regarding Vo_{2CR} screening. Cohen’s κ demonstrated higher agreement with TT Vo_{2max} for EQ_{TT} (p<0.001) than EQ_{LÉG} (p<0.05). TT Vo_{2max} correlated with the end result of both EQ_{LÉG} and EQ_{TT} at r = 0.75 (p<0.001). Unlike EQ_{TT} (p>0.05), mean predicted Vo_{2max} from EQ_{LÉG} was significantly higher compared to TT Vo_{2max} (p<0.001).
Conclusion: These algorithms increase the efficacy of 20mMST to accurately evaluate aspects of health and fitness.
 ANOVA, analysis of variance
 CF, cardiorespiratory fitness
 CI, confidence interval
 CV, coefficient of variation
 EC, energy cost
 GEE, generalised estimating equations
 GLM, general linear model
 LIM_{AG}, limits of agreement analysis
 MAS, maximal attained speed
 ROC, receiver operating characteristics
 SD, standard deviation
 TT, treadmill test
 20mMST, 20 m multistage shuttle run test
 energy cost
 field testing
 receiver operating characteristics
 screening
 Vo2max
Statistics from Altmetric.com
 ANOVA, analysis of variance
 CF, cardiorespiratory fitness
 CI, confidence interval
 CV, coefficient of variation
 EC, energy cost
 GEE, generalised estimating equations
 GLM, general linear model
 LIM_{AG}, limits of agreement analysis
 MAS, maximal attained speed
 ROC, receiver operating characteristics
 SD, standard deviation
 TT, treadmill test
 20mMST, 20 m multistage shuttle run test
Despite the vast amounts of research focusing on various cardiorespiratory fitness (CF) assessments and the acceptance of specific CF cut offs in national health guidelines,^{1,}^{2} statistical screening methodology such as calculating receiver operating characteristics (ROC) curves has not been employed hitherto. The ROC curve analysis is extensively used in epidemiology to provide a graphic means for assessing the accuracy of a diagnostic instrument.^{3} The difficulty in adopting ROC curves in sports medicine is mainly attributed to the fact that most outcome measures are in continuous format. However, these biomarkers can be dichotomised using dummy variables according to clinically accepted critical values Q and defined positive or negative if the test outcome measure is greater or lesser than Q. For instance, a maximal oxygen uptake (Vo_{2max}) of 44 ml kg^{−1} min^{−1} for young male adults (18–29 years of age) has been generally accepted as a criterion (Vo_{2CR}) below which both health and fitness may be compromised.^{1,}^{4,}^{5}
The 20 m multistage shuttle run test (20mMST)^{6} represents an acceptable field assessment tool for CF, and has been repeatedly employed in different health^{7,}^{8} and fitness^{9} settings. However, the popularity of the 20mMST is mainly attributed to its practical use for simultaneous measurement of large groups of individuals. Studies evaluating its accuracy in predicting laboratory Vo_{2max} have reported contradictory results.^{9–}^{11} More importantly, the efficacy (that is, the extent to which a specific procedure produces a valid classification of data in relation to established criteria) of the original 20mMST model in screening for CF remains unknown.
From a statistical standpoint, the limited accuracy of the 20mMST may be attributed to the repeated measures design used in the original study.^{6} It is well known that the inherent dependency of withinsubject observations can reduce the power of prediction models.^{12} Concurrently, it seems tenable that the theoretical basis of the original 20mMST model may be further compromised by the use of generally large and heterogeneous samples in the validation procedures.^{6} It has been established that severely biased linear relationships can occur owing to sample heterogeneity.^{13}
From a physiological viewpoint, it could be argued that the curtailed ability of the original 20mMST model to predict treadmill Vo_{2max} values might be attributed to differences in the exercise modes utilised in the validation procedures (that is, shuttle running v forward running). Findings from recent investigations suggested that Vo_{2max} during the 20mMST is significantly higher compared to a treadmill test.^{14,}^{15} Ergo, a prediction model controlling for differences in energy cost (EC) between the reference standard laboratory assessment and the proxy 20mMST may result in more accurate prediction of Vo_{2max} and increased efficacy in screening for Vo_{2CR}. The objective of the present investigation was to develop a new Vo_{2max} prediction algorithm for the 20mMST using data collected via portable indirect calorimetry and statistical procedures which accounted for withinsubject observation dependency. Thereafter, the efficacy of both the original and the novel models was assessed in predicting standard treadmill Vo_{2max} and screening for Vo_{2CR}.
METHODS
Subjects and procedures
A total of 110 healthy males (age: 21.6 (SD 2.5); BMI: 23.6 (2.2)) volunteered. Exclusion criteria included smoking and any muscular or skeletal injuries. Written informed consent was obtained from all participants after full explanation of the procedures involved. The cohort was arbitrarily divided into model (n = 40) and validation (n = 70) groups. Analysis of variance (ANOVA) revealed no significant difference between the two groups in terms of anthropometrical characteristics.
Within a 14 day period, all participants underwent a treadmill Vo_{2max} assessment and performed the 20mMST in an indoor rubber floored gymnasium. Unlike the validation group, participants in the model group were subjected to Vo_{2max} assessment whilst performing the 20mMST using a portable gas analyser. Special care was taken to maintain similar environmental conditions in both measurement sites during assessment. Prior to data collection visits, subjects were familiarised with all assessment protocols. They were also advised to avoid stressful activities 36–48 h prior to the data collection visits. Tests were conducted in a random order, by the same investigators, and at the same time for each subject either between 9:00 and 12:00 h or between 14:00 and 17:00 h. The study was approved by the Research Ethics Board of the University of Wolverhampton.
Data collection
Laboratory assessment of Vo_{2max} (TT)
A modified Bruce treadmill test (TT) to exhaustion was used.^{16} The treadmill running speed was manipulated accordingly in order to bring the subject to exhaustion in 7–10 min. The treadmill inclination was increased by 2.5° every 3 min from an initial 3.5°. Oxygen uptake (Vo_{2} (ml kg^{−1} min^{−1})) was measured via open circuit spirometry using an automated gas analyser (Vmax 29, SensorMedics, Yorba Linda, CA) previously calibrated with standard gases. Respiratory parameters were recorded every 20 s during testing, while subjects inspired room air through a low resistance twoway Rudolph valve. To ensure that subjects achieved Vo_{2max}, measurements were considered for further analysis when at least two of the following criteria were met: (i) maximal heart rate greater than 185 bpm, (ii) respiratory exchange ratio greater than 1.1, and/or (iii) detection of plateau in Vo_{2} curve. EC in kcal was calculated for each individual minute/stage as the product of mean Vo_{2} (l min^{−1}) by the corresponding caloric equivalent.^{17}
Field assessment of Vo_{2max} (20mMST)
This test was conducted according to established procedures.^{6} In the model group a portable gas analyser (K4b^{2}, Cosmed, Rome, Italy) was used to record respiratory parameters every 20 s during testing, while subjects inspired room air through a facemask. Maximal oxygen uptake was the main parameter determined using the open circuit method. Prior to measurement, the gas analyser was calibrated with standard gases. Exhaustion was confirmed when at least two of the following criteria were met: (i) maximal heart rate greater than 185 bpm, (ii) respiratory exchange ratio greater than 1.1, and/or (iii) detection of plateau in Vo_{2} curve. The EC in kcal was calculated for each individual minute/stage as the product of mean Vo_{2} (l min^{−1}) by the corresponding caloric equivalent.^{17} In the validation group, Vo_{2max} was predicted from the 20mMST performance according to established procedures.^{6}
The K4b^{2} gas analyser weighed 475 g and was not expected to significantly alter the subjects’ energy demands. A pilot study using five subjects (age: 21.6 (SD 1.3); BMI: 24.3 (1.5)) was conducted in order to investigate additional energy demands and ensure that significant agreement existed between the two gas analysers employed. The subjects, who did not partake in the main part of the investigation, performed the previously described TT twice using both gas analysers. Results showed no significant difference (p>0.05) between the mean Vo_{2max} value recorded by the stationary (Vmax 29, SensorMedics) and the portable (K4b^{2}, Cosmed) gas analyser (48.7 (SD 3.1) v 49.1 (3.5) ml kg^{−1} min^{−1}, respectively), with an average absolute error of 0.51 (SD 0.18) ml kg^{−1} min^{−1}.
Statistical analyses
ANOVA was used to compare mean EC between TT and 20mMST. The effect of energycost variance between TT and 20mMST (EC_{V}) on the original 20mMST prediction model (EQ_{LÉG}^{6}) was assessed via a simultaneous general linear model (GLM). This model aimed to predict Vo_{2max} differences/errors between TT and EQ_{LÉG} using mean EC_{V} as an independent variable. In addition, Pearson’s correlation coefficients were used to detect linearity between EC_{V} and various anthropometrical characteristics.
For the calculation of the novel prediction model, the generalised estimating equations (GEE)^{18} approach was employed to account for subject specific dependency between the repeated observations. The GEE is a powerful approach in fitting generalised linear models to nonnormally but dependently distributed response variables.^{18} A GLM framework with GEE estimation was introduced to generate an equation (EQ_{MST}) predicting Vo_{2max} measured during the 20mMST using the model group data (n = 40). For the latter model, the maximal attained speed (MAS) during the 20mMST was set as the independent variable. Thereafter, a second GLM with GEE estimation was performed generating the EQ_{TT} model which aimed to predict the reference standard TT Vo_{2max} (dependent variable) using the end result of EQ_{MST} as an independent variable. This procedure was employed to produce a 20mMST Vo_{2max} model that accounts for EC_{V}. In order to ensure that the procedures followed in the calculation of the EQ_{TT} model were indeed superior to the traditional approach, a GLM was calculated using TT Vo_{2max} (dependent variable) and MAS (independent variable). ANOVA and Pearson’s correlation coefficients were used to detect possible bias between the mean actual and predicted Vo_{2max} values for the three models.
Data from the remaining 70 subjects (referred to as the validation group) were used to cross validate EQ_{TT} and the original EQ_{LÉG} model. Correlation coefficients, ANOVA, 95% limits of agreement analyses (LIM_{AG}) and percent coefficients of variation (CV_{%}) were adopted to validate the two models according to established procedures.^{19} Ninety five percent confidence intervals (CI_{95%}) and ROC curve analysis were calculated using statistical software incorporated in SAS/Macro/IML. The latter software is designed specifically to fit ROC curves using dummy variables for data obtained from repeated measures designs. The area under the ROC curve was estimated using the Wilcoxon nonparametric method.^{20} The demarcation point for Vo_{2CR} was set at 44 ml kg^{−1} min^{−1} according to available guidelines.^{1,}^{4,}^{5} Calculated sensitivity and specificity with corresponding CI_{95%} were used to determine the efficacy of the two equations in screening for Vo_{2CR}. Sensitivity (S_{E}) was defined as the proportion of subjects below the Vo_{2CR} who demonstrated a 20mMST predicted value below 44 ml kg^{−1} min^{−1}. Specificity (S_{P}) was defined as the proportion of subjects above the Vo_{2CR} who revealed a 20mMST predicted value above or equal to 44 ml kg^{−1} min^{−1}. McNemar χ^{2} analysis examined the differences between calculated sensitivity and specificity at the cut off point for both equations. Cohen’s κ statistic was used to evaluate the agreement between the prediction models and the reference standard test. Finally, ANOVA and Pearson’s correlation coefficients were used to detect possible bias between the mean actual and predicted values. All statistical analyses were carried out with SPSS (version 11.5; SPSS, Chicago, IL) and SAS (version 8.2; SAS Institute, Cary, NC, USA) statistical software packages. The level of significance was set at p<0.05.
RESULTS
Effect of energycost variance on EQ_{LÉG}
ANOVA detected significant differences in EC and Vo_{2max} between TT and EQ_{LÉG} (p<0.001; fig 1). Further, GLM results indicated that mean EC_{V} was a significant predictor of Vo_{2max} differences between TT and EQ_{LÉG} (r^{2} = 0.25, F_{1, 38} = 28.89, p<0.001). A significant linearity was also detected between EC_{V} and subject height (r = 0.94, p<0.001).
Prediction of Vo_{2max} achieved via 20mMST and TT
Table 1 shows relevant statistics for the calculated models (that is, EQ_{MAS}, EQ_{MST}, and EQ_{TT}). Routine preanalysis screening procedures were used to assess whether the data conformed to the assumptions of GLM. Although normally distributed, the variables used in these analyses were not independent of one another. Examination of residuals scatterplots detected no violation of normality, linearity, and homoscedasticity between predicted Vo_{2max} scores and errors of prediction. Mahalanobis distance of each case to the centroid of all cases detected no multivariate outliers for χ^{2}<0.001. As expected the values in the variables utilised were multicollinear, being similar measures of the same parameter (that is, Vo_{2max}). As significant linearity was detected between EC_{V} and subject height (see previous section), initial calculations for EQ_{MST} and EQ_{TT} included height as a covariate. Nevertheless, the latter variable was not a significant predictor (p>0.05) for either model.
[EQ_{MAS}] Vo_{2max} = MAS×6.87−39.54
[EQ_{MST}] Vo_{2max} = MAS×6.65−35.8
[EQ_{TT}] Vo_{2max} = EQ_{MST}×0.95+0.182
Thus,
[EQ_{TT}] Vo_{2max} = (MAS×6.65−35.8)×0.95+0.182
Model cross validation
Means (SD) and comparisons of various performance indices from the TT and the 20mMST, as well as results for LIM_{AG} and CV_{%} appear in table 2. Preliminary analyses for LIM_{AG} revealed no positive relationship between the differences/errors (either (EQ_{LÉG}–TT) or (EQ_{TT}–TT)) and the size of measurements (given by either (the mean of EQ_{LÉG} and TT) or (mean of EQ_{TT} and TT)), respectively. Thus, the LIM_{AG} can be reported as absolute measurements.^{21} Finally, unlike EQ_{TT} and TT (t = 1.46, p>0.05), the mean difference (error) between estimates from EQ_{LÉG} and TT (t = −8.86, p<0.001) was biased.
Relevant univariate statistics and ROC curve analyses for the designated cut off point (that is, 44 ml kg^{−1} min^{−1}) appear in table 3 and fig 2. Twenty six subjects (37.1%; CI_{95%}: 0.9%) were diagnosed below the Vo_{2CR} using the reference standard TT. In contrast, EQ_{LÉG} and EQ_{TT} identified six and 29 subjects below the Vo_{2CR}, respectively. Cohen’s κ statistic demonstrated significant agreement with the TT measurement for both the EQ_{LÉG} (p<0.05) and the EQ_{TT} (p<0.001).
DISCUSSION
Sedentary lifestyle is a common phenomenon in modern societies, representing a major risk factor for numerous pathologies.^{22} Consequently, screening for, and evaluation of, CF has become important for both health and fitness. The aim of the present investigation was to utilise the most salient physiological and epidemiological procedures in order to enhance the efficacy of the 20mMST for CF screening. Results suggested that the developed prediction models significantly increased the efficacy of the 20mMST to discern subjects according to Vo_{2CR}. To our knowledge, the present study represents the first direct clinical appraisal of the 20mMST as a screening tool for specific CF cut off points such as Vo_{2CR}.
To account for the increased energy requirements of shuttle running compared to forward treadmill running,^{14,}^{15} we developed a prediction equation which incorporates indirect calorimetry data collected while the subjects performed the 20mMST. Results from the newly developed model demonstrated increased accuracy in predicting Vo_{2max} and a minimised standard error of the estimate (1.9 ml kg^{−1} min^{−1}) compared to the original EQ_{LÉG} and EQ_{MAS} (4.4 and 2.7 ml kg^{−1} min^{−1}, respectively). ^{6} Although the limits of agreement in EQ_{TT} are still relatively wide, this range is more likely to be acceptable compared to EQ_{LÉG} and EQ_{MAS}. Further, as illustrated by the present CV_{%} indices, the traditional Vo_{2max} prediction can be up to 1.2 times as unreliable as the prediction of EQ_{TT}. ROC curve analysis indicated that both EQ_{TT} and EQ_{LÉG} were highly specific in discriminating individuals according to Vo_{2CR}. However, sensitivity in the former was significantly increased compared to the latter model (81% v 23%).
What is already known on this topic
The 20 m multistage shuttle run test (20mMST) is an acceptable field assessment tool for cardiorespiratory fitness but its original prediction model is subject to significant bias.
The theoretical basis of the EQ_{TT} model is advantageous in that it seeks to parallel the energy utilisation of the human body during the 20mMST and the TT, rather than relying on statistical inference from a generally large and heterogeneous sample. The cohort consisted entirely of males to avoid the well known phenomenon of severely biased (that is, nonsense or spurious) linear relationships attributed to sample heterogeneity.^{13} This phenomenon has been demonstrated explicitly by Anderson^{23} who examined various factors associated with prediction power in the original 20mMST model. Anderson concluded that research utilising large heterogeneous samples in the validation process of predictive tests of aerobic capacity must be suspect. It seems reasonable to suggest that the prediction models developed using these procedures are rather generalised, representing merely vague indicators of the true values. These hypotheses are verified in the present study by the reduced accuracy of the EQ_{MAS} prediction model, as compared to EQ_{TT}.
On another note, the present results are in line with previous studies suggesting increased energy demands during shuttle running compared to treadmill running.^{14,}^{15} This may well be attributed to differences in factors such as intensity, exercise mode, technique, and musculature employed between the two conditions. These factors should be considered in the design of physical training programmes that incorporate shuttle running elements. This information should also be taken into account when designing the physical training for sports incorporating shuttle running (for example, football, basketball, rugby). In addition, the present results suggest that EC_{V} is exacerbated with increased body stature. It is tenable that various biomechanical complexities of shuttle running may account for this. The EQ_{MST} model developed herein to predict Vo_{2max} during the 20mMST can be used to calculate the oxygen transport demands of shuttle running, when such information is required.
It is important to acknowledge, however, that the 20mMST is a test requiring maximal effort. Therefore, it may not be suitable for populations with specific diseases. In addition, the novel EQ_{TT} model represents a strict means of assessing CF. Three subjects with CF above the Vo_{2CR} in our cross validation sample were misscreened as performing below the Vo_{2CR}. Practicing such strict screening techniques may be beneficial in circumstances where adequate levels of CF are crucial (for example, military training). The applications from the present investigation would be further increased by calculating additional prediction models for both males and females of various age groups. In addition, it is worth mentioning that the present results are subject to some variability among different models of metabolic carts.^{24} Within the limits of the present investigation, it is concluded that the developed models can be valuable tools that explicitly increase the efficacy of the 20mMST to discern subjects according to Vo_{2CR}.
What this study adds
The prediction models introduced in the present study increase the efficacy of 20mMST thus providing increased accuracy in evaluating aspects of health and fitness.
REFERENCES
Footnotes

Competing interests: none declared