Background The Nine Plus screening battery test (9+) is a functional movement test intended to identify limitations in fundamental movement patterns predisposing athletes to injury. However, the interseason variability is unknown.
Aim To examine the variability of the 9+ test between 2 consecutive seasons in professional male football players.
Methods Asymptomatic Qatar Star League players (n=220) completed the 9+ at the beginning of the 2013 and 2014 seasons. Time-loss injuries in training and matches were obtained from the Aspetar Injury and Illness Surveillance Program. No intervention was initiated between test occasions.
Results A significant increase in the mean total score of 1.6 points (95% CI 1.0 to 2.2, p<0.001) was found from season 1 (22.2±4.1 (SD)) to season 2 (23.8±3.3). The variability was large, as shown by an intraclass correlation coefficient (ICC) of 0.24 (95% CI 0.11 to 0.36) and a minimal detectable change (MDC) of 8.7 points. Of the 220 players, 136 (61.8%) suffered a time-loss injury between the 2 tests. There was an improvement in mean total scores in the injured (+2.0±0.4 (SE), p<0.001) group but not in the uninjured group (+0.9±0.5, p=0.089). The variability from season 1 to season 2 was large both in the injured (ICC 0.25, 0.09 to 0.40, MDC 8.3) and uninjured (ICC 0.24, 0.02 to 0.43, MDC 9.1) groups.
Conclusions The 9+ demonstrated substantial intraindividual variability in the total score between 2 consecutive seasons, irrespective of injury. A change above 8 points is necessary to represent a real change in the 9+ test between seasons.
- Functional movement screen
- Injury prevention
Statistics from Altmetric.com
Injuries in football are common, causing substantial morbidity, and may have long-term health consequences on the player.1–3 One strategy to prevent injuries is the use of a periodic health evaluation (PHE) or screening examination to identify the athlete at risk for injury, with a view to implementing targeted prevention measures.4 For a PHE to be effective in detecting injury risk or be clinically useful, it is essential that the screening tools or tests used are reliable, valid and reproducible, and have acceptable measurement error.5–9
Functional movement tests have become popular components of musculoskeletal screening examinations, and are also used for clinical assessments to determine treatment response and assist in return to play decision-making.10 The Nine Plus screening battery test (9+) is a functional movement test attempting to identify limitations in fundamental movement patterns predisposing athletes to injury.11 This relatively recently developed tool comprises six tests with modified criteria from the functional movement screen (FMS; deep squat, in-line lunge, shoulder mobility, trunk stability push-up, active hip flexion and diagonal lift); in addition, Frohm et al11 included five additional tests (one-legged squat, deep one-legged squat, drop jump test, seated rotation and straight leg raise) to fill the gap for tests challenging dynamic trunk flexors, rotation of the spine, and knee control and strength.11 ,12
There is limited evidence for the measurement properties of the 9+. An initial study by Frohm et al11 found good inter-rater (intraclass correlation coefficient (ICC) 0.80) and intra-rater (ICC 0.75) reliability of the 9+ in a sample of elite football players. The validity of the 9+ in predicting injury is still unknown. However, athletes with scores below 67% of the total score on the FMS have shown a significantly higher injury risk compared with athletes who score above 67%.13 For 9+ to be clinically useful as a potential predictor, it is important to document the normal variation, in the absence of any intervention or injury, to be able to meaningfully interpret differences in a test result.14
Therefore, we aimed to examine the season-to-season variability of the 9+ in a group of professional male football players. We hypothesised that in the absence of any prevention or performance intervention or injury, the 9+ score would be stable (ie, low variability) between seasons.
Study design and participants
We analysed prospectively collected data from a PHE of professional male football players in Qatar.15 All players eligible to compete in the Qatar Stars League (QSL), the professional first division of football in Qatar, were invited to participate as they presented for their annual PHE at Aspetar Orthopaedic and Sports Medicine Hospital in Doha (Qatar) at the beginning of the 2013 and 2014 seasons, which the majority (66.6%) completed during the preseason period (July through September). A smaller group (23.8%) completed the tests during the early/mid competition phase (October through December 2013 or 2014) and a few (9.7%) did the testing during the 2014 postseason (May through June).
As part of the musculoskeletal component of the PHE, all players underwent the 9+ test in the rehabilitation department of the hospital each year. Players presenting with 9+ data from both season examinations (2013 and 2014) were included for analyses. Players reporting a current injury or physical symptom limiting training or match play at the time of testing were excluded from analyses. Ethical approval was obtained from the Institutional Review Board, Anti-Doping Laboratory Qatar. All players signed a written informed consent form at inclusion, allowing their data to be used for research.
The 9+ was performed by experienced sports physiotherapists working at the study institution. In total, 14 physiotherapists were involved in performing the 9+ testing during the study period (7 performed in both seasons, 7 in one of the two seasons only). Prior to testing, all physiotherapists underwent a 2-day course with the inventors of the 9+,11 in addition to performing the 9+ in their clinical practice.
We measured the intertester reliability of the 14 physiotherapists in a subgroup of 63 randomly chosen players during the screening setting in the 2014 season. The intertester reliability for the total score and each of the tests was examined with two testers from a randomly selected pool of 8 of the 14 physiotherapists (4 of these were involved in testing both seasons, 4 in the 2014 tests only). The testers were blinded for each other's 9+ score.
The 9+ screening battery was performed as described by Frohm et al11 ,12 on both test occasions (2013 and 2014). The 9+ consists of 11 functional and complex movement exercises to assess stability, mobility and neuromuscular control in the kinetic chain. Each player performed the 11 tests and they completed each test in the same order on both test occasions. Seven of the 11 tests are assessed bilaterally, looking for asymmetries. For these tests, the left extremity was tested first and the lower of the two scores for the left and right sides was used for data analysis. Each movement test was scored on a four-point scale (3–0), with 3 representing correct completion of the task with no compensatory movements, 2 correct but with the presence of compensatory movements, 1 not correct despite compensatory movements and 0 if pain was present. Thus, the player could reach a maximum score of 33 points. A more detailed description of the 9+ movements is provided by Frohm et al.11 ,12
All players performed the tests barefoot, with shorts and a t-shirt, except for the drop jump test. As described by Frohm et al,12 the players wore their own training shoes for this test. Owing to equipment availability, the participants performed the drop jump test from a 30 cm box, in contrast to a 40 cm box height as described by Frohm et al.12 The physiotherapists gave a standardised verbal instruction, and showed the player a photo of the starting and finishing positions of an optimally performed exercise. Each player performed each test three times, and the maximum score achieved was recorded and used for evaluation of test performance. Verbal corrections were given during the three trials in order to achieve the most optimal performance. All testers and participants were blinded to the player's score from test occasion 1 on test occasion 2. The 9+ took 20–30 min to complete.
Between-season data collection
After the completion of the initial 9+ in 2013, a report form with the total 9+ score along with the results from the other PHE tests was given to the respective team doctor.15 Other than that, no specific intervention was advised based on the 9+ score from test 1. Data on injuries in training and matches during the intervening football season were obtained from the Aspetar Injury and Illness Surveillance Program (AIISP).2
The AIISP is based on prospective injury recording from all 14 QSL teams. An injury was recorded if the player was unable to fully participate in future football training or match play (time-loss injury).2 ,16 The player was considered injured until declared fit for full participation in training and available for match selection by medical staff.
The team physician (or head physiotherapist when no physician was available) for each team recorded all injuries daily throughout the intervening season. For each injury recorded, the team physician/physiotherapist completed a standardised injury card containing information on the body part injured, injury type and specific diagnosis. In addition, the injury card included questions related to reinjury, overuse or trauma, injury mechanism (contact or collision), as well as information on whether the injury occurred during training or match play. Injury severity was determined by the number of days absent from matches or training sessions due to injury and was classified as mild (1–3 days), minor (4–7 days), moderate (8–28 days) or severe (>28 days). Injury data were requested from the clubs every month. We maintained regular communication with the clubs to encourage timely and accurate reporting.
Data were analysed with IBM SPSS statistics, V.21 (IBM Corp, Armonk, New York, USA). We used a paired t-test to assess for systematic differences in the 9+ total score between test occasions. Significance level was set at p<0.05. The variability (random error) of the 9+ total score between tests was assessed using the ICC1.1 with 95% CIs, and SE of measurement (SEM).17 ,18 The SEM was calculated from the square root of the mean square of the residual term derived from the analysis of variance (ANOVA). The minimal detectable change (MDC) with 95% certainty was calculated as SEM×1.96×√2.17 ,19
Systematic differences and the variability of each movement test between test occasions were also examined. Since each movement test is measured on an ordinal scale, a non-parametric test (Wilcoxon signed-rank test) and weighted κ (κw) were used. The weighted κ was calculated using STATA (V.11.0, StataCorp, College Station, Texas, USA).
The intertester reliability for the total score was analysed using ICC1.1 with scores between 0.75 and 1.00 interpreted as good, 0.50–0.74 as moderate, and those below 0.50 as poor.20 The κw was used to analyse the intertester reliability for each movement test with scores interpreted as follows: <0.20 as poor, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial and 0.81–1.00 as excellent.21
Data are presented as means with SDs or 95% CI unless otherwise stated.
A total of 247 male footballers completed the 9+ during both the 2013 and 2014 seasons. Of these, 27 players were excluded from analyses because of current injury, no consent or missing injury registration (figure 1). Thus, the final sample included 220 players (age 25.3±4.6 years; height 176±7 cm; body mass 71±9 kg; body mass index 22.8±2.0 kg/m2). The players represented 35 nationalities, the majority from the Middle East (71.8%). By ethnicity, 57.7% were Arabic, 29.5% black, Caucasian 3.6%, East Asian 0.9%, Persian 6.8% and 1.4% other. There were no missing items of the 9+ among any of the players included in the interseason variability and intertester reliability analyses.
Examiner intertester reliability
The intertester reliability for the total score was moderate (ICC=0.68), while the intertester reliability for each test ranged from fair to excellent (κw=0.31 to 0.81; table 1). For 8 of the 11 exercises (72%), reliability was fair or moderate.
Interseason variability of the 9+
The mean time between the two 9+ test scores was 359.7±65.4 days. We observed a statistically significant increase in the mean total score of the 9+ test of 1.6 (95% CI 1.0 to 2.2, p<0.001) from season 1 (22.2±4.1) to season 2 (23.8±3.3). However, the variability was large (ICC 0.24, 95% CI 0.11 to 0.36; figure 2 and table 2).
Among the 220 players, 136 (61.8%) players had a ≥1 time-loss injury between the two 9+ tests, predominantly to the lower extremity (n=124, 91.2% of all injured players). We observed a consistent improvement in the 9+score across all subgroups, which tended to be greater for the injured than the uninjured group, as seen in table 2. The variability between season 1 and season 2 was large across all injured (ICC=0.13–0.25) and uninjured groups (ICC=0.23–0.27). Players with a severe injury (>28 days absence) displayed the greatest increase in mean total score between season 1 and season 2 (2.9±0.7, p<0.001). Again, the variability was large in this group (ICC=0.13–0.16), as illustrated in figure 3.
The SEM for the total score was large (3.0–3.4 points) across all groups, irrespective of injury and severity. The clinical applicability of the 9+ total score is limited, as indicated by the magnitude of the MDC (8.3–9.5 points), again irrespective of injury and severity (table 2).
We performed a subanalysis of players with a mean time between the 9+ test of <1SD (294.2 days, n=32) and >1SD (425.1 days, n=27) than the average, and observed similar findings as described above. There was a significant increase in the mean total score for the >1SD group of 2.1 (95% CI 0.40 to 3.82, p=0.017) from season 1 (21.3±3.8) to season 2 (23.4±4.2), whereas there was no significant increase in the mean total score for the <1SD group (0.21±0.8, p=0.79) from season 1 (22.8±4.2) to season 2 (23.0±2.8). However, the variability was again large for both the >1SD group (ICC=0.35, 95% CI −0.03 to 0.64, SEM=3.1) and the <1SD group (ICC=0.19, 95% CI −0.16 to 0.50, SEM=3.2).
Interseason variability of each movement test
There was a significant increase in score for each movement test between season 1 and season 2, apart from the one-legged squat, deep one-legged squat, seated rotation and shoulder mobility (table 3). However, the variability was large for all movement tests (κw=−0.003 to 0.63), irrespective of injury and severity.
The main finding of this study was that there was a substantial intraindividual variability in the 9+ mean total score between the two consecutive seasons, irrespective of injury and severity status, and the MDC was high across all groups. The intertester reliability was moderate. Additionally, there was a small but systematic improvement from one season to the next across all injured and uninjured groups.
The variability of the 9+ test
Only one study has previously investigated the measurement properties of the 9+ test. Frohm et al11 examined the inter-rater and intra-rater reliability among eight trained observers of the 9+ in a group of male elite football players (n=26). They reported good intra-rater reliability (ICC 0.75, based on data from 18 players) with no systematic change when players were retested after 7 days, indicating that player and tester performance was stable across test sessions. Similarly, good ICC scores have also been reported in several studies investigating the intra-rater and test–retest reliability of the FMS in physically active populations and college athletes retested after 2–7 days.22–24
We therefore assumed that in the absence of any intervention or injury, the 9+ total score would be stable (ie, low variability) between seasons. The remarkably low ICC observed in our study, across injured and uninjured groups, suggests that the ability of the 9+ test to detect changes in functional movement patterns is very limited, largely because of the sizeable measurement error. A similar tendency was also observed for each movement test, displaying consistently poor κw for all tests across all injured and uninjured groups.
An error in a measurement includes both rater variation, variation by chance and between-session variability in player performance.25 The intertester reliability of our testers (overall ICC 0.68) was lower on all of 9+ movement tests than those reported by Frohm et al,11 except for seated rotation and shoulder mobility. Frohm et al11 examined the intertester reliability in a small group of male football players (n=26) in a controlled research setting, using eight physiotherapists who were all experienced on the 9+. Our results may differ from those of Frohm et al,11 given that our testing was undertaken in a busy clinical screening setting using multiple testers (n=14) with less 9+ experience (than in Frohm et al's study). However, studies on the FMS have reported good intertester reliability for testers with varying experience.22–24 ,26–28 It is possible that in our screening setting some of the detailed movement criteria may have been missed, although all of our testers received the same initial 9+ training, and had similar clinical and 9+ experience. On the other hand, this increases the external validity of our findings.
The SEM in this study was large, ranging from 3.0 to 3.4 points across all groups independent of injury status, indicating that the 9+ total score has a normal variation (measurement error) of 3–4 points from season to season. Furthermore, the MDC ranged from 8.3 to 9.5 points, indicating that a minimum improvement of 8–10 points is required to represent a real change in the 9+ test, again irrespective of injury and severity. Given our large SEM and MDC, it suggests that the 9+ total score interseason variation is too large for the 9+ to detect change attributed to injury or clinical interventions.19 ,29 In other words, the large variability in the 9+ is mainly attributed to variability in player performance and chance rather than variability between testers.17 ,19 This view is substantiated by the difference in ICC values, which for the total score was 0.68 for between testers but only 0.24 between seasons.
There are several potential sources of random error that may help explain the observed variability in the 9+, including the motivation of the player, interpretation of the test instructions by the player or a learning effect.30 Another possible explanation may be the ambiguity of the scoring criteria. The difficulty in assessing and performing the more complex tests involving multiple joints and complex physical qualities such as balance, coordination and core stability (ie, the diagonal lift, in-line lunge, one-legged squat test) makes scoring and performance uncertain, and subsequently will cause variability in athlete performance and in the scoring (tester variability).11 ,27
Functional movement tests, including the 9+, are growing in popularity as an injury screening tool. Our results show that there is a large variability in the 9+ total score and a change of above 8 points is necessary to represent a real change in a player's 9+ test between seasons. Practitioners should consider this when interpreting the 9+ or similar FMS scores. Our intertester reliability, using multiple testers, was moderate and practitioners are advised to perform their own reliability tests on their target population before considering the 9+ for clinical use.
The ability of the 9+ to predict injury is still unknown. However, the validity of the FMS as an injury prediction tool has been scrutinised recently, and with conflicting results.8 ,13 ,31 ,32 Based on the initial study by Kiesel et al33 on the FMS in professional American football players, a total score below 67% was believed to represent an increased risk of injury.33 However, a recent meta-analysis revealed that a cut-off of 67% only provided a sensitivity of 24.7% and a specificity of 85.7%, with an area under the curve of 0.58, indicating that the overall predictive validity of the FMS is only slightly better than a 50/50 chance.32
Nevertheless, based on the study by Kiesel et al, a 9+ score below 67% (22 points) has been suggested as a possible cut-off point for identifying players at increased risk of injury.11 Given our SEM of 3–4 points, a player may be considered at risk in one season and not the next season without any injury or intervention occurring. We therefore anticipate that the 9+ test will have limited value in predicting injury. Practitioners should therefore exercise caution using a 67% cut-off value when interpreting the 9+ total score as an injury screening tool. Further studies are needed to confirm (or refute) the predictive validity of the 9+ test.
A major strength of this study was that it was undertaken in a real clinical athlete screening setting with a large group of professional male football players in one sports medicine hospital. A further strength of our study was the use of multiple testers. This provides good generalisability, but also might have influenced the intertester reliability adversely.
Limitations include that we did not record any prevention interventions occurring between the two test occasions. Also, this study was performed in a multinational and multilanguage setting. Although most of our testers spoke the same language as the players and we used pictures of the tests as described by Frohm et al,11 it is possible that players did not understand the instructions given. This may have influenced the variability in the player performance of the 9+ score.30 Finally, our study participants consisted of a homogeneous group of professional male football players in a specific setting which limits the generalisability of the findings to other sports, settings, age groups or women.
There was a substantial intraindividual variability of the 9+ total score between two consecutive seasons, irrespective of injury and severity status. A change above 8 points between seasons is necessary to represent a real change in the 9+ test. Additionally, there was a small but systematic improvement from one season to the next across all injured and uninjured groups.
What are the findings?
There was a substantial intraindividual variability of the Nine Plus screening battery test (9+) total score between two consecutive seasons, irrespective of injury and severity status.
A change above 8 points is necessary to represent a real change in the 9+ test between seasons, irrespective of injury.
There was a small but systematic improvement in the 9+ total score among injured and uninjured players.
How might it impact on clinical practice in the future?
Practitioners should consider the large intraindividual variability of the 9+, and the high minimal detectable change necessary to represent a real change in the 9+ test (irrespective of injury), when interpreting the 9+ total score or similar functional movement screen scores between seasons.
The authors would like to sincerely thank all the Aspetar staff involved in this study, especially the physiotherapists at the Rehabilitation Department and the Qatar National Sports Medicine Program (NSMP) who participated in the numerous 9+ functional movement assessments.
Twitter Follow Arnhild Bakken at @phbakken
Contributors AB designed the study, contributed in data collection, analysed and interpreted the data, and drafted the article. RB designed the study, interpreted the data, revised the article and approved the final revision of the article. AF and RW contributed in data analysis, interpreted the data, revised the article and approved the final revision of the article. KMK, ST, TB, CE, JLT and EW interpreted the data, revised the article and approved the final revision of the article.
Competing interests KMK is Editor in Chief of BJSM and was at arm's length (and blinded) from the review process in BJSM.
Ethics approval The study has been reviewed and approved by the Institutional Review Board, Anti-Doping Laboratory Qatar (ADLQ), Doha, Qatar.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.