Aim This paper aims to systematically review studies investigating the strength of association between FMS composite scores and subsequent risk of injury, taking into account both methodological quality and clinical and methodological diversity.
Design Systematic review with meta-analysis.
Data sources A systematic search of electronic databases was conducted for the period between their inception and 3 March 2016 using PubMed, Medline, Google Scholar, Scopus, Academic Search Complete, AMED (Allied and Complementary Medicine Database), CINAHL (Cumulative Index to Nursing and Allied Health Literature), Health Source and SPORTDiscus.
Eligibility criteria for selecting studies Inclusion criteria: (1) English language, (2) observational prospective cohort design, (3) original and peer-reviewed data, (4) composite FMS score, used to define exposure and non-exposure groups and (5) musculoskeletal injury, reported as the outcome. Exclusion criteria: (1) data reported in conference abstracts or non-peer-reviewed literature, including theses, and (2) studies employing cross-sectional or retrospective study designs.
Results 24 studies were appraised using the Quality of Cohort Studies assessment tool. In male military personnel, there was ‘strong’ evidence that the strength of association between FMS composite score (cut-point ≤14/21) and subsequent injury was ‘small’ (pooled risk ratio=1.47, 95% CI 1.22 to 1.77, p<0.0001, I2=57%). There was ‘moderate’ evidence to recommend against the use of FMS composite score as an injury prediction test in football (soccer). For other populations (including American football, college athletes, basketball, ice hockey, running, police and firefighters), the evidence was ‘limited’ or ‘conflicting’.
Conclusion The strength of association between FMS composite scores and subsequent injury does not support its use as an injury prediction tool.
Trial registration number PROSPERO registration number CRD42015025575.
- Sporting injuries
- Functional movement screen
- Evidence -based review
- Injury prevention
Statistics from Altmetric.com
Loss of participation due to injury threatens the health benefits of physical activity,1 and impedes competitive success for individuals and teams,2 3 andare associated with socioeconomic costs and health burden.4 5 Screening tests that might identify modifiable intrinsic risk factors for musculoskeletal injury are appealing to applied practitioners working in sport and exercise medicine.
Recently, several performance-based6 and movement-competency-based tests7–12 for the purpose of identifying deficits in neuromuscular ability associated with elevated injury risk have been described. Of these, the Functional Movement Screen (FMS) is a movement-competency-based test in widespread clinical use13 14 and has also attracted considerable research attention.15 16 The FMS is a battery of seven movement tasks and three additional clearing tests, assessed by visual observation using standardised criteria.11 12 Recent systematic reviews report acceptable intra-rater and inter-rater reliability for composite FMS scores;15 17 however, other properties are less well established with the use of FMS as an injury prevention screening tool—a particular area of current debate.14
In a recent review, Bahr18 described three research steps in the development and validation of injury prevention screening programmes. Step 1 involves conducting prospective cohort studies to establish the strength of association between a putative risk factor and subsequent injury. Step 2 involves validation of screening test properties, and Step 3 prescribes the use of controlled studies to investigate effectiveness. Since Kiesel et al’s seminal ‘injury prediction’ study of American football players in 2007,19 many studies have investigated the relationship between dichotomised FMS composite score and injury across a variety of sports and occupational settings.
Two systematic reviews have attempted to synthesise this literature.16 20 Dorell et al20 included seven prospective cohort studies in their 2015 review, while Bonazza et al’s16 2016 review included nine prospective studies but did not assess individual studies for risk of bias, instead pooling all studies, regardless of quality. Moreover, both previous reviews aggregated data from studies with diverse participant ages, sex, occupation and sports settings and injury definitions, which may bias the conclusions or limit their interpretation.21 The conclusions of Bonazza et al16 support the injury predictive value of FMS; however, this conflicts with the earlier review of Dorell et al,20 who concluded that the diagnostic accuracy of the FMS to predict injury was low.
Because of the emergence of several new prospective cohort studies and the specific weaknesses in the methodological approach of previous reviews,16 20 we systematically and comprehensively reviewed studies investigating the strength of association between FMS composite scores and subsequent risk of injury. We considered both methodological quality and clinical and methodological diversity.
A systematic review with meta-analysis was undertaken and reported based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement22 and MOOSE (Meta-Analysis of Observational Studies in Epidemiology) proposal for reporting.23 The study was prospectively registered with PROSPERO (CRD42015025575).
The search strategy was developed in consultation with a specialist librarian. Databases were searched from inception, and the final search was undertaken on 3 March 2016. Two reviewers (RM and JM) independently undertook initial database search and screened search results for relevance using the article title and abstract (table 1). A composite list of all articles identified by each reviewer that included the term ‘functional movement screen*” in the title or abstract was saved using reference management software, and duplicate database results were removed. Subsequently, two reviewers (RM and AS) independently screened the titles and abstracts of all articles identified in the search results. On the basis of the title and abstract information, full-text articles were retrieved for any article judged by at least one reviewer to be investigating the association between FMS score and injury (figure 1). The reference lists of retrieved articles were hand-searched for additional records, and a search of the citation history of selected articles was undertaken using Scopus (Elsevier, B.V.).
Eligibility for inclusion in the review was independently assessed by two reviewers (RM and JM) after considering full-text articles and applying the following selection criteria. Inclusion criteria were the following: (1) the language used was English; (2) the study was an observational prospective cohort design; 3) the study reported original and peer-reviewed data; 4) composite FMS score was used to define exposure and non-exposure groups and 5) musculoskeletal injury was reported as the outcome. Exclusion criteria were as follows: (1) data reported in conference abstracts24 or non-peer-reviewed literature including theses and (2) studies employing cross-sectional or retrospective study designs. Differences between reviewers regarding selection eligibility were resolved by majority decision after a third reviewer (AGS) considered the full-text records and applied the selection criteria. Study characteristics were independently extracted from each article by two reviewers (RWM and JM), who subsequently met to cross-check extracted information against the original articles.
Risk of bias
An assessment of methodological quality for the selected studies was undertaken using the ‘Quality of Cohort Studies’ (Q-Coh), a tool with acceptable validity and reliability specifically developed to assess risk of bias in prospective cohort studies.25 Risk of bias was assessed across six domains: sample representativeness, comparability of groups, exposure measure, maintenance of comparability, outcome measures and attrition. Before commencing assessment, operational definitions for interpreting Q-Coh items in the context of the topic were developed and agreed by the reviewers. Two reviewers (RWM and JM) independently appraised each study before meeting to compare findings. Disagreements in the assessment of Q-Coh items between reviewers were resolved by consensus, and a third reviewer (AGS) was available to make a final decision, if necessary. Descriptors for the overall quality of each article were based on the study by Jarde et al25 and defined as ‘good’ when ≤1 domain was not satisfied, ‘acceptable’ if 2 domains were not satisfied and ‘low’ when >2 domains were not satisfied.
Data analysis and synthesis
When reported, we used dichotomised FMS composite scores based on the cut-points, as defined in each study. Meta-analysis was attempted when there were at least two studies of ‘good’ or ‘acceptable’ methodological quality, and studies shared low methodological and clinical diversity with a sufficiently similar design, cohort characteristics (age, sex and occupation/sport) and injury definitions (see online supplementary table S1). A random-effects model, accounting for both within-study and between-study variance, was used, because it was assumed that the true effect would vary between studies.26 Statistical heterogeneity was explored using Cochrane χ² (Cochrane Q), with the statistical significance set at p<0.1. Heterogeneity was quantified using the I2 statistic and interpreted using the guidelines suggested in the Cochrane Handbook, with 0%–25% indicating that heterogeneity ‘might not be important’, 30%–60% as ‘moderate’, 50%–90% as ‘substantial’ and 75%–100% as ‘considerable’ heterogeneity.27 Review Manager (RevMan) v5.3 (The Nordic Cochrane Centre, The Cochrane Collaboration, Copenhagen, 2014) was used to undertake meta-analysis calculations.
When meta-analysis was not appropriate, a qualitative best evidence synthesis was undertaken.28 Consistent with other recent systematic reviews,15 29 we drew conclusions about the overall quality of evidence, using criteria adapted from the study by van Tulder et al30 (table 2). For the best evidence synthesis, we operationally defined the ‘smallest worthwhile effect’ based on the lower limit of the CI for RR ≥1.131 or OR ≥1.5. These thresholds equate to ‘small’ magnitudes of effect.32 If measures of association (RR and OR) derived from dichotomised composite scores were not reported, but instead a significance test for differences in the composite FMS score between injured and non-injured participants was reported as a continuous variable, we interpreted no statistical difference (where p<0.05) as evidence for the absence of an effect. Similarly, we operationally defined the smallest worthwhile effect for an area under a receiver operating curve as 0.733 and a likelihood ratio of ≥2, which equates to a change in post-test odds of ~15%.34
Systematic database search identified 122 potential studies, which, based on the title and abstract information, appeared likely to be investigating the strength of association between FMS score and injury (figure 1). Following the removal of duplicate records and assessment of full-text articles for eligibility, 24 articles were accepted for risk of bias assessment. Two studies35 36 reported results from the same data set; therefore, findings from these studies were considered concurrently in decisions about the overall quality of evidence. The characteristics of appraised studies are shown in table 3.
Risk of bias assessment
Reviewers achieved initial agreement on 117 of 144 (81.3%) possible Q-Coh domains (κ=0.62, 95% CI 0.49 to 0.75) and achieved consensus on the remaining domains after discussion and consideration of the operational definitions. Of the 24 studies reviewed, the quality of 16 was assessed as ‘low’, 2 studies as ‘acceptable’ and 6 as ‘good’ (table 4). Figure 2 displays the proportion of studies satisfying each Q-Coh domain.
Of the eight studies appraised as being of ‘good’ or ‘acceptable’ quality, four studies involved military/police personnel and four studies were of participants in sport. Military personnel are required to complete very different physical tasks than those typically involved in sport,37 and both military and police personnel are also exposed to higher biomechanical loads associated with body-borne tactical equipment.37–39 Thus, given the differences in task requirements and operating environment between military personnel and athletes, for the purpose of meta-analysis, two subgroups of studies were identified (‘Sport’ and ‘Military/Police’). The ‘Sport’ subgroup consisted of three studies reporting on single competitive sporting codes, including football (soccer),40 running41 and American football,42 and one study of mixed codes.43 The ‘Military/Police’ subgroup comprised four studies and included elite task force police44 and military cohorts, including infantry,45 Marine Corps46 and Coast Guard.47 There were insufficient similarities in clinical (age, sex and sport) and methodological diversity (injury definition) to conduct meta-analysis of studies in the ‘Sport’ subgroup; however, there were three studies of military cohorts with sufficient similarity to conduct meta-analysis in the ‘Military/Police subgroup (see online supplementary table S1). Data from the female cohort of Coast Guard cadets47 were not pooled with data from the male cohort in the meta-analysis on the basis that injury risk, rate and characteristics may differ between men and women.48 Meta-analysis using a random-effects model for the strength of association (RR) between dichotomised FMS composite score (cut-point 14 out of 21) and subsequent musculoskeletal injury resulted in a pooled RR=1.47 (95% CI 1.22 to 1.77, p<0.0001) and was associated with ‘moderate’ statistical heterogeneity; see figure 3.
Best evidence synthesis
Results of the best evidence synthesis are displayed in table 5. Because of the low number of studies, the level of evidence was ‘limited’ for police, firefighters, female military, middle-distance and long-distance running, ice hockey, basketball and multiple high-school sports. There was ‘conflicting’ evidence for American football based on one good-quality study that is not in favour of an association that exceeds the smallest worthwhile effect and two low-quality studies in favour of at least a ‘small’ effect. Considering collegiate-level athletes in a variety of sports, there was ‘conflicting’ evidence based on one good-quality study and two low-quality studies not in favour an association and two low-quality studies in favour of an association that exceeds the smallest worthwhile effect. For football (soccer), there was ‘moderate’ evidence not in favour of an association based on consistent findings in one good-quality study and three low-quality studies. For male military personnel, there was ‘strong’ evidence in favour of an association that was ‘small’ in magnitude31 32 based on three good-quality studies using the pooled effect from meta-analysis (figure 3).
Our findings indicate that the strength of association between FMS composite scores and injury is not sufficient to support use as an injury prediction tool. With the exception of male military personnel, where there was ‘strong’ evidence of a small association, the overall level of evidence was ‘limited’ or ‘conflicting’ for a wide range of athletic populations, including running, ice hockey, collegiate and high school sport and professional or collegiate American football. In football (soccer), the magnitude of effect was ‘unclear’, and there was ‘moderate’ evidence to recommend against the use of FMS composite scores for the purpose of injury prediction. Regardless of the level of evidence or the sport studied, the true magnitude of association for any population studied was not greater than ‘small’.
Approach to the problem: diagnostic accuracy or strength of association?
The utility of a diagnostic screening tool is predicated on the strength of association between the risk factor (ie, movement competency) and the outcome of interest (injury). If the strength of association is weak or unclear, then clinical utility will inevitably be poor; therefore, establishing the strength of association between risk factor and outcome in exploratory studies using prospective cohort designs is a fundamental first step.18 If well-controlled prospective cohort studies demonstrate sufficiently strong estimates of the strength of association between risk factor and outcome, then further studies designed to investigate diagnostic test properties (ie, likelihood ratios) can be undertaken.18
In reviewing existing studies investigating the relationship between FMS and subsequent injury, it is apparent that the literature does not discretely align into either exploratory studies or diagnostic utility studies. This presents a dilemma for the design of systematic reviews because primary studies were designed, analysed and reported using conventions of either observational cohort, diagnostic accuracy studies or combinations of both. Fundamentally, the quality of studies reporting diagnostic accuracy metrics in predicting sports injury from baseline predictors depend on the principles of robust prospective cohort design because in this context, the ‘reference test’ is an injury event that has not occurred at the time of administering the index test (FMS). This differs from the conventional application of diagnostic accuracy, where the reference and index test results are administered in close temporal proximity, and there is no need to control for potential confounding effects that arise when the index test (FMS) and reference ‘test’ (injury event) are separated by one or more sporting seasons. Therefore, rather than applying a diagnostic accuracy framework such as QUADAS (Quality Assessment of Diagnostic Accuracy Studies),49 we appraised all studies on the basis of the strength of association between FMS and subsequent injury using Q-Coh,25 an appraisal tool specifically designed to assess risk of bias in prospective observational cohort studies.
Comparison with other studies
Two recent systematic reviews that investigated the relationship between FMS composite scores and injury risk draw contradictory conclusions.16 20 Our findings align with those of Dorrel et al,20 who, based on critical appraisal of seven studies using a diagnostic accuracy framework (QUADAS), concluded that the diagnostic accuracy of the FMS to predict injury was low. Bonazza et al16 reported the findings of a systematic review and meta-analysis of nine studies for injury predictive value and conclude that composite scores ≤14/21 were associated with elevated odds of sustaining an injury (pooled OR=2.74, 95% CI 1.70 to 4.43).
In reconciling our findings with those of Bonazza et al,16 two important differences in methodological approach need to be considered. First, unlike Bonazza et al,16 who pooled results from all studies without consideration of clinical or methodological diversity, we systematically considered the appropriateness of pooling data in an attempt to avoid combining data from studies with obvious clinical diversity in terms of population characteristics (age, sex and sport/occupation) and injury definitions. The use of differing injury definitions between studies is a well-known confounder in sports injury prevention research;50 thus, for meta-analysis, we pooled only studies that used similar injury definitions. Similarly, we avoided pooling studies with marked differences in cohort characteristics, including sex, age and sport, on the basis that intrinsic injury risks are likely to differ by age, sex and exposure to different physical demands in different sports. Second, unlike Bonazza et al,16 who did not undertake appraisal of methodological quality and included all studies in their meta-analysis, we systematically assessed risk of bias for all eligible studies and incorporated methodological quality into decisions about the overall level of evidence.
Methodological issues in the studies reviewed
Consistent with a previous systematic review of rater reliability for FMS composite scores that noted poor quality of study reporting,15 we also observed deficits in reporting quality, with essential study characteristics such as participant age and loss to follow-up not reported in some studies. Several studies also lacked precision in reporting the duration of injury surveillance, which was often limited to descriptions such as ‘one season’. Despite the wide availability of consensus statements for injury definitions in many sports,51–55 several studies failed to adequately define injury.36 44 56 57 Such a fundamental omission is surprising, given that definition of injury is a critical and well-documented methodological issue in sports injury research and can impact on the interpretation of both individual studies and the synthesis of literature.50 58
When considering injury causation related to modifiable risk factors, the temporal relationship between a putative risk factor such as movement competency and injury occurrence needs to be considered. As the interval between baseline measurement and the time of injury extends, there may be greater exposure to confounding effects that are not controlled in the study design. This issue is less pertinent for shorter surveillance periods, such as a single preseason training period, but over the course of a full competitive season, the relationship between injury events and baseline risk factors is more vulnerable to confounding.
An inherent assumption in the design of many of the studies reviewed here is that the strength of the intrinsic risk factor (represented here by the FMS composite score) remains stable over time. However, this design does not account for changes in risk that may occur over time (both within and between participants) in response to factors such as training, competition and match exposure, subclinical adaptations to tissue loading and neuromuscular function. Although some studies addressed this issue (see table 4 ‘Maintenance of comparability’),41 43 45 46 59 not accounting for these potential confounding factors by either design or statistical analysis fails to address the recursive dynamic elements of injury aetiology described in classical60 and emerging aetiological models.61 Simply put, movement competency, as measured by FMS, may change over the course of a season such that, at the time of injury onset, the level of movement competence at the time of injury is different from that recorded at baseline, thus confounding the association. To address this issue, repeated administration of measures in injury prediction studies has been proposed,62 although to date, very few prospective injury prediction studies have undertaken repeated administration of measures for key predictor variables, and all of the studies reviewed here employed a single assessment of movement competency by FMS at baseline.
Previous work has demonstrated that FMS scores may change following the prescription of corrective exercise over a period of 463 to 8 weeks.64 For studies undertaking injury surveillance over shorter periods (eg, 6–10 weeks),46 65 the threat of bias arising from temporal instability of FMS scores is probably low. Given the potential for intrinsic risk factors to change in response to training and competition exposures, it seems prudent for investigators to carefully evaluate the potential for repeated administration, particularly where monitoring is planned over a prolonged period. Clearly, investigators need to make pragmatic decisions related to logistic and resource constraints, and repeated administration of measures for intrinsic risk factors may not be feasible, particularly when research is embedded within pre-existing clinical practice, as was the case in many of the studies reviewed here. Notwithstanding these practical constraints, investigators not able to account for confounding through design should at least acknowledge these limitations in discussion and consider the likely impact on study conclusions.66
Although employed in all studies reviewed here, the use of a single composite score is problematic from several perspectives. First, several studies indicate that the factor structure of the FMS battery is unlikely to be unidimensional; thus, interpretation of a single composite score may not be valid.67–71Second, the apparent research interest in FMS composite scores for injury risk is not commensurate with the minimal attention afforded to composite scores by FMS developers. Cook et al11 12 72 have largely focused on clinical interpretation based on 1) identification of pain associated with each subtest, 2) the presence of left–right asymmetrical scoring and 3) identification of poor movement competency on each subtest (as defined by a score of ‘1’ using the FMS scoring criteria). The FMS appears to have been conceived in an attempt to develop a standardised and systematic approach to assessing basic movement patterns, with a goal of informing clinical decision making based on the interpretation of each movement subtest in the context of other clinically relevant information.11 12 72 Notwithstanding the use of the word ‘screen’ in the test name, this use of the FMS battery contrasts markedly from ‘screening’ in the conventional description of preparticipation health screening.73
The now-substantial number of studies that have attempted to quantify the risk of future injury, based exclusively on the outcome of a single preparticipation administration of FMS, share two notable limitations. First, an unfortunately high number of studies reviewed here failed to accommodate existing multicausal models of injury aetiology in developing research hypotheses. The premise that a single preseason administration of a field-based test of one intrinsic risk factor (movement competency) is likely to have good utility as a predictor of future injury may constitute causal oversimplification. This is especially apparent when considered in light of emerging injury aetiology models employing complex systems approaches.74 75Second, in so far as the FMS battery might provide possible injury predictor variables for inclusion in multivariate or complex prediction models, there are several possible categorical indices that may be derived from the FMS, in addition to composite score, that have attracted only sparse research attention to date.76 For example, indices of pain provocation (eg, proportion of movement subtests on which pain was reported) on active movement subtests,76 77 scoring discrepancies between left and right or indices representing patterns (ie, specific subtests) of poor movement competency could be explored further as possible predictor variables. This work could commence at an exploratory level through secondary analysis of existing data sets from studies of good methodological quality.
There was ‘moderate’ evidence to recommend against the use of FMS composite scores as an injury prediction test in football (soccer). For other sports studied (table 5), the evidence was ‘limited’ or ‘conflicting’. In male military personnel, there was ‘strong’ evidence that the strength of association between composite score and subsequent injury is ‘small’. The findings of this study should be interpreted in accord with the scope of the review, which relates only to the strength of association between FMS composite score and subsequent injury. Beyond injury prediction, the use of FMS as a standardised movement test battery that can be reliably administered in the field by practitioners with limited previous experience15 17 may usefully inform applied practice if test limitations are acknowledged and findings are interpreted judiciously alongside other relevant clinical information.78 79
Given the complexity of injury aetiology, investigators who seek to model the risk of future injury should apply multivariate analysis and predictor variables such as ‘movement competency’ (or similarly named constructs) need to be justified from a stronger theoretical basis. The theoretical construct addressed by the FMS, labelled as both ‘movement competency’11 or ‘movement quality’,80 81 has undergone limited scholarly development, and its relationship with similar conceptual constructs, such as physical literacy, requires explication.82
It is possible that other studies satisfying the eligibility criteria exist but were not identified. We consider this to be unlikely, and in order to substantially impact on conclusions regarding the level of evidence for various sports reported here, there would need to exist multiple, unidentified high-quality studies with consistent findings. The exclusion of grey literature from systematic reviews can raise the risk of publication bias, although studies reviewed here included both positive and negative findings, indicating this risk was probably minimal. The methodological appraisal of studies in this review was conducted using the Q-Coh, a new tool not yet in widespread use but developed specifically for application to prospective observational cohort studies in response to limitations identified in other tools.25 66 The selection of critical appraisal tools in systematic reviews may impact on review conclusions;83 84 however, based on the weak magnitude of association reported in eligible studies here, we consider it unlikely that differences in quality appraisal attributable to the use of a different appraisal tool would substantially impact the overall conclusions.
In summary, the level of evidence for the strength of association between FMS composite scores and subsequent injury is not sufficient to support the use of FMS composite score as an injury prediction tool.
What is already known?
The Functional Movement Screen (FMS) is widely used by clinicians as part ofpre-participation evaluation.
Systematic reviews report acceptable intra-rater and inter-rater reliability for composite FMS scores, but what are its other clinimetric properties?
What are the new findings?
The strength of association between FMS composite scores and subsequent injury was not sufficient to recommend use as an injury prediction tool in the sports reviewed.
In male military personnel, there was ‘strong’ evidence that the strength of association between composite score (cut-point ≤14/21) and subsequent injury was ‘small’.
There was ‘moderate’ evidence to recommend against the use of FMS composite scores as an injury prediction test in football (soccer).
The authors thank Cathy O’Brien for her assistance with developing the database search strategy.
Contributors RM conceived the idea for the study. RM and JM undertook the literature search, and RM and AGS screened search results. RM and JM determined eligibility for inclusion and appraised the articles. AS took the final decision on appraisal decisions when not agreed by RM and JM. RM drafted the manuscript, and AS and JS reviewed it critically for intellectual content. All authors approved the final version. RM submitted the article.
Competing interests None declared.
Ethics approval Exempt
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement All data supporting this study are provided as supplementary information accompanying this paper.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.