Article Text

Measurement properties for muscle strength tests following anterior cruciate ligament and/or meniscus injury: What tests to use and where do we need to go? A systematic review with meta-analyses for the OPTIKNEE consensus
  1. Anouk P Urhausen1,
  2. Bjørnar Berg2,3,
  3. Britt Elin Øiestad3,4,
  4. Jackie L Whittaker5,6,
  5. Adam G Culvenor7,
  6. Kay M Crossley7,
  7. Carsten B Juhl8,9,
  8. May Arna Risberg1,2
  1. 1 Department of Sports Medicine, Norwegian School of Sport Sciences, Oslo, Norway
  2. 2 Division of Orthopaedic Surgery, Oslo University Hospital, Oslo, Norway
  3. 3 Centre for Intelligent Musculoskeletal Health, Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway
  4. 4 Department of Rehabilitation Science and Health Technology, Oslo Metropolitan University, Oslo, Norway
  5. 5 Department of Physical Therapy, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
  6. 6 Arthritis Research Canada, Vancouver, British Columbia, Canada
  7. 7 La Trobe Sport and Exercise Medicine Research Centre, School of Allied Health, Human Services and Sport, La Trobe University, Bundoora, Victoria, Australia
  8. 8 Department of Physiotherapy and Occupational Therapy, Copenhagen University Hospital, Herlev and Gentofte, Copenhagen, Denmark
  9. 9 Department of Sports Science and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark
  1. Correspondence to Anouk P Urhausen, Department of Sports Medicine, Norwegian School of Sport Sciences, Oslo 0806, Norway; anouku{at}nih.no

Abstract

Objectives Critically appraise and summarise the measurement properties of knee muscle strength tests after anterior cruciate ligament (ACL) and/or meniscus injury using the COnsensus-based Standards for the selection of health Measurement INstruments Risk of Bias checklist.

Design Systematic review with meta-analyses. The modified Grading of Recommendations Assessment, Development and Evaluation-guided assessment of evidence quality.

Data sources Medline, Embase, CINAHL and SPORTSDiscus searched from inception to 5 May 2022.

Eligibility criteria for selecting studies Studies evaluating knee extensor or flexor strength test reliability, measurement error, validity, responsiveness or interpretability in individuals with ACL and/or meniscus injuries with a mean injury age of ≤30 years.

Results Thirty-six studies were included involving 31 different muscle strength tests (mode and equipment) in individuals following an ACL injury and/or an isolated meniscus injury. Strength tests were assessed for reliability (n=8), measurement error (n=7), construct validity (n=27) and criterion validity (n=7). Isokinetic concentric extensor and flexor strength tests were the best rated with sufficient intrarater reliability (very low evidence quality) and construct validity (moderate evidence quality). Isotonic extensor and flexor strength tests showed sufficient criterion validity, while isometric extensor strength tests had insufficient construct and criterion validity (high evidence quality).

Conclusion Knee extensor and flexor strength tests of individuals with ACL and/or meniscus injury lack evidence supporting their measurement properties. There is an urgent need for high-quality studies on these measurement properties. Until then, isokinetic concentric strength tests are most recommended, with isotonic strength tests a good alternative.

  • anterior cruciate ligament
  • meniscus
  • knee
  • reliability
  • validity

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

What is already known?

  • As knee extensor and flexor strength deficits are common following anterior cruciate ligament (ACL) and/or meniscus injuries, muscle strength testing is an important component of a clinical examination.

  • Isokinetic computerised dynamometry is considered the gold standard to assess strength, yet handheld dynamometry (HHD) and conventional weight machines are more often available in clinical settings.

  • Evidence synthesis for knee extensor and flexor strength tests after ACL and/or meniscus injuries is lacking making it difficult to identify which test should be used in clinical settings and knowledge gaps about their measurement properties to be addressed in future high-quality studies.

  • There is a lack of consensus about which strength tests (modes, equipment and variables reported) have the best measurement properties and are most clinically applicable.

What are the new findings?

  • Studies evaluating measurement properties for different muscle strength tests following ACL and/or meniscus injuries include a large variety of modes, equipment, and variables reported, and high-quality studies on measurement properties are scarce.

  • Isokinetic concentric strength tests are currently the most recommended to assess strength deficits in individuals with an ACL injury, displaying sufficient intrarater reliability, construct validity and criterion validity.

  • Conventional isotonic weight machines testing one-repetition maximum (1RM) might be a good alternative to computerised isokinetic dynamometry when assessing knee extensor or flexor strength in a clinical setting.

  • Isometric strength tests using HHD offer sufficient intrarater reliability when consecutive contractions within one session are performed in a standardised seated position. The outcomes from isometric strength tests should not be used interchangeable with outcomes from isokinetic concentric strength tests.

Introduction

Muscle weakness and dysfunction are a common concern following anterior cruciate ligament (ACL) and/or meniscus injury and can persist for many years.1–4 Muscle strength deficits are accompanied with neural and morphological alterations and changes in muscle control including timing of muscle activation.3 Knee extensor and flexor muscle strength are important parts of an individual’s functional capacity, contributing significantly to lower limb biomechanics, performance and daily life activities.5 Reduced knee extensor strength is also an important modifiable risk factor for development of knee osteoarthritis.6 7 Assessing and monitoring knee extensor and flexor muscle strength is, therefore, important in individuals following ACL and/or meniscus injuries.8

Knee extensor and flexor muscle strength tests are widely used to evaluate treatment effects and to quantify and monitor changes after surgery and rehabilitation.9 The psychometric measurement properties of strength tests are essential to interpret test results for clinical practice and research purposes. Applying strength tests with insufficient measurement properties can increase risk of biased outcomes. Currently, we lack information about the reliability, validity and responsiveness of these tests to guide clinical decisions and future research.10

Isokinetic muscle strength tests have been used for decades to assess knee extensor and flexor muscle strength in athletic and non-athletic populations.11 12 Hence, isokinetic dynamometry is considered the gold standard to assess strength impairments after ACL and/or meniscus injuries.13 14 Handheld dynamometry (HHD) provides a clinically applicable and less costly alternative for isometric testing. Conventional weight machines assessing isotonic strength are another cost-effective and user-friendly alternative to high-cost isokinetic machines particularly popular with clinicians.15 16 Many different types of muscle strength tests are used without any consensus on type, including modes, equipment or reported variables.

The COnsensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) initiative developed a consensus statement on definitions of measurement properties17 and established guidelines18 19 to conduct systematic reviews of various types of measurement instruments. To our knowledge, there is a lack of evidence synthesis on quality of evidence of measurement properties for knee muscle strength tests in individuals who have had an ACL and/or meniscus injuries to inform researchers and clinicians. The aim of this systematic review was to critically appraise and summarise the measurement properties of knee muscle strength tests after ACL and/or meniscus injury using the COSMIN Risk of Bias checklist. This systematic review is one of several systematic reviews aimed at developing evidence-based consensus recommendations for rehabilitation to optimise musculoskeletal health and prevent post-traumatic osteoarthritis following knee trauma (OPTIKNEE; https://bit.ly/OPTIKNEE).

Materials and methods

The current systematic review was conducted using the COSMIN guidelines18 19 and reported according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses guidelines.20 The study protocol was registered on the Open Science Framework (https://osf.io/bkhr5/).

Eligibility criteria

Studies were eligible for inclusion when meeting the following four criteria, in line with the COSMIN guidelines21 22:

  1. Population: individuals diagnosed with an isolated ACL tear, isolated meniscus injury or an ACL tear with concomitant meniscus injury with a mean injury age ≤30 years, regardless of study settings.

  2. Construct: the test was a measure of knee extensor or flexor strength, defined as ‘function related to the force generated by the contraction of a muscle or muscle groups’ based on the International Classification of Functioning, Disability and Health (ICF)23 function domain.

  3. Instrument: the instrument was a standardised muscle strength test. Studies were excluded if the strength test was used just as an outcome measure, to validate another instrument, or to predict or discriminate outcomes (ie, predictive and known-group validity). Studies investigating a multidomain test battery not assessing strength separately were excluded.

  4. Measurement properties: the study reported at least one measurement property (ie, reliability, internal consistency, measurement error, content validity, construct validity, criterion validity, responsiveness) or interpretability17 of a knee extensor or flexor strength test.

Studies were excluded if they were not available in full-text, published in languages other than English, or published prior to year 2000. The rationale for restrictions on publication year was to exclude studies describing outdated strength tests or evaluating ACL-injured individuals using interventions no longer used in clinical practices. Studies testing a hypothesis for construct validity were included only if construct validity was a primary objective, and if the comparator instrument measured neural activity (notably spinal or corticomotor excitability as per body structure related to movement in the ICF domain)23 or dynamic and/or knee-specific functional performance (most often activities in the ICF domain).23 24

Study selection and data extraction

The search strategy was developed in collaboration with a senior librarian (University of Oslo, library of medicine and science), who performed the search on 16 June 2020 and repeated on 7 July 2021 and on 5 May 2022. The electronic databases searched were MEDLINE (Ovid), EMBASE (Ovid), CINAHL (EBSCO) and SPORTDiscus (EBSCO). We adapted a highly sensitive validated search filter for identifying studies on measurement properties (online supplemental appendix 1).21 Exclusion filters were used to exclude randomised controlled trials or systematic reviews; no filters were set on language or publication date (studies in non-English or published prior to 2000 were excluded through screening). The reference lists of the included full-text articles were hand searched by two authors (APU and BB), independently.

Supplemental material

Identified publications were imported into EndNote software (X V.9.3.3, Clarivate Analytics) and duplicates removed. Two authors (APU and BB) independently screened titles and abstracts using the Rayyan application,25 then reviewed the full texts for eligibility. Two senior authors (BEØ and MAR) were consulted as needed to resolve discrepancies by consensus.

Data were extracted by two authors independently (APU and BB) using a data extraction form. The following data items were included: (1) patient sample characteristics, (2) equipment, (3) test setting and quantification measure and (4) measurement properties.

Risk of bias assessment

Two authors (APU and BB) independently assessed included studies for methodological quality using the COSMIN checklist26 and the extended version19 for studies on reliability and measurement error. Each study evaluating a measurement property was rated on a 4-point scale (very good, adequate, doubtful and inadequate) based on standards specific to each measurement property. The overall rating was determined using the ‘worst score counts’ principle.19 26 Studies on construct validity assessing multiple comparator instruments were rated separately for each instrument; if the methodological quality was identical for all instruments, the study received one conjunct risk of bias rating.22

Data synthesis and analysis

To assess the quality of evidence for the measurement properties, the result of each single study was rated as either sufficient (+), indeterminate (?) or insufficient (−) according to the criteria for good measurement properties (table 1).18 To obtain an overall result for each strength test by measurement property, the results of single studies were quantitatively pooled, if possible, or qualitatively summarised.22 Pooled results were rated as sufficient (+), insufficient (−), indeterminate (?) or inconsistent (±) according to the same criteria for good measurement properties.18 Summarised results were rated as sufficient (or insufficient) if at least 75% of the results were sufficient (or insufficient).18 If the results for a study were all indeterminate, the overall rating was indeterminate. If less than 75% of the results were sufficient or insufficient, the results were inconsistent. Explanations for inconsistency were explored by categorising subgroups based on comparable characteristics (eg, timepoint after injury or surgery).

Table 1

Criteria for rating good measurement properties

For construct validity, we considered neural activity, functional performance tests and four knee-specific patient-reported outcomes27–30 as comparator instruments. Only studies documenting Pearson or Spearman correlation coefficients were included in construct validity data synthesis.22 A priori hypotheses about the expected relationship between the strength tests and comparator instruments were formulated based on the generic hypotheses suggested by the COSMIN initiative, experiences of the review team and previous literature (online supplemental appendix 2).22 31 We expected correlations ≥0.50 (knee extensors) and ≥0.40 (knee flexors) with neural activity and jump or hop tests, correlations 0.30–0.50 with running tests, dynamic balance tests and patient-reported outcomes. The rationale was that (1) neural activity is an essential attribute of muscle strength,3 (2) jump or hop tests are highly demanding on strength, thus likely to measure similar constructs to strength tests32 33 and, finally, (3) running tests, dynamic balance tests and patient-reported outcomes being multidimensional measure related but dissimilar constructs.1 34 To assess criterion validity, we designated isokinetic concentric strength tests at 60°/s using computerised dynamometry as the gold standard.13 15 Results regarding isokinetic speed were presented as slow speed (ie, 60°/s, 90°/s and 120°/s) and high speed (ie, 180°/s and 300°/s).35

Quantitative synthesis

Meta-analyses and forestplots were performed in R V.4.1.0 (R Core Team, Austria) using the meta and forestplot packages. We grouped the comparator instruments used for construct validity into five categories: neural activity, hop tests, running tests, dynamic balance tests and patient-related outcomes. To perform meta-analyses for construct validity, the correlation coefficients were qualitatively pooled if two or more results were available for one comparator instrument category. Dependent aggregate effect sizes were calculated for studies providing multiple outcomes, that is, testing strength at various speeds or different timepoints or performing several tests for one comparator instrument.36 37 We calculated weighted mean Pearson correlation coefficients and 95% CIs. The analysis was performed using Fisher’s Z-scores, which were back-transformed to the pooled correlation coefficients.38 We used a random effect model to account for between-study heterogeneity. The I2 statistic was calculated to quantify the dispersion in the pooled estimates.39 I2 benchmarks were set as not important (0–40%), moderate (30–60%), substantial (50–90%) and considerable (75–100%) heterogeneity.40

Qualitative synthesis

If quantitative pooling was not possible, the results were qualitatively summarised. For reliability, the point estimate of the intraclass correlation coefficient was used to rate reliability.41 Potential sources of variation were extracted from the studies on reliability and measurement error; comprehensive research questions were formulated to inform on the quality of the results.41 The number of confirmed hypotheses were counted across studies to evaluate construct validity. The correlation coefficient was used to rate criterion validity.41

Grading the quality of evidence

The evidence quality was graded for each measurement property by strength test. As recommended by the COSMIN guidelines,18 we applied the modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) to grade the quality of evidence as high, moderate, low or very low. Four factors were considered to grade the quality of evidence: (1) risk of bias (ie, methodological quality of studies using the COSMIN checklist), (2) inconsistency (ie, unexplained inconsistency of results), (3) imprecision (ie, total sample size of available studies) and (4) indirectness (ie, evidence from different populations).18 Publication bias was not assessed.18 The quality of evidence was downgraded when there were concerns regarding one of the four factors. No grading was given in cases where the overall result was indeterminate or inconsistent without explanation for inconsistency.18

Protocol deviation

We initially intended to search the CENTRAL database. When finalising the search strategy, we decided to exclude systematic reviews and randomised trials to reduce the number of irrelevant search results. Accordingly, the CENTRAL database was not searched.

Results

Study selection and characteristics

The searches resulted in 3533 studies after removal of duplicates (figure 1). After screening 110 full texts, 74 studies were excluded (online supplemental appendix 3), of which six were published prior to 2000. Five42–46 of those six studies investigated construct validity and one47 reliability and measurement error. Finally, we included 36 studies.

Figure 1

PRISMA flow diagram of study selection. ACL, anterior cruciate ligament; PRISMA, Preferred Reporting Items for Systematic reviews and Meta-Analyses.

Twenty-eight studies included ACL-reconstructed individuals,48–75 three assessed ACL-injured individuals,76–78 four79–82 involved individuals before and after ACL reconstruction and one study83 assessed individuals with an isolated partial meniscectomy. Further details about the included studies and strength tests are provided in table 2.

Table 2

Characteristics of included studies

Thirty-one different modes and equipment of strength tests were evaluated; all of them tested extensor strength while 20 also tested flexor strength (table 2). Isokinetic concentric and eccentric strength were consistently assessed by computerised dynamometry at five different speeds. Isometric strength was assessed at five different angles, either tested manually with HHD48 74 78 or by computerised dynamometry.51 52 55 60 62 66 79 Isotonic strength was either tested during one-repetition maximum (1RM) on leg extension or prone leg curl machines70 or in a leg press machine testing 70% of 1RM.69 All strength tests but two74 78 were performed in a seated position. An overview of the modes, equipment and variables reported of the strength tests is provided in online supplemental appendix 4.

Data synthesis

The included studies investigated four measurement properties: reliability,48 62 71 74 measurement error,48 62 71 construct validity (hypothesis testing)49–61 63 64 66–69 71–73 75–84 and criterion (concurrent) validity.48 69 70 74 None of the included studies investigated responsiveness or interpretability.

Reliability

Four studies investigated intrarater reliability in individuals with an ACL reconstruction (table 3).48 62 71 74 Isokinetic concentric extensor strength tests recorded using computerised dynamometry at 60°/s showed sufficient intrarater reliability.62 71 The variables reported were mean peak torque62 and limb symmetry index (LSI)71 of five maximal contractions, collected throughout two test sessions. Rater experience was not specified. Based on two studies, isometric extensor strength tests with HHD showed sufficient intrarater reliability.48 74 Both studies specified that raters were experienced in using HHD. Maximal isometric contractions were measured at 90° of knee flexion for 5 s. The variables reported were mean normalised peak torque48 and mean LSI,74 collected from two48 or three74 consecutive contractions within one test session (online supplemental appendix 5). Isometric extensor strength tests in seated and prone position using HHD had insufficient interrater reliability.74 Two experienced raters tested three consecutive contractions preserving the same test setting. The variables reported were LSI.

Table 3

COSMIN methodological quality ratings, result ratings and quality of evidence for reliability and measurement error per strength test

Measurement error

Three studies,48 62 71 including ACL-reconstructed individuals only, investigated measurement error (table 3). Isokinetic concentric extensor and flexor,62 71 isometric extensor48 and alternating consecutive isometric extensor and flexor62 strength tests showed indeterminate rating, due to the fact that the minimal (clinical) important change (MIC) value was not described.41

Construct validity

Thirty-two studies assessed hypothesis testing, including ACL-reconstructed or ACL-injured individuals or individuals after isolated meniscectomy (one study). All but one78 study exploring construct validity assessed strength using computerised dynamometry. Correlation coefficients for isokinetic concentric and isometric strength tests were quantitatively pooled. Isokinetic concentric extensor and flexor (high-speed) strength tests and isometric extensor strength tests were rated sufficient, based on strong to moderate correlations with hop tests, running tests and patient-reported outcomes (table 4). Isokinetic concentric slow-speed and isometric flexor strength tests were rated insufficient. Forest plots are displayed in online supplemental appendix 6. Methodological quality and individual study results are presented in online supplemental appendix 7.

Table 4

Meta-analyses of strength tests for construct validity

Correlations between isokinetic eccentric and isotonic strength tests were summarised in a qualitative synthesis. Four of the seven strength tests were rated sufficient. These tests were qualitatively summarised showing moderate correlations with patient-reported outcomes: isokinetic eccentric slow-speed and high-speed extensor and flexor strength tests. Isometric extensor and flexor strength tests using HHD78 and isotonic extensor strength tested on a leg press displayed insufficient rating.69 Methodological quality and individual study results are presented in online supplemental appendix 8.

Criterion validity

Four studies explored criterion validity including ACL-reconstructed individuals only.48 69 70 74 Isokinetic concentric high-speed extensor strength tests had sufficient rating (r=0.83 at 180°/s and r=0.82 at 300°/s for mean normalised peak torque).69 Similarly, isotonic extensor and flexor strength tests were rated sufficient using 1RM in a seated leg extension machine and in a prone leg curl machine (r=0.91 and r=0.80 for absolute peak torque, respectively).70 Seated isometric extensor strength tests using HHD had insufficient rating (r=0.62 for normalised peak torque and r=0.36–0.52 for LSI).48 74 Similarly, prone isometric extensor strength tests (r=0.17–0.36 for LSI)74 were rated insufficient. Isotonic extensor strength tested in a leg press machine had also insufficient rating (r=0.57 for normalised peak torque).69 Methodological quality and individual study results are presented in online supplemental appendix 9.

Summary of the quality of evidence

Isokinetic concentric extensor strength tests had very low (high speed) to low (slow speed) quality of evidence for sufficient intrarater reliability, low quality of evidence for sufficient criterion validity and moderate quality of evidence for sufficient construct validity. Isotonic strength tests had high quality of evidence for sufficient criterion validity (testing 1RM in a seated leg extension machine and in a prone leg curl machine). Isometric strength tests had high quality of evidence for insufficient criterion validity (using HHD). A summary of the quality of evidence for each measurement property by strength test is described in table 5. A GRADE Table and Summary of Findings Table are shown in online supplemental appendices 10 and 11.

Table 5

Quality of evidence per measurement property per strength test category

Discussion

This is the first study to synthesise quality of evidence on measurement properties of strength tests following ACL and/or meniscus injuries. Four measurement properties for the 31 strength tests (different mode and equipment) were evaluated according to the updated COSMIN guidelines. Our findings showed an overall paucity of evidence for the measurement properties of most strength tests for ACL-injured individuals. We acknowledge the importance of monitoring knee muscle strength and inspiring clinicians to use strength tests. Hence, our clinical recommendations are as follows: (1) isokinetic concentric strength tests can be used due to sufficient intrarater reliability and validity, (2) isotonic strength tests can be used due to sufficient criterion validity, and (3) isometric extensor strength tests using HHD can be used based on sufficient intrarater reliability but with insufficient construct and criterion validity. All these strength tests should be used with caution recognising that further high-quality studies on reliability and validity are needed. Standardised procedures and engaging one rater to monitor muscle strength are recommended. We cannot recommend any strength test for individuals with isolated meniscus injury due to lack of known psychometric measurement properties for this population.

Reliability of strength tests was investigated only for ACL-reconstructed individuals without pain during testing.85 For isokinetic concentric strength tests, we have very limited evidence for sufficient intrarater reliability. Hence, recommendations to researchers should call for more high-quality reliability studies with larger sample size and appropriate description of the statistical methods to increase the evidence for these tests.18 19 We have moderate evidence about sufficient intrarater reliability for isometric extensor strength using HHD. However, this finding applies strictly to experienced raters testing consecutive contractions. Experienced raters with extensive training and practice with test procedures are important for gaining high reliability. To improve outcomes of consecutive contractions, we recommend to apply standardised test protocols as described in the studies48 74 and to involve experienced raters.31 For instance, the individuals should be seated with hip and knee flexion at 90°, using a stabilisation belt over the thigh; the resistance should be on the anterior aspect of the shin, two to three centimetres proximal to the midpoint of the lateral malleolus. Regarding interrater reliability, results were insufficient for isometric extensor strength tested using HHD. Although based on very limited evidence, we recommend that these strength tests are performed by one single rater to obtain the most precise outcomes. In addition, averaging repeated measures affords more reliable outcomes than single measures.31 HHD has been evaluated in different populations.86 87 In young individuals with high muscle strength, the raters ability to withstand high forces influences the test result.13 88 Belt-stabilising testing is recommended to increase reliability.89

Three studies evaluated measurement error for isokinetic concentric and isometric strength tests. All tests were rated indeterminate as we lacked information of the MIC and are unable to apply the criteria for good measurement properties.17 31 Nevertheless, the studies provided results on the instrument’s ability to detect true changes.48 62 71 This needs to be considered when interpreting muscle strength outcomes or change over time. Coefficients of variation for absolute peak torque were reported for isokinetic concentric extensor (8.3% and 2.9%) and flexor (3.4% and 3.3%) strength tests at slow-speed and high-speed, respectively.62 The smallest detectable change has been calculated for LSI for isokinetic concentric extensor strength tests (10.5%, based on the formula 1.96*√2*SEM).71 Limits of agreement were reported as normalised peak torque for the isometric extensor strength using HHD (−18.7, 17.9).64 Computerised dynamometry involves smaller measurement error than HHD for extensor strength test, which is in accordance with a meta-analysis on healthy individuals.14 While recommendations on an MIC value are lacking, previous studies chose to apply 10% to 15% as relevant change.14 90 91 If we considered an MIC of 15%, the isokinetic concentric strength tests were rated adequate while the isometric strength tests using HHD would be inadequate to detect clinical important changes. With an MIC of 10%, the rating of the isokinetic concentric extensor strength test at 60°/s would shift to inadequate.71 This conflicting finding demonstrates the need for establishing MIC values.

In line with our hypotheses, isokinetic concentric high-speed strength tests were strongly correlated with hop tests and moderately correlated with patient-reported outcomes. Strength tests seem to better relate to functional hop performance than to one’s perception of knee function. This finding supports that these comparator instruments measure different constructs and should be differentiated. While hop and strength tests may assess similar constructs, hop tests should not be used interchangeably but rather complementary to inform on knee function.92 93 Concerning neural activity, further investigation is required to confirm the strong correlation between isokinetic concentric extensor strength and corticospinal excitability.50

Criterion validity is contingent on the conception of a gold standard of strength test. We used the most reported test in the literature with sufficient reliability as the gold standard: the isokinetic concentric strength test at 60°/s on computerised dynamometry.13 14 One study provides high quality of evidence that isotonic strength tests on a seated leg extension and a prone leg curl machine are reflective of isokinetic outcomes using computerised dynamometry.70 This applies only to variables reported as peak torque of 1RM. Furthermore, the range of motion to test extensor strength was limited from 90° to 40° of knee flexion as patients were tested at 3 months post-ACL reconstruction.70 Assessing maximal strength on conventional weight machines might, therefore, represent an alternative to computerised isokinetic dynamometry. Further research is needed on later stages of rehabilitation assessing range of motion to full extension. In contrast, isometric extensor strength outcomes obtained using HHD should not be used interchangeably with isokinetic strength outcomes, as these tests’ task specificity requires distinct aspects of muscle action.94 There is also concern of inaccuracy when testing large muscle groups such as the knee extensors using HHD.1 88 This finding is consistent with systematic reviews on healthy and injured individuals13 and applies also for knee flexor strength tests.95

Strength and limitations

A strength of this study is the use of the COSMIN guidelines to appraise measurement properties of strength tests. We used the original COSMIN guidelines for validity18 and applied the recent extended version for studies on reliability and measurement error.19 The criteria to rate construct validity were based on a number of a priori hypotheses and a cut-off for good measurement properties. The COSMIN risk of bias checklist is in agreement with existing guidelines appealing for domain-based assessment.96 The COSMIN guidelines also allow that studies lacking methodological quality are downgraded for quality of evidence instead of being ignored. Different terminology exists regarding measurement theory. The term ‘psychometric properties’ has been criticised as ignoring the clinical utility of measurement instruments.97 While disagreement on this distinction exists,98 we acknowledge the importance of including clinical aspects when evaluating measurement instruments. We did not aim to assess predictive validity, as the COSMIN methodology was developed for instruments used for evaluative purposes and would need adaptations for predictive purposes. Neither did we investigate known-group validity. However, a recent systematic review and meta-analysis concluded that ACL-reconstructed individuals have lower quadriceps strength compared with age-matched, sex-matched and activity-matched controls.2

Another strength of this study is the use of the modified GRADE approach, a structured, transparent way to integrate evidence into decisions. However, we must acknowledge that judgements are subjective and may induce variability, bias or low reproducibility of the assessments. To minimise flawed judgments and facilitate scrutiny, we involved two raters and provided transparent justifications by visualising a GRADE table.99

A limitation of this systematic review may be that the evidence quality relies on consensus of the COSMIN initiative, not on empirical evidence. An example of this limitation can be the downgrading for the factor imprecision, which is only stipulated if the total sample size is below 100. This entails that high quality of evidence can rely on only one single study.

Another possible limitation is the restriction to studies with a mean age at injury ≤30 years. The rationale for this age cut-off was to limit the inclusion of presumably weaker individuals with degenerative meniscus lesions and osteoarthritis. The age criterion restrictive to ≤30 years precluded five studies that only assessed construct validity and likely did not influence our findings. We further excluded studies published before year 2000. Including data from those studies would increase evidence quality on intrarater reliability of isokinetic concentric extensor strength tests by one level, and not affect our conclusions.47

Clinical implications and future research

Despite the overall paucity of evidence, we recommend using isokinetic strength tests to assess strength in individuals following ACL injuries. Isokinetic concentric strength tests using computerised dynamometry are reliable on repeated test sessions, but larger sample size in future studies is required to improve the evidence level for intrarater reliability. Testing isotonic strength for 1RM on conventional weight machines may represent a good alternative to isokinetic strength tests, which is particularly interesting for clinicians given that weight machines are more affordable and common in clinical practices. Isometric extensor strength tests using HHD have sufficient reliability when one experienced rater assesses consecutive contractions. If using isometric muscle strength tests, we strongly recommend a standardised seated position using fixed stabilisation, in particular, for young individuals with high muscle strength. We further dissuade using isometric strength outcomes interchangeably with isokinetic concentric strength outcomes.

This review outlines domains for which further research on measurement properties of strength tests is needed in individuals following ACL or isolated meniscus injuries. Future studies should evaluate strength tests in individuals with isolated meniscus injuries to investigate whether measurement properties alter depending on different intra-articular knee pathologies. Moreover, studies evaluating reliability mainly lack large sample sizes and appropriate description of the statistical method. We urge future studies evaluating reliability to enhance interpretation of their findings, for example, by applying Guidelines for Reporting Reliability and Agreement Studies.100 In addition, evidence on measurement error, responsiveness and interpretability is lacking. MIC values for strength tests should be established as the notion of important change is implicit in the method of assessing measurement error. One approach to determine MIC may be using an external anchor31; another alternative could be a consensus-based approach (eg, Delphi, RAND-UCLA, Nominal Group approaches to consensus).41 Due to only one high-quality study on criterion validity for assessing isotonic or isometric strength tests, we call out for further high-quality studies within this domain. Regarding construct validity, we have sufficient high-quality studies for isokinetic extensor and flexor strength tests; this should not be a prioritised area of research.

Conclusion

The present systematic review underlines the knowledge gap for measurement properties of knee extensor and flexor strength tests to assess young individuals following ACL and/or meniscus injuries. Clinically, we can recommend the isokinetic concentric strength tests due to sufficient intrarater reliability and validity. Isotonic tests may be a good alternative to computerised dynamometry as they have sufficient criterion validity.

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.

Acknowledgments

The authors of the systematic review would like to acknowledge the senior librarian Marte Ødegaard at the University of Oslo for the valuable help in developing the search strategy and performing the search. We further like to acknowledge Stephanie Filbay, Pætur Holm, Erin Macri, Ewa M Roos, and Marienke van Middelkoop for contributing valuable methodological input. JLW is supported by a Michael Smith Foundation for Health Research a Scholar Award (SCH-2020-0403) and an Arthritis Society STAR Career Development Award (STAR-19-0493).

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @AnoukUrhausen, @jwhittak_physio, @agculvenor

  • Contributors JLW, AGC, KMC, CBJ and MAR contributed to the conception of the study. APU, BB, BEØ and MAR designed the study. APU and BB screened studies for inclusion, performed the data extraction and risk of bias assessment. All authors assisted with the interpretation of data. APU was the principal writer of the manuscript. All authors critically provided edits of the manuscript and approved the final version.

  • Funding This systematic review is part of the OPTIKNEE consensus (https://bit.ly/OPTIKNEE), which has received funding from the Canadian Institutes of Health Research (OPTIKNEE principal investigator JLW #161821). APU and MAR are recipients of the National Institutes of Health grant R37HD37985. AGC is a recipient of a National Health and Medical Research Council (NHMRC) of Australia Investigator Grant (GNT2008523). The funders had no role in any part of the study or in any decision about publication.

  • Competing interests JLW and AGC are Associate Editors of the British Journal of Sports Medicine (BJSM). JLW is an Editor with the Journal of Orthopaedic and Sports Physical Therapy. KMC is a senior advisor of BJSM, project leader of the Good Life with Osteoarthritis from Denmark (GLA:D)—Australia a not-for profit initiative to implement clinical guidelines in primary care, and holds a research grant from Levin Health outside the submitted work. All other authors declare no competing interests.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.