Article Text

other Versions

Improving physical performance tests: time to include a psychologist
  1. Josh B Kazman,
  2. Sarah de la Motte,
  3. Timothy C Gribbin,
  4. Jeffrey Galecki,
  5. Patricia A Deuster,
  6. Francis O'Connor
  1. Department of Military and Emergency Medicine, Uniformed Services University, Bethesda, Maryland, USA
  1. Correspondence to Josh B Kazman, Department of Military and Emergency Medicine, Uniformed Services University, Bethesda 20814, Maryland, USA; josh.kazman.ctr{at}

Statistics from

Hegedus and Cook recently highlighted the lack of evidence supporting physical performance tests (PPTs) in sports medicine1 and asked whether these tests would ever be validated like ‘their self-report cousins’ and ‘predict a complex and multidimensional concept such as injury’. We believe that psychometrics–as used for clinical assessment in psychology and education–may also have a role in sports medicine.

The science of psychometrics is well laid-out, encompassing a framework for clinical assessment.2–4 Developing tests requires subject-matter experts, thought, and creativity to: (1) clearly define a construct based on theory; (2) explore and catalogue how the construct is manifest; (3) develop potential items; and (4) pilot-test items and iteratively refine item-selection (figure 1).

Figure 1

A framework for clinical assessment.

If physical performance tests were developed using psychometric principles, it would amplify their reliability and their ability to predict outcomes (eg, rehabilitation success, injury risk). Most physical performance tests were created to identify one physical task correlating with some criterion (eg, biomechanics, pathology). For example, in the Landing Error Scoring System, examinees perform a jump-landing task, rated along 17 possible errors (eg, knee flexion), with more errors being predictive of poor movement, a risk for anterior cruciate ligament injury.5 ,6

A ‘new’ way of developing physical performance tests in sports medicine

Psychometricians take a different approach; they use independent observations from separate tasks. Redundancy is essential as no one measurement is adequate. One example is academic achievement tests, some of which are intended to predict college success; they are among the most sophisticated psychometrically valid tests. Just as athletic success requires physiological and non-physiological factors, college success requires cognitive and non-cognitive factors. College success can be predicted. In a meta-analysis, American College Testing scores correlated with college grades two years out (r=0.45) and with retention three years out (r=0.14).7 Impressive, considering the myriad factors necessary for college success.

The American College Test lasts 3.5 hours and has 215 questions. Imagine recreating the American College Test with the methods used to create physical performance tests. You might start with some maths problems, which isolated distinct maths skills, then use laboratory observations to find the best maths problem – measure brain activity while people solve them, compare results across clinical populations and watch people work on them, noting where progress was impeded. Based on this, one best maths problem could be identified. But no matter how great, one problem would not predict much. And yet, in sports medicine, one physical movement/task is expected to suffice.

In sports medicine, a 3.5 hour test would be absurd. The key is the principle—multiple different tasks are needed to test the same construct, because no one task measures it perfectly. Building redundancy into tests allows for the idiosyncrasies of any individual item (‘measurement error’) to wash out. Given a set of good-enough ‘scored’ items, which adequately span a trait's continuum, the average provides a substantially better measure than any item alone.

A practical example

How might this apply to a physical performance test, say, measuring abdominal strength? Start with distinct exercises which all primarily target abdominals. Each exercise should be equally difficult. Develop a rating system, perhaps from repetitions or movement quality. The rating system should be uniform. The same examinee should be likely to achieve similar scores across all exercises. Next, pilot-test the items and collect the performance data. Using psychometric analyses (eg, principal components, internal-consistency3 ,4), remove the less desirable items based on common thresholds in the literature to improve the measure's precision. Lastly, collect longitudinal outcomes.

Are 215 exercises needed? Probably not. Psychometric assessments often have 10–30 items, but it varies by field. The number of items required is substantially reduced by creating good items and combing modern statistical models with larger pilot-test samples.8 Overall, determining the final item-set is a science and an art.3 It depends on judgments (eg, construct complexity/scope) informed by metrics (eg, item inter-relatedness, desired accuracy).

What is the purpose of your performance test?

Physical performance tests are often used for three distinct reasons: predicting athletic success, predicting injury, and guiding return to sport.1 It is important to delineate which constructs best suit each purpose, and to break them down into singular dimensions.

The abdominal strength test might be expanded to measure core strength. Core strength is multidimensional and likely hierarchical (ie, nested factors). A comprehensive core strength test might integrate subtests to assess abdominal, hip, and trunk strength, each with multiple item-sets, like an IQ test with subtests targeting complementary dimensions. Based on iterative pilot-testing, the observed pattern of responses should align with theory.

In summary, physical performance test development should begin incorporating psychometric principles derived from a century old discipline.4 It would be a victory for all if these techniques were to make their way into sports medicine/sports physiotherapy/athletic training, and physical performance tests in particular.


View Abstract


  • Disclaimer The views expressed are those of the authors and do not reflect the official policy or position of the Uniformed Services University of the Health Sciences, the Department of the Defense, or the US Government.

  • Contributors JBK wrote an initial draft of the article, which was then discussed and refined with the other authors.

  • Competing interests None declared.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles