Article Text

Gait, physical activity and tibiofemoral cartilage damage: a longitudinal machine learning analysis in the Multicenter Osteoarthritis Study
  1. Kerry E Costello1,2,3,
  2. David T Felson3,
  3. S Reza Jafarzadeh3,
  4. Ali Guermazi4,
  5. Frank W Roemer5,6,
  6. Neil A Segal7,8,
  7. Cora E Lewis9,
  8. Michael C Nevitt10,
  9. Cara L Lewis2,3,
  10. Vijaya B Kolachalama11,12,
  11. Deepak Kumar2,3
  1. 1 Mechanical and Aerospace Engineering, University of Florida, Gainesville, Florida, USA
  2. 2 Physical Therapy, Boston University, Boston, Massachusetts, USA
  3. 3 Rheumatology, Boston University School of Medicine, Boston, Massachusetts, USA
  4. 4 Radiology, VA Boston Healthcare System, West Roxbury, Massachusetts, USA
  5. 5 Radiology, Universitatsklinikum Erlangen, Erlangen, Germany
  6. 6 Radiology, Boston University School of Medicine, Boston, Massachusetts, USA
  7. 7 Rehabilitation Medicine, The University of Kansas Medical Center, Kansas City, Kansas, USA
  8. 8 Epidemiology, The University of Iowa, Iowa City, Iowa, USA
  9. 9 Epidemiology, The University of Alabama, Birmingham, Alabama, USA
  10. 10 Epidemiology & Biostatistics, University of California, San Francisco, California, USA
  11. 11 Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts, USA
  12. 12 Computer Science, Boston University, Boston, Massachusetts, USA
  1. Correspondence to Deepak Kumar, Boston University, Boston 02215, Massachusetts, USA; kumard{at}


Objective To (1) develop and evaluate a machine learning model incorporating gait and physical activity to predict medial tibiofemoral cartilage worsening over 2 years in individuals without advanced knee osteoarthritis and (2) identify influential predictors in the model and quantify their effect on cartilage worsening.

Design An ensemble machine learning model was developed to predict worsened cartilage MRI Osteoarthritis Knee Score at follow-up from gait, physical activity, clinical and demographic data from the Multicenter Osteoarthritis Study. Model performance was evaluated in repeated cross-validations. The top 10 predictors of the outcome across 100 held-out test sets were identified by a variable importance measure. Their effect on the outcome was quantified by g-computation.

Results Of 947 legs in the analysis, 14% experienced medial cartilage worsening at follow-up. The median (2.5–97.5th percentile) area under the receiver operating characteristic curve across the 100 held-out test sets was 0.73 (0.65–0.79). Baseline cartilage damage, higher Kellgren-Lawrence grade, greater pain during walking, higher lateral ground reaction force impulse, greater time spent lying and lower vertical ground reaction force unloading rate were associated with greater risk of cartilage worsening. Similar results were found for the subset of knees with baseline cartilage damage.

Conclusions A machine learning approach incorporating gait, physical activity and clinical/demographic features showed good performance for predicting cartilage worsening over 2 years. While identifying potential intervention targets from the model is challenging, lateral ground reaction force impulse, time spent lying and vertical ground reaction force unloading rate should be investigated further as potential early intervention targets to reduce medial tibiofemoral cartilage worsening.

  • Knee
  • Accelerometer

Data availability statement

Data are available in a public, open access repository. Data from the Multicenter Osteoarthritis Study are available through the National Institute on Aging Research Biobank: Public use datasets from the 144- and 168-month visits of the Multicenter Osteoarthritis Study will be available at this website in 2023.

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Gait and physical activity features are potential intervention targets to slow knee osteoarthritis progression, but little is known about their role in early knee osteoarthritis.

  • Underlying interactions among gait, physical activity, demographic and clinical features make traditional statistical approaches challenging.


  • Machine learning models predicted cartilage worsening in persons without advanced knee osteoarthritis from gait, physical activity and clinical and demographic characteristics with a median area under the receiver operating characteristic curve of 0.73 across 100 held-out test sets.

  • High lateral ground reaction force impulse, more time spent lying and low vertical ground reaction force unloading rate were associated with increased risk of cartilage worsening over 2 years.


  • Gait and physical activity are some of the only modifiable risk factors for knee osteoarthritis; this study identified three potential intervention targets to slow early knee osteoarthritis progression.


Knee osteoarthritis (OA) is a progressive, painful joint disease and leading cause of disability, affecting over 350 million adults.1 While some individuals with advanced disease undergo knee replacement, there is no cure and many experience pain and poor quality of life for decades. Existing structural damage and other risk factors (eg, obesity, malalignment) can drive further degeneration.2 3 Addressing this burden will require early identification of at-risk individuals and discovery of intervention targets that can be addressed before the onset of extensive damage or other risk factors.

Joint loading is one of few modifiable risk factors for knee OA4 and can be manipulated through gait and physical activity. While prior research has identified gait features associated with medial tibiofemoral knee OA progression,5 these were typically examined in isolation, in small samples and/or without accounting for other risk factors. Importantly, little is known about gait and physical activity predictors of progression early in the disease process. Machine learning can identify features in complex datasets that are important to prediction without requiring assumptions about underlying relationships among features, making it useful for exploring gait and physical activity.6–8

The Multicenter Osteoarthritis Study (MOST)9 is a large observational cohort of individuals with and without knee OA where data on gait, physical activity, clinical and demographic measures are available for machine learning applications. Further, MOST includes MRI exams at multiple time points, providing sensitive measures of early joint structural change, including worsening cartilage damage.10 Using MOST data, our objectives were to (1) build and evaluate a machine learning model to predict medial tibiofemoral cartilage worsening over 2 years from gait, physical activity, clinical and demographic features in individuals without advanced knee OA, and (2) identify features that contribute most to model prediction and quantify their effect on the outcome.


Study sample

At 144 months, surviving participants from the original MOST cohort (age 50–79, with or at increased risk for developing knee OA at enrolment) were invited for a return visit. Concurrently, a new cohort (age 45–69, Kellgren-Lawrence grades (KLG) ≤2, with or without knee pain) was enrolled. Participants with inflammatory arthritis or stroke were not included in either cohort.

We used data from both cohorts for our baseline (original: 144 months, new: enrolment) and 2-year follow-up (original: 168 months, new: 24 months). MRIs were read for one knee per participant (herein referred to as the ‘study knee’) at baseline and 2 years. If both knees had readable baseline and follow-up images, the knee with better quality images was read. We excluded participants with KLG >2 in the study knee to focus on early disease (figure 1). We excluded participants with history of knee or hip replacement (either leg), steroid or hyaluronic acid injection during the past 6 months (either knee) or regular use of walking aids. Finally, we excluded participants who did not undergo MRI assessment or with gait or physical activity data quality issues (described later).

Figure 1

Study sample from the Multicenter Osteoarthritis Study (MOST).

Patient and public involvement

Currently, patients and the public are not involved in the design, conduct, reporting or dissemination plans for research projects using MOST data.

Equity, diversity and inclusion statement

The authors include women and men with training in engineering and various clinical specialties from Asia, Europe and North America. This study included participants (58.2% women) from two North American clinical sites with various self-reported racial identities (table 1). Sex, site and race were accounted for in our analyses (described later); however, we did not examine socioeconomic status.

Table 1

Baseline demographics and clinical characteristics


Clinical and demographic features

Model inputs included clinical and demographic factors11–18 that are both independent risk factors for OA and affect gait/physical activity (ie, confounders based on hypothesised directed acyclic graphs19). Sex, age, body mass index (BMI), race, clinic site and prior history of knee injury or surgery were recorded at baseline. Given small samples in multiple categories of race (table 1), particularly at UIowa, race and site were combined into a single feature with three levels: UAB non-white (n=117), UAB white (n=221), UIowa (n=609). Participants completed the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC)20 and Center for Epidemiologic Studies Depression Scale, and had posterior–anterior and lateral weight-bearing radiographs taken, which were read for KLG.21 Hip-knee-ankle alignment was read from baseline long-limb radiographs for the new cohort and long-limb radiographs taken at the 60-month visit for the original cohort. Pain during walking was extracted from the first question of WOMAC (categorised as ‘no,’ ‘mild’ or ‘moderate or higher’).

Gait features

Three-dimensional (3D) ground reaction force (GRF) data were recorded (1000 Hz) while participants walked at a self-selected speed across a portable force platform embedded in a 5.3 m walkway (AccuGait, AMTI, Watertown, Massachusetts, USA). At least five trials were acquired per leg, with the first excluded as an acclimatisation trial. Legs with ≥3 remaining trials where the foot landed completely on the force plate were retained for analysis. For each trial, we extracted commonly used 3D GRF metrics (figure 2), ‘toe-out’ angle defined by Chang et al,22 stance time and walking speed. We normalised all timing features to stance phase (ie, % stance). GRFs were not amplitude-normalised given the inclusion of BMI in the model and to avoid issues with interpreting ratios.23 We averaged each feature across trials for each leg.

Figure 2

Features extracted from ground reaction force (GRF) data.

Physical activity features

Participants wore an activity monitor (AX3, Axivity, Newcastle upon Tyne, UK) consisting of a triaxial accelerometer and temperature sensor on the lower back (centred over the midpoint of L5–S1) for 7 days at baseline, with 3D acceleration sampled at 100 Hz with a range of ±8 g. Non-wear was defined as periods ≥10 min with no movement and verified using the temperature sensor.24 Data for each axis were bandpass filtered (0.2–20 Hz, 4th order Butterworth filter). Summary metrics were calculated for each day: step count, time spent walking, time spent lying and mean 3D signal vector magnitude (overall magnitude of acceleration across all dimensions, Equation 1). Time spent walking and lying were expressed as % wear time to account for differences in wear time among individuals.25 Metrics were averaged across all valid days (defined as ≥10 hours of wear time/day26). We excluded participants with <3 valid days.27

Embedded Image (1)


Two musculoskeletal radiologists (AG, FWR) scored the severity of cartilage damage in five medial tibiofemoral subregions of the study knee at each time point using the MRI Osteoarthritis Knee Score.28 We defined medial cartilage worsening as any increase in area and/or depth in at least one of the five subregions over the 2-year period, as done previously.10 29

Machine learning model

Model development was performed in R (V.4.2.2). We examined Spearman correlations between all continuous features and for near perfect correlations (ρ>0.85), selected one feature to retain for analysis (table 2 shows retained gait and physical activity features). We used the predictive mean matching algorithm within the multiple imputation by chained equations framework (V.3.13.0) to impute missing exposure data (<0.1% dataset).30 We randomly split the data into 70% train and 30% test, maintaining the same proportion of outcome in both datasets.31 Continuous features were scaled and centred to have zero mean and unit variance.

Table 2

Baseline gait and physical activity features

Our goal was to predict the binary cartilage worsening outcome from baseline GRF, accelerometer and clinical/demographic data. We used ‘super learning’ (V.1.4.2),32 an ensemble machine learning approach that combines several candidate algorithms to enhance prediction accuracy above and beyond individual algorithms (figure 3). We selected candidate learners to include diverse learning strategies while being computationally feasible, as recommended by Phillips et al.33 Using the training dataset, candidate learners were trained through fivefold cross-validation. Corresponding predictions on out-of-fold samples were used to develop a meta learner that optimised the weight (ie, contribution) of each individual learner. We then applied this model to the held-out test set to assess its performance by area under the receiver operating characteristic curve (AUC) and mean squared error (MSE).

Figure 3

Machine learning model development and evaluation.

To test robustness and reproducibility of the model training and testing, we used repeated cross-validation, that is, repeated the process of randomly splitting the data into train and test, training the super learner and evaluating its performance on the held-out test set. Here, we report median (ie, 50th percentile), 2.5th and 97.5th percentile AUC and MSE across 100 iterations.

Identification of influential predictors

To assess the contribution of each feature to model prediction, for each of the 100 iterations, 35 additional models were trained on the training set (each excluding one of the features included in the full model). These models were applied to the test set and a variable importance measure (VIM) statistic was calculated for each feature for each iteration based on the size of the risk difference between the full model and the model fit without the feature. Thus, 35 VIMs were produced per iteration. The top contributors to prediction for each iteration were identified as the 10 features with the highest VIMs. We defined ‘influential predictors’ as the 10 features that most frequently appeared as top contributors across the 100 iterations.

Marginal causal risk differences

To quantify the effect of influential predictors identified from the super learner model on cartilage worsening, we used parametric g-computation.34 Continuous variables were quantised into tertiles. For each predictor, we calculated the marginal causal risk difference of each category of the predictor on cartilage worsening, compared with the corresponding reference category, using risk Communicator (V.1.0.0); 95% CIs were calculated using 1000 bootstrap samples.35 Different risk factors may be associated with OA initiation vs progression; thus, we explored sensitivity analyses stratified by baseline cartilage damage (ie, lesion in ≥1 subregion). Only 6% of knees without baseline damage had cartilage worsening at follow-up, thus, we focused our sensitivity analysis on those with baseline damage (online supplemental file).

Supplemental material


Model performance

Of 947 participants, 133 (14%) experienced cartilage worsening in the study knee over 2 years. Across 100 iterations, the median (2.5th and 97.5th percentiles) AUC and MSE on the held-out test sets were 0.73 (0.65–0.79) and 0.11 (0.09–0.13), respectively.

Influential predictors

The features most frequently appearing as top contributors to prediction across 100 iterations (and frequency of appearance) were baseline medial tibiofemoral cartilage damage (100), KLG (98), lateral GRF impulse (46), pain during walking (45), time spent lying (35), timing of the vertical GRF first peak (31), vertical GRF impulse (30), early medial GRF peak (29), timing of the vertical GRF second peak (28) and the maximum instantaneous vertical GRF unloading rate (28).

Marginal risk differences

Marginal risk differences from the g-computation analyses (figure 4) can be interpreted as the difference in risk of cartilage worsening per 100 individuals in the given category compared with the referent category. Presence of cartilage damage, higher KLG, greater lateral GRF impulse, greater pain during walking, greater time spent lying and lower vertical GRF unloading rate at baseline were associated with increased risk of cartilage worsening (figure 4). Point estimates were similar in the sensitivity analysis (online supplemental file).

Figure 4

Causal risk differences for influential predictors identified from the machine learning model. GRF, ground reaction force; KLG, Kellgren-Lawrence grades; WOMAC, Western Ontario and McMaster Universities Osteoarthritis Index.


An ensemble machine learning approach incorporating baseline gait, physical activity and clinical/demographic features showed good performance predicting medial tibiofemoral cartilage worsening over 2 years in knees with KLG ≤2. While determining the relationships among predictors and outcomes in machine learning models is challenging, our analysis suggests that high lateral GRF impulse, high time spent lying, and low vertical GRF unloading should be investigated further as potential targets to reduce cartilage worsening.

Model performance

Our model performance is comparable to other machine learning models predicting OA progression from clinical/demographic data. Du et al reported AUCs of 0.70–0.79 for predicting radiographic worsening (increase in KLG, medial or lateral joint space narrowing) over 2 years from baseline cartilage damage MRI features in those with KLG 0 to 4.36 Tiulpin et al reported AUCs of 0.73–0.75 for predicting worsening (increase in KLG or knee joint replacement) over 7 years from baseline age, sex, BMI, injury, surgery, WOMAC and KLG in individuals with KLG <2.37 The current model achieved similar AUC for predicting cartilage worsening over 2 years in individuals with KLG ≤2, with the added benefit of providing information about potentially modifiable gait and physical activity predictors.

Prior longitudinal gait studies typically examined knee-specific loading (eg, knee adduction moment) rather than GRFs, often in samples of 15–300 knees.5 Correspondingly, few addressed clinical/demographic confounders, incorporated physical activity or examined performance in held-out test sets. Further, many were conducted in samples with established OA (KLG ≥2), limiting their potential to identify at-risk individuals early in the disease process or identify early intervention targets. Our sample included 947 individuals with KLG ≤2, who predominantly had no or mild pain during walking, and thus were younger with lower BMI than previously reported samples (mean age 59.2 vs 62.0 years, BMI 27.8 vs 29.2 kg/m2).18

Predictors of OA progression

The super learner identified multiple influential gait and physical activity predictors of cartilage worsening. Of these, the g-computation analyses found baseline lateral GRF impulse, time spent lying and vertical GRF unloading rate were associated with cartilage worsening. The 7.2% estimate of risk difference for lateral GRF impulse suggests that for every 100 individuals in the highest tertile, there are 7.2 individuals who experience cartilage worsening who would not experience worsening in they were in the lowest tertile. Accordingly, approximately 14 (ie, 1/0.072) individuals with lateral GRF impulse at the highest tertile would be needed to observe an increase in the number of individuals with cartilage worsening by 1 person. In a cross-sectional study in the same cohort, we previously reported that limbs with radiographic OA, with or without knee pain, have higher lateral GRFs in early stance compared with limbs without radiographic OA or pain.38 The current results suggest lateral GRF may also play a role in progression. The 5.4% increased risk of cartilage worsening for the middle versus lowest tertile of time spent lying, along with prior research showing greater sedentary time is associated with future functional decline39 and lower quality of life,40 suggests reducing sedentary time should be investigated as a potential intervention target. The physiological reason for the 6.6% decreased risk of cartilage worsening in the highest vs lowest tertile of vertical GRF unloading rate is not clear and warrants further exploration.

The appearance of structure and symptom features as influential predictors is not surprising, given that these are established risk factors. Of note, despite only 10.1% of the sample having what is traditionally considered established radiographic OA (KLG=2), 39.2% had baseline cartilage damage and both appeared as influential predictors in the model. The g-computation analysis identified a 15.3% increased risk of cartilage worsening for every 100 individuals with baseline damage compared with no damage, and a 14.4% increased risk for KLG 2 vs 0. The lack of difference for KLG 1 vs 0 may highlight limitations of the KLG scoring system, which does not reflect tissue-level damage well, particularly in early disease.41 42 Along with these structural measures, knees with mild pain had an increased risk of cartilage worsening (6.8%) compared with those with no pain. The large CI for moderate and higher pain could stem from the small proportion of knees (3% of sample) and/or heterogeneity in this category.

Clinical implications

The utility of this model for risk screening is debatable, as it requires GRF data. While faster to collect than joint moments, collecting GRFs requires specialised equipment (force platform). Future advances in wearable technologies may facilitate gait data capture during everyday life, including estimates of GRFs,43 44 improving the potential of this type of model as a risk screening tool.

This model identified potential gait and physical activity intervention targets for further study. Interestingly, two influential predictors (baseline damage, KLG) appeared as top contributors in ≥98% of the iterations while others appeared less consistently (<50%). While we removed highly correlated features, this may result in part from predictors that capture similar constructs (eg, four features collectively describing an important construct could each appear 25% of the time). Similarly, our g-computation approach provides insight into causal pathways but does not account for concurrent changes in several risk factors. An important motivation for using machine learning was to address potential interactions among predictors. While it is challenging to identify these relationships from the model, the lack of consistency in top contributors could indicate a need for simultaneous intervention on several features rather than a single feature, opening interesting avenues for future study.

Strengths and limitations

Strengths include the large sample, investigation of gait and physical activity in early disease, use of machine learning to address inter-related predictors and use of g-computation to quantify their effects. These strengths expand existing literature by accounting for demographics and clinical characteristics, examining multiple gait and physical activity features, and testing the model on held-out data. Our sample was largely white with little to no pain during walking; these results may not generalise to diverse populations or those with severe symptoms. Lateral or patellofemoral worsening could have been present in both outcome groups, causing noise in the model. While we adjusted for a diverse set of confounders, as in any observational study, there may be residual unmeasured confounding. Knee loading (eg, knee adduction moment) may provide additional predictive power but kinematics are not available in MOST, limiting comparison to prior gait studies and insight into mechanisms by which features such as lateral GRF impulse affect structure. We are unaware of other large datasets with gait, physical activity and MR outcomes that could be used for external validation, however, we assessed reproducibility with repeated cross-validation. Better characterisation of dynamic physical activity patterns may also improve model performance and identification of relevant intervention targets.


Using an ensemble machine learning approach, we predicted medial tibiofemoral cartilage worsening over 2 years in persons without and with early radiographic OA with good performance on held-out samples. Additionally, we identified baseline gait and physical activity measures associated with cartilage worsening that may be potential early intervention targets, including lateral GRF impulse, time spent lying and vertical GRF unloading rate.

Data availability statement

Data are available in a public, open access repository. Data from the Multicenter Osteoarthritis Study are available through the National Institute on Aging Research Biobank: Public use datasets from the 144- and 168-month visits of the Multicenter Osteoarthritis Study will be available at this website in 2023.

Ethics statements

Patient consent for publication

Ethics approval

Ethical approval for the Multicenter Osteoarthritis Study was obtained from the Institutional Review Board at each of the four core sites: Boston University (H-32956), University of Alabama at Birmingham (IRB-000329007), University of California San Francisco (301480) and University of Iowa (201511711). All participants provided informed consent before participating, in accordance with the Helsinki Declaration. Approval for secondary data analysis using MOST data was obtained from the Institutional Review Board at the University of Florida (IRB202201899).


The authors would like to acknowledge the contributions of the MOST participants and clinic staff.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @kecostello, @david_felson, @srjafarz, @ali_guermazi, @FrankRoemer1, @ProfCaraLewis, @vkola_lab, @ProfDeepakKumar

  • Contributors Guarantors for the integrity of the work as a whole: KEC and DK. Conception and design: KEC, DK and DTF. Analysis and interpretation of the data: all authors. Drafting of the article: KEC. Critical revision of the article for important intellectual content: all authors. Final approval of the article: all authors. Provision of study materials or patients: DTF, CEL, MCN and NAS.

  • Funding MOST is composed of four cooperative grants (DTF (BU)—AG18820, J.C. Torner (UI)—AG18832, CEL (UAB)—AG18947, and MCN (UCSF)—AG19069] funded by the National Institutes of Health (NIH), a branch of the Department of Health and Human Services, and conducted by MOST study investigators. Research reported in this publication was also supported under award numbers F32AR076907 (KEC), P30AR072571 (DTF), 1UL1TR001430 (DTF), R21AR074578 (SRJ), R03AG060272 (S.R. Jafarzadeh), K23AR063235 (CLL), R01HL159620 (VBK), R21CA253498 (VBK), and K01AR069720 (DK) from the National Institutes of Health and 17SDG33670323 (VBK) and 20SFRN35460031 (VBK) from the American Heart Association. KEC was also supported by an Investigator Award from the Rheumatology Research Foundation. The MOST publications committee reviewed this manuscript before submission. The National Institutes of Health, the American Heart Association, and the Rheumatology Research Foundation were not involved in study design, collection, analysis or interpretation of data, or the decision to submit this manuscript for publication.

  • Disclaimer This manuscript was prepared using MOST data and does not necessarily represent the official views of MOST investigators, the National Institutes of Health, the American Heart Association, or the Rheumatology Research Foundation.

  • Competing interests NAS reports personal fees from Tenex Health and grants from Pacira Bioscience, outside of the submitted work. AG is shareholder of BICL, LLC and consultant to Pfizer, AstraZeneca, Novartis, TissueGene, Regeneron and MerckSerono. FWR is shareholder of BICL, LLC and consultant to Grünenthal. All other authors have no competing interests to report.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.