Gait, physical activity and tibiofemoral cartilage damage: a longitudinal machine learning analysis in the Multicenter Osteoarthritis Study

Objective To (1) develop and evaluate a machine learning model incorporating gait and physical activity to predict medial tibiofemoral cartilage worsening over 2 years in individuals without advanced knee osteoarthritis and (2) identify influential predictors in the model and quantify their effect on cartilage worsening. Design An ensemble machine learning model was developed to predict worsened cartilage MRI Osteoarthritis Knee Score at follow-up from gait, physical activity, clinical and demographic data from the Multicenter Osteoarthritis Study. Model performance was evaluated in repeated cross-validations. The top 10 predictors of the outcome across 100 held-out test sets were identified by a variable importance measure. Their effect on the outcome was quantified by g-computation. Results Of 947 legs in the analysis, 14% experienced medial cartilage worsening at follow-up. The median (2.5–97.5th percentile) area under the receiver operating characteristic curve across the 100 held-out test sets was 0.73 (0.65–0.79). Baseline cartilage damage, higher Kellgren-Lawrence grade, greater pain during walking, higher lateral ground reaction force impulse, greater time spent lying and lower vertical ground reaction force unloading rate were associated with greater risk of cartilage worsening. Similar results were found for the subset of knees with baseline cartilage damage. Conclusions A machine learning approach incorporating gait, physical activity and clinical/demographic features showed good performance for predicting cartilage worsening over 2 years. While identifying potential intervention targets from the model is challenging, lateral ground reaction force impulse, time spent lying and vertical ground reaction force unloading rate should be investigated further as potential early intervention targets to reduce medial tibiofemoral cartilage worsening.


INTRODUCTION
Knee osteoarthritis (OA) is a progressive, painful joint disease and leading cause of disability, affecting over 350 million adults. 1 While some individuals with advanced disease undergo knee replacement, there is no cure and many experience pain and poor quality of life for decades. Existing structural damage and other risk factors (eg, obesity, malalignment) can drive further degeneration. 2 3 Addressing this burden will require early identification of at-risk individuals and discovery of intervention targets that can be addressed before the onset of extensive damage or other risk factors.
Joint loading is one of few modifiable risk factors for knee OA 4 and can be manipulated through gait and physical activity. While prior research has identified gait features associated with medial tibiofemoral knee OA progression, 5 these were typically examined in isolation, in small samples and/or without accounting for other risk factors. Importantly, little is known about gait and physical activity predictors of progression early in the disease process. Machine learning can identify features in complex datasets that are important to prediction without requiring assumptions about underlying relationships among features, making it useful for exploring gait and physical activity. [6][7][8]

WHAT IS ALREADY KNOWN ON THIS TOPIC
⇒ Gait and physical activity features are potential intervention targets to slow knee osteoarthritis progression, but little is known about their role in early knee osteoarthritis. ⇒ Underlying interactions among gait, physical activity, demographic and clinical features make traditional statistical approaches challenging.

WHAT THIS STUDY ADDS
⇒ Machine learning models predicted cartilage worsening in persons without advanced knee osteoarthritis from gait, physical activity and clinical and demographic characteristics with a median area under the receiver operating characteristic curve of 0.73 across 100 held-out test sets. ⇒ High lateral ground reaction force impulse, more time spent lying and low vertical ground reaction force unloading rate were associated with increased risk of cartilage worsening over 2 years.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
⇒ Gait and physical activity are some of the only modifiable risk factors for knee osteoarthritis; this study identified three potential intervention targets to slow early knee osteoarthritis progression.

Original research
The Multicenter Osteoarthritis Study (MOST) 9 is a large observational cohort of individuals with and without knee OA where data on gait, physical activity, clinical and demographic measures are available for machine learning applications. Further, MOST includes MRI exams at multiple time points, providing sensitive measures of early joint structural change, including worsening cartilage damage. 10 Using MOST data, our objectives were to (1) build and evaluate a machine learning model to predict medial tibiofemoral cartilage worsening over 2 years from gait, physical activity, clinical and demographic features in individuals without advanced knee OA, and (2) identify features that contribute most to model prediction and quantify their effect on the outcome.

Study sample
At 144 months, surviving participants from the original MOST cohort (age 50-79, with or at increased risk for developing knee OA at enrolment) were invited for a return visit. Concurrently, a new cohort (age 45-69, Kellgren-Lawrence grades (KLG) ≤2, with or without knee pain) was enrolled. Participants with inflammatory arthritis or stroke were not included in either cohort.
We used data from both cohorts for our baseline (original: 144 months, new: enrolment) and 2-year follow-up (original: 168 months, new: 24 months). MRIs were read for one knee per participant (herein referred to as the 'study knee') at baseline and 2 years. If both knees had readable baseline and follow-up images, the knee with better quality images was read. We excluded participants with KLG >2 in the study knee to focus on early disease (figure 1). We excluded participants with history of knee or hip replacement (either leg), steroid or hyaluronic acid injection during the past 6 months (either knee) or regular use of walking aids. Finally, we excluded participants who did not undergo MRI assessment or with gait or physical activity data quality issues (described later).

Patient and public involvement
Currently, patients and the public are not involved in the design, conduct, reporting or dissemination plans for research projects using MOST data.

Equity, diversity and inclusion statement
The authors include women and men with training in engineering and various clinical specialties from Asia, Europe and North America. This study included participants (58.2% women) from two North American clinical sites with various self-reported racial identities (table 1). Sex, site and race were accounted for in our analyses (described later); however, we did not examine socioeconomic status.

Clinical and demographic features
Model inputs included clinical and demographic factors [11][12][13][14][15][16][17][18] that are both independent risk factors for OA and affect gait/physical activity (ie, confounders based on hypothesised directed acyclic graphs 19 ). Sex, age, body mass index (BMI), race, clinic site and prior history of knee injury or surgery were recorded at baseline. Given small samples in multiple categories of race (table 1), particularly at UIowa, race and site were combined into a single feature with three levels: UAB non-white (n=117), UAB white (n=221), UIowa (n=609). Participants completed the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) 20 and Center for Epidemiologic Studies Depression Scale, and had posterior-anterior and lateral weight-bearing radiographs taken, which were read for KLG. 21 Hip-knee-ankle alignment was read from baseline long-limb radiographs for the new cohort and long-limb radiographs taken at the 60-month visit for the original cohort. Pain during walking was extracted from the first question of WOMAC (categorised as 'no,' 'mild' or 'moderate or higher').

Gait features
Three-dimensional (3D) ground reaction force (GRF) data were recorded (1000 Hz) while participants walked at a self-selected speed across a portable force platform embedded in a 5.3 m walkway (AccuGait, AMTI, Watertown, Massachusetts, USA). At least five trials were acquired per leg, with the first excluded as an acclimatisation trial. Legs with ≥3 remaining trials where the foot landed completely on the force plate were retained for analysis. For each trial, we extracted commonly used 3D GRF metrics (figure 2), 'toe-out' angle defined by Chang et al, 22 stance time and walking speed. We normalised all timing features to stance phase (ie, % stance). GRFs were not amplitude-normalised given the inclusion of BMI in the model and to avoid issues with interpreting ratios. 23 We averaged each feature across trials for each leg.

Physical activity features
Participants wore an activity monitor (AX3, Axivity, Newcastle upon Tyne, UK) consisting of a triaxial accelerometer and temperature sensor on the lower back (centred over the midpoint of L5-S1) for 7 days at baseline, with 3D acceleration sampled at 100 Hz with a range of ±8 g. Non-wear was defined as periods ≥10 min with no movement and verified using the temperature sensor. 24 Data for each axis were bandpass filtered (0.2-20 Hz, 4th order Butterworth filter). Summary metrics were calculated for each day: step count, time spent walking, time spent lying and mean 3D signal vector magnitude (overall magnitude of acceleration across all dimensions, Equation 1). Time spent walking and lying were expressed as % wear time to account for differences in wear time among individuals. 25 Metrics were averaged across all valid days (defined as ≥10 hours of wear time/ day 26 ). We excluded participants with <3 valid days. 27 Signal vector magnitude =

Outcome
Two musculoskeletal radiologists (AG, FWR) scored the severity of cartilage damage in five medial tibiofemoral subregions of the study knee at each time point using the MRI Osteoarthritis Knee Score. 28 We defined medial cartilage worsening as any increase in area and/or depth in at least one of the five subregions over the 2-year period, as done previously. 10 29 Machine learning model 2). We examined Spearman correlations between all continuous features and for near perfect correlations (ρ>0.85), selected one feature to retain for analysis (table 2 shows retained gait and physical activity features). We used the predictive mean matching algorithm within the multiple imputation by chained equations framework (V.3.13.0) to impute missing exposure data (<0.1% dataset). 30 We randomly split the data into 70% train and 30% test, maintaining the same proportion of outcome in both datasets. 31 Continuous features were scaled and centred to have zero mean and unit variance.
Our goal was to predict the binary cartilage worsening outcome from baseline GRF, accelerometer and clinical/demographic data. We used 'super learning' (V.1.4.2), 32 an ensemble machine learning approach that combines several candidate algorithms to enhance prediction accuracy above and beyond individual algorithms (figure 3). We selected candidate learners to include diverse learning strategies while being computationally feasible, as recommended by Phillips et al. 33 Using the training dataset, candidate learners were trained through fivefold crossvalidation. Corresponding predictions on out-of-fold samples were used to develop a meta learner that optimised the weight (ie, contribution) of each individual learner. We then applied this model to the held-out test set to assess its performance by area under the receiver operating characteristic curve (AUC) and mean squared error (MSE).
To test robustness and reproducibility of the model training and testing, we used repeated cross-validation, that is, repeated the process of randomly splitting the data into train and test, training the super learner and evaluating its performance on the held-out test set. Here, we report median (ie, 50th percentile), 2.5th and 97.5th percentile AUC and MSE across 100 iterations.

Identification of influential predictors
To assess the contribution of each feature to model prediction, for each of the 100 iterations, 35 additional models were trained on the training set (each excluding one of the features included in the full model). These models were applied to the test set and a variable importance measure (VIM) statistic was calculated for each feature for each iteration based on the size of the risk

Original research
difference between the full model and the model fit without the feature. Thus, 35 VIMs were produced per iteration. The top contributors to prediction for each iteration were identified as the 10 features with the highest VIMs. We defined 'influential predictors' as the 10 features that most frequently appeared as top contributors across the 100 iterations.

Marginal causal risk differences
To quantify the effect of influential predictors identified from the super learner model on cartilage worsening, we used parametric g-computation. 34 Continuous variables were quantised into tertiles. For each predictor, we calculated the marginal causal risk difference of each category of the predictor on cartilage worsening, compared with the corresponding reference category, using risk Communicator (V.1.0.0); 95% CIs were calculated using 1000 bootstrap samples. 35 Different risk factors may be associated with OA initiation vs progression; thus, we explored sensitivity analyses stratified by baseline cartilage damage (ie, lesion in ≥1 subregion). Only 6% of knees without baseline damage had cartilage worsening at follow-up, thus, we focused our sensitivity analysis on those with baseline damage (online supplemental file).

Marginal risk differences
Marginal risk differences from the g-computation analyses (figure 4) can be interpreted as the difference in risk of cartilage worsening per 100 individuals in the given category compared with the referent category. Presence of cartilage damage, higher KLG, greater lateral GRF impulse, greater pain during walking, greater time spent lying and lower vertical GRF unloading rate at baseline were associated with increased risk of cartilage worsening ( figure 4). Point estimates were similar in the sensitivity analysis (online supplemental file).

DISCUSSION
An ensemble machine learning approach incorporating baseline gait, physical activity and clinical/demographic features showed good performance predicting medial tibiofemoral cartilage worsening over 2 years in knees with KLG ≤2. While determining the relationships among predictors and outcomes in machine learning models is challenging, our analysis suggests that high lateral GRF impulse, high time spent lying, and low vertical GRF unloading should be investigated further as potential targets to reduce cartilage worsening.

Model performance
Our model performance is comparable to other machine learning models predicting OA progression from clinical/ demographic data. Du et al reported AUCs of 0.70-0.79 for Figure 2 Features extracted from ground reaction force (GRF) data.

Original research
predicting radiographic worsening (increase in KLG, medial or lateral joint space narrowing) over 2 years from baseline cartilage damage MRI features in those with KLG 0 to 4. 36  Prior longitudinal gait studies typically examined knee-specific loading (eg, knee adduction moment) rather than GRFs, often in samples of 15-300 knees. 5 Correspondingly, few addressed clinical/demographic confounders, incorporated physical activity or examined performance in held-out test sets. Further, many were conducted in samples with established OA (KLG ≥2), limiting their potential to identify at-risk individuals early in the disease process or identify early intervention targets. Our sample included 947 individuals with KLG ≤2, who predominantly had no or mild pain during walking, and thus were younger with lower BMI than previously reported samples (mean age 59.2 vs 62.0 years, BMI 27.8 vs 29.2 kg/m 2 ). 18

Predictors of OA progression
The super learner identified multiple influential gait and physical activity predictors of cartilage worsening. Of these, the g-computation analyses found baseline lateral GRF impulse, time spent lying and vertical GRF unloading rate were associated with cartilage worsening. The 7.2% estimate of risk difference for lateral GRF impulse suggests that for every 100 individuals in the highest tertile, there are 7.2 individuals who experience cartilage worsening who would not experience worsening in they were in the lowest tertile. Accordingly, approximately 14 (ie, 1/0.072) individuals with lateral GRF impulse at the highest tertile would be needed to observe an increase in the number of individuals with cartilage worsening by 1 person. In a crosssectional study in the same cohort, we previously reported that limbs with radiographic OA, with or without knee pain, have higher lateral GRFs in early stance compared with limbs without radiographic OA or pain. 38 The current results suggest lateral GRF may also play a role in progression. The 5.4% increased risk of cartilage worsening for the middle versus lowest tertile of time spent lying, along with prior research showing greater sedentary time is associated with future functional decline 39 and lower quality of life, 40 suggests reducing sedentary time should be investigated as a potential intervention target. The physiological reason for the 6.6% decreased risk of cartilage worsening in

Original research
the highest vs lowest tertile of vertical GRF unloading rate is not clear and warrants further exploration. The appearance of structure and symptom features as influential predictors is not surprising, given that these are established risk factors. Of note, despite only 10.1% of the sample having what is traditionally considered established radiographic OA (KLG=2), 39.2% had baseline cartilage damage and both appeared as influential predictors in the model. The g-computation analysis identified a 15.3% increased risk of cartilage worsening for every 100 individuals with baseline damage compared with no damage, and a 14.4% increased risk for KLG 2 vs 0. The lack of difference for KLG 1 vs 0 may highlight limitations of the KLG scoring system, which does not reflect tissue-level damage well, particularly in early disease. 41 42 Along with these structural measures, knees with mild pain had an increased risk of cartilage worsening (6.8%) compared with those with no pain. The large CI for moderate and higher pain could stem from the small proportion of knees (3% of sample) and/or heterogeneity in this category.

Clinical implications
The utility of this model for risk screening is debatable, as it requires GRF data. While faster to collect than joint moments, collecting GRFs requires specialised equipment (force platform). Future advances in wearable technologies may facilitate gait data capture during everyday life, including estimates of GRFs, 43 44 improving the potential of this type of model as a risk screening tool.
This model identified potential gait and physical activity intervention targets for further study. Interestingly, two influential predictors (baseline damage, KLG) appeared as top contributors in ≥98% of the iterations while others appeared less consistently (<50%). While we removed highly correlated features, this may result in part from predictors that capture similar constructs (eg, four features collectively describing an important construct could each appear 25% of the time). Similarly, our g-computation approach provides insight into causal pathways but does not account for concurrent changes in several risk factors. An important motivation for using machine learning was to address potential interactions among predictors. While it is challenging to identify these relationships from the model, the lack of consistency in top contributors could indicate a need for simultaneous intervention on several features rather than a single feature, opening interesting avenues for future study.

Strengths and limitations
Strengths include the large sample, investigation of gait and physical activity in early disease, use of machine learning to address inter-related predictors and use of g-computation to quantify their effects. These strengths expand existing literature by accounting for demographics and clinical characteristics, examining multiple gait and physical activity features, and testing the model on held-out data. Our sample was largely white with little to no pain during walking; these results may not generalise to diverse populations or those with severe symptoms. Lateral or patellofemoral worsening could have been present in both outcome groups, causing noise in the model. While we adjusted for a diverse set of confounders, as in any observational study, there may be residual unmeasured confounding. Knee loading (eg, knee adduction moment) may provide additional predictive power but kinematics are not available in MOST, limiting comparison to prior gait studies and insight into mechanisms by which features such as lateral GRF impulse affect structure. We are unaware of other large datasets with gait, physical activity and MR outcomes that could be used for external validation, however, we assessed reproducibility with repeated crossvalidation. Better characterisation of dynamic physical activity patterns may also improve model performance and identification of relevant intervention targets.

CONCLUSION
Using an ensemble machine learning approach, we predicted medial tibiofemoral cartilage worsening over 2 years in persons without and with early radiographic OA with good performance on held-out samples. Additionally, we identified baseline gait and physical activity measures associated with cartilage worsening that may be potential early intervention targets, including lateral GRF impulse, time spent lying and vertical GRF unloading rate.