Background In short track speed skating, the Canadian national team monitors their athletes throughout the season to adjust training and maximize the amount of time an athlete is at 100%.
Objective This study attempts to create a statistical model to predict the injury risk of an athlete based on training monitoring data with a machine learning approach.
Design Retrospective observational study.
Setting 2018–2019 season.
Patients (or Participants) National women’s speed skating team.
Interventions (or Assessment of Risk Factors) We defined injuries as overuse,subjective, non-traumatic or soft tissues.Multiple variables were measured throughout the season and pooled in 5 categories: external and internal load, mental state, heart rate variability and neuromuscular function. We also engineered multiple features from the training load (moving means and SD) over different time scales, providing time evolution information. The machine learning algorithm try to spot patterns in the variables leading to overuse injury. We tested 5 different algorithms, 4 resampling and used 3 different approaches to deal with Non-available data.
Main Outcome Measurements We started with a broader perspective, hence the large number of algorithms, resampling technique and variables used. The different models on 3 performance metrics were evaluated: Sensitivity, Specificity and F-score.
Results The Naïves Bayes algorithm model with the over/under resampling technique and the fill approach had the best results out of the 75 different possibilities: F-score: 0.77 (harmonic mean of precision and recall), Sensitivity: 0.81 (true positive rate) and Specificity: 0.72 (true negative rate).
Conclusions The common imbalance between the injured and non-injured class in our data set and the amount of non-available data forced us to address these issues in a way that could have led to overfitting. However, this project provides great insight in regard to which variables should be considered when trying to predict injury risk. Also, the framework created throughout this project represent a great starting point for future work.