Article Text

Download PDFPDF

143 Modelling the risk of soft tissue non-contact injuries from multiple training monitoring data sources in a short track speed skating elite team
  1. François Bieuzen1,2,3,
  2. Jérémy Briand1,
  3. Breault Pierre-Olivier1,
  4. Sylvain Gaudet1
  1. 1Institut National du Sport du Québec, Montréal, Canada
  2. 2ReFORM : Réseau francophone olympique de la recherche de médecine du sport, IOC Medical Research Network, Montréal, Canada
  3. 3Speed Skating Canada, National Training Center, Montréal, Canada


Background In short track speed skating, the Canadian national team monitors their athletes throughout the season to adjust training and maximize the amount of time an athlete is at 100%.

Objective This study attempts to create a statistical model to predict the injury risk of an athlete based on training monitoring data with a machine learning approach.

Design Retrospective observational study.

Setting 2018–2019 season.

Patients (or Participants) National women’s speed skating team.

Interventions (or Assessment of Risk Factors) We defined injuries as overuse,subjective, non-traumatic or soft tissues.Multiple variables were measured throughout the season and pooled in 5 categories: external and internal load, mental state, heart rate variability and neuromuscular function. We also engineered multiple features from the training load (moving means and SD) over different time scales, providing time evolution information. The machine learning algorithm try to spot patterns in the variables leading to overuse injury. We tested 5 different algorithms, 4 resampling and used 3 different approaches to deal with Non-available data.

Main Outcome Measurements We started with a broader perspective, hence the large number of algorithms, resampling technique and variables used. The different models on 3 performance metrics were evaluated: Sensitivity, Specificity and F-score.

Results The Naïves Bayes algorithm model with the over/under resampling technique and the fill approach had the best results out of the 75 different possibilities: F-score: 0.77 (harmonic mean of precision and recall), Sensitivity: 0.81 (true positive rate) and Specificity: 0.72 (true negative rate).

Conclusions The common imbalance between the injured and non-injured class in our data set and the amount of non-available data forced us to address these issues in a way that could have led to overfitting. However, this project provides great insight in regard to which variables should be considered when trying to predict injury risk. Also, the framework created throughout this project represent a great starting point for future work.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.