Elsevier

Journal of Biomechanics

Volume 47, Issue 10, 18 July 2014, Pages 2385-2390
Journal of Biomechanics

Clustering vertical ground reaction force curves produced during countermovement jumps

https://doi.org/10.1016/j.jbiomech.2014.04.032Get rights and content

Abstract

The aim of this study is to assess and compare the performance of commonly used hierarchical, partitional (k-means) and Gaussian model-based (Expectation–Maximization algorithm) clustering techniques to appropriately identify subgroup patterns within vertical ground reaction force data, using a continuous waveform analysis. In addition, we also compared the performance across each technique using normalized and non-normalization input scores. Both generated and real data (one hundred and twenty two vertical jumps) were analyzed. The performance of each cluster technique was measured by assessing the ability to explain variances in jump height using a stepwise regression analysis. Only k-means (normalized scores; 82%) and hierarchical clustering (normalized scores; 85%) were able to extend the ability to describe variances in jump height beyond that achieved using the group analysis (i.e. one cluster; 78%). Further, our findings strongly indicate the need to normalize the input data (similarity measure) when clustering. In contrast to the group analysis, the subgroup analysis was able to identify cluster specific phases of variance, which improved the ability to explain variances in jump height, due to the identification of cluster specific predictor variables. Our findings therefore highlight the benefit of performing a subgroup analysis and may explain, at least in part, the contrasting findings between previous studies that used a single group level of analysis.

Introduction

The countermovement jump (CMJ) is an important task in a number of sports (e.g. volleyball and basketball) and its biomechanics have been frequently studied (Klavora, 2000). However, identified features that relate to the performance outcome (jump height) are often inconsistent (Richter et al., 2014). For example, maximum vertical ground reaction force (vGRF) is reported in some studies as a performance related factor (Cormie et al., 2009, Dowling and Vamos, 1993, Sheppard et al., 2009), while it is not in others (Morrissey et al., 1998, Newton et al., 1999, Petushek et al., 2010). This makes it difficult to conclude which neuromuscular capacities or movement techniques should be altered to enhance jump height, the criterion performance outcome in CMJs. Recently, we have shown that some of the contrasting findings across studies may be due to the use of discrete point analysis (Richter et al., 2014). An alternative to discrete point analysis is a continuous waveform analysis (e.g. functional principal component analysis or analysis of characterizing phases) which has grown in popularity within many disciplines, including biomechanics, and has been reported to provide a better insight than discrete point analysis (Dona et al., 2009, Donoghue et al., 2008, Godwin et al., 2010, Harrison et al., 2007, Newell et al., 2006, Ramsay and Silverman, 2002, Richter et al., 2014, Ryan et al., 2006).

An additional reason for the inconsistencies across studies however, may be inter-subject variability. Vertical ground reaction curves generated during a CMJ can differ significantly in shape across subjects (e.g. non-modal, uni-modal or bi-modal), which could imply that different movement strategies are being employed, which may in turn have different performance related factors. This might explain some of the contrasting findings, since previous studies generally employed a single group analysis which can mask performance related factors if different shapes have different performance related factors (Bates, 1996, Stergiou, 2004, Stergiou and Scott, 2005). An alternative to a single group analysis is a subgroup analysis, which classifies similar patterns (curve shapes or movement strategies) into subgroups; so called clusters. An optimal clustering maximizes the ability to predict the dependent variable (e.g. jump height) of a data set (Han et al., 2006). To the authors׳ knowledge it appears that none of the previous CMJ studies have used a subgroup analysis, while subgroup analyses have been frequently performed in studies that examine human gait (Carriero et al., 2009, Kienast et al., 1999, OByrne et al., 1998, OMalley et al., 1997, Stout et al., 1995, Toro et al., 2007, von Tscharner et al., 2013).

A challenge in subgroup analysis is that a variety of clustering techniques exists that may result in different clusters (Hastie et al., 2001, Jain et al., 1999, Martinez et al., 2004, Witten and Frank, 2005). Additionally, while the number of studies that has used continuous waveform analysis in the area of biomechanics is increasing, little is known about the performance of different clustering techniques with continuous waveform analysis in biomechanics. The computed continuous features aim to represent the pattern of a curve over multiple phases of the movement cycle and can be highly collinear, which may influence results of some clustering techniques. Clustering approaches differ in their underlying assumptions and can be divided broadly into hierarchical, partitional and probabilistic clustering (Hastie et al., 2001, Martinez et al., 2004, Witten and Frank, 2005). The advantage of hierarchical clustering techniques is that they provide a highly interpretable description of the hierarchy within the data (i.e. dendrogram) and do not require the number of clusters to be chosen prior to the analysis. However, the assignment of samples into clusters requires the generation of inter-point distances of the input data (where different approaches can give very different results) and imposes a hierarchical structure within the examined data (Hastie et al., 2001, Martinez et al., 2004, Witten and Frank, 2005). In contrast, partitional clustering (e.g. k-means) can be performed without calculating inter-point distances, it is commonly used and is usually more suitable for large data sets (Martinez et al., 2004). However, k-means clustering also requires the user to choose the number of clusters (prior to analysis) and the construction of a dendrogram is computationally prohibitive (Hastie et al., 2001, Jain et al., 1999, Martinez et al., 2004, Witten and Frank, 2005). In addition, both hierarchical and partitional clustering techniques follow a deterministic process where the generated clusters and their members are somewhat dependent on the ordering of samples (Witten and Frank, 2005). Consequently, a third method, model-based clustering might be more appropriate for classifying biomechanical data. Model-based clustering techniques assign individuals into clusters based on their fit to a given mathematical model. An often used model is the Gaussian mixture model (Han et al., 2006), which assigns subjects into clusters based on the nature of the statistical inference, might be more appropriate for classifying movement strategies. Due to the variation in clustering approaches, and the relative novelty of classifying continuous biomechanical data/features, it is important to identify which clustering technique has the greatest ability to recognize and appropriately separate patterns within multiple curves.

The primary aim of this study is to assess and compare the performance of commonly used hierarchical, partitional and probabilistic clustering techniques to appropriately identify patterns within a sample of self-created curves (manipulated data set) and a sample of vGRF curves captured during countermovement jumps (real data set), using a continuous waveform analysis. A secondary aim is to examine if there are benefits to performing a subgroup analysis compared to the commonly used single group analysis when identifying vertical ground reaction vGRF factors related to jump height.

Section snippets

Data set

Manipulated data set: A random vGRF curve from the real data set (see below) was selected and used to create a sample of 100 manipulated curves, which contained three clusters to reflect some of the general shapes of the vGRF curve. Curves in the first cluster (n=41) were manipulated to have a unimodal shape, where the peak value occurred from 25 to 30% of the cycle. Curves in the second cluster (n=9) were manipulated to have a unimodal shape, where the peak value occurred from 70 to 75% of the

Manipulated data set

For the manipulated data set, the accuracy of the clustering techniques was (from high to low): hierarchical clustering utilizing normalized scores (98% accuracy), k-means clustering utilizing normalized scores (97% accuracy), Expectation–Maximization algorithm (95% accuracy), hierarchical clustering utilizing similarity scores (67% accuracy) and k-means clustering utilizing similarity scores (61% accuracy).

Key phases differ between the single group and subgroup analysis. Key phases for the

Clustering technique comparison

The examined clustering techniques differed in their performance in both the manipulated and real data sets. Using the manipulated data, the hierarchical clustering utilizing normalized scores, k-means clustering utilizing normalized scores, and Expectation–Maximization algorithm performed best. Using the real data set, only k-means (normalized scores) and hierarchical clustering (normalized scores) extended the ability to describe variances in jump height beyond that achieved using the group

Conclusion

K-means clustering utilizing normalized subject scores appears to be the most suitable technique for clustering vGRF curves, while hierarchical clustering also showed a high level of suitability. Further, when clustering curve shapes, it is extremely important to normalize subject scores, by transforming them into their correlation matrix, before using a clustering technique. The subgroup analysis should be used in preference to a single group analysis because it explained greater variances in

Conflict of interest statement

The authors declare that no conflict of interest is associated with the present study.

Acknowledgments

This work is supported by Science Foundation Ireland under Grant 07/CE/I114.

References (39)

  • B.T. Bates

    Single-subject methodologyan alternative approach

    Med. Sci. Sports Exerc.

    (1996)
  • A. Carriero et al.

    Determination of gait patterns in children with spastic diplegic cerebral palsy using principal components

    Gait Posture

    (2009)
  • Cohen, J., 1988. Statistical Power Analysis for the Behavioral Sciences....
  • P. Cormie et al.

    Power-time, force-time, and velocity-time curve analysis of the countermovement jumpimpact of training

    J. Strength Cond. Res.

    (2009)
  • C. De Boor
    (1978)
  • G. Dona et al.

    Application of functional principal component analysis in race walkingan emerging methodology

    Sports Biomech.

    (2009)
  • O.A. Donoghue et al.

    Functional data analysis of running kinematics in chronic Achilles tendon injury

    Med. Sci. Sport Exerc.

    (2008)
  • J.J. Dowling et al.

    Identification of kinetic and temporal factors related to vertical jump performance

    J. Appl. Biomech.

    (1993)
  • A. Godwin et al.

    Functional data analysis as a means of evaluating kinematic and kinetic waveforms

    Theor. Issues Ergon. Sci.

    (2010)
  • J. Han et al.

    Data MiningConcepts and Techniques

    (2006)
  • A.J. Harrison et al.

    Functional data analysis of joint coordination in the development of vertical jump performance

    Sports Biomech./Int. Soc. Biomech. Sports

    (2007)
  • T. Hastie et al.

    The Elements of Statistical Learning

    (2001)
  • A. Jain et al.

    Data clusteringa review

    ACM Comput. Surv. (CSUR)

    (1999)
  • Jaques, J., Preda, C., 2013. Functional data clustering: a survey. Technical Report, Research Centre Lille Nord Europe,...
  • G. Kienast et al.

    Determination of gait patterns in children with cerebral palsy using cluster analysis

    Gait Posture

    (1999)
  • P. Klavora

    Vertical-jump testsa critical review

    Strength Cond. J.

    (2000)
  • Marshall, B., 2010. Can a Pre-training Biomechanical Pathway Identify The Most Effective Exercise to Enhance A Given...
  • W. Martinez et al.

    Exploratory Data Analysis with MATLAB

    (2004)
  • M.C. Morrissey et al.

    Early phase differential effects of slow and fast barbell squat training

    Am. J. Sports Med.

    (1998)
  • Cited by (12)

    • Supervised learning techniques and their ability to classify a change of direction task strategy using kinematic and kinetic features

      2018, Journal of Biomechanics
      Citation Excerpt :

      A subgroup design has been advocated in a number of clinical studies investigating the aetiology of musculoskeletal pathologies (Carriero et al., 2009; Kienast et al., 1999; O’Byrne et al., 1998; O’Malley et al., 1997; Stout et al., 1995; Toro et al., 2007). The subgroup design has also been reported to increase prediction accuracy by up to 11% compared to a single group design when predicting jump height in counter-movement jumps using ground reaction forces (Richter et al., 2014), while reducing the amount of data (fewer features) used to predict jump height. As such a subgroup of analysis may prove particularly useful for clinicians, trainers and researchers as the identification of specific movement deficits may facilitate the development of individualized rehabilitation, injury prevention and performance programs.

    • Machine learning for lumbar and pelvis kinematics clustering

      2023, Computer Methods in Biomechanics and Biomedical Engineering
    • Different movement strategies in the countermovement jump amongst a large cohort of NBA players

      2020, International Journal of Environmental Research and Public Health
    View all citing articles on Scopus
    View full text