Objective Self-reports of physical activity (PA) have been the mainstay of measurement in most non-communicable disease (NCD) surveillance systems. To these, other measures are added to summate to a comprehensive PA surveillance system. Recently, some national NCD surveillance systems have started using accelerometers as a measure of PA. The purpose of this paper was specifically to appraise the suitability and role of accelerometers for population-level PA surveillance.
Methods A thorough literature search was conducted to examine aspects of the generalisability, reliability, validity, comprehensiveness and between-study comparability of accelerometer estimates, and to gauge the simplicity, cost-effectiveness, adaptability and sustainability of their use in NCD surveillance.
Conclusions Accelerometer data collected in PA surveillance systems may not provide estimates that are generalisable to the target population. Accelerometer-based estimates have adequate reliability for PA surveillance, but there are still several issues associated with their validity. Accelerometer-based prevalence estimates are largely dependent on the investigators’ choice of intensity cut-off points. Maintaining standardised accelerometer data collections in long-term PA surveillance systems is difficult, which may cause discontinuity in time-trend data. The use of accelerometers does not necessarily produce useful between-study and international comparisons due to lack of standardisation of data collection and processing methods. To conclude, it appears that accelerometers still have limitations regarding generalisability, validity, comprehensiveness, simplicity, affordability, adaptability, between-study comparability and sustainability. Therefore, given the current evidence, it seems that the widespread adoption of accelerometers specifically for large-scale PA surveillance systems may be premature.
- Physical activity
Statistics from Altmetric.com
Non-communicable disease (NCD) surveillance includes the assessment and monitoring of chronic disease risk through regular population-representative surveys of key behavioural and physiological antecedent factors and their determinants. Given that reducing physical inactivity is as important to global health as tobacco or obesity control,1 an international focus on NCD surveillance is provided through the WHO Global Monitoring Framework. This framework now incorporates a target to decrease physical inactivity by 10% in all participating countries by the year 2025.2
Increased efforts are being made to track the prevalence of physical activity (PA) internationally. For example, 87 countries have implemented the WHO STEPwise Approach to Surveillance measures and protocols using the Global Physical Activity Questionnaire.3 Furthermore, a recent review has shown that 3 international and more than 30 national surveillance systems collect PA data in European Union member states, mostly utilising a variety of short questionnaires.4 In the UK, PA is assessed using at least 10 national and country-level surveillance systems.5–10
While information on some other health indicators, such as body mass index and tobacco use, is ascertained using reasonably standardised and comparable measures, the situation with PA is variable.4 ,11 Although they have recognised limitations in small-scale studies, self-report measures have been the method of choice in most population-based studies12 and, in many countries, time trends for PA and sedentary behaviour are reliant on decades of questionnaire-based assessment.4 Recently, technological improvements have led to enthusiasm for accelerometer use in several national NCD surveillance systems.4 ,5 ,13 ,14 However, there is a lack of studies specifically examining how well accelerometers conform to the principles of PA surveillance. Therefore, the purpose of this paper was to appraise the suitability of accelerometers for use in population-level PA surveillance. We did not aim to review issues associated with self-report measures, as it has been done elsewhere,12 ,15–18 but we focus here on the specific challenges of using accelerometers in PA surveillance systems. An appraisal of new technologies and methods in population surveillance systems is particularly important, given the difficulties in developing comprehensive and sustained monitoring systems in many countries. A gap in the accelerometer field has been a high level of focus on technical issues, without reflection on their usability as part of an integrated PA surveillance system. Specifically, we consider the generalisability, reliability, validity, comprehensiveness and between-study comparability of accelerometer estimates, and also their simplicity, cost-effectiveness, adaptability and potential sustainability in population PA surveillance.
Suitability of accelerometry for PA surveillance
Population PA measures need to be acceptable to most participants to ensure high adherence and provide generalisable data. In order to capture the between-day variability in PA and sedentary behaviour, participants are usually asked to wear accelerometers during waking hours for seven consecutive days. There is still an ongoing debate on how many days of monitoring are needed for reliable estimates of habitual PA,19–24 but valid data are usually defined as 10 or more hours of wear time on at least 4 days.14 ,25–28 In large-scale population-based studies, between 6% and 32% participants (median 17.6%) do not meet this criterion, and are excluded from subsequent analyses.14 ,26–29 National Health and Nutrition Examination Survey (NHANES) 2003–2006 and Health Survey for England 2008 have shown that participants with valid and invalid accelerometer data significantly differ in a range of sociodemographic, lifestyle and health characteristics.29–31 This means that findings based on accelerometer data may not be generalisable to the target population, but only to those who adhere to the rigorous measurement requirements. Reweighting of the sample, such as was performed by Troiano et al,28 may help to reduce selection bias. Such reweighting may not necessarily eliminate the selection bias, because it is restricted only to standard auxiliary variables (ie, those whose population distributions are available, such as gender, age and race/ethnicity).
Nonetheless, we recognise that missing data remains a substantial problem for self-report measures as well. For comparison, in a large-scale study among adults from 20 countries using the International Physical Activity Questionnaire—short form, between 0% and 7.4% (median 1.6%) of participants had missing PA data.32 Furthermore, in the National Health Interview Survey 2005, NHANES 2005–2006, and Behavioral Risk Factor Surveillance System 2005 self-reported PA data were missing in around 2.9%, 0.1% and 7.1% of participants, respectively.33 Hence, the percentages of participants with incomplete data may be lower in questionnaire-based than in accelerometer-based studies on national-representative samples.
Measures used for PA surveillance need to be sufficiently reliable to provide credible estimates at the population level, but not necessarily for individuals. Accelerometer reliability is subject to technical (eg, inconsistencies in capturing signals and processing data) and human-related (eg, accidental altering of the position of device) sources of random error. Accelerometers demonstrated high intrainstrument and interinstrument reliability in mechanical laboratory settings, with coefficients of variation (CV) mostly below 5% and 10%, respectively,34–43 and when assessing structured activities in controlled laboratory conditions, with most intraclass correlation coefficients (ICCs) ranging from 0.60 to 0.90.36 ,42 ,44–48 Interinstrument reliability of accelerometers in free-living conditions ranged from mediocre for RT3 accelerometers (CV 9.8–39.8%)49 to relatively high for Actigraph 7164 and GT1M models (CV 0.9–15.5%).49 ,50 A test–retest reliability study has shown high agreement between accelerometer data collected during two 7-day periods (1–4 weeks apart) in free-living conditions (ICCs 0.77–0.90).51 This evidence suggests that the reliability of accelerometer-based estimates of PA and sedentary behaviour may be sufficient for public health surveillance. In comparison, a recent comprehensive review of 89 PA and sedentary behaviour questionnaires showed that most of their test–retest reliability ICCs were slightly lower than for accelerometers, ranging from 0.59 to 0.84 (median 0.73).52
PA surveillance measures need to provide valid estimates at the population level. Two recent reviews have shown moderate criterion validity of accelerometers.53 ,54 The pooled correlations with doubly labelled water were 0.39 (activity energy expenditure, AEE) and 0.52 (total energy expenditure, TEE) for uniaxial devices, and 0.59 (AEE) and 0.61 (TEE) for triaxial devices.54 On average, uniaxial accelerometers underestimated AEE by 24% and TEE by 12%, while triaxial devices underestimated AEE by 21% and TEE by 7%.54 Validation studies using activPAL inclinometers as the criterion measure have shown that hip-mounted accelerometers provide reasonably accurate group estimates, but not individual estimates of sedentary behaviour.55–60 By comparison, in previous validation studies identified by six systematic reviews,52 ,61–65 most correlations between PA questionnaires and doubly labelled water estimates of AEE and TEE ranged from 0.21 to 0.45 (median 0.35)66–72 and 0.23 to 0.58 (median 0.37),66 ,70 ,71 ,73–80 respectively. AEE was underestimated by most questionnaires, while no clear pattern could be observed for TEE. Average absolute (non-negative) values of the differences between questionnaires and doubly labelled water estimates of AEE and TEE were 32%66–71 ,74 ,77 ,81 and 23%,66 ,70 ,74–80 ,82–86 respectively.
The validity of accelerometer-based estimates of PA and sedentary behaviour is potentially compromised by four categories of concerns: (1) technical shortcomings, (2) significant amounts of non-wearing time, (3) possible participant's interference with the results and (4) use of intensity cut-off points.
The wrist-worn or hip-worn accelerometers do not capture common activities such as cycling, resistance and static exercise, and carrying loads. Additionally, non-waterproof accelerometers cannot be used to assess aquatic activities. For example, accelerometer activity counts during cycling are underestimated by approximately 73%.87 This may limit assessment of ‘active travel’ in countries where cycling is prevalent, such as China, Denmark and the Netherlands.88 ,89 Furthermore, different accelerometer models underestimate energy expenditure of a range of other daily living and leisure-time activities, with the highest bias determined for ascending stairs and playing tennis.90
Across different studies and age groups, between 27% and 74% of participants who satisfied the inclusion criteria (eg, 4 days×10 h), did not have valid accelerometer data for all days of measurement14 ,25 ,26 ,28 ,91 ,92 and their average wearing time per valid day was between 13 and 15 h.14 ,25–27 ,28 ,93 This shows that some awake time is not monitored. Unfortunately, suggested methods of missing data imputation94 ,95 are not without shortcomings and were often not used in large-scale accelerometer-based studies.14 ,25–28 ,93 ,96 Nonetheless, there are encouraging indications that the compliance is higher for wrist-worn when compared with hip-mounted accelerometers.97
Accelerometers are considered objective devices as their assessment of acceleration is independent of human-related factors. However, participants can influence accelerometer data collection in free-living conditions by intentional non-wearing, altering their habitual behaviour, and changing the position or shaking the device. Children as well as adults occasionally report aesthetic issues and physical discomfort as reasons for the occasional non-wearing of accelerometers.98 ,99 Besides, more than 40% of adolescents found it disturbing to wear the accelerometer during physical activities and were worried about losing or breaking the device.100 Furthermore, awareness that PA is being monitored might influence habitual behaviour. Although the Hawthorne effect has been recognised as a potential limitation of accelerometry,15 ,101–103 empirical evidence on its magnitude is equivocal and remains scarce.104 ,105 Furthermore, Kowalski et al,106 Kowalski et al107 and Pate et al108 detected tampering with accelerometers in 8%, 33% and 1%, respectively, of schoolchildren/adolescents. The latter study used CSA 7164 (no display), while the former two used older Caltrac models (with display, but taped in a holster to prevent tampering). Tampering with devices is less likely among adults, but evidence in this area is still limited.
Furthermore, one of the main arguments for utilisation of objective measures in PA surveillance is to avoid the social desirability response bias associated with self-reports.109–112 Unlike some objective measures of PA, it seems that accelerometry may occasionally be related to measures of social desirability.69 ,113 Adams et al69 reported that a social desirability score was not related to PA estimates from doubly labelled water (r=−0.02, p=0.860), while it was significantly related to accelerometer counts (r=−0.29, p=0.009). Interestingly, in the same study, none of the three tested self-report measures exhibited significant bivariate correlation with the social desirability scale.69 Another study has shown a negative relationship between social desirability in adolescence and accelerometer-assessed sedentary time in adulthood.113 Such social desirability bias is not a widespread problem with accelerometers, but from an ‘independent scientific perspective’, the questioning and challenging of all dimensions of and potential biases in measurement is worthy of consideration.
Intensity cut-off points
The use of ‘cut-off points’ has been the most common method for defining the intensity of PA in large-scale accelerometer-based studies.14 ,25–28 ,93 ,96 ,114 Experts in the field recommend not using cut-off points,115 ,116 and suggest that other solutions such as pattern recognition may provide better estimates of moderate to vigorous physical activity (MVPA). In most calibration studies, cut-off points were developed by analysing the relationship between accelerometer counts and objectively measured energy expenditure during a set of activities using regression analyses or receiver operating characteristic curves.117 ,118 Possible issues that can affect the validity of PA and sedentary behaviour estimates when such cut-off points are applied in public health surveillance systems are: (1) non-representativeness of the calibration study sample, (2) non-representativeness of the set of activities used in the calibration study, (3) difference between the accelerometer models used for surveillance and in the calibration study, (4) non-representativeness of the sample of accelerometers used in the calibration study and (5) variability of true individual intensity thresholds around the universal group-based cut-off points (ie, SE of regression or false-positive and false-negative rates in the calibration study). Furthermore, for some accelerometer models, different sets of cut-off points have been developed and recommended.119–122 Across 15 calibration studies of Actigraph accelerometers, the lower threshold for MVPA for adults ranged from 191 to 3285 counts per minute (cpm).121 ,122 Application of different cut-off points can result in significantly different prevalence estimates and intensity-specific PA levels.111 ,122–127 The accelerometry-defined prevalence of ‘insufficient PA’ among US adults, US youth and European youth ranged from 2.4% to 95.3%,121 40.7% to 93.8%121 and 0% to 97%,114 respectively, depending on which cut-off points were applied. It has been shown that the choice of cut-off points can also influence the association between estimated PA and various health outcomes.121 ,128 Despite this, the use of cut-off points remains the method of choice for the estimation of intensity-specific PA levels. By contrast, it seems that large-scale studies were more consistent in the choice of cut-off points for sedentary behaviour, with <100 cpm being the most often used.14 ,25–27 ,93 ,96 ,129
To conclude regarding validity, it seems that accelerometers provide somewhat more valid estimates of PA than questionnaires. Despite that, significant underestimation of PA levels, problems with non-wearing time, potential susceptibility to some subjective factors, and the high dependence of findings on the selection of cut-off points are current issues that need to be resolved.
Accelerometers provide data on the time spent in sedentary, low, moderate and vigorous intensity activities, and valid step count estimates.130–133 To allow for the calculation of prevalence of insufficient PA according to the current WHO PA recommendations,134 accelerometers can estimate time spent in MVPA,14 ,25–28 ,93 ,96 but cannot provide data on muscle-strengthening exercises (participation in MVPA and muscle-strengthening exercises are separate components of the WHO PA recommendations).134 In order to assess the frequency of muscle-strengthening exercises, PA surveillance systems still have to rely on self-reports.
Nonetheless, in considering comprehensiveness, there are some advantages of accelerometers; they can provide data on the distribution of PA across the day and week, which is not the case with most self-reports. Additionally, some accelerometer models can assess sleep duration, which may be useful for assessing the total daily activity spectrum.135
Furthermore, the concept of a comprehensive surveillance system requires domain-specific PA levels (separate PA levels for work, transport, domestic and leisure-time domain), type-specific data for common activities (eg, cycling), and measures of the antecedents and determinants of PA, including intrapersonal, interpersonal, societal, environmental and policy factors. Indicators are needed across the socioecological framework, reflecting a ‘system-based approach’ to PA surveillance.136 Despite advances in movement pattern recognition, it seems that accelerometers are not yet capable of estimating domain and most type-specific PA levels, and cannot describe the social and environmental context of activities.115 Thus, for more comprehensive assessment of PA for population surveillance systems, self-report measures are still required.
The use of accelerometers adds to the administrative burden for researchers and participants. Researchers need to distribute/collect accelerometers, instruct participants, recharge/replace batteries, replace broken devices, initialise/calibrate devices, and store and process the data. Furthermore, respondent burden is usually wearing the accelerometer for the whole week. This may reduce participation rates by around 10–20%.29 ,96 It is encouraging that there are no significant differences in sociodemographic, lifestyle and health characteristics between participants who provide valid accelerometer data and those who decline to participate.29 The increasing issues of non-response in self-report surveys are also problematic, but are beyond the scope of this paper; our concern is that this problem is not solved by accelerometry, but exacerbated as subsamples of questionnaire responders are asked to wear accelerometers, and low participation among these subsamples may further compromise statistical power.
Accelerometer data are expensive to collect. Although accelerometer prices have decreased since the 1930s,137 the cost of a standard accelerometer suitable for PA research purposes is still high (around $200). Government expenditure on accelerometry may result in reduced funding to other components of a comprehensive PA surveillance system, such as measuring policies, physical environments, facilities and infrastructure. This might generate slightly more reliable and valid data on activity behaviour, but lose policy or system-wide monitoring of the context for, and antecedents to, PA behaviours. Thus, accelerometry should only be afforded where governments can substantially increase funding for comprehensive PA surveillance.
Sustainability and continuity
Measures used in PA surveillance need to be repeated in a standardised format over the many years required to monitor time trends in population behaviour. Studies have indicated that a significant number of accelerometers may be lost during a single round of data collection in adults (from 2% to 8% of devices)26 ,137 ,138 and adolescents (up to 21% of devices).100 In the Women's Health Study the total loss was estimated to be ∼420 devices.137 Results of studies about the comparability of PA estimates across different generations of accelerometers from the same manufacturer are equivocal.139–145 This questions the feasibility and sustainably of standardised monitoring using the same model of accelerometer over the years, as the lost accelerometers would need to be replaced, and manufacturers might not retain older models. Furthermore, different brands of accelerometers do not always provide directly comparable data.81 ,146 ,147 There is no guarantee that the initially selected accelerometer manufacturer will operate over the course of a long-term surveillance system, and product discontinuation could influence the estimation of time trends in PA.
Furthermore, it seems challenging to retain initial accelerometer models and protocols, and not to be tempted to substitute them with their improved alternatives in subsequent surveys. For example, in the NHANES 2003–2006, hip-mounted Actigraph 7164 accelerometers were used,28 ,96 while in 2011–2014, in order to improve participants’ compliance and allow for assessment of sleep time,97 Actigraph GT3X+ accelerometers were worn on the wrist.148 Although a study has indicated that these two accelerometer models may produce comparable results when both are worn on the hip,139 the alteration in accelerometer placement might influence the assessment of changes in the ‘sufficiently active’ fraction of the population. In spite of advances in technical capacity, identical accelerometry protocols should be sustained throughout PA surveillance systems.
Measures used in PA surveillance need to be adaptable, to allow slight adjustments that are sometimes needed to keep up with technological, scientific and social changes,149 but without compromising data comparability over time. Technological improvements to accelerometers, such as inclinometers and ambient light sensors, may become useful for future public health surveillance. Any instrument change or upgrade requires the development and extensive testing of new accelerometer models before implementation in surveillance systems. It seems that new accelerometer models are marketed and promoted before the older models are thoroughly investigated.
Between-study and international comparability
The potential and importance of between-study and across-country comparability has been widely recognised in PA and sedentary behaviour research.4 ,56 ,103 ,150–152 However, outcomes of accelerometer-based studies may be dependent on the choice of accelerometer brand81 ,146 ,147 and generation,139 ,140 ,142 ,144 ,145 wearing position,153–157 epoch length,111 ,125 ,158–164 definition of non-wearing time,165–168 definition of a valid day,24 required number of days of monitoring,20–24 cut-off points,111 ,121–128 use and definition of activity bouts,26–28 ,96 ,115 ,153 ,165 data imputation method,94 ,95 and other variable factors, such as accelerometer firmware version,169 band-pass filter version,137 ,139 dealing with spurious data166 and reintegration of smaller epochs into larger epochs.170 Different protocols for use, constant technological development, emerging methodological questions and a lack of academic consensus hinders uniform use of accelerometers in population studies. To illustrate the need for standardisation, we reviewed the major components of accelerometer protocols used in a sample of six recent population-based studies among adults.14 ,26–28 ,93 ,96 ,129 Most of the studies: used Actigraph 7164 monitors; asked participants to wear the monitors on the right hip; recorded data in 60 s epochs; defined non-wearing time as 60 consecutive minutes of 0 counts with the allowance for up to 2 min with 1–100 counts; required data for ≥10 h/day on at least 4 days; and used 10 min bouts of MVPA with allowance for 1–2 min with counts below the intensity threshold. All studies used the same cut-off point for sedentary time (<100 cpm), but only three studies used the same MVPA threshold (≥2020 cpm). Most importantly, all the reviewed elements of the protocol were identical in only two studies. Under the assumption that countries used the same model of accelerometer, better comparisons may be made by subsequently reanalysing raw data from different countries using the same procedures, such as in Hagströmer et al.171 Such examples of good practice are to be encouraged in cross-study comparisons.
Several questionnaires have been developed and standardised internationally for cross-study comparisons. Unfortunately, such standardised and validated questionnaires are often modified or translated without providing data on measurement properties and comparability of the altered version. By comparison, given their invariance to language and cultural differences, if standardised procedures existed, accelerometers would have greater potential for valid international comparisons.
The present paper identified some of the methodological issues associated with accelerometry, specifically when used in PA surveillance. These are: (1) accelerometer based estimates may not be generalisable to the target population; (2) accelerometers may underestimate total PA levels; (3) accelerometers do not provide valid data on some common activities; (4) accelerometer-based estimates of PA and sedentary behaviour can potentially be influenced by participants; (5) accelerometer-based prevalence rates are largely dependent on the investigators’ choice of intensity cut-off points; (6) despite recommendations to avoid their use, cut-off points remain the most often used method to estimate intensity-specific PA levels in large-scale studies; (7) accelerometers do not provide data on the frequency of muscle-strengthening exercises, one of the WHO recommendations for PA; (8) use of accelerometers is expensive and adds to administrative burden for researchers and participants; (9) long-term sustainable use of standardised accelerometry protocols in population data collection is challenging; and (10) use of accelerometers does not necessarily produce meaningful between-study and international comparisons, because data collection and processing methods are not standardised.
The present paper shows that accelerometers provide slightly more reliable and valid estimates when compared with self-report measures, such as questionnaires. The use of accelerometers in small scale controlled intervention studies and in studies of associations with biomarkers is warranted by these measurement properties, relative to self-report tools. However, from the PA surveillance system perspective, it seems that accelerometers still have limitations regarding generalisability, validity, comprehensiveness, simplicity, affordability, adaptability and sustainability. Correlations between accelerometer and questionnaire-based PA and sedentary behaviour data are low to moderate (most often ranging between 0.10–0.45),52 and their mean estimates are often not comparable.65 Therefore, to assure continuity in time-trend data, accelerometers should not substitute, but only supplement self-report information systems, the latter remaining as the mainstay of established PA surveillance systems.
In order to improve the use of accelerometry in PA surveillance, future studies should: (1) further investigate the generalisability of estimates; (2) develop methods to increase response rates and participants’ compliance; (3) develop user-friendly data processing methods to replace intensity cut-off points and circumvent the incorrect capture of some activities; (4) investigate the potential subject-related sources of bias in accelerometer-based measures; (5) not omit the assessment of domain and type-specific PA levels, and also of muscle-strengthening exercises in PA surveillance systems; (6) propose and standardise minimum testing requirements before a new accelerometer model is implemented in population-level research; and (7) standardise accelerometer-based measurement protocols.
Accelerometer manufacturers might further contribute to the development of accelerometers for public health surveillance. This might include testing less expensive technologies and developing even less obtrusive devices. The inclusion of Global Positioning System may allow for better ascertainment of movement velocity and type, as well as possibly assessing the environmental (and perhaps even social) contexts of PA and sedentary behaviours.
To conclude, although self-report measures traditionally have a defined place in comprehensive PA surveillance systems, we acknowledge their limitations. Further, the role of accelerometers in a myriad of clinical interventions and biomarker-based studies is widely accepted. Nonetheless, given the difficulties in maintaining stable PA surveillance systems, it may be unwise to add new complexity and non-comparability. Without appropriate standardisation protocols, the widespread implementation of accelerometers in PA surveillance systems may be premature.
What are the new findings?
Comprehensive physical activity (PA) surveillance systems require measures that monitor PA and sedentary behaviours, and also include measures of the antecedents, domains and policy context of PA in populations.
Accelerometer-based estimates have adequate reliability for PA surveillance purposes, but there are still some issues associated with their validity.
Without adequate sample reweighting, accelerometer data collected in PA surveillance systems may not provide generalisable estimates of PA prevalence for the underlying target population.
Accelerometer-based prevalence estimates remain largely dependent on the investigators’ choice of intensity cut-off points.
Although demonstrating utility in data collection in small scale intervention and biomedical studies, accelerometer data on their own do not provide sufficient information for a PA surveillance system.
Funding NHMRC program grant 569940.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.