Article Text

Download PDFPDF

Methods matter: population attributable fraction (PAF) in sport and exercise medicine
  1. Ahmad Khosravi1,2,
  2. Rasmus Oestergaard Nielsen3,4,
  3. Mohammad Ali Mansournia5,6
  1. 1 Department of Epidemiology, Shahroud University of Medical Sciences, Shahroud, Iran
  2. 2 Ophthalmic Epidemiology Research Center, Shahroud University of Medical Sciences, Shahroud, Iran
  3. 3 Department of Public Health, Section for Sports Science, Aarhus University, Aarhus, Denmark
  4. 4 Research Unit for General Practice, Aarhus, Denmark
  5. 5 Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
  6. 6 Sports Medicine Research Center, Neuroscience Institute, Tehran University of Medical Sciences, Tehran, Iran
  1. Correspondence to Professor Mohammad Ali Mansournia, Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran; mansournia_ma{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Physical inactivity kills as many people as smoking—a newspaper heading reads1 following a Lancet publication2 suggesting that more than 5 million deaths would be avoided annually if all inactive people exercised. The statistic that provided this data point (5 million deaths) is called the population attributable fraction (PAF).3 PAF is an estimate of the health impact of an exposure (eg, physical inactivity, high-carbohydrate diet) on a health outcome (eg, death, heart attack, onset of type 2 diabetes mellitus).4 Although PAF is widely used in public health research, it is not yet well known in the sport and exercise medicine and physiotherapy settings. A search of the Pubmed database in December 2019 using the strings ‘PAF AND sports medicine’ or ‘PAF AND sports injury’ identified 12 hits and one hit, respectively. We believe that PAF will be of interest to those members of the BJSM community who work in physical activity/prevention research.

In the sports injury context, PAF may be used to identify the proportion of a certain injury attributed to a certain exposure. For instance, scientists may be interested to know the attributable fraction of bicycle-related head injuries in the population due to non-helmet use.5 Other examples include the sport medicine researcher interested in estimating the PAF of knee injuries as risk factor for knee osteoarthritis or knee replacement in footballers.6 In this case, they can calculate the proportion of knee osteoarthritis or knee replacement cases as long consequence of knee injury that will be prevented in footballers if they receive ‘the 11+’ prevention programme.7 8 In another example in this context, the coaches interested knowing how many injuries will be prevented in volleyball players with additional training sessions.9 Finally, policy makers may be interested in estimating the PAF using artificial turf as a risk factor for soccer injuries.10 If artificial turf was replaced with natural grass, researchers may estimate how many cases of soccer-related injuries could be prevented.

Sport and exercise medicine clinicians may be interested in the PAF of physical inactivity for type 2 diabetes mellitus,11–13 cardiovascular disease14 or breast cancer.15 If physical inactivity was eliminated, how many cases of diabetes (heart disease or breast cancers) would be prevented over a certain time span? Another question in this field is how many excess cases of diabetes are attributable to sugar sweetened beverages consumption. The result of a meta-analysis revealed that 1.8 million diabetes cases over 10 years in USA would be attributable to consumption of sugar sweetened beverages (PAF=8.7%).16 Researchers may be interested to compare the relative importance of physical inactivity and high-carbohydrate diet on diabetes events using PAF. In a study on diabetes patients, the proportional reduction in mortality within 5.5 years was estimated after lifestyle and dietary modifications. If everyone were physically active every day, PAF is 12.3% for death meanwhile consuming at least three servings of vegetables per day which would reduce the estimated fractions of death by 12.6%. Changing high-carbohydrate diet to less than seven servings/week is responsible for a PAF=2.4%.17 To show how widely this statistic has been applied, other examples include the PAF of low energy availability and poor mental health for sports incapacity (time loss) due to illness in athletes18 19 and the PAF of peer victimisation during adolescence on depression in early adulthood.20 Researchers also may seek to estimate the PAF of fractures and fracture-related mortality that can be attributed to low bone mineral density for implication of prevention strategies in the population.21

Even though the concept of PAF, as well as potential errors when estimating PAF,22–25 is well known in the field of medicine, it has received little attention in the sport and exercise medicine literature. In the present educational review, we therefore use examples from sports medicine and sports injury context to (i) describe the terminology and definitions related to PAF (Part 1), (ii) present different ways to calculate PAF (Part 2)—as an Appendix and (iii) critically discuss assumptions underpinning PAF (Part 3). We also include a case study for PAF (Part 4).

Part 1: terminology and interpretation related to PAF

The concept of PAF goes by many names.26 27 In the literature, terms such as ‘population attributable risk’, ‘percent population attributable risk’,28 ‘population etiologic fraction’, ‘etiologic fraction’,29 30 ‘population attributable risk percent’ and ‘population attributable fraction’26 are used. Besides the concept of PAF, the attributable fraction among the exposed is a similar metric that is interpreted as the proportion of exposed cases that would have been prevented if a specific risk factor had been abolished.29 The attributable fraction among exposed is also known as ‘excess fraction’26 and ‘percent attributable risk’.28 In this article, we retain the use of PAF, although other expressions could have been used.

How should clinician readers interpret PAF?

A research letter published in the Br J Sports Med reported PAF of physical inactivity for deaths from all causes to be 9.4%. We can interpret this percentage as the per cent of all deaths that would not have occurred if physical inactivity was eliminated in the population.31 For other outcomes (eg, sports injury, knee osteoarthritis, diabetes, cardiovascular disease), PAF can be interpreted in a similar way. Essentially, PAF refers to the proportion of all cases with a particular outcome in a population that could be prevented by eliminating a specific exposure (eg, physical inactivity, non-helmet use, knee injury, artificial turf), which is a cause for the outcome.27 It can be also interpreted as the proportion of outcome risk reduction over specified time interval following the eliminating of exposure(s), while the distribution of other risk factors remains stable.32 33 In a sports injury context, the concept of PAF is ironically and unintentionally used by sedentary persons advocating for the danger of practising sport, as they could argue: Just don’t practice sport and you avoid sustaining a sports-related injury. This is exactly true as participating in sport is (to a greater or lesser degree) necessary to produce injury.34 35 By eliminating participation in sport, we eliminate sports injury. In this case, PAF=100%. Still, we do want people to practise sport (in a safe manner) to harvest the health benefits it offers.

For preventive exposures, the PAF can be defined as the fraction of all cases that would be prevented if the whole population were exposed which is often known as prevented fraction (PF).27 As an example of PF, the authors of a published paper in Br J Sports Med, calculated the PF of habitual leisure-time physical activity (LTPA) for incident type 2 diabetes in a prospective cohort of Chinese adults with impaired fasting glucose (IFG).11 They reported 9.1% (95% CI: 2.2% to 15.5%) reduction in the incidence rate of diabetes if inactive IFG population would have engaged in low volume of LTPA.

Part 2: how to calculate PAF

Depending on the type of data available and study design, there are different methods to estimate PAF (Levin’s formula, Miettinen’s formula and model-based standardisation method). Please see the Appendix for a guide to the various methods to calculate PAF. All formulas for PAF will generate greater fractions with greater effect size (risk ratio (RR), OR, or rate ratio) and greater prevalence of exposure in the population.32 We alert the equation-enthusiastic reader that Levin’s formula is the only formula which is included in many teaching epidemiological textbooks28 36—although it is biased in practice

Part 3: critical discussion of the assumptions underpinning PAF (advanced content)

The reader who uses, or is considering using, PAF in her or his research will appreciate that every statistical method is only valid in certain contexts and with specific patterns of data. As with any statistical method—garbage in equals garbage out. Here (table 1), we discuss four assumptions that must be met for the PAF calculation to be valid:

Table 1

The four main assumptions underpinning population attributable fraction (PAF), which all need to be met to make appropriate and valid interpretations of the PAF estimate

Assumptions—realistic intervention, causality, other risk factors unaffected and limited bias

Assumption 1 (realistic intervention)

The intervention is a well defined and realistic and it can eliminate the exposure (eg, physical inactivity) from the population. However, in some situations the effect of partial exposure reduction is needed or there is not a perfect intervention to entirely remove the exposure. This point is particularly important as a recent educational review in Br J Sports Med identified lack of compliance to the intervention in trials examining training load and sports injury to be a major problem.37 Consequently, researchers need to consider if the intervention is realistic and if the athletes have been compliant throughout follow-up prior to using PAF.

Assumption 2 (causality)

We consider whether the estimated RR has a causal interpretation. To reach causal interpretations, researchers need to take into account all known and unknown confounding variables by using restriction in the design-phase or by adjustment in the analysis-phase of a study. Taking into account confounders need to be done with caution, as some variables serve as mediators. Therefore, researchers should visualise the causal assumptions in a directed acyclic graph (DAG) or in a framework.38 The question remains—Have sports medicine/injury researchers adopted this train-of-thought regarding the alignment between DAGs/frameworks and causal interpretations, which has been well known in epidemiology for decades?39 40 If not, the first step for sports medicine and sports injury researchers is to use DAGs or frameworks prior to using PAF, as the PAF becomes meaningless if causal assumptions are violated.

Assumption 3 (other risk factors unaffected)

Eliminating the risk factor in question has no effect on the distribution of other risk factors.27 33 However, this assumption may be violated in practice. For example, some runners may consider changing from a rearfoot strike to a forefoot strike to redistribute the load applied to the body.41 However, the runners making such change may very likely change their running routines to avoid an excessive transition to the new running style. Consequently, the change in foot strike pattern is likely to change runners’ training load, which ultimately violates the assumption underpinning PAF as training load is assumed to remain constant. Another example from the sports medicine context includes physical activity, diet and diabetes. By increasing LTPA athletes may change diet, making interpretation of the PAF of LTPA for diabetes difficult.

Assumption 4 (limited bias)

There are no other sources of bias besides confounding (see Assumption 2 (causality)). These include selection bias and information bias because of measurement error.27 These are discussed in relation to the different study designs below.

Bias and observational studies

In the cohort study design, the concept and calculation of PAF are realistic27 which is not surprising as all statistical methods for estimating PAF mentioned above were developed in the context of cohort studies. Even in cohort studies there is still a possibility of bias due to loss to follow-up, unmeasured confounding and measurement error.42

Case–control studies are generally easy to perform and suitable for rare diseases and thus are frequently used in the place of cohort studies. In a case–control study, one can use either Miettinen’s formula or model-based standardisation, but an adjustment is needed for correction of different sampling probabilities of selecting cases and controls. In the absence of any information on sampling fractions of cases and controls, PAF can be estimated using Miettinen’s formula by replacing RR by OR assuming the outcome is rare (say less than 10%).43 PAF calculation in case–control study is subject to different sources of bias including inappropriate control group, unmeasured confounding and measurement error especially recall bias, why PAF always should be interpreted with caution.

In the cross-sectional design, the outcome and exposure status are measured simultaneously, thus, it would be difficult to infer the temporally association between a risk factor and an outcome leading to reverse causality bias.29 One can estimate PAF in cross-sectional study by replacing RR by OR, but this replacement requires strong assumptions beyond the rarity of outcome.28 Violations of these assumptions may lead to incidence-prevalence bias. For these reasons PAF calculation is not recommended using a cross-sectional design.28

Ecological studies are used for generating hypothesis rather than deriving an adjusted association between risk factors and diseases. Moreover, ecological studies suffer from aforementioned problems with cross-sectional studies. Ecological fallacy is another misinterpretation of ecological study results due to a cross-level bias in estimating the association between exposure and outcome at a group level.44 The group level association does not necessarily represent the association that exists at an individual level.28

Bias and other study designs

When the researchers do not have access to the raw data to estimate prevalence of exposure and RR, they may use the information from ancillary data or meta-analysis. Then Miettinen’s formula can be used to estimate PAF but there are several problems with this approach. Meta-analysis studies usually combine the effect estimates such as OR and RR obtained from several types of study designs: case–control, cross-sectional and cohort studies. Depending on study design, the pooled effect estimates derived from these meta-analyses are subject to several biases including reverse causality, incidence-prevalence bias, recall bias and residual confounding. The latter bias is of especial concern because many meta-analysis studies pool unadjusted and partially adjusted effect size estimates.45 Another problem in PAF estimation is heterogeneity of definition of exposure and outcome in individual studies included in the meta-analysis.46 As an example in a meta-analysis study published in Br J Sports Med, researchers summarised the economic impact of physical inactivity in population using the estimate of the healthcare cost attributable to physical inactivity. In this review, 23 articles were included with substantial heterogeneity in the definition of physical activity. They noted that only nine studies used an adjusted RR.47

Part 4: case study for PAF

We illustrate the different calculation methods of PAF using a fictional cohort that consists of 32 919 subjects to imitate results of published paper about physical activity and risk of diabetes over a 18-year follow-up period.11 In our example, the outcome variable is new incident cases being diagnosed with diabetes during follow-up (coded as 0, was not diagnosed with diabetes; 1, was diagnosed with diabetes), and the exposures are smoking status (0, non-smoker; 1, smoker) and LTPA (coded as 0, inactive; 1, have low LTPA) of participants at the beginning of the study.

Physical activity was assessed by asking about duration (hours/week) usually spent on physical activity every week. Participants who did not do any leisure-time activity or did less than 1 hour per week were classified as inactive. In this example, smoking is a risk factor and physical activity is a preventive factor for diabetes incidence (table 2). In table 3, we calculated the stratum-specific and adjusted RRs of diabetes (using Poisson regression) for physical activity and smoking variables in addition to prevalence of physical activity and smoking status in diabetic cases as input data. Using the described input data, we calculated PAF for smoking, prevented fraction (PF) for physical activity (or PAF for inactivity) and combined PAF for smoking and inactivity.

Table 2

Schematic example of diabetes incidence associated with physical activity and smoking

Table 3

Calculation of population attributable fraction (PAF) and prevented fraction (PF) for smoking and leisure-time physical activity

We computed the PAF for smoking and physical inactivity separately using the Miettinen’s formula (table 3). Adjusted RRs were estimated using Poisson regression. The PAF estimate for smoking was 5.0% and can be interpreted as proportion of diabetes cases that could be prevented if the participants don’t smoke. We also calculated the PF for physical activity as 12.8%. It can be interpreted as proportion of diabetes cases that could be prevented from the population if physical inactivity was eliminated.

We also computed the combined PAF for both exposures with three aforementioned methods including a generalisation of Miettinen’s formula, product formula and model-based standardisation (table 3). The estimated value of combined PAF using model-based standardisation and generalisation of Miettinen for smoking and physical inactivity was 16.0%. However, product formula yielded a combined PAF value of 17.2%, which is biased due to the slight interaction. This combined PAF for smoking and physical inactivity is interpreted as the proportion of diabetes cases that could be prevented from the population if smoking and inactivity were eliminated.


PAF can provide valuable insight into how much behavioural determinants (eg, physical inactivity, smoking) influence a health state (diabetes, death, myocardial infarction). We alert the interested reader that four important assumptions need to fulfilled when calculating PAF and it is critical that the variable of interest (eg, physical activity) is causally related to the primary outcome (eg, greater longevity).

For the reader familiar with this area of biostatistics, we provide the alert observational studies are liable to confounding and Levin’s formula (see Appendix) should not be used in that setting to estimate PAF. Miettinen’s PAF formula is appropriate for general use and should be used with caution when there are external data such as in the case of meta-analysis. We recommend using model-based standardisation to calculate PAF when considering several risk factors jointly. Biostatisticians can easily calculate generalised impact fraction (GIF) using model-based standardisation.

Appendix: calculation methods of PAF

There are different methods to estimate PAF. This section serves as a brief tutorial in PAF calculation using different methods including Levin’s method, Miettinen’s method and model-based standardisation method. We also described the PAF calculation in generalised setting including multi-categorical exposure, multiple risk factors and when the counterfactual value of exposure is not zero.

Levin’s method

In this method (F1 in table 4), crude RR and prevalence of exposure in the total population were combined to estimate the proportion of cases that could be prevented by removing the risk factor.3 32 48 49 This is an unadjusted PAF which can be biased in the presence of confounding.32 50

Table 4

Different formulas for the calculation of PAF

Embedded Image

Levin’s formula (F1 in table 4) is unbiased in the absence of confounding and effect modification.27 48 As confounding is inevitable in observational studies, it must be stressed that F1 is useless in practice for unbiasedly estimating PAF.27 The Levin’s formula is biased if we plug in adjusted RR (OR/rate ratio/HR in the case of uncommon outcome) in the formula.4 49 Thus, to quantify a valid estimate, the Levin’s formula must be generalised to adjust for potential confounding. A valid approach to estimate the PAF in this situation is to use a weighted-sum approach (F 2 in table 4), which calculates a weighted average of PAFs estimate in over the stratum of confounding variables. The weighted factor is the proportion of cases in each stratum of confounding variables.49 Implementation of this method is limited in practice, especially when there are multiple confounders, some of which may be quantitative48 51 and so stratification produces sparse data in some strata.52

As an example, the authors of a published paper in Br J Sports Med calculated the PAF of four risk factors including high body mass index (BMI), high blood pressure (BP), smoking and physical inactivity, for risk of heart disease in women using Levin’s formula. They plugged the adjusted RRs from ancillary databases in the Levin’s formula which may introduce bias. The magnitude of this bias depends on the degree of confounding.12

Miettinen’s method

In 1974, Miettinen introduced a case-based PAF formula (F3 in table 4).53 In contrast to the F1, it requires information about the prevalence of exposure among the cases (pc).

Embedded Image

Unlike Levin’s formula, Miettinen’s formula produces valid PAF estimate even in the presence of confounding if one uses adjusted RR.4 27 Also, Miettinen’s formula has an intuitive interpretation as an adjustment to attributable fraction in the exposed which is (RR−1)/RR. The adjustment factor is the prevalence of exposure among the cases (pc) which reflects the fact that PAF measures the impact of exposure removal in the total population which includes both exposed and unexposed people.

To calculate PF according to Miettinen’s formula, one can use adjusted RR for no exposure and prevalence of no exposure among cases.27

Model-based standardisation method (parametric g-formula)

Based on the original formula ‘(O−E)/O’ (where O and E refer to the observed number of cases in population and the expected number of cases under no exposure, respectively), one can estimate the PAF using model-based standardisation.27 The principle behind using this approach is to predict the number of cases under no exposure (E). First the outcome is modelled based on exposure and all confounding variables using multivariable logistic regression model. Second, to calculate the expected number of cases (E), one can set the exposure equal to zero (or some reference level for continuous), predicting the probability of case, and summing the predicted probabilities over all individuals.54

In a recent paper published in Br J Spots Med, the PAF of victimisation by peers at age 13 years for depression at 18 years was estimated as 29.2% (95% CI: 10.9% to 43.7%) using model-based standardisation.20

PAF for multi-categorical exposure

Originally, PAF was formulated for a single binary exposure3 and was extended for multi-categorical exposures by Miettinen (F5)30 and Walter (F4).55 The extensions of Miettinen’s and Levin’s formulas for multi-categorical exposures were presented in table 4.

PAF for multiple risk factors

PAF can also be estimated for several exposures simultaneously. For multiple risk factors the sum of individual PAFs can be more than one. Overlapping of risk factors is the main explanation for this situation: people exposed to two exposures should not be accounted twice.

To estimate the overall PAF of several risk factors, the following formula has been proposed:

Embedded Image , where r is the exposure index.

This formula assumes the independency30 and no interaction between risk factors which is often unrealistic in practice. In fact almost all of the pairs of the risk factors in most studies are associated. As an example, in the study of heart disease in Australian women mentioned above, the risk factors of high BMI, smoking, high BP and physical inactivity are strongly associated.12

Using a generalisation Miettinen’s formula (F5 in table 4) or model-based standardisation, we can calculate PAF for the joint effects of two or more risk factors. By cross-classification, one can transform categorical risk factors to one exposure variable with more than two levels.27 32 As an example, suppose we are interested in the combined PAF of smoking (0=non-smoker, 1=smoker) and physical inactivity (0=active, 1=inactive) for incident type 2 diabetes, then we can cross-classify these two factors as one variable: a joint variable with four categories (0=non-smoker and active (baseline), 1=non-smoker and inactive, 2=smoker and active, 3=smoker and inactive).

PAF when the counterfactual value of exposure is not zero

PAF is an impact measure of the reduction of the current exposure status to a counterfactual zero exposure. However, there are situations that the effect of partial exposure reduction is needed or there is not a perfect intervention to entirely remove the exposure.27 56 Then, the generalisation of PAF known as GIF is used. GIF is defined as the fractional reduction of cases that would result from reducing the current level of exposure in the population to some level of interest.27 One can see that PAF is a special case of GIF when the counterfactual is zero. For example, we may be interested in the impact of halving the prevalence of physical inactivity in the population on risk of diabetes. GIF covers other generalisation of PAF such as time-varying treatments and dynamic interventions.57 GIF can be easily calculated using model-based standardisation.58

WHO advised using potential impact fraction (PIF) when the counterfactual is not zero.59 The PIF is based on the Levin’s formula60 which is identical in definition and interpretation with GIF.59 PIF formula for a multi-categorical risk factor (F6 in table 4) has all limitations of Levin’s formula, therefore it is not recommended for calculating of PAF.



  • Twitter @RUNSAFE_Rasmus

  • Contributors AK wrote the paper, and MAM and RON revised the paper. All authors approved the final version of the paper.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.