The dominance of East African distance runners and sprinters of West African origin invites discussion around the contribution of genetic and lifestyle factors to performance. In this review, we focus on the genetic basis for performance. Previous research associating candidate genes such as ACE and ACTN3 to endurance and sprint performance in Caucasian populations has not been replicated in African populations. This may be influenced by numerous factors, including small sample sizes, comparisons across different ethnic populations and problems identifying appropriate control groups. Conceptually, these failures reveal the complex polygenic nature of physiology and performance, and the erroneous application of a candidate gene approach to more genetically diverse African populations. We argue that research has in fact established a role for genes in performance, and that the frequency, rather than the prevalence, of favourable genetic variants within certain populations may account for the performance dominance in these populations.
Statistics from Altmetric.com
A geographical concentration of unprecedented success
An enduring and fascinating question in exercise science is the dominance of specific population groups at the extreme ends of the competitive running spectrum. In sprint events, more specifically the 100 m, athletes born in the USA or Caribbean Islands have won 10 of the last 13 Olympic titles, every World title since 1983 and have recorded 74 of the top 100 times in history.1
At the opposite end of the spectrum, distance running events are dominated by runners of East African origin, particularly those from Kenya and Ethiopia. Statistics of their success are well known and reviewed elsewhere,2 but include the presence of 87 athletes of East African descent in the top 100 marathon performances of all time3 and every top ranked marathoner since 2003.4
The intrigue around the dominance of the East African nations in distance running is increased by the observation that a large majority of Kenya's most successful runners originate from a single tribe, the Kalenjin. This tribe, with a population of approximately 3.5 million, has won 75% of all Kenya's gold medals and a similar percentage of silver medals at major international running competitions.5 Further, almost half of Kenya's international runners (44%) come from a subtribe of the Kalenjin known as the Nandi, who comprise only about 3% of the total Kenyan population6 followed by other Kalenjin subtribes: the Keiyo (16%) and Kipsigis (10%) and other non-Kalenjin tribes, the Kisii and Kikuyu. The dominance of these athletes has been described as ‘the greatest geographical concentration of achievement in the annals of sport’.7
The underlying factors enabling this concentrated achievement have been the subject of considerable research. At opposite extremes are theories that West and East Africans are genetically predisposed to sprint and distance running, respectively, compared with theories that downplay the role of genetic factors and rather propose a myriad of lifestyle-related factors in the pathway to world-class performance. These include altitude, habitual diet, cultural, training-related and socioeconomic factors, and have been reviewed extensively recently.8
We have recently described and contrasted two polarised models to explain sporting success. We termed these models the practice sufficiency and innate ability models, and they propose that sports success is solely the result of accumulated training or genetic factors, respectively.9 Arguably, such exclusive and polarised theories are too simplistic to account for the complexity of performance, and the available research clearly indicates that both training and innate factors are necessary for the attainment of elite performance. The ultimate conclusion is that the exposure of optimal genetic factors to the optimal training environment produces champion athletes.9
In the present review, we apply this integrated conclusion to discuss the contribution made by genes to sporting success, using East African runners as an illustrative example. We argue that our current understanding is limited by the complexity of the genetic determinants of a performance phenotype, as well as practical limitations facing the research. These include the focus on a limited number of identified candidate genes across different populations, small sample sizes and challenges identifying the appropriate control population. The result, based on the published literature, is a theory that is incomplete and may well be incorrect. Here, we propose alternative conclusions and theories based on this published research, and also suggest future research to better understand how genetic factors affect performance.
We chose East African runners because they have been widely studied, and physiological characteristics and candidate genes for endurance in these populations have been described extensively in the literature.10–12 Where possible, we illustrate both the support and the shortcomings of a genetic explanation for sporting performance by comparing and contrasting East African distance runners with sprinters from the Caribbean and West Africa, since these genetic comparisons have been made in the published literature.
Multifactorial explanation for performance
As it is clear that success is multifactorial and complex,8 the pursuit of a simple explanation for East African distance running success must be acknowledged as futile. Over four decades of running success has led to the development of a unique combination of environmental, lifestyle and cultural factors which makes distinguishing between innate and trainable factors impossible.
Also, it is theoretically possible to downplay the relative contribution made by each lifestyle-related factor individually, because it is not unique to the identified running regions of East Africa—poverty, altitude and/or favourable socioeconomic motivation for success exist in numerous countries and population groups that have never produced world-class athletes. This would suggest that it is not a single factor, but the inter-relationships between them that creates an optimal environment for distance running success, including producing the first successful athletes who serve as a catalyst for future successes.
Wilber and Pitsiladis8 therefore correctly concluded that the East African running phenomenon is due to a combination of favourable somatotypical characteristics leading to exceptional biomechanical and metabolic economy/efficiency, chronic exposure to altitude in combination with moderate-volume, high-intensity training (live high+train high) and a strong psychological motivation to succeed athletically for the purpose of economic and social advancement.
Having reviewed the evidence for genetic traits (mitochondrial DNA, Y chromosome haplogroups and specific candidate genes such as ACE and α-actinin-3 (ACTN3)) in the East African population, Wilber and Pitsiladis8 described the unlikelihood that a single-gene polymorphism would result in the success of East African athletes. They conclude instead that ‘it is likely that elite athletes rely on the presence of a combination of advantageous genotypes’. While this is arguably the case, Wilber and Pitsiladis8 further emphasised his contention that ‘the East African running phenomenon is not a genetically mediated phenomenon’. It is this aspect that we aim to address here.
To begin with, we suggest that few would dispute that many of the physiological differences accounted for by the above conclusion, including the somatotype, biomechanical, metabolic and cardiovascular characteristics, have at least some genetic basis.9 For example, there is evidence that both the initial value and trainability of VO2max is approximately 50% heritable,13 and that the heritability of athlete status is approximately 66%.14
Using a genome-wide association study (GWAS), Bouchard et al15 identified 21 single nucleotide polymorphisms (SNPs), out of a panel of 325 000, that accounted for 49% of the response of VO2max to aerobic training. Individuals possessing 19 or more of a group of 21 SNPs could be classified as ‘high responders’ to training, achieving an increase in VO2max threefold greater than the ‘low responders’, individuals with nine or fewer of these SNPs. This suggests a very powerful role for genetic polymorphisms in physiology and, by extension, performance.9 ,15
We acknowledge that VO2max is by no means a key differentiator between Kenyan and European runners, having been shown in numerous studies to be similar between groups.16 ,17 However, this GWAS, the only one in the field of which we are aware, highlights that (1) physiological characteristics and hence performance are indeed strongly influenced by genes and (2) importantly, that numerous polymorphisms must be studied for a comprehensive picture of genotype–phenotype relationships to emerge, rather than single candidate genes, as has been acknowledged in the literature.8 While physiological factors such as running economy, neuromuscular, metabolic and biochemical functions have yet to be examined and quantified in this way, it seems reasonable to suggest that each of the factors identified as crucial has some genetic basis.
An alternative question, one that invokes more polarising viewpoints, is whether East African athletes possess an advantage over the rest of the world because they have unique genetic polymorphisms not found or extremely rare in other populations? As described, there is no direct evidence, as yet, that this is the case leading to the contention that East African success is not genetically mediated.8 However, as we propose subsequently, this contention may be incomplete and premature, based on limited studies in small African populations, and with concerns for research design and the complexity of training and gene interactions. We begin by describing previous genetic research on East African runners.
Genetic studies on East African runners
To date, the major focus of genetic research on East Africans has been to identify specific genes previously associated with performance, and to determine whether different polymorphisms of these genes are found in athletically successful individuals compared with unsuccessful individuals from the same population. Specifically, two genes have been extensively studied for their association with athletic ability, namely the ACE and ACTN3 genes.
The first evidence of genetic polymorphisms influencing human physical performance was reported for the ACE gene.18 ,19 An insertion/deletion (ACE I/D, rs1799752) polymorphism, which has been estimated to explain up to 47% of the variance in circulating ACE levels,20 was associated with extremes in exercise performance in Caucasians, with the I allele (an insertion of 287 bp) occurring more frequently in endurance athletes, while the D allele was associated with short duration sprint performance.21
However, these associations have not been replicated in other ethnic groups. For example, a study of Kenyans compared international-level endurance runners to national level runners and control individuals who showed no running prowess, but were representative of the Kenyan population in their geographical distribution throughout Kenya.10 No differences in ACE I/D genotype were found between the groups.
This may be due to the relatively larger influence of other variants of the ACE gene on circulating ACE levels.22 For example, an A to G transition at nucleotide 22 982 (rs4363), in the sequence AF11856923 or 31958,24 elicits the largest intergenotype differences in ACE levels in both Afro-Caribbean and European subjects.25 Although absolute linkage disequilibrium between I/D and A22982G has been shown for Caucasian populations, this is not the case in individuals of East African descent, where linkage disequilibrium between the two loci is 0.58.10 Therefore, this may explain why there is no association between I/D genotype and elite Kenyan athlete status.
A similar phenomenon is observed in world-class Jamaican sprinters. Scott et al26 found no differences between Jamaican sprint athletes and controls for either ACE I/D or A22982G. Given previous associations between the ACE D allele and, by inference, the A22982G allele and their performance, studies performed with homogenous ethnic populations appear not to support a role for ACE genotype in elite sprint athlete performance.
The same appears true for the other major candidate gene, ACTN3. This gene encodes the protein ACTN3, which is almost exclusively expressed to sarcomeres of fast glycolytic type II fibres that are responsible for the generation of rapid forceful contractions during activities such as sprinting and weightlifting.27 ,28 A genetic variation in the ACTN3 gene that results in the replacement of an arginine (R) with a stop codon (X) at amino acid 577 (R577X, rs1815739) can create two different versions of the ACTN3 gene, R and X. A strong association has been found between the ACTN3 R polymorphism and sprinting performance in Caucasian populations.29
As is the case for ACE, this association is not present in African athletes, where no genotype frequency differences were found between elite Nigerian sprinters and controls11 The same was recently observed between Jamaican or US sprint athletes and controls for ACTN3 genotype frequency.26 This variant within the ACNT3 gene is therefore not informative within these population groups.
Of interest, however, is the very low frequency of ACTN3 XX genotypes in the control populations—only 2% of control participants had the unfavourable XX variant, similar to the Elite Jamaican athletes (3%). Furthermore, 75% of both the control population and Elite athletes were found to possess the theoretically favourable RR gentotype.26 This finding has two implications. First, given the small sample size of elite sprinters, and the similar prevalence of the RR allele in the control population, it makes the discovery of a statistically significant difference between sprinters and controls unlikely.
Second, it invites an alternative explanation for the failure to distinguish between elite athletes and controls on the basis of a single candidate gene polymorphism. That is, if one accepts the previously published data showing the association between the RR allele and sprint performance, then the exceptionally low prevalence of the XX genotype may suggest, at least with respect to this gene variant, that the control population of Jamaicans also has the favourable configuration to become sprinters, should they be exposed to the correct training and environmental factors.
This theory could be expanded further, to recognise that other, as yet unidentified polymorphisms may be present in the cases and absent in the controls (or vice-versa), but these have not yet been studied. As such, the selection of control participants becomes an important consideration when adopting a single candidate gene approach,26 a factor we explain in more detail below.
Conceptual and research limitations in East African genetic studies
This concept introduces some of the significant challenges facing research into genetic predictors of sporting success. A given phenotype, the sum of which is exercise performance, is the result of the complex interaction between multiple genetic variants, and their impact on phenotype likely differs between populations. The recognition of ethnic/population differences means that genetic comparisons between world-class athletes and controls have been limited to individuals from within the same ethnic group.10–12 ,30 This, however, introduces a second challenge, because if the advantage of a particular population group is related to the frequency of potentially favourable genotypes, rather than the presence of specific gene variants, then comparisons between individuals from the same population will not necessarily reveal these differences, particularly using the candidate gene approach of previous research studies.
For example, candidate gene studies in East and West African/Caribbean athletes have failed to find associations with performance levels, but could also be viewed as having provided evidence of an increased prevalence of the favourable candidate gene polymorphisms across the entire population. Notwithstanding the previously mentioned concerns around comparisons between different populations, consider that in Australian Caucasian, Spanish with European origin and Japanese and Javanese populations in Asia, the frequency of the X allele of ACTN3 is between 44% and 54%, resulting in an XX genotype in between 18% and 25% of the population.31 This can be compared with the aforementioned prevalence of only 3% XX in the general Jamaican population, with three in four Jamaicans, regardless of current performance level, having the sprint-favourable RR configuration.26 This could be interpreted to point to the prevalence of a polymorphism that (1) predisposes the majority of individuals in the population to becoming a sprinter and (2) eliminates the possibility of finding an association between the polymorphism and elite athlete status in these populations.
At the very least, it must be acknowledged that in these case–control association studies, the control individuals, who are genetically similar to the elite athletes for studied candidate genes, may be potential cases should they be exposed to the same training as the elite athletes. Stated differently, the absence of differences between elite sprinters and non-performing controls does not disprove or minimise the role of genes, but may instead indicate that a very large portion of the population has a genetic predisposition for sprinting success compared with other populations. The dual challenge of ethnic differences in genotype–phenotype interactions and valid comparisons to a control group creates a ‘catch-22’ for conclusions on the role of genes in performance.
The selection of appropriate control cases is thus a significant factor when only a single gene variant is considered. A critical requirement of comparing groups to identify the prevalence of certain candidate gene polymorphisms likely to play a role in performance is that the experimental group (in this case, elite distance runners or sprinters) must be indistinguishable from controls (non-elite runners) in all respects, with the exception of the identified condition, namely performance level. That is, the control population must perform the same training, in the same environmental conditions, as the experimental population, but fail to achieve the outcome, world-class running ability. This is arguably impossible to do, given the myriad of factors that interact to produce a champion athlete.
The importance of the selection of case controls is illustrated by the example of Paul Tergat, an elite Kenyan runner from the Nandi subtribe who held the marathon world record from 2003 to 2007. Tergat began running aged 19, but was an elite athlete by 21 years of age (David Epstein, personal communication). A genetic comparison between elite and non-elite runners would have placed Tergat in its experimental group only 2 years after it would have placed him in its control group, simply as a result of differences in exposure to training. There are numerous other examples of Kenyan athletes who have achieved world-class performances shortly after starting training. They too would be classified as controls only a year before becoming cases.
This introduces the hypothesis that it may be the frequency of favourable genotype combinations for performance within these groups that explain their domain-specific dominance. Similarly, diet may be a significant confounder, affecting the performance levels of athletes who are training similarly. Any failures to find genetic differences within a given ethnic group must thus be interpreted in the context that training- and lifestyle-related factors cannot possibly be controlled for.
For this reason, gene comparisons between different populations may be more informative, notwithstanding the complexity introduced when comparing different ethnic groups, as described previously. Yang et al11 have compared East African (Kenya and Ethiopia) to West African (Nigeria) athletes and found that the frequency of the X allele in the ACTN3 gene was similar between Kenyan and Nigerian populations, but substantially lower than in any non-African population sampled to date, and similar to that seen previously in a South African Bantu-speaking cohort.27 On the other hand, they reported that the X allele frequency seen in Ethiopian runners is substantially higher, probably reflecting population admixture with non-African groups.11
Yang et al11 explained the similar genotype between the East African and West African individuals as the result of the effect of other genetic influences or environmental factors on muscle performance, rather than a simple relationship between the ACTN3 genotype and running ability. For example, many East Africans are subject to environmental influences, such as living and training at altitude and high levels of incidental running during childhood, which play major roles in shaping athletes from this region.5 ,32 These factors differ substantially from the childhood experiences of most potential non-African athletes, and, in this study, West African athletes. Such factors may reduce the impact of ACTN3 phenotypes on the muscle performance of East African individuals. This is an illustration of how a candidate gene approach can be significantly complicated by the interaction between environmental factors and genes, and is not new. It has recently been proposed that early life factors, together with the appropriate genotypes, may induce biological changes that allow, for example, for a more robust biological response to training in later life.33
Another significant problem is the combination of small sample sizes and the narrow focus on single gene variants to explain performance. Given that over 200 genetic variants have been shown to contribute to variations in physical fitness and performance,34 it is not surprising that the candidate gene approach applied to only two genes thus far is limited. The likelihood of an as yet unidentified polymorphism in the East African runners also cannot be ruled out, given that the African population is known to be more genetically diverse than the previously studied Caucasian populations.35 Indeed, Wilber and Pitsiladis8 recognised that ‘there will be many interacting genes involved in elite running performance, and hence it is timely that genetic research has moved to the genomics era’, which will involve the simultaneous testing of multiple genes.
Conclusion and future research directions
Given the aforementioned research, with its recognised limitations, we conclude the following regarding the role of genes in the performance of East African distance runners.
First, previous failures to conclusively identify genetic differences between elite and non-elite individuals from within the ethnic group may be the result of case–control comparisons that have not controlled for environmental factors. These confirm the essential roles of environment and training,9 but do not disprove the theory that Jamaican or East African runners are predisposed to sprint or endurance running success, respectively.
Second, candidate gene studies have found no difference in the prevalence of favourable polymorphisms in ACE and ACTN3 between cases and controls, but this may indicate a wider distribution of the favourable polymorphisms rather than the lack of their effect. Alternatively, it must be acknowledged that other as yet unidentified polymorphisms, possibly unique to the genetically diverse African population, may mediate any performance benefits, and the candidate genes previously associated with performance in Caucasian populations have little relevance to the African population. Given that only two candidate genes have been studied, in small populations with concerns around case–control matching, neither of these possibilities can be discounted.
Third, we acknowledge the complexity of genetic research on complex physiological and performance phenomena. We would, however, suggest that it is premature and incorrect to conclude that East African running success is not genetically mediated, for even based on the existing research studies with its recognised limitations, an alternative conclusion is possible. That is, these populations may contain a higher frequency of favourable genotype combinations for performance. Future studies, using a genome-wide approach, need to expand the focus to examine not only more polymorphisms, but also the proposed higher frequency of potentially favourable gene variants in specific groups.
Fourth, we do not disagree with a previous suggestion that future studies will identify ‘performance genes essential for world-class performances such as those not only typical of the East African runners, but also possessed by world-class distance runners worldwide’. Based on the previous literature, it appears likely that every population group will possess individuals who have the genetic make-up to become elite athletes. However, even this statement is premature, since (1) the sample sizes studied have been very small by comparison with typical genome-wide studies and (2) GWAS have not been applied to these populations, which are more genetically diverse, and there may well be unidentified polymorphisms, unique to certain populations that have beneficial effects for performance.
Fifth, we would direct future research into identifying the frequency of these markers, with the hypothesis that the favourable performance genotype will occur more frequently in certain populations, predisposing a greater number individuals from these populations to elite sports performance. Upon exposure to the optimal environment, which arguably exists more in East African than elsewhere, this population will achieve disproportionately more performance success.
The corollary is that individuals from outside a given ethnic group will be less likely to achieve world-class performance, and only genetic comparisons between different groups would elucidate this finding. However, these comparisons have not been done, given the numerous complex interactions between gene variants and their functional significance. Ultimately, the approach of comparing individuals distinct only in performance levels when environment cannot be controlled for is unable to fully answer the question of whether a genetic basis for performance exists within a narrowly defined ethnic group.
Summary of article
To date, no studies have conclusively shown a genetic basis for the dominance of East African distance runners or sprinters of West African origin.
This is largely due to limitations in the research method and capability to date, given small sample sizes and a focus on candidate gene polymorphisms.
Ethnic differences between African and European populations confound this approach significantly.
It is premature to argue that performance is not genetically mediated.
Future studies, using genome-wide association study methods, identifying a panel of single nucleotide polymorphisms known to affect physiology and performance, may further reveal the basis for advantage.
Support for this research is provided by the Medical Research Council of South Africa, National Research Foundation and Discovery Health. The authors would like to acknowledge David Epstein for conceptual contributions to the paper.
Contributors All three authors were equally involved in the conceptualisation, writing and editing of the manuscript. All provided substantial contributions to conception and design, as well as drafting the article and revising it critically for important intellectual content. All provided final approval of the version to be published.
Competing interests None.
Provenance and peer review Commissioned; internally peer reviewed.
▸ References to this paper are available online at http://bjsm.bmj.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.