Article Text

Download PDFPDF

Reproducibility of computer based neuropsychological testing among Norwegian elite football players
  1. T M Straume-Naesheim1,
  2. T E Andersen1,
  3. R Bahr2
  1. 1Oslo Sports Trauma and Research Center, Oslo, Norway
  2. 2Norwegian University of Sport and Physical Education, Oslo, Norway
  1. Correspondence to:
 T M Straume-Naesheim
 Oslo Sports Trauma and Research Center, Oslo, Norway;


Background: Head injuries account for 4–22% of all football injuries. The rate of brain injuries is difficult to assess, due to the problem of defining and grading concussion. Thus computerised testing programs for cognitive function have been developed.

Objective: To assess the reliability of a computerised neuropsychological test battery (CogSport) among Norwegian professional football players.

Methods: Norwegian professional football league players (90.3% participation) performed two consecutive baseline Cogsport tests before the 2004 season. CogSport consists of seven different subtasks: simple reaction time (SRT), choice reaction time (ChRT), congruent reaction time (CgRT), monitoring (MON), one-back (OBK), matching (Match) and learning (Learn).

Results: There was a small but significant improvement from repeated testing for the reaction time measurements of all seven subtasks (SRT: 0.7%, ChRT: 0.4%, CgRT: 1.2%, MON: 1.3%, OBK: 2.7%, Match: 2.0%, Learn: 1.1%). The coefficient of variation (CV) ranged from 1.0% to 2.7%; corresponding intraclass correlation coefficients ranged from 0.45 (0.34 to 0.55) to 0.79 (0.74 to 0.84). The standard deviation data showed higher CVs, ranging from 3.7% (Learn) to 14.2% (SRT). Thus, the variance decreased with increasing complexity of the task. The accuracy data displayed uniformly high CV (10.4–12.2) and corresponding low intraclass correlation coefficient (0.14 (0.01 to 0.26) to 0.31 (0.19 to 0.42)).

Conclusion: The reproducibility for the mean reaction time measures was excellent, but less good for measures of accuracy and consistency. Consecutive testing revealed a slight learning effect from test 1 to test 2, and double baseline testing is recommended to minimise this effect.

  • football
  • soccer
  • neuropsychology
  • reproducibility
  • CogSport

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Football is the only contact sport that exposes a large number of participants to purposeful use of the head for controlling and advancing the ball.1 Based on a series of cross-sectional studies using neurological examinations, neuropsychological tests, computed tomography scans and electroencephalographic examinations on active and older retired Norwegian football players, Tysvaer2 postulated that heading the ball could lead to chronic brain injuries as seen in boxing. Since then, several cross-sectional studies have indicated that football can cause sustained measurable brain impairment,3–6 although not all studies have reported such a relation.7,8

Head injuries account for 4–22% of all football injuries2 with a reported incidence during matches of 1.7 injuries per 1000 player hours.9 However, this figure incorporates all types of head injury, including facial fractures, concussions, lacerations, and eye injuries. The incidence of concussion has been estimated to be 0.5 injuries per 1000 match hours9 but is difficult to assess, due to the problem of defining and grading concussions.1,10 When using the traditional diagnostic criteria for concussions, which require loss of consciousness or amnesia, only a fraction of these are recognised as concussions. Trauma to the neck and/or head that is sufficient to cause facial fractures or lacerations, will potentially also cause damage to the brain, although this is easily overlooked because of the more visible injuries. Although most athletes with head injuries recover uneventfully following a single episode of concussion, repetitive mild head trauma may be implicated in the development of cumulative cognitive deterioration.1 Accurate monitoring of symptom resolution and cognitive recovery is therefore important to ensure the athlete’s safety and indicate whether the player should return to play or not.

The change of paradigm in the diagnosis and management of concussion has evoked the need for new diagnostic instruments within sports related head injuries. One item in such tests is deterioration in cognitive test performance.11 In the sports arena, changes in cognition following a concussion injury are conventionally determined by administering a battery of neuropsychological tests during the pre-season to establish a baseline for comparison after an injury. In studies using such a design, any changes from baseline are considered to be a consequence of the concussion injury.

In the past decade, computerised cognitive function testing programs have been developed—for example, CogSport (CogState Ltd, Melbourne, Australia), ImPACT (ImPACT Inc., Pittsburgh, PA), ANAM (Automated Neuropsychological Assessment Metrics; developed by the US Department of Defense), CRI by HEADMINDER (concussion resolution index; Headminder Inc., New York). The conventional paper and pencil tests were designed primarily for assessment of cognitive dysfunction caused by neuronal or psychiatric disorders and not for the assessment of mild changes in cognitive function over time.12 Therefore, these tests often have poor psychometric properties for serial studies including a limited range of possible scores, floor and ceiling effect(s), learning effects, and poor test–retest reliability.13,14 Computerised testing using infinitely variable test paradigms may overcome these concerns.15

Makdissi et al16 compared the sensitivity of the CogSport test and conventional paper and pencil tests to detect cognitive changes following mild concussion in a cohort of elite players from the Australian Football League by comparing baseline tests with post-injury tests. Their data suggested that computerised tests may be particularly sensitive to the cognitive consequences of sports related concussions, and also that conventional neuropsychological tests do not show this sensitivity in athletes with mild concussion. Similar findings have recently been reported in studies on high school athletes with head injuries using ImPACT.17–19 Computer based cognitive tests have many advantages over paper and pencil tests that may allow them to detect subtle impairments such as those expected to occur in mildly concussed athletes.20 In general, repeated tests of healthy adults in different age groups have shown that computer based tests are reliable20,21; although there is a learning effect between test 1 and 2, this effect seems to decrease after the first two tests.22

The test properties of the CogSport test, a computer based neuropsychological evaluation tool widely used in football concussion management, have not been assessed by independent researchers, nor has it been examined among elite athletes. Therefore, the objective of this study was to evaluate the CogSport test by investigating the reproducibility of two consecutive baseline tests in a cohort of elite football players.


We invited all the 14 clubs of the Norwegian professional male football league (Tippeligaen) with their A-squad contract players (about 300) to participate in the study; 289 players (90%) agreed to take part. Written informed consent was obtained from all participants and the project was approved by the Regional Ethics Committee for Southern Norway.

Neuropsychological testing

The teams meet for the La Manga Cup and pre-season training camp in February/March every year at the Norwegian Football Association training centre in La Manga, Spain. We conducted testing among 13 of the 14 teams of Tippeligaen at La Manga prior to the 2004 season in a test lab set up in the residential complex, Los Lomas II. Trained personnel administered and supervised the neuropsychological testing, and the tests were completed by the players in groups of three in the same quiet room to allow efficient data collection. The last team was tested at its home field in Norway two weeks later under similar standardised conditions. There is no time difference between Spain and Norway and the testing was performed at the same time of day with the same person instructing and supervising the test for each team.

We used the computer based neuropsychological test CogSport (versions 2.2.0 and 2.2.1). Norwegian speaking players were tested with the Norwegian language version of the test, where instructions for each subtask were in Norwegian, and the rest of the players used the English language version. The test has been described in detail elsewhere.13,23,24 The stimulus for all tasks consists of playing cards and responses are given using the keyboard. The d key indicates “no” and the k key “yes” (vice versa for left handed players). These are the only keys used throughout the whole test.

The CogSport test battery includes seven subtasks testing different cognitive brain functions (table 1). All subtasks include between 15 and 40 trials, and the data are reported by the CogSport program as mean reaction times with corresponding standard deviations for all subtasks, accompanied by accuracy data for all tasks except simple reaction time and monitoring. Anticipatory responses (reaction times <100 ms) and abnormally slow responses (reaction times >3500 ms) are recorded as errors and excluded from the analyses. Accuracy data are calculated as the number of true positive responses divided by the number of trials.

Table 1

 Description of the seven CogSport subtasks and their assumed corresponding cognitive function

The computer program sends a test report by e-mail to the test supervisor with basic analyses of the result. In addition, the test report includes an estimate of whether the player’s performance meets the minimum requirements of the test with regard to alertness throughout the test and the plausibility of whether they understood the instructions or not. This built-in decision is based on the variability of performance on simple reaction time and a threshold value for accuracy on the three final tasks. If a player has more than 40 incorrect responses on one task, the test is stopped.

Data analysis

Reliability and correlation studies of CogSport on young adults recommend the mean reaction time for all seven subtasks and accuracy data from the three final tasks (one-back, matching, learning) as the main outcome measures.20 Our data analysis therefore focused on these 10 measures. Before all calculations, the mean reaction times and standard deviation data were log10 transformed and the accuracy data were arcsine transformed to obtain a more normal distribution.20

Reproducibility analyses were performed using the method error (ME), calculated as the standard deviation (SD) of the mean difference between test 1 and 2 divided by the square root of the number of tests performed: ME = SDmean diff/√2.25 From the ME we calculated the coefficient of variation (CV), which quantifies the variation between each measurement as a percentage of the joined mean: CV =  ME/[(X1mean + X2mean)/2]. These calculations were done for all outcome measures supplied by the test. We also calculated the intraclass correlation coefficient for the same measures. The intraclass correlation coefficient is defined as the ratio of the “true” variance, or the variance between subjects (S2b), relative to the total variance given by the variance between subjects adding the variance within subjects (S2w).26 The intraclass correlation coefficient ranges from 0 to 1, and from the equation it its simplest form (S2b/(S2b + S2w)), we see that when the variation within the subjects (that is, a player’s test score on two consecutive tests) moves towards 0, the intraclass correlation coefficient approaches 1 indicating good reproducibility.

We used SPSS version 11 for the statistical analyses and its two way random single measure model for calculating the intraclass correlation coefficients. Paired Student’s t tests were used to investigate significant differences and any directional trends between the groups.



Of the 289 players (96.3%) who agreed to participate in the study, 18 did not report for testing, leaving us with 271 (90.3%) players who underwent two consecutive neuropsychological tests. However, due to technical problems with some tests (unrelated to test performance), the number of players with dual tests decreased to 247. In addition, 15 tests did not fulfil the minimum requirements set by the computer program and therefore could not be included in the analyses. Thus, a total of 232 players (83% Norwegians, 8% Scandinavians (with no problems in understanding Norwegian), and 9% from other countries (mainly European)) were included in the study. The mean (SD) age of the investigated group was 25.7 (4.6) years (range 17–35); 87.5% were right and 12.5% left handed; 62.9% had completed secondary education (that is, high school), and 36.6% had a tertiary level of education (that is, college or beyond). The demographic characteristics of excluded group did not differ significantly in any way from the included group.


There was a significant improvement in the CogSport subtasks from test 1 to test 2, ranging between 0.4% and 2.7% for the log10 transformed reaction time measures (table 2, fig 1). The improvement in reaction time was slightly higher for the more complex tasks compared with the simpler ones (table 2). The accuracy data for the more complex subtasks (one-back and learning) also indicated a better performance (higher percentage of correct responses) in test 2, except for matching (table 2, fig 2).

Table 2

 Comparison between the results from test 1 and test 2 for the main CogSport outcome measures

Figure 1

 Reproducibility of mean reaction time (log10, ms) for five CogSport subtasks; test 1 plotted against test 2 (n = 232). The hatched line is the identity line (x = y). Regression lines (dotted) have been added to illustrate whether there were systematic differences between test 1 and test 2. The subtasks are arranged vertically and from left to right according to their complexity from top left (easiest) to bottom right (most difficult).

Figure 2

 Reproducibility of accuracy (arcsine of % correct responses) for three CogSport subtasks; test 1 plotted against test 2 (n = 232). See fig 1 for further details.

The reproducibility tests resulted in a CV ranging from 1.0% to 2.7% for the reaction time measures (table 3). A closer look at fig 1 reveals higher variability for subjects with slower reaction times and Bland–Altman plots (not shown) were used to examine this phenomenon more closely. They uniformly indicated a somewhat increasing difference in favour of test 2 with increasing reaction time. Thus, a poor performance on test 1 indicated a larger improvement on test 2. The intraclass correlation coefficients were also generally high for the reaction time measurements. All but one task, monitoring (0.45 (0.34 to 0.55)), resulted in intraclass correlation coefficients above 0.65 (up to 0.79 for the most complex task, learning, thus indicating good reproducibility; table 3).

Table 3

 Reproducibility reported as the coefficient of variation and the intraclass correlation coefficient between test 1 and test 2 for the seven CogSport subtasks

The accuracy data for the three more complex tasks, one-back, matching, and learning, showed poorer reproducibility. The CV ranged from 10.4% to 12.4% and the intraclass correlation coefficient from 0.31 to 0.14 (table 3). Additionally, as indicated in fig 2, one-back and matching tasks suffered from a ceiling effect with many participants managing 100% correct responses.

Measures of consistency, as given by the standard deviations of the mean reaction times for each subtask, were subject to greater variability than the mean result and inversely related to the complexity of the task. The CV for the standard deviation ranged from 14.2% for simple reaction time to 3.7% for learning, the most complex task (table 3). In the same way, the corresponding intraclass correlation coefficients increased with increasing task complexity, ranging from 0.12 for simple reaction time to 0.61 for learning (table 3).


This is the first study to examine the test properties of a computer based neuropsychological test battery performed by an independent research group. The main finding was that the day to day reproducibility for the mean reaction time measures was excellent in a large cohort of professional football players, but that the accuracy and consistency measures were less reliable. We also observed a slight learning effect from the first to the second test. Thus our results are in accordance with those of recent studies examining the reliability of computerised neuropsychological tests among healthy young adults and elderly people.20 Collie et al assessed the reliability of CogSport by serial testing at a one hour and a one week interval 60 young volunteers recruited through advertisements around university campuses in Melbourne, Australia.20 Elite athletes are select individuals, who may differ from this group in many different ways, including background characteristics such as education level and socioeconomic status. However, even more important is that superior neurocognitive skills may be one of the selection criteria to become an elite footballer. In fact, a closer look at the reaction time data of Collie et al’s 60 volunteers reveals that they were considerable slower than the footballers on all subtasks. The reproducibility of the CogSport test on elite athletes has not been thoroughly investigated before. The apparent difference between regular controls and elite athletes illustrates the need to develop appropriate reference data in populations of elite athletes, and supports the practice of individual baseline testing in the elite as a basis for the management of concussion.

What is already known on this topic

  • Computerised neuropsychological testing programs have been proved to be sensitive and reliable in the evaluation of cognitive function after concussions in sport

  • Dual baseline testing is recommended to minimise learning effects

The CV ranged from 1.0% to 2.7% for the mean reaction time measures and all values under 5% must be considered as good. Collie et al,20 in their study on 60 healthy non-athletic young volunteers, reported intraclass correlation coefficients for the reaction time measurement higher than 0.69 for all of the four tested subtasks. Except for simple reaction time, the results were similar when comparing the test–retest results with both the one hour and the one week interval between the tests. The intraclass correlation coefficient for the mean reaction time measures from our material were within the same range. Reaction time measures have been shown to provide the most sensitive index of cognitive changes following a head injury,27 which in part is due to the fact that they are highly reproducible, as indicated in both our study and previous studies on other study populations.16,20,22,28 In contrast, of the consistency measures only the standard deviations for the most complex tasks (matching and learning) were within this limit. Although there was a uniform trend of less variation on test 2, the reproducibility data imply that these measures are unlikely to be helpful for follow up evaluations. The simpler tasks were the least consistent and one may speculate that the lack of complexity in these tasks causes the player to lose focus during the task. In the Cogsport testing program, simple reaction time testing is repeated three times during the session, which may exaggerate this effect.

In our study, the accuracy data showed inadequate reproducibility and the highest improvement from test 1 to test 2. The ceiling effect found on both one-back and matching may also make these less suitable as outcome measures, even with a dual baseline setting. Previous analyses using this computerised battery have shown ceiling effects for all accuracy data except matching and learning,29 but our results indicate that this is also the case for matching.

It should be noted that intraclass correlation coefficients must be interpreted with caution. From the simplified equation for the intraclass correlation coefficient (S2b/(S2b +S2w)) it is evident that data of a homogeneous group (that is, where the between-subjects variability (S2b) is small compared with the within-subjects variability (S2w)) will produce a poorer intraclass correlation coefficient than data of a heterogeneous group (that is, with high between-subject variability with respect to the within-subject variability), even if within-subject variability is exactly the same for the two groups. It is therefore recommended not to compare directly the intraclass correlation coefficients from different study populations without knowing the variance within the tested groups.30 We have therefore, as recommended,25 also presented the test–retest coefficients of variation, which are independent of test result range and therefore can be compared directly between studies. It should be noted that, compared with the performance data reported by Collie et al,20 our footballers displayed both faster mean reaction times and a more homogeneous performance. When this is taken into consideration, a comparison of the test–retest intraclass correlation coefficients with Collie et al indicates that the reproducibility of the mean reaction time measures may be even better among elite footballers than non-athletic controls. In a one year follow up of 84 elite Australian Rules footballers, the test–retest coefficients of variation were not reported.31

What this study adds

  • The computerised test battery (CogSport) showed excellent reproducibility in a large cohort of professional Norwegian football players using a translated version of the test

  • The reaction time measures proved to be the most reliable for all subtasks tested, and these are therefore recommended as primary outcome measures

In our group of professional football players, there was a significant improvement from test 1 to test 2 for the mean reaction time measures on all subtasks of CogSport. Collie et al found a similar practise effect when a group of elderly volunteers (mean age 64 (8) years)) performed four consecutive CogSport™ tests in three hours.22 Whereas our professional football players tended to display a more pronounced practise effect when the tasks became more complicated, Collie et al’s elderly volunteers showed an opposite trend. More relevant is a comparison with elite Australian Rules footballers, and, as mentioned above, 84 of these were tested after an injury-free season (the exact timeframe was not stated) without displaying any significant differences in performance since baseline for any of the subtasks of CogSport (for the final two tasks, matching and learning, accuracy data were presented instead of mean reaction times). A practise test was conducted before the baseline test, but it is not clear if this was done for the follow-up as well (either in full or shortened).

Since we performed only two tests, we are not in a position to say whether the practise effect will decrease with further testing. However, Falleti et al followed 26 young volunteers who performed four different baseline CogSport tests on three different days, where the first two tests were performed on the same day with a two hour break in between, and the first was discarded. In the three remaining baseline tests (time intervals not stated) there were no differences in performance on reaction time or accuracy measures.23

Another aspect of the mean reaction time measurements, which became evident on Bland–Altman plots, was that the improvement from test 1 to 2 was not evenly distributed. The players with the slowest mean reaction times improved the most, and on some subtasks those with the fastest mean reaction times were actually slower on test 2. Such regression towards the mean has also been described by Erlanger on simple and choice reaction time measures from a similar computerised neuropsychological test package from HEADMINDER.32

Due to the practise effect, we agree with previous studies conducted on other populations that the test requires a dual baseline, where the first test is discarded.22,23 Whether this procedure should be used in follow up testing if more than a couple of weeks have passed since the baseline testing, needs further investigation. One problem with a dual baseline tests is that the test becomes more time consuming and there is a risk is that the player will lose their focus. A large study of patient with head injuries found that effort explained 53% of the variance in neuropsychological test performance (in comparison, educational level accounted for only 11% and age only 4%).33 It has to be noted that the group used the old paper and pencil test, which implies a different testing setting. The results can therefore not be transferred directly to computerised testing. Nevertheless, the issue of including some kind of effort measure when conducting a neuropsychological test was recently stressed at the Second International Symposium on Concussion in Sport in Prague.34

In conclusion, the reproducibility for the mean reaction time measures was excellent in the cohort on professional footballers included in the present study. However, the accuracy and consistency measures were less reliable, and may therefore be less sensitive as outcome measures in post-concussion management. Consecutive testing revealed a slight learning effect from test 1 to test 2, and dual baseline testing with rejection of the first test is recommended to minimise this effect.


This study was paid for by a grant from FIFA. In addition, financial support came from the Oslo Sports Trauma Research Center, which has been established at the Norwegian University of Sport and Physical Education through generous grants from the Eastern Norway Regional Health Authority, the Royal Norwegian Ministry of Culture and Church Affairs, the Norwegian Olympic Committee and Confederation of Sports, Norsk Tipping AS, and Pfizer AS. CogState Ltd. provided the necessary software and technical support free of charge. A special thanks to Jiri Dvorak and Astrid Junge from the FIFA-Medical Assessment and Research Centre (F-MARC) for their collaboration on developing the study protocol and Alex Collie for technical support. The authors thank Jostein and Grete Steene-Johannessen for test supervision, Ingar Holme and Lars Bo Anderson for statistical assistance, and the players, team physicians, physiotherapists, and coaches for their cooperation.



  • Competing interests: none declared