Article Text
Abstract
Consensus statements have the potential to be very influential. Recently, such statements in sport and exercise medicine appear more prescriptive, strongly recommending particular approaches to research or treatment. In 2020, a statement on methods for reporting sport injury surveillance studies included an extension to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines; STROBE guidelines are now official requirements for many journals. This suggests that investigators who use methods outside of these guidelines may have difficulty publishing their results. By definition, consensus is not unanimity, and consensus recommendations are sometimes considered flawed at a later date. This is expected as a discipline benefits from new knowledge. However, the consensus methods themselves may also inadvertently suppress contrary—but valid—opinions. I point to a different model for consensus meetings and statements that embraces dissenting opinions and is more transparent than common current methods in sport and exercise medicine. The method, based on how Supreme Courts function in many countries, allows for both majority and one or more minority opinions. I illustrate how a consensus statement might be written using examples from four previous sport and exercise medicine consensus statements. By adopting the ‘Supreme Court’ approach, important disagreements about the strength and interpretation of evidence will be far more visible than is currently the case in most consensus meetings. The benefit of the Supreme Court model is that it will ensure that clinicians, researchers and journals are not inappropriately influenced by recommendations from consensus statements where uncertainty remains.
- consensus statement
- methodology
- surveillance
- epidemiology
- injury
Statistics from Altmetric.com
Introduction
In February 2020, the IOC published a consensus statement on methods for reporting sport injury surveillance studies.1 The objective was prescriptive: ‘to provide hands-on guidance to researchers on how to plan and conduct data collection and how to report data.’1 The paper included an extension to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines2 called STROBE-Sport Injury and Illness Surveillance (SIIS); STROBE guidelines are now official requirements for many journals. It seems reasonable to assume journal reviewers and editors will want authors to report using STROBE-SIIS in relevant studies.
Researchers and journals (for reporting guidelines) and clinicians (for clinical guidelines) need to be cautious before applying the recommendations of consensus statements in general. Consensus statements have sometimes included recommendations that were later considered inappropriate. This is expected as new knowledge accumulates. However, recommendations are sometimes based on analyses known to be flawed at the time, and contrary to knowledge available when the statement was adopted.3 4
The Appraisal of Guidelines for Research & Evaluation statement (AGREE II) provides standards for developing and reporting consensus guidelines.5 However, these are not always followed. In brief, consensus statements imply consensus, which represents decisions based on a majority of a committee. However, there are often differing opinions within the committee. A transparent framework would report dissenting opinions because dissent and discussion are the foundation on how we interpret and then improve science; dissent needs to be embraced if we are to move forward appropriately.
As a solution, I propose that sport and exercise medicine adopt a model that is similar to how a Supreme Court functions in many nations.6 In this process, there is not one consensus. Rather, participants choose to align themselves with either a majority opinion, or with one or more minority opinions.
Some of my sport and exercise medicine colleagues suggested the Supreme Court approach will confuse some readers. My perspective is that it is better for clinicians and researchers to be appropriately confused rather than inappropriately certain when there are disagreements within the research community.
To illustrate the approach, I briefly review some general methods used to develop consensus statements and guidelines and highlight some challenges. In subsequent sections, I consider some past consensus recommendations in the sport and exercise medicine field as the ‘majority opinion’ and I speculate as to what a ‘minority opinion’ might have looked like. Specific examples of inappropriate conclusions include how to categorise concussions,7 restricting activity in women with female athlete triad,8 managing load and injury in sport9 and methods for surveillance studies.1 Although I present a format where each minority opinion is separate for clarity, the key underlying principle is transparency. More concise formats that appropriately summarise the strength of consensus and different opinions (eg, sudden death in cricket10) will be more appropriate in some contexts.
Table 1 presents a summary of the challenges and potential solutions for consensus meetings. These are my preliminary thoughts and represent a starting point for discussion. I encourage the reader to critique these publicly and I fully expect some will be modified or considered inappropriate/infeasible in the near future.
General methods for consensus statements
The AGREE II statement recommends how to develop appropriate consensus guidelines.5 One recent IOC statement1 described an eight-stage process in which small working groups were formed. These working groups of researchers had published papers on sports injuries and some combinations of them had collaborated to various extents. The working groups wrote up texts among themselves, and circulated these to other committee members or the entire group for approval at different draft stages. Despite our community’s general acceptance of these processes, I suggest there are four distinct categories of biases.
First, recommendations exist on who should be invited to participate.5 In sport and exercise medicine, this generally includes clinicians and applied researchers—methodologists and statisticians are less commonly invited. In addition, invited young investigators or students of the organisers often do not have the same breadth of knowledge or experience, but would have equal votes to others in any ‘consensus of meeting experts’. Finally, an appropriate meeting would invite those who organisers know have opposing views even though it will make the meeting more challenging, and writing will take much longer. Failure to invite people with appropriate expertise, including only those who are likely to avoid contradicting others, or not appropriately encouraging frank and open discussions will increase the chance of what is sometimes known as groupthink.11 This may be why some recommendations have been based on what I consider were seriously flawed analyses at the time of the meeting (see section on specific examples).
A second potential source of bias relates the material considered by the experts. Formal systematic review of the literature was already being recommended as part of the consensus process in 2003 (see AGREE and AGREE II5), where statements rely on research methods, epidemiological and statistical literature related to the issues being discussed must also be included. For example, a 2016 IOC statement on load and injury9 used methods known to be flawed at the time of publication (summarised in Refs. 3 4 12–15).
Third, most medical consensus statements do not define ‘consensus’ criteria a priori.16 In the recent IOC statement on harmonising recording and reporting for injury, ‘items were voted on to achieve a majority’,1 which is >50%. I would suggest that 50.1% is not a strong enough endorsement to make prescriptive recommendations because ‘expert opinion’ among the participants suggests the recommendation is just as likely to be incorrect as correct.
Fourth, discussions during consensus meetings are only later summarised and written up. There are rarely official ‘votes’ establishing how many participants agree when coauthors provide suggestions. Authors are sometimes given explicit instructions that ‘approving’ a text is not synonymous with ‘agreeing’ to the text. Would you trust a recommendation more if it was approved by an 8:1 vote versus a 5:4 vote? When votes are taken, the reporting needs to reflect what happened. In one consensus statement,17 participants rated each ‘statement’ on a scale of 0 (not appropriate) to 9 (appropriate). Because the authors only reported the median of the responses, readers do not know whether everyone rated the statement as 6, or whether there was a range of ratings from 3 to 9.
Another consensus statement asked participants to rate each statement from 0 (complete disagreement) to 10 (complete agreement), and discussions continued until the mean score was ≥7.50.18 The authors then reported the mean score with a 95% CI. Although a CI provides some additional information, it is not the best metric because it does not provide information on the variation among ratings. Rather, authors could report SD if the sample were large enough, or ranges/quartiles or other metrics depending on the data. The underlying principles are that consensus statements need to be transparent on how they determined what proportion of participants disagreed with particular parts of the text.
Concrete examples for a Supreme Court model
In the following examples, the majority opinion is a direct quote from a published sport and exercise consensus paper. The theoretical minority opinion represents some level of disagreement or elaboration on an important nuance or limitation within the majority opinion.
Concussions (2005)
The 2005 concussion in sport consensus statement suggested the sport medicine community categorises concussions as simple (resolves within 7–10 days) or complex (persistent symptoms).7 This categorisation was dropped unanimously at the 2008 meeting.19 However, even in 2005, general epidemiological principles required all information for a categorisation schema to be available at the time the categorisation is to be applied. When a concussion occurs, we do not know how long it will last. Therefore, we cannot diagnose a simple or complex concussion at the time of injury. Italics in the majority opinion represents text added for clarity.
Majority: “One of the key developments by the Prague Group is the understanding that concussion may be categorized for management purposes as either simple (resolves without complication over 7–10 days) or complex (loss of consciousness >1 min or prolonged recovery >10 days).”
Theoretical minority opinion: Simple vs complex concussion categorization might be appealing to researchers who want to determine if risk factors (or treatment) for concussions leading to prolonged recovery are different from risk factors (or treatment) for concussions that resolve quickly. However, this analytical approach restricts data based on events that occur after the diagnosis, and this can lead to bias if the purpose is to determine causal risk factors.20 Further, it is of limited use for clinical management because one cannot generally apply the categorization at the time of injury or for the next 10 days.
Return to play for participants with female athlete triad (2014)
Based on an earlier version21 of the Strategic Assessment of Risk and Risk Tolerance (StARRT) model for return to play,22 this statement8 recommends using a cumulative risk assessment score (based on six individual factors) to determine whether a female athlete should be cleared for full activity, but does not account for the type of activity. Both the original21 and StARRT22 models propose that the magnitude of risk depends on the stresses applied during activity. Therefore, the consensus statement inappropriately suggests a table tennis athlete and a marathoner who have the same characteristics would have the same risk.
Majority: “This cumulative risk stratification protocol is then translated into clearance and return-to-play guidelines for the Triad based on the athlete’s cumulative risk score (figure 5). Future research is needed to assess if implementation of a risk stratification model results in improved outcomes for female athletes with Triad disorders.”
Theoretical minority opinion: The magnitude of risk depends on the activity being performed. Therefore, the “cumulative risk score” is a measure of bone health, not risk. In addition, the six individual factors are weighted equally in the score, whereas most clinicians would consider that an athlete with a bone mineral density Z-score between −1 and −2 is at much higher risk than an athlete with one previous stress fracture due to training errors. We suggest clinicians consider the individual elements within this cumulative score as part of their overall decision-making process until there is empirical evidence supporting use of the score.
Managing load and injury in sport (2016)
The IOC consensus statement on managing load and injury in sport9 recommended stratification of injury risk based on the acute:chronic workload ratio (ACWR). The ACWR was developed as a measure of change in activity: a recent (acute) change in activity compared with usual (chronic) activity. The statement reproduces a graph with a U-shaped curve suggesting that there is a ‘sweet spot’ between ACWR of 0.8 and 1.3 that minimises the risk of injury. This sweet spot is also explicitly mentioned in the associated infographic.23 The implied interpretation is that athletes who decrease activity by more than 20% (ie, recent activity is 0.8 of usual activity) are more at risk of injury compared with maintaining the same level of activity. There has never been a biological theory to support this statement; the results are expected due to analytical methods that were known to be flawed at the time (summarised in4 13 14 24).
Even if the authors were unaware that the analytical methods were flawed (because they did not include an associated methodological literature review or biostatistician expert on the panel), the text for a majority and minority opinion in a Supreme Court model might have been (majority quote from Infographic article23):
Majority: “Limit weekly increases of their training load to less than 10%, or maintain an acute:chronic load ratio within a range of 0.8 to 1.3, to stay in positive adaptation and thus reduce the risk of injuries”
Theoretical minority opinion: We agree with the majority opinion that injury risk increases as ACWR rises above 1.3. However, we cannot think of any biological reason why injury risk would acutely increase when activity is decreased (ie, ACWR <1), without a subsequent increase at a later time (ie, ACWR >1). The results may have occurred by chance given the limited data available, or due to some unanticipated bias in the collection or analysis of data.
Methods for recording and reporting injury studies (2020)
The most recent IOC statement on reporting methods1 includes several challenges. Since it is an extension to the STROBE statement, it is likely to be required by some journals and the implications of unrecognised limitations are therefore much greater compared with the other consensus statements mentioned in this document. Therefore, this section discusses several challenges with the recommendations proposed so that future authors can publish their results using other methods if their study question and data require them. Although the limitations due to general definitions are not easily described in a majority and minority opinion format, table 2 provides examples of majority and minority opinions for issues related to time to recurrence, multiple injuries and recovery.
Definitions
The definition of injury includes ‘transfer of kinetic energy’ which requires motion. Although there is motion at the cellular and tissue level during an isometric contraction leading to damaged tissue, there is no motion at the joint level.
The definition of injury includes the word ‘damage’. Is damage defined by the presence of bleeding or swelling, any rise in creatine kinase or only above a threshold, or something else? There is a body of medical literature discussing diseases versus illness, diagnoses versus incapacities and so on that extend our traditional medical concepts of disease to include additional outcomes that are sometimes more meaningful to patients. Timpka et al began to adapt these extended concepts to a sports framework in 201425 and distinguished the following concepts: injury, trauma and incapacity along with more nuances for disease, illness and sickness. This work is consistent with the challenges that occur when we distinguish between recurrent injury and exacerbation.26 It may also help with classifications when asymptomatic patients with known osteoarthritis or meniscal tears become symptomatic27; there are no new ‘injuries’ from a medical perspective but we would normally want to keep track of and analyse these events in sport and exercise medicine. Further, we traditionally define injury in sport and exercise medicine research as seeking all physical complaints, medical attention and time loss injuries. However, Bolling et al found that some athletes consider an injury only if performance is affected, and others consider that pain alone is not enough of a criteria to define an injury.28 If all sport injury and illness researchers were required to use the injury definition stated in the methods and reporting consensus statement1 (or the updated Oslo Sport Trauma Research Center questionnaire consensus statement29), then our results would clearly provide incorrect answers to the research question of interest to these ‘clients’. When the objective of a consensus meeting is to be prescriptive, as in this statement, it is important for the relevant literature to be circulated, discussed and mentioned in the report.
There are additional definition challenges. Non-contact is defined as ‘no contact from an external source’, and ‘no evidence of disruption or perturbation of the player’s movement pattern’. Indirect contact is defined as an injury that results ‘from contact with other athletes or an object… The force is not applied directly to the injured area, but contributes to the causal chain leading to the health problem.’ However, this approach requires that one specify the most distal link in a chain of events (furthest away from the event) one is interested in. The example for indirect contact provided in the consensus statement is a skier who suffers a concussion ‘… after being knocked off balance hitting the gate with his knee’. Now consider two skiers who both lose their balance because of ice, the head hits the snow hard and the athlete suffers a concussion. One athlete loses their balance because the ski slips and falls without hitting the gate. The other athlete has the same event and although the skier maintains some control, they are still off balance resulting in their knee hitting the gate. According to these definitions, the first is a non-contact injury and the second is an indirect contact injury even though the initiating and final events in the chain (slip, head hitting snow) are the same.
Creating precise definitions is difficult and often requires many iterations and debate. Optimal definitions depend on whether the research is studying the most proximal cause of the pathology (eg, head hitting the snow), or one of the more distal causes (eg, hitting gate, losing balance, poor sleep).
Time to recurrence
The IOC statement recommends that time to recurrence be recorded in days. The denominator for any rate calculation should reflect the population-time at risk (known as risk set). Consider two athletes who are cleared to return to sport after an initial injury on Sunday, where one plays a sport on Sundays, and the other plays a sport Monday, Wednesday and Friday. Both athletes get reinjured their first time playing. The athlete who plays once per week has a recurrence at 7 days, and the athlete who plays three times per week has a recurrence at 1 day. Is it fair to conclude the two sports have different recurrent injury rates when each athlete was reinjured on their first day back playing?
Multiple injuries at the same time
The statement recommends injury prevalence and incidence should count multiple injuries occurring at the same time as one injury, and the severity should be considered the severity of the most severe injury. Therefore, if there were 10 events leading to simultaneous ankle and knee injuries over a season, the injury incidence would be 10 ‘injuries’/season overall, 10 ankle injuries/season and 10 knee injuries/season. Further, an ankle injury requiring only 1 week without activity will be considered severe if there is an associated fracture of the wrist.
Fully recovered
The key distinction between exacerbations and subsequent injury is whether the initial injury had ‘healed/fully recovered’. The statement defines this as ‘fully available for training and competition’, similar to a previous consensus statement.30 However, the previous statement30 noted that athletes who continue to receive treatment after returning to full activity are still generally considered to be injured clinically. They proposed return-to-play criteria only as a pragmatic solution. I suggest the solution depends on the research question and we should not recommend a one-solution-fits-all approach. For example, the first question on the updated consensus statement regarding the Oslo Sport Trauma Research questionnaire29 includes an answer choice ‘Full participation, but with (location) problems’. Therefore, the two consensus statements are inconsistent in their recommendations of what should be considered an ongoing injury vs healed injury. This is understandable because the optimal definition for a study would depend on the research question and data gathered (eg, are symptoms being recorded).
Summary
The AGREE II recommendations for consensus statements include conducting a relevant systematic review with extensive readings and discussions. Sport and exercise medicine consensus statements that discuss methods need to include relevant epidemiological and statistical best practices. Failure to report dissenting opinions lead to non-transparency, suboptimal products, and likely hinders the advancement of injury and illness prevention/treatment programmes.
The Supreme Court model still needs to be evaluated in the medical setting but it may lead to better results, and still requires the correct mix of investigators’ experience and knowledge. At the very least, consensus statements need to be transparent about the strength of the consensus. Most importantly, as the famous basketball coach John Wooden once said, “If you don’t have the time to do it right, when will you have the time to do it over?”
What is already known
Consensus guidelines often influence clinical practice because they are supposed to be based on best evidence synthesis. If the methods and discussion are not transparent, then errors and omissions will likely be more difficult to identify.
More transparent research methods generally lead to improved understanding by clinicians and researchers.
What are the new findings
Consensus meetings may suppress dissent by only inviting like-minded participants, not allotting enough time for discussion, or failing to sufficiently encourage participants to speak frankly and openly.
Consensus statements typically do not report dissent, which may lead to inappropriately strong recommendations.
A Supreme Court model, where both majority and minority opinions are encouraged and reported, would likely lead to more transparency and fewer errors.
How one might operationalise a Supreme Court model is illustrated using examples from four different published sport and exercise medicine consensus statements.
References
Footnotes
Contributors The sole author was responsible for the conception and writing of this review article.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.