Screening for red flags in individuals with low back pain (LBP) has been a historical hallmark of musculoskeletal management. Red flag screening is endorsed by most LBP clinical practice guidelines, despite a lack of support for their diagnostic capacity. We share four major reasons why red flag screening is not consistent with best practice in LBP management: (1) clinicians do not actually screen for red flags, they manage the findings; (2) red flag symptomology negates the utility of clinical findings; (3) the tests lack the negative likelihood ratio to serve as a screen; and (4) clinical practice guidelines do not include specific processes that aid decision-making. Based on these findings, we propose that clinicians consider: (1) the importance of watchful waiting; (2) the value-based care does not support clinical examination driven by red flag symptoms; and (3) the recognition that red flag symptoms may have a stronger relationship with prognosis than diagnosis.
- low back pain
- red flags
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Despite intense focus and increased research funding, the self-reported levels of disability in individuals with low back pain (LBP) have not improved in the last decade.1 Worsening disability has propagated, despite the presence of numerous classification schemes designed to lead to patient-specific treatments. The care provided to patients frequently does not meet professionally recommended standards. We can do much better. We have to do better.
Diagnosis is one of many necessary components during the clinical decision-making process. Characteristically, diagnosis is performed early in the management process of the patient and involves both soft skills (clinical reasoning) and highly valid quantitative clinical testing methods. All medical professionals who manage LBP, regardless of training, background or philosophy, use differential diagnostic methods to improve the likelihood of providing the right care to the appropriate patient, and reducing risks associated with delayed management. By its nature, differential diagnosis is a systematic process used to identify the proper diagnosis from a competing set of possible diagnoses.2 3 For example, a practitioner would perform more extensive diagnostic testing for an individual with LBP that also has clinical indication of cancer or infection compared with one without those indicators.
In one of the earliest works on differential diagnosis for LBP, published in 1924, John M Berry4 wrote prophetically: ‘Confronted with such a multiplicity of symptoms and causes, diagnosis usually is difficult or uncertain, and consequently treatment is unsatisfactory.’ Berry identified a number of conditions that led to LBP that were not affiliated with muscle or bone, and suggested the importance in classifying potentially serious pathologies for proper care. Berry4 espoused a ‘screening’ process for red flags, which is a modern term used to describe signs or symptoms that are related to a serious underlying pathology, and may indicate more diagnostic testing is necessary before the appropriate care can be delivered. Astonishingly, it has taken more than 90 years to conclude that screening for red flags associated with LBP does not work. We discuss four primary reasons why red flag screening does not work and provide alternatives to consider in future LBP management models.
Reason 1: red flag symptoms neither rule out nor identify serious pathology
In a meta-analysis by Downie and colleagues,5 nearly all patient history, physical examination findings and diagnostic clinical tests associated with ‘screening’ for LBP exhibited negative likelihood ratios (−LR) near 1.0, suggesting that none were truly sufficient in ruling out a condition with a negative finding. A −LR is calculated from the sensitivity and specificity values of a contingency table. A −LR is calculated by taking the probability that a person who has the disease and tests negative, and dividing by the probability of a person who does not have the disease and tests negative. In other words, a lower numerator over a higher denominator and more closely approximating zero provides you with a lower and ‘better’ −LR. Low −LRs allow a clinician to have confidence that a negative finding definitely indicates that the underlying condition assessed (eg, cancer, fracture) is indeed absent.
Downie and associates’5 meta-analysis exhibited that the tests evaluated in their review failed to exhibit a low −LR (close to zero) needed to rule out, and failed to exhibit a high positive likelihood ratio (+LR) necessary to ‘rule in’ a condition. High +LR is generally used to confirm the presence of a condition once a comprehensive clinical examination is used to guide hypothesis testing. Also calculated from sensitivity and specificity, a +LR is the probability that a person who has the disease and tests positive divided by the probability of a person who does not have the disease and tests positive. In other words, a higher numerator divided by a lower denominator provides you with a higher and ‘better’ +LR. Based on the findings that both the −LR and +LR of the clinical tests were of limited clinical utility not diagnostic, the authors5 warned of unwarranted referrals for imaging, as some of the endorsed red flag tests were very commonly positive.
In an effect to provide a clinical example, we propose the following scenario using data from the meta-analysis of Williams and colleagues,6 who examined red flag screening for vertebral fracture in patients with LBP. A vertebral fracture is more common in older individuals (0.5%–4% of all patients with LBP) and clinicians are taught to screen by examining age, history of trauma and history of corticosteroid use. Supposing a pretest prevalence of 0.5%–4%7 8 and using the values reported by Williams et al,6 the post-test probabilities for ruling out improve by less than 1% when a negative finding occurs (table 1); a change in post-test probability that has marginal clinical utility. In addition, the 95% CIs cross 1.0 in sensibility and motor testing, further questioning their discriminative capacity.
Reason 2: variability in definitions for red flag symptoms greatly limits research and clinical progress in this area
In research, there is no reliable way to compare red flag symptom rates across different cohorts so we know very little about overall prevalence. Studies such as this are extremely difficult to conduct, as very large sample size is required based on overall low prevalence estimates. The influence variability has on clinical practice is that it continues to perpetuate inconsistency in assessment of red flag symptoms.9
The extent of variability was highlighted during the development of a standardised self-report tool for assessment of red flag symptomology.10 A systematic literature search was performed to identify the types of red flag symptomatology items used in prior studies as part of the creation of the OSPRO-ROS (Optimal Screening for Prediction of Referral and Outcome-Review of Systems) tool.10 The OSPRO-ROS tool was developed with data from a physical therapy cohort receiving care for a variety of musculoskeletal pain conditions (including LBP). The search yielded 97 unique items representing symptoms from eight body systems (eg, cardiovascular, pulmonary or gastrointestinal). Subsequent analyses focused on the number of items needed to identify a ‘red flag symptom responder’ (operationally defined as a positive response to any of the 97 items) and 10 items had 94.7% accuracy while 23 items were needed to reach 100% accuracy. The development of the OSPRO-ROS tool was the first attempt we are aware of to standardise assessment of red flag symptomology, and it suggests that the variation in this process can be greatly reduced, making standardisation a potentially important first step in gaining better consistency on inquiry and reporting for research and clinical purposes.
Reason 3: LBP guidelines do not help
Underwood and Buchbinder11 suggest that screening for red flags in patients with LBP is ‘a popular idea that didn’t work and should be removed from guidelines’. The vast majority of current clinical practice guidelines for back pain recommend the use of red flags for determining potential presence of spinal fracture or malignancy.12 Those that then present with these suggested red flags are recommended to undergo more extensive, costly and potentially harmful diagnostic testing. As mentioned earlier, extensive diagnostic testing has a tendency to lend to higher costs, and to potential overtreatment, elicitation of unnecessary fear and concern on part of the patient, as well as a significant potential for being cost prohibitive.
An overview of clinical guidelines for primary care non-specific LBP management13 revealed eight different guidelines endorsing 27 red flags for malignancy and 26 for fracture. More distressing was the fact that none of these eight guidelines endorsed the same set of red flags for either condition! The lack of consistency does not provide the clinician with a stable set of ‘rules’ to follow to perform a screening test adequately for their patients, and is therefore detrimental to good practice. Additionally, despite clinicians generally defining red flags for back pain in accordance with guidelines, there is little consensus on consistency of inquiry and reporting red flags.9
Guidelines describing variable and numerous red flags without reported diagnostic accuracy are undermined by guidelines recommending immediate referral for imaging if any red flag is present.14 15 A list of potential red flags, such as ‘age <45 years, night/morning pain, slow beginning, rigidity duration over 3 months, family history of spondyloarthritis, ulcerating colitis, Crohn’s disease, psoriasis’ would lend to significant overutilisation of imaging, and to inappropriate clinician clinical reasoning.14 For example, in the aforementioned study of developing the OSPRO-ROS tool, 393/431 participants (91.2%) reported at least one red flag symptom.5 Red flags of ‘age <22 or >55 years’15 should not necessitate the clinical concern that ‘history of malignancy or immune compromise’15 is present based on face value alone. Additionally, when red flags are present, the recommendation that plain radiographs alone can explain the ‘cause for pain found’ is simply not the type of clinical reasoning to use when such decision-making processes are not universally supported by the literature.16
Let us not confuse the use of technology associated with imaging and lab work with an improvement in differential diagnosis, since there is notable overuse of inappropriate imaging and identification of false positives in those with LBP. No clinical guidelines for LBP support the use of early imaging, and as stated previously, the use of imaging when a red flag is present has led to notable overimaging.17 In a study of 1003 original patients with LBP referred from four primary care clinics, 110 had at least one red flag (75% of which had a single red flag). These 110 underwent advanced imaging (eg, CT, MRI, and so on). Twenty-four of these individuals had a non-benign spinal disorder. Among these 24 individuals, 50% had correlating clinical neurological findings reported, leaving only 12 cases of highest priority for advanced imaging. Additionally, in the 893 patients who did not have advanced imaging done and were followed for approximately 1 year, none had a non-benign spinal disorder. Unfortunately, most had radiographs prior to implementation of study. The prevalence of many serious low back pathologies is extremely low, including the reported prevalence of <1% for spinal malignancy in patients with LBP presenting to a primary care physician.18
Reason 4: clinicians do not actually screen for red flags; they manage LBP conditions they see
Medical screening is a process in which a disease or a condition is assessed in an asymptomatic population who may or may not have early disease or disease precursors.19 Test results are then used to guide whether or not a diagnostic test should be offered. Medical screening is typically performed in the preclinical phase of a condition, a time frame in which there are no outward symptoms. Beneficial medical screening tests are able to identify findings that are hallmarks of a serious pathology within the preclinical phase.
In contrast, diagnostic testing involves clinical testing that is designed to aid in the diagnosis or detection of a suspected disease or condition when symptoms exist. In an ideal diagnostic scheme, the symptoms help to guide identification of the underlying pathology. In the case of LBP, symptoms offer little to no guidance in detecting underlying pathology.
Although recognised to encompass a set of diagnoses within the WHO disease classification,20 LBP is actually a symptom; and typically, a symptom associated with an unknown underlying condition. LBP not affiliated with a serious pathology will often exhibit symptoms that are similar to competing diagnoses such as fracture, cancer and other red flags. Many of the red flags associated with LBP are more prevalent in older individuals,21 a subset of individuals who will also frequently have concomitant orthopaedic-related LBP.22 In a study designed to identify movement examination features unique to metastatic spinal cancer, 61 of 66 individuals with LBP were diagnosed with metastatic bone cancer and (concomitantly) were diagnosed with a condition such as lumbar stenosis, spondylosis or degenerative disc disease.22 No unique movements discriminated metastatic spinal cancer from other age-related conditions. Rarely does the literature outline a definitive set of signs or symptoms that are unique to serious pathology of the low back—for either the screening OR the diagnostic phase.
To recapitulate, clinicians managing patients with LBP do not actually screen for red flags. Medical screening is typically performed in the preclinical phase of a condition, and red flag symptoms for LBP are captured in the clinical phase. For clinicians to truly understand and change their clinical reasoning processes, educational curricula need to be revised. Similar suggestions have also been provided related to changing one’s emphases on diagnosis to understanding the importance of prognosis.23
Summary and three recommendations
For those who fear that a philosophical change in screening for LBP may result in detrimental patient outcomes (harms) or marginal benefits, we offer the following options: (1) watchful waiting as an alternative to diagnostic testing, (2) enhancement of value-based care when cost is considered, and (3) linking red flag symptomology directly with outcomes.
Early use of diagnostic tests is often used to provide reassurance to patients. After careful evaluation of studies that have used early diagnostic testing methods, van Ravesteijn and colleagues24 supported the practice of providing a clear explanation and watchful waiting. In medicine, watchful waiting is the act of close surveillance, but allowing time to pass before medical intervention or therapy is used. There is evidence that early intervention may actually be detrimental/harmful in patients with LBP.25–27 Moreover, the incidence of several diseases that tend to be detected early is increasing with no corresponding reductions in mortality rate.28 Hence, the surge in attention on early detection and the focus on benefits over harms do not seem justified.29
We propose careful monitoring for changes in symptoms over time. Careful monitoring, performed concurrently with open and timely communication with the patient regarding potential contributing factors to their symptoms, can not only be as (or more) effective than early testing. Watchful waiting may also improve patient–provider relationship, improve clinician clinical reasoning/decision-making, improve patient satisfaction and anxiety30 and be one small step forward for improvement of exorbitant healthcare costs. Tangentially, a ‘wait and see’ approach has proven effective in having patients avoid major oncological surgery, averting permanent alterations in lifestyle without loss of oncological safety.31
Enhancement of value-based care
An often-overlooked consideration relative to screening in musculoskeletal pain is cost; and whether screening adds to the overall value of a particular care episode.32 In a comparison of diagnostic strategies for diagnosis of cancer in patients with LBP, MRI used as a first-line diagnostic strategy was shown to cost 10 times as much as the conventional strategy (MRI if erythrocyte sedimentation rate elevated and radiographs were positive). Each extra patient with a spine malignancy in the MRI group exceeded US$625 000.33 These costs are likely of greater disparity today.
Although (to our knowledge) no clinical practice guidelines address value-based care within their recommendations, we propose this should be an adopted consideration before ordering tests for red flags. We feel value-based care is a small but important step towards managing unnecessary and potentially unwarranted care for LBP, one that has strong support within the literature.
Linking red flag symptomology not with diagnostic testing but directly with health status
Early diagnostic imaging testing is very likely to overinterpret potential pathology in patients with LBP with as high as 94% of patients presenting to a general practice office presenting with ‘abnormal’ MRI findings, yet only 3% of these individuals had actual serious pathology.34 The 3% rate was three times higher than another study of younger patients with more acute LBP presenting to general practice,35 demonstrating that even in older patients with chronic LBP the prevalence is extremely low. Additionally, in both studies, the primary serious pathology presentation was vertebral fracture (3%) for which there are good clinical screening tests. Unwarranted diagnostic imaging leads to higher healthcare costs, unnecessary, potentially invasive interventions and, as discussed in reason 2, no improvement in diagnostic screening accuracy.
In light of our above proposal for enhanced value-based care and in place of using red flag symptoms to drive more diagnostic testing, perhaps a change paradigm is necessary and the symptoms themselves can be used to refine recommendations for care pathways and outcome prediction. For example, Roach et al 36 used red flag symptomology to predict whether LBP would result in surgical or conservative care and George et al 37 have used the OSPRO-ROS tool to predict change in 12-month comorbidity status. These alternative models of using red flags deserve further exploration and may be appropriate for a value-based era of musculoskeletal care.32
What are the findings
Screening requires preclinical phase of a condition, a time frame in which there are no outward symptoms. As such, red flag screening for low back pain is actually a management strategy that does not involve screening methodology.
Clinical tests for red flags do not exhibit low negative likelihood ratios, suggesting that they fail to ‘rule out’ a red flag when a negative finding is present with the test.
Red flag findings are more closely affiliated with prognosis than they are to actual serious pathology such as cancer, fracture, and so on.
Although low back pain clinical practice endorses assessing red flags, only a few outline which findings are useful for clinical practice.
Contributors CEC, SZG and MPR all worked together for the concept of the study, participated in the full writing and approved the final manuscript.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.