THE TRAFFIC LIGHT SYSTEM
This review analyses current evidence for the diagnostic accuracy of the Traffic Light System (TLS) in primary care, its implications for GPs, and its alternatives. Two linked studies1,2 of the performance of the National Institute for Health and Care Excellence’s TLS concluded that it was unfit for purpose. TLS aims ‘to improve clinical assessment and help healthcare professionals diagnose serious illness among young children [under 5] who present with fever’,3 a notoriously difficult assessment since fever in children is common while serious bacterial infection (SBI) is rare. The TLS groups clinical features into three categories with corresponding recommendations: red (serious illness likely, hospital admission recommended), green (serious illness unlikely, manage at home), and amber (intermediate risk, judgement required). Clark et al2 reported that the TLS was insufficiently accurate to identify infants in primary care with SBI and to exclude those without. Arguably these findings suggest abandoning the TLS. In preparing this review, all references from both studies were retrieved. Backward and forward searches from the references were cascaded until no new references were found.
WHAT CLARK AND BLYTH FOUND
Blyth et al1 and Clark et al2 mapped data from the previous prospective cohort DUTY study4 of over 6700 children presenting with constitutional or urinary tract infection symptoms to GPs, walk-in centres, and emergency departments (EDs) to TLS items. A third of under-5s had at least one red feature.1 The accuracy of the TLS, using hospital-diagnosed SBI as the reference standard, was calculated for two thresholds: any red feature and either red or amber feature.2 Any red feature had a sensitivity of 58.8% (95% confidence interval [CI] = 32.9% to 81.6%) and a specificity of 68.5% (CI = 67.4% to 69.6%). Using the lower threshold improved sensitivity to 100% (CI = 80.5% to 100) but debased specificity to 5.7% (CI = 5.2% to 6.3%). Such performance for a national guideline is disappointing. The authors could not map all features to TLS, which may explain the poor performance, but other explanations exist.
THE EVIDENCE MAY NOT BE RELEVANT TO PRIMARY CARE
Many TLS features were first tested in secondary care, where prevalence is higher.2,5–9 Diagnostic accuracy calculated in secondary care cannot automatically apply to primary care because the populations differ, as demonstrated by the Wells thromboembolism scores.10 Furthermore, while the TLS targets under-5s, the studies on which it is based included some that restricted the age range to infants less than 6 months and others that included older children.
THE TLS ITEMS MAY LACK ACCURACY IN ANY SETTING
Likelihood ratios (LRs) are useful in comparing tests with one another. A rule of thumb for a ‘good’ test is a positive LR, LR+ ≥5, and a negative LR, LR–≤0.2. The poor performance of the TLS even in the high-prevalence populations of EDs, where LR+ lies between 1.09 to 1.99,6,7 suggests that the test in its present form lacks merit anywhere. Yet some individual TLS items may be strong indicators of SBI, although the aggregate performs poorly. Several studies in EDs have reported high performance of vital or pathognomic signs: decreased consciousness, poor circulation, petechial rash, meningism, tachypnoea, tachycardia, temperature ≥39.5°C, and O2 saturations <94%.8,11,12 Practitioner-perceived ‘ill appearance’ or ‘instinct’ has been reported to be a good predictor in primary care with an odds ratio (OR) of 62,13 LR+ 7.3 (5.3‒10.1) and LR–0.2 (0.1–0.8)5 as well as in paediatric emergency care with ORs of 2.63 (0.99–7.00),11 1.62 (1.12‒2.33)14 and 1.90 (1.04–3.68).8
ALTERNATIVES TO TLS
Two primary care alternatives to TLS are the Dutch College of General Practitioners guideline Children with Fever (NHG)15 and the decision tree of Van den Bruel et al.13 NHG is a narrative description of symptoms and signs to look for. Both NHG and TLS were tested against a previous prospective cohort study.16 The results for LR+ were TLS 1.87 (1.25–2.78) and NHG 1.04 (0.97‒1.12), and for LR–TLS 0.60 (0.34‒1.06) and NHG 0.39 (0.25‒6.22). NHG also recommends dip testing the urine in the under-2s, a bedside test that improves accuracy for urinary tract infections (UTIs).7
Van den Bruel13 derived a ‘decision tree’ fr+om a prospective study of nearly 4000 children aged 0–15.9 years in primary care. The tree features questions in sequential steps: a positive answer at any step is taken to indicate a high risk of SBI. The 6-step tree had high accuracy: LR+ 8.4 (7.6–9.4), and LR–0.04 (0.01–0.2). Such impressive results suggest it is reliable. A model derived in one sample (derivation group) should be tested in another sample (validation group) because observational studies may inadequately control for unknown confounders.17 Two validation studies of the decision tree, Verbakel16 and van Ierland,18 used referral as a surrogate, which cannot be assumed to be an accurate estimate of the incidence of SBI. The third, Verbakel et al,5 used the more objective outcome of hospital diagnosis. The results for LR+ and LR–were: 6.13 (5.34‒7.03) and 0.11 (0.04–0.33);16 1.7 (1.6‒1.8) and 0.7 (0.6‒0.7);18 and LR+ 7.3 (5.3‒8.1) and LR–0.2 (0.1‒0.8).5 However, the study using diagnosis as outcome5 combined two nodes to create a new node, which means it is not an exact replication of the original study.
DISCUSSION
This analysis updates previous reviews. Verbakel et al’s16 retrospective evaluation of TLS and NHG used the surrogate outcome, referral, on a small sample, approximately 500.19 Thompson et al’s20 review had only the original Van den Bruel study.13 This current analysis includes the larger data set of Clark et al with the more accurate outcome of SBI and includes further studies of the decision tree.
The red items of the TLS have been found to have poor sensitivity and specificity for SBI in EDs and primary care. Clarke may have underestimated TLS’s performance since not all features were mapped. However, the underestimation would have to be huge to raise the diagnostic accuracy to an acceptable level. Sensitivity could be improved, to 100%, by treating red and amber items equally but only with undesirable consequences: an unrealistic rate of referral, 94% of children presenting with fever. The Dutch NHG did not have superior accuracy. In practice, GPs do not follow the TLS guidance even on red items. In the DUTY study, only 43 (6.2%) children with a red flag were referred for same-day assessment.1 Studies elsewhere report that practitioners refer fewer children than guidelines recommend. In a Netherlands primary out-of-hours service, only 19% of 3424 children with a positive indication were referred to the ED.18 Ambulatory care paediatricians in the US diverged from guidelines in 36% of cases.11
So primary care practitioners’ decisions to refer patients for same-day assessment depend on more than the presence of clinical features specified in guidelines. Divergence from guidelines is not unreasonable: a tool that rules out SBI when neither red nor amber features are present is of little use when it applies to only 5.7% of cases. Can more research find a solution? ‘Instinct’ appears useful but we do not know what it constitutes nor how big a part it plays in GPs’ decisions. Perhaps qualitative studies of GP instinct may provide understanding? The decision tree has been the best performing tool but further replication studies are desirable. However, it returned a high proportion of positives, 23%, in primary care in the most recent study.5 Overall only 11.8% were admitted, indicating that practitioners use judgement not present in the tool.
Future research might move from listing individual features or their combinations towards the development of a prediction rule that weights features to calculate a predictive value, as do the Wells score and QFracture. To be accurate. it must use objective rather than surrogate outcomes. To address the problem correctly, the sample should consist of children under 5. To be relevant to primary care, such research must be developed in primary care itself.
Acknowledgments
Thanks to Dr Kathryn Hughes, Division of Population Medicine, Cardiff University, for constructive criticisms that have greatly improved the quality of this review.
Provenance
Freely submitted; externally peer reviewed.
REFERENCES
- 1.Blyth MH, Cannings-John R, Hay AD, et al. Is the NICE traffic light system fit-for-purpose for children presenting with undifferentiated acute illness in primary care. Arch Dis Child. 2022;107(5):444–449. doi: 10.1136/archdischild-2021-322768. [DOI] [PubMed] [Google Scholar]
- 2.Clark A, Cannings-John R, Blyth M, et al. Accuracy of the NICE traffic light system in children presenting to general practice: a retrospective cohort study. Br J Gen Pract. 2022. DOI: . [DOI] [PMC free article] [PubMed]
- 3.National Institute for Health and Care Excellence . Fever in under 5s: assessment and initial management NG143. London: NICE; 2019. https://www.nice.org.uk/guidance/NG143 (accessed 25 Jul 2023). [PubMed] [Google Scholar]
- 4.Hay AD, Sterne JA, Hood K, et al. Improving the diagnosis and treatment of urinary tract infection in young children in primary care: results from the DUTY prospective diagnostic cohort study. Ann Fam Med. 2016;14(4):325–336. doi: 10.1370/afm.1954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Verbakel JY, Lemiengre MB, De Burghgraeve T, et al. Validating a decision tree for serious infection: diagnostic accuracy in acutely ill children in ambulatory care. BMJ Open. 2015;5(8):e008657. doi: 10.1136/bmjopen-2015-008657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yao SHW, Ong GY, Maconochie IK, et al. Analysis of emergency department prediction tools in evaluating febrile young infants at risk for serious infections. Emerg Med J. 2019;36(12):729–735. doi: 10.1136/emermed-2018-208210. [DOI] [PubMed] [Google Scholar]
- 7.De S, Williams GJ, Hayen A, et al. Accuracy of the ‘traffic light’ clinical decision rule for serious bacterial infections in young children with fever: a retrospective cohort study. BMJ. 2013;346:f866. doi: 10.1136/bmj.f866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Urbane UN, Petrosina E, Zavadska D, Pavare J. Integrating clinical signs at presentation and clinician’s non-analytical reasoning in prediction models for serious bacterial infection in febrile children presenting to emergency department. Front Pediatr. 2022;10:786795. doi: 10.3389/fped.2022.786795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bleeker SE, Moons KG, Derksen-Lubsen G, et al. Predicting serious bacterial infection in young children with fever without apparent source. Acta Paediatr. 2001;90(11):1226–1232. doi: 10.1080/080352501317130236. [DOI] [PubMed] [Google Scholar]
- 10.Oudega R, Hoes AW, Moons KGM. The Wells rule does not adequately rule out deep venous thrombosis in primary care patients. Ann Intern Med. 2005;143(2):100–107. doi: 10.7326/0003-4819-143-2-200507190-00008. [DOI] [PubMed] [Google Scholar]
- 11.Pantell RH, Newman TB, Bernzweig J, et al. Management and outcomes of care of fever in early infancy. JAMA. 2004;291(10):1203–1212. doi: 10.1001/jama.291.10.1203. [DOI] [PubMed] [Google Scholar]
- 12.Thompson M, Coad N, Harnden A, et al. How well do vital signs identify children with serious infections in paediatric emergency care. Arch Dis Child. 2009;94(11):888–893. doi: 10.1136/adc.2009.159095. [DOI] [PubMed] [Google Scholar]
- 13.Van den Bruel A, Aertgeerts B, Bruyninckx R, et al. Signs and symptoms for diagnosis of serious infections in children: a prospective study in primary care. Br J Gen Pract. 2007;57(540):538–546. [PMC free article] [PubMed] [Google Scholar]
- 14.Nijman RG, Vergouwe Y, Thompson M, et al. Clinical prediction model to aid emergency doctors managing febrile children at risk of serious bacterial infections: diagnostic study. BMJ. 2013;346:f1706. doi: 10.1136/bmj.f1706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Eizenga WE, Opstelten W. Revision of the Dutch College of General Practitioners practice guideline ‘Children with fever’.[Article in Dutch] Ned Tijdschr Geneeskd. 2017;161:D1199. [PubMed] [Google Scholar]
- 16.Verbakel JY, Van den Bruel A, Thompson M, et al. How well do clinical prediction rules perform in identifying serious infections in acutely ill children across an international network of ambulatory care datasets? BMC Med. 2013;11:10. doi: 10.1186/1741-7015-11-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Siontis GC, Tzoulaki I, Castaldi PJ, Ioannidis JPA. External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. J Clin Epidemiol. 2015;68(1):25–34. doi: 10.1016/j.jclinepi.2014.09.007. [DOI] [PubMed] [Google Scholar]
- 18.van Ierland Y, Elshout G, Moll HA, et al. Use of alarm features in referral of febrile children to the emergency department: an observational study. Br J Gen Pract. 2014. DOI: . [DOI] [PMC free article] [PubMed]
- 19.Monteny M, Berger MY, van der Wouden JC, et al. Triage of febrile children at a GP cooperative: determinants of a consultation. Br J Gen Pract. 2008. DOI: . [DOI] [PMC free article] [PubMed]
- 20.Thompson M, Van den Bruel A, Verbakel J, et al. Systematic review and validation of prediction rules for identifying children with serious infections in emergency departments and urgent-access primary care. Health Technol Assess. 2012;16(15):1–100. doi: 10.3310/hta16150. [DOI] [PMC free article] [PubMed] [Google Scholar]
