Abstract
BACKGROUND:
Viral laryngotracheobronchitis croup is the most common cause of acute upper airway obstruction in young children. Clinical assessment of children with croup is often performed using ‘croup scores’; however, these scores have not been validated outside of the research setting.
OBJECTIVE:
To determine the reliability of clinical observation items in croup scores in a paediatric emergency department (ED) setting.
DESIGN:
Literature review identified 12 observation items (level of consciousness or mental status, inspiratory breath sounds, air entry, stridor, cough, cyanosis or colour, anxiety or air hunger, retractions and/or flaring, respiratory rate and heart rate, oxygen saturation and respiratory distress); overlapping items were combined, yielding 10 variables. In a prospective cohort study over 13 months, patients presenting with croup were observed independently, and croup scores were assigned by the triage nurse, ED nurse and the ED physician before treatment. Agreement among observers for clinical observations was analysed using Cohen’s quadratic weighted kappa.
SETTING:
University-affiliated, paediatric hospital ED providing primary care to an urban area (population 330,000).
PATIENTS:
Children aged three months to five years presenting with viral croup (preceding history of at least one day of upper respiratory tract symptoms associated with barking cough and/or hoarseness and/or stridor).
RESULTS:
One hundred fifty-eight children meeting inclusion criteria for croup were assessed by three observers within 1 h of each other’s assesments and before treatment. Interobserver agreement among the three observers using weighted kappa was greater than chance for all clinical observation items and ranged from fair to moderate (0.2 to 0.4 and 0.4 to 0.6, respectively).
CONCLUSIONS:
In the busy practice setting of a paediatric ED, substantial interobserver variability exists among health care providers in the measurement of respiratory signs associated with croup in young children. Based on the present study in a practice setting and two research studies, the most reliable items of all of the published items included in croup scoring systems were stridor and retractions.
Keywords: Croup, Health measurement scale, Viral laryngotracheobronchitis
Abstract
HISTORIQUE :
La laryngotrachéobronchite virale (croup) est la cause la plus courante d’obstruction aiguë des voies respiratoires supérieures chez les jeunes enfants. L’évaluation clinique des enfants atteints du croup se fait souvent au moyen d’«indices du croup». Cependant, ces indices n’ont été validés qu’en milieu de recherche.
OBJECTIF :
Déterminer la fiabilité des éléments d’observation clinique dans les indices du croup à la salle d’urgence (l’urgence) d’un département de pédiatrie.
MÉTHODOLOGIE :
Un relevé de la documentation médicale a permis de repérer 12 éléments d’observation (niveau de conscience ou état mental, bruits inspiratoires, entrée d’air, stridor, toux, cyanose ou coloration, anxiété ou suffocation, rétractions thoraciques ou battement des ailes du nez, rythmes respiratoire et cardiaque, saturation en oxygène et détresse respiratoire). Les éléments qui se chevauchaient ont été combinés, pour ne laisser que 10 variables. Dans le cadre d’une étude prospective par cohorte menée sur une période de 13 mois, les patients se présentant avec le croup ont été observés séparément, par l’infirmière du tri, l’infirmière de l’urgence et le médecin de l’urgence, qui leur ont attribué des indices du croup avant de choisir le traitement. La concordance entre les observateurs quant aux observations cliniques a fait l’objet d’une analyse au moyen du test de concordance Kappa élaboré par Cohen.
LIEU :
L’urgence d’un hôpital pédiatrique universitaire offrant des soins primaires en zone urbaine (population de 330 000 habitants).
PATIENTS :
Enfants de trois mois à cinq ans se présentant avec un croup viral (antécédents d’au moins une journée de symptômes des voies respiratoires supérieures associés à une toux aboyante, un enrouement ou un stridor).
RÉSULTATS :
Cent cinquante-huit enfants respectant les critères d’inclusion du croup ont été évalués par trois observateurs à moins d’une heure d’intervalle avant d’être traités. La concordance entre les observateurs au moyen du test de concordance Kappa était plus élevée que ne pouvait le laisser supposer le simple hasard pour tous les éléments d’observation clinique et variait entre acceptable et moyen (0,2 à 0,4 et 0,4 à 0,6, respectivement).
CONCLUSIONS :
Dans le milieu occupé des urgences pédiatriques, il existe une importante variabilité entre les observations des différents dispensateurs de soins qui évaluent les signes respiratoires associés au croup chez les jeunes enfants. D’après la présente étude en milieu clinique et deux autres études, les éléments les plus fiables publiés parmi tous ceux des indices du croup sont le stridor et les rétractions.
Scoring systems or ‘health measurement instruments’ are used in both patient care and research settings to quantify clinical information gathered during the history taking, physical examination and investigation of patients. Scoring systems, such as the Glasgow Coma Scale (1) or the Apgar score (2), have become integrated into paediatric medicine.
Viral laryngotracheobronchitis (croup) is the most common cause of acute upper airway obstruction in young children (3). In the emergency department (ED) or physician’s office, clinical assessment is directed at determining the extent to which inflammatory narrowing of the subglottic larynx interferes with pulmonary gas exchange. Croup scoring systems, which incorporate items such as ‘stridor’, ‘level of conciousness’ or ‘retractions’, have been used as evaluative outcome measures (to measure the magnitude of change over time) (4) in clinical trials of the efficacy of steroids (5) and as discriminative instruments (4) to identify children who may benefit from treatment (6). The IWK Health Centre, Halifax, Nova Scotia has used a croup score for two decades for nursing and physician assessment of the severity of illness and response to therapy; however, periodically, practitioners have questioned the score’s usefulness (7). When the authors informally surveyed other Canadian paediatric hospitals in 1995, about 50% of hospitals reported using a croup scoring system. Because croup scores have not been evaluated outside of the research setting, the authors sought to determine the reliability or interobserver variability of clinical assessment items in croup scores in the practice setting of a busy paediatric ED.
PATIENTS AND METHODS
Item identification
To identify published croup measurement instruments, MEDLINE was searched from 1966 to 1995 using the medical subjects heading term ‘croup’. Twelve items (level of consciousness or mental status, inspiratory breath sounds, air entry, stridor, cough, cyanosis or colour, anxiety or air hunger, retractions and/or flaring, respiratory rate and heart rate, oxygen saturation and respiratory distress) were identified from 10 croup scoring instruments (6,8–16) (Table 1). Five other scoring instruments modified single items in a previously published score (17–19). Responses to items, with the exception of two categorical variables (13), were ordinal in nature, and were scored between 0 and 3. In all cases, items were combined into scales by a summation of the scores on the individual items.
TABLE 1:
Variations in the composition of published croup scoring instruments
| Observation items | Croup scoring instruments | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| James (8) | Gardner et al (9) | Downes and Raphaely (6) | Taussig et al (10) | Lockhart and Battaglia (11) | Westley et al (12) | Leipzig et al (13) | Geelhoed and Macdonald (15) | Godden et al (16) | |
| Inspiratory breath sounds | + | ||||||||
| Stridor | + | + | + | + | + | + | + | + | + |
| Cough | + | + | + | + | |||||
| Retractions | + | + | + | + | + | + | + | + | + |
| Cyanosis or colour | + | + | + | + | + | + | + | ||
| Anxiety or air hunger | + | + | |||||||
| Air entry | + | + | + | ||||||
| Level of consciousness | + | + | + | ||||||
| Respiratory rate | + | ||||||||
| Heart rate | + | ||||||||
| Oxygen saturation | + | ||||||||
| Respiratory distress | + | ||||||||
| Total number of items | 4 | 5 | 5 | 5 | 6 | 5 | 5 | 2 | 5 |
| Maximum score | 10 | 10 | 10 | 14 | 12 | 17 | 11 | 6 | 17 |
+ Item included in scoring instrument
The croup scoring instrument used in the present study incorporated all of the items identified in the literature review (Table 2). The item ‘anxiety or air hunger’ (9) was assumed to be incorporated in the restlessness associated with the item ‘level of consciousness’, and the item ‘inspiratory breath sounds’ (6) was assumed to be incorporated under ‘stridor’. Heart rate and respiratory rate were transformed into a numerical score by using a previously published method (13). Scores for individual items within the total score ranged from 0 to 3; the total score was out of 18.
TABLE 2:
Observation items in a croup scoring instrument used to assess children presenting with viral croup to a paediatric emergency department
| Observation items* | Scoring system | |||
|---|---|---|---|---|
| 0 points | 1 point | 2 points | 3 points | |
| Stridor | None | With stimulation | At rest | Inspiration and expiration at rest |
| Retractions | None | Mild | Moderate | Accessory muscles |
| Air entry | None | Mild decrease | Moderate decrease | Markedly decreased |
| Colour | Normal | Normal | Normal | Cyanosis |
| Level of consciousness | Normal | Restless when disturbed | Anxious or restless (undisturbed) | Lethargic |
| Cough | None | Harsh cry | Bark | Severe paroxysms |
Respiratory rate (breaths/min) and heart rate (beats/min) were transformed into a numercial score using a previously published method (13). Oxygen saturation was not included
Patient population
All children admitted to the ED at the IWK Health Centre from October 1, 1995 to November 30, 1996 were considered for study entry. The IWK Health Centre is a university-affiliated, children’s and maternity hospital providing referral care for the maritime provinces of Canada (population approximately two million) and primary care for the urban area of Halifax (population 330,000). There are about 27,000 visits annually.
Inclusion criteria were an attending ED physician’s diagnosis of viral croup or viral laryngotracheobronchitis (preceding history of at least one day of upper respiratory tract symptoms associated with barking cough and/or hoarseness and/or stridor) in children aged three months to five years. Patients with spasmodic croup, epiglottitis, bacterial tracheitis, foreign body aspiration and allergic reactions were excluded from the study.
Measurement
Each patient was assessed independently by the triage nurse, ED nurse and ED physician using the study instrument (Table 2). Before the study commenced, the study nurse met with groups of physicians and nurses to train them in the use of the study instrument and study procedures. The study nurse visited the ED daily to encourage staff compliance with record completion and to collect data collection forms. The ED is staffed by 15 physicians, 40 nurses and 20 paediatric residents. All clinical assessments were made before treatment was started, and observations by all three observers had to be completed within 1 h to limit the effects of a child’s ‘state’ variation on respiratory signs. Scoring was performed with the child resting as quietly as possible in the sitting position. Respiratory rate and heart rate were counted over at least 30 s. Scoring sheets were placed face down in a designated box at the ward clerk’s station immediately after completion so that observers would not be influenced by each other’s observations.
This study was approved by the Research Ethics Board of the IWK Health Centre.
Statistical analysis
Observed agreement includes the agreement occurring by chance alone and agreement beyond chance. The degree of agreement among observers for clinical observations was represented using Cohen’s quadratic weighted kappa, which measures agreement beyond chance (20,21). Quadratic weighted kappa scores give gradually decreasing weight the more two observers differ on an ordinal scale with multiple levels and closely agree with the intraclass correlation coefficient when data are continuous in nature. The magnitudes of quadratic weighted kappa tend to be higher than those of unweighted kappa scores. The computer software STATA, version 5 (Stata Corporation, USA), was used for the analyses.
RESULTS
Of 478 children with croup presenting to the ED during the study period, croup scoring sheets for 158 patients were completed by all three ED observers within the required time period and before therapy. The mean croup scores for the excluded patients were not different from those of the 158 patients. The mean age was two years.
The mean croup score was 3.5 (±2 SD). Only one patient had an abnormal score for the item ‘colour’, and only two patients had an abnormal score for the item ‘level of consciousness’. Interobserver agreement among nurses and physicians for clinical signs was greater than chance for all observation items (Table 3). The highest agreement (0.4 to 0.5) was seen for ‘stridor’ and ‘retractions’. Agreement for colour could not be calculated because an abnormality was only identified in one patient by one observer. Early in the study, ED personnel indicated unwillingness to measure oxygen saturation for every patient because it was not believed to be an accurate estimate in young children due to movement artifact; thus, this item was not included in the analysis.
TABLE 3:
Observer agreement measured by quadratic weighted kappa scores among the triage nurse, emergency department (ED) nurse and ED physician for clinical signs associated with croup
| Observation item | Triage nurse versus ED nurse | ED nurse versus ED physician | Triage nurse versus ED physician |
|---|---|---|---|
| Stridor | 0.39 | 0.39 | 0.27 |
| Retraction | 0.29 | 0.51 | 0.30 |
| Air entry | 0.21 | 0.16 | 0.30 |
| Heart rate | 0.20 | 0.14 | 0.17 |
| Respiratory rate | 0.17 | 0.15 | 0.24 |
| Cough | 0.15 | 0.28 | 0.09 |
| Level of consciousness | 0.27 | 0.24 | 0.12 |
| Colour* |
Only one patient had an abnormal score
DISCUSSION
In the present study of croup scoring in the paediatric ED setting, the reliability of generally accepted observation items on croup scoring measures was less than had been anticipated. Reliability, the reproducibility of a measurement item in different conditions, must be acceptable if a health index is to be useful for clinical decision-making. Observed agreement for individual items was greater than that expected by chance alone, and the strength of agreement is classified as fair to moderate (0.21 to 0.4 and 0.41 to 0.6, respectively), according to the guidelines for interpreting strength of agreement suggested by Landis and Koch (22).
A wide spectrum of reliability has been reported for objective signs of respiratory tract illness. Wang et al (23,24), in two studies of children with lower respiratory tract infection, found agreement for retractions less than 50% beyond chance, 0.31 for wheezing and 0.25 to 0.97 for respiratory rate. Among physician observers of infants with lower respiratory tract infection, agreement ranged from 0.2 to 0.66 for auscultatory findings, respiratory effort and colour (25). In studies of faculty and trainee clinicians assessing over 30 observation items in adults with respiratory disease, only tachypnea and percussion note had agreement more than 50% beyond chance (26–29). In the context of randomized, controlled clinical trials of steroid use for croup with fewer numbers of trained observers, much better agreement for at least two scoring items has been reported. Klassen and Rowe (30) reported inter-rater reliability of 0.47 for air entry, 0.93 for stridor and 0.87 for retractions (weighted kappa) among two physicians and three research assistant observers on a sample of 19 patients. Geelhoed and Macdonald (15) reported agreement of 0.85 for stridor and retractions among triage nurses on a sample of 17 patients. Our study of 158 patients is, thus, much larger than any previously published analysis of croup scoring methods and, to our knowledge, the first study of croup scoring in the practice setting.
Assessment by limited numbers of observers with similar skills most likely accounts for superior agreement among observers in the research context (15,30) compared with our study, which was performed in a busy ED with many different observers. Even though we required that all three clinical observations be completed within 1 h, variation in inter-rater assessments may also be due to variability in the clinical status of patients over short periods of time or to the effects of state variation (eg, wakefulness) on respiratory signs.
Although there is evidence from research settings that croup scoring by limited observers is a useful evaluative measure with good reliability for two of the 12 items reported in the literature, we have found that croup scoring is not highly reliable in a typical ED where multiple clinicians provide patient care. Of the various items that have been proposed to be included in a composite croup score, the most reliable, based on our study in a practice setting and two research studies (15,30), are stridor and retractions. Educational sessions for physical therapists have been shown to improve inter-rater reliability of lung auscultation (31). If croup scoring is to be used to guide medical decisions, especially by observers with varying experience, training must be routinely implemented to improve inter-rater reliability.
Acknowledgments
The authors thank the nurses and physicians at the IWK Health Centre Emergency Department and Mrs Heather Samson for their collaboration in this study.
Footnotes
This paper was prepared from presentations made at the Society for Pediatric Research, Washington, DC, May 2 to 6, 1997 and at the 36th Annual Meeting of the Infectious Disease Society of America, Denver, Colorado, November 12 to 15, 1998. The research was supported in part by a grant from Research Services, IWK Health Centre, Halifax, Nova Scotia
REFERENCES
- 1.Teasdale G, Jennett B. Assessment of coma and impaired consciousness. A practical scale. Lancet. 1974;ii:81–4. doi: 10.1016/s0140-6736(74)91639-0. [DOI] [PubMed] [Google Scholar]
- 2.Apgar V. The newborn (Apgar) scoring system. Reflections and advice. Pediatr Clin North Am. 1966;13:645–50. doi: 10.1016/s0031-3955(16)31874-0. [DOI] [PubMed] [Google Scholar]
- 3.Orenstein D. Acute inflammatory upper airway obstruction. In: Behrman RE, Kliegman R, Arvin AM, editors. Nelson Textbook of Pediatrics. Philadelphia: WB Saunders and Co; 1996. [Google Scholar]
- 4.Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis. 1985;38:27–36. doi: 10.1016/0021-9681(85)90005-0. [DOI] [PubMed] [Google Scholar]
- 5.Ausejo M, Saenz A, Pham B, et al. The effectiveness of glucocoricoids in treating croup: Meta-analysis. BMJ. 1999;319:595–600. doi: 10.1136/bmj.319.7210.595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Downes J, Raphaely R. Pediatric intensive care. Anesthesia. 1975;43:238–43. doi: 10.1097/00000542-197508000-00009. [DOI] [PubMed] [Google Scholar]
- 7.MacFarlane PI, Suri S. Steroid in the management of croup. Croup scores are rarely used in practice. BMJ. 1996;312:510. doi: 10.1136/bmj.312.7029.510b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.James J. Dexamethasone in croup. Am J Dis Child. 1969;117:511–6. doi: 10.1001/archpedi.1969.02100030513003. [DOI] [PubMed] [Google Scholar]
- 9.Gardner HG, Powell KR, Roden VJ, Cherry JD. The evaluation of racemic epinephrine in the treatment of infectious croup. Pediatrics. 1973;52:52–5. [PubMed] [Google Scholar]
- 10.Taussig LM, Castro O, Beaudry PH, Fox WW, Bureau M. Treatment of laryngotracheobronchitis (croup). Use of intermittent positive-pressure breathing and racemic epinephrine. Am J Dis Child. 1975;129:790–3. doi: 10.1001/archpedi.1975.02120440016004. [DOI] [PubMed] [Google Scholar]
- 11.Lockhart CH, Battaglia JD. Croup (laryngotracheal bronchitis) and epiglottitis. Pediatr Ann. 1977;6:262–9. [PubMed] [Google Scholar]
- 12.Westley CR, Cotton EK, Brooks JG. Nebulized racemic epinephrine by IPPB for the treatment of croup: A double-blind study. Am J Dis Child. 1978;132:484–7. doi: 10.1001/archpedi.1978.02120300044008. [DOI] [PubMed] [Google Scholar]
- 13.Leipzig B, Oski F, Cummings C, Stockman J, Swender P. A prospective randomized study to determine the efficacy of steroids in treatment of croup. J Pediatr. 1979;94:194–6. doi: 10.1016/s0022-3476(79)80821-5. [DOI] [PubMed] [Google Scholar]
- 14.Kuusela AL, Vesikari T. A randomized double-blind, placebo-controlled trial of dexamethasone and racemic epinephrine in the treatment of croup. Acta Paediatr Scand. 1988;77:99–104. doi: 10.1111/j.1651-2227.1988.tb10606.x. [DOI] [PubMed] [Google Scholar]
- 15.Geelhoed GC, Macdonald WB. Oral and inhaled steroids in croup: A randomized, placebo-controlled trial. Pediatr Pulmonol. 1995;20:355–61. doi: 10.1002/ppul.1950200604. [DOI] [PubMed] [Google Scholar]
- 16.Godden CW, Campbell MJ, Hussey M, Cogswell JJ. Double blind placebo controlled trial of nebulised budesonide for croup. Arch Dis Child. 1997;76:155–8. doi: 10.1136/adc.76.2.155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Super D, Cartelli N, Brooks L, Lembo R, Kumar M. A prospective randomized double-blind study to evaluate the effect of dexamethasone in acute laryngotracheitis. J Pediatr. 1989;115:323–9. doi: 10.1016/s0022-3476(89)80095-2. [DOI] [PubMed] [Google Scholar]
- 18.Husby S, Agertoft L, Mortenson S, Pederson S. Treatment of croup with nebulized steroid (budesonide): A double blind, placebo controlled study. Arch Dis Child. 1993;68:352–5. doi: 10.1136/adc.68.3.352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Davis HW, Gartner JC, Galvis AG, Michaels RH, Mestad PH. Acute upper airway obstruction: Croup and epiglottitis. Pediatr Clin North Am. 1981;28:859–80. doi: 10.1016/s0031-3955(16)34073-1. [DOI] [PubMed] [Google Scholar]
- 20.Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46. [Google Scholar]
- 21.Cohen J. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968;70:213–20. doi: 10.1037/h0026256. [DOI] [PubMed] [Google Scholar]
- 22.Landis J, Koch G. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74. [PubMed] [Google Scholar]
- 23.Wang EE, Milner RA, Navas L, Maj H. Observer agreement for respiratory signs and oximetry in infants hospitalized with lower respiratory infections. Am Rev Respir Dis. 1992;145:106–9. doi: 10.1164/ajrccm/145.1.106. [DOI] [PubMed] [Google Scholar]
- 24.Wang EE, Law BJ, Stephens D, et al. Study of interobserver variability in clinical assessment of RSV lower respiratory tract illness: A Pediatric Investigators Collaborative Network on Infections in Canada (PICNIC) study. Pediatr Pulmonol. 1996;22:23–7. doi: 10.1002/(SICI)1099-0496(199607)22:1<23::AID-PPUL4>3.0.CO;2-L. [DOI] [PubMed] [Google Scholar]
- 25.Margolis PA, Ferkol TW, Marsocci S, et al. Accuracy of the clinical examination in detecting hypoxemia in infants with respiratory illness. J Pediatr. 1994;124:552–60. doi: 10.1016/s0022-3476(05)83133-6. [DOI] [PubMed] [Google Scholar]
- 26.Godfrey S, Edwards H, Campbell E, Armitage P, Oppenheimer E. Repeatability of physical signs in airways obstruction. Thorax. 1969;24:4–9. doi: 10.1136/thx.24.1.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Fletcher C. The clinical diagnosis of pulmonary emphysema – an experimental study. Proc R Soc Med. 1952;45:577–84. [PubMed] [Google Scholar]
- 28.Smyllie H, Blendis L, Armitage P. Observer agreement in physical signs of the respiratory system. Lancet. 1965;ii:412–3. doi: 10.1016/s0140-6736(65)90759-2. [DOI] [PubMed] [Google Scholar]
- 29.Spiteri MA, Cook DG, Clarke SW. Reliability of eliciting physical signs in examination of the chest. Lancet. 1988;i:873–5. doi: 10.1016/s0140-6736(88)91613-3. [DOI] [PubMed] [Google Scholar]
- 30.Klassen T, Rowe P. The croup score as an evaluative instrument in clinical trials. Arch Pediatr Adolesc Med. 1995;149:60. (Abst) [Google Scholar]
- 31.Brooks D, Thomas J. Interrater reliability of auscultation of breath sounds among physical therapists. Phys Ther. 1995;75:1082–8. doi: 10.1093/ptj/75.12.1082. [DOI] [PubMed] [Google Scholar]
