Abstract
Objective
To evaluate the agreement between temperature measured at the axilla and rectum in children and young people.
Design
A systematic review of studies comparing temperature measured at the axilla (test site) with temperature measured at the rectum (reference site) using the same type of measuring device at both sites in each patient. Devices were mercury or electronic thermometers or indwelling thermocouple probes.
Studies reviewed
40 studies including 5528 children and young people from birth to 18 years.
Data extraction
Difference in temperature readings at the axilla and rectum.
Results
20 studies (n=3201 (58%) participants) had sufficient data to be included in a meta-analysis. There was significant residual heterogeneity in both mean differences and sample standard deviations within the groups using different devices and within age groups. The pooled (random effects) mean temperature difference (rectal minus axillary temperature) for mercury thermometers was 0.25°C (95% limits of agreement −0.15°C to 0.65°C) and for electronic thermometers was 0.85°C (−0.19°C to 1.90°C). The pooled (random effects) mean temperature difference (rectal minus axillary temperature) for neonates was 0.17°C (−0.15°C to 0.50°C) and for older children and young people was 0.92°C (−0.15°C to 1.98°C).
Conclusions
The difference between temperature readings at the axilla and rectum using either mercury or electronic thermometers showed wide variation across studies. This has implications for clinical situations where temperature needs to be measured with precision.
Introduction
The presence of fever in children and young people affects the decisions of parents and clinicians. Parents may take vigorous steps to lower their child's temperature and will commonly seek medical advice,1 and clinicians may carry out investigations and interventions, including antipyretics, physical cooling measures, antibiotics, and admission to hospital.2 Measuring temperature in children can be difficult, especially when they are uncooperative or restless. Measurement of rectal temperature is frequently preferred over other ways of taking temperature but may not be acceptable to children and parents.2 The axilla is a safe and accessible site but concerns have been raised about its accuracy.3,4 We therefore systematically reviewed the agreement between temperature measured at the axilla and temperature measured at the rectum.
Methods
Search strategy
Studies were identified by a single reviewer (JVC) through electronic searches (see website) of Medline 1966 to October 1999, CINAHL 1982 to August 1999, the British Nursing Index June 1999, the Cochrane Library (issue 3, 1999), and the journals database of the Royal College of Nursing 1985-99. The National Research Register (issue 2, 1999) was searched for any unpublished studies, and conference abstracts were accessed through the BIDS index to Scientific and Technological Proceedings (1982-99). Authors of studies and suppliers of clinical thermometers were asked to provide details of other studies.
Inclusion criteria
Two reviewers (JVC and Catherine Lees) independently judged the studies for eligibility according to predetermined criteria. We included: method comparison studies where temperature measured at the axilla (test site) was compared with temperature measured at the rectum (reference site) in the same individual; studies of children and adolescents from birth to 18 years; and studies using mercury or electronic thermometers or thermocouple probes.
We excluded children with hypothermia (rectal temperature less than 35.0°C), preterm infants (less than 37 weeks' gestational age), studies using different types of devices at the two sites, and studies where the rectal mercury thermometer was read before three minutes had elapsed (some authors were contacted to clarify placement times).5,6
Data extraction and quality assessment
Two reviewers (JVC and Catherine Lees) independently assessed studies for methodological quality. As there is no validated scoring system for assessing the methodological quality of method comparison studies, we modified a previously published checklist that had been developed for evaluating studies of diagnostic tests (see box).7 There was initial disagreement on occasions. This was resolved by discussion. Two reviewers (JVC and GAL) independently extracted data. When the outcome data were not provided, we asked the authors for the mean difference and standard deviation of the difference between the temperature measured at the axilla and rectum or, where this could not be provided, for the anonymised raw data. Where outcome data were missing, but the mean and standard deviation of the measurements were reported for the two sites separately with a correlation coefficient, we calculated the mean and standard deviation of the differences from these data. Correlation coefficients were not reported in several studies so we estimated these from similar studies.
Criteria and rationale for assessing methodological quality of method comparison studies7*
Were thermometers calibrated?†
Off the shelf thermometers have been shown to be inaccurate by at least 0.1°C4,6
Was the placement time of the thermometer given?†
Mercury thermometers read before stabilisation underestimate body temperature
Were all tests carried out concurrently or immediately sequentially?†
Where there is a delay between the two readings, any difference in the results could potentially be attributed to a change in actual body temperature
Were the test and reference standard measured independently (blind) of each other?
Was the second reading taken before any interventions were given?
Avoids treatment paradox
Were both tests carried out in all children regardless of the first reading?
Avoids verification bias
*Criteria were graded as yes, no, or not stated.
†Additional criteria specific to temperature measurement.
Data analysis
We calculated the upper and lower 95% limits of agreement for each study.8 Where the standard deviation of the differences was estimated with a correlation coefficient from a similar study, we performed a sensitivity analysis including and excluding these studies. In a meta-analysis of randomised controlled trials, a pooled estimate of the relative treatment effect is of interest. For method comparison studies, systematic error (bias) and random error (limits of agreement) are of interest. To obtain a pooled estimate of bias, we used the usual Mantel-Haenszel weighted approach to combine individual study estimates of the mean difference. To obtain pooled estimates of the limits of agreement, we first obtained a pooled estimate of the standard deviation of individual differences and then combined this with the pooled estimate of the mean difference. We hypothesised a priori that type of thermometer, duration of placement time at the axilla for mercury thermometers, and age may be sources of heterogeneity, and we performed subgroup analyses based on these characteristics. Homogeneity of mean differences and standard deviations of differences across studies were evaluated with the standard large sample test.9 In the presence of significant residual heterogeneity, we calculated pooled estimates of the mean difference and the standard deviation of the individual differences using a random effects approach.9 From the combination of these estimates it was possible to calculate pooled estimates of the limits of agreement using a random effects approach. The techniques are described elsewhere (P R Williamson, personal communication).
Results
Description of studies and methodological quality
Overall, 37 papers (34 in English) containing 40 method comparison studies including 5528 children and young people were suitable for inclusion. Disagreement about study inclusion on six occasions was resolved through discussion. Three studies were reported in two publications.10–15 Three publications were each considered to contain two studies because either two different target populations were included and the results for each reported separately16,17 or two different measuring devices were studied in the same children.18 The table gives a description of the studies and dimensions of methodological quality. Disagreement between reviewers on the details of seven studies was resolved by discussion.
Outcome data were available from the article or author or were calculated for 16 studies (2870 (52%) participants). We estimated the standard deviation of the differences in temperature measurements for four studies (331 (6%)) (table). The analysis and conclusions with and without the data from these studies were similar and are included in the results.
Mean axillary temperature was always lower than mean rectal temperature. Significant heterogeneity was found between mean differences within device groups (mercury thermometer: χ2=1305, df=9, P<0.0001; electronic thermometer: χ2=959, df=9, P<0.0001). Significant heterogeneity was found between standard deviations within device groups (mercury: χ2=943, df=9, P<0.0001; electronic: χ2=519, df=9, P<0.0001). The pooled (random effects) mean temperature difference (rectal minus axillary temperature) for mercury thermometers was 0.25°C (95% limits of agreement −0.15°C to 0.65°C) and for electronic thermometers was 0.85°C (−0.19°C to 1.90°C) (fig 1). Studies with mercury thermometers were ordered according to placement time at the axilla (longest to shortest time), and there was a tendency towards improved accuracy as placement time increased.
We grouped neonates separately from other children (fig 2). Significant heterogeneity was found between mean differences within the groups (neonates: χ2=269, df=9, P<0.0001; older children and young people: χ2=548, df=9, P<0.0001). Significant heterogeneity was found between standard deviations within age groups (neonates: χ2=111, df=9, P<0.0001; older children and young people: χ2=169, df=9, P<0.0001). The pooled (random effects) mean temperature difference (rectal minus axillary temperature) for neonates was 0.17°C (−0.15°C to 0.50°C) and for older children and young people was 0.92 (−0.15 to 1.98).
Of the 20 eligible studies with insufficient data (see table A on website), nine studied neonates (mercury thermometer (four studies), electronic thermometer (four), indwelling thermocouple probe (one)), and 11 studied older children and young people (mercury thermometer (three), electronic thermometer (five), and indwelling thermocouple probes (three)).
Discussion
We found large mean differences and wide limits of agreement between temperatures measured at the axilla and those measured at the rectum. Determining febrile status is an important part of the assessment of children and young people who are unwell. Accurate measurement of temperature is required in certain clinical situations or patient groups. In neutropenic patients the decision to commence antibiotics may be made on the basis of an accurate measurement of temperature.19 In neonates accurate measurement of temperature is important for ensuring a thermoneutral state.20 It is believed that rectal temperature can be estimated by adding 1°C to the temperature measured at the axilla. The wide range in the mean differences we have detected suggests that this is not the case.
In general, limits of agreement were narrower when mercury thermometers were used, placement time of mercury thermometers was longer, and measurements were made in neonates. Further investigation by age was not possible because many studies reported only the age range. Electronic thermometers were used in only two studies of neonates. One showed narrow limits of agreement.21 The other, with wide limits of agreement, was the only study published before the 1980s, and a different device, the telethermometer, was used.18 Electronic thermometers were used in eight of the 10 studies of older children and young people. This may have confounded the comparison of mercury with electronic thermometers. In neonates, although agreement is better with longer placement times, this may be difficult to achieve. Young children may be less compliant when placement time is prolonged, which may affect accuracy.
Review methodology
Although we used a sensitive search strategy to identify studies, we may not have identified relevant unpublished evidence. We cannot comment on the impact this may have had on our results because of lack of empirical evidence on publication bias for method comparison studies (P R Williamson, personal communication).
The design of most studies was limited to one measurement per site per participant. Lack of agreement may be caused by poor repeatability at either site. We were not able to look at within site variability to see how much it differed from between site variability as data on results for repeated measurements were not reported and no individual patient data were available. Six of the 20 studies gave the number of febrile children by their own definition (table), but no studies presented data separately to enable analysis of febrile children only. We did not find any evidence that systematic and random error varied by level of temperature.
Methods used in primary studies
Our results may have been influenced by methodological shortfalls in the primary studies. Verification bias was difficult to assess as selection of participants was not always clearly described. All studies seemed to take either convenience or random samples of children from a variety of settings. Seven studies gave specific exclusion criteria, based on clinical conditions. The rest gave no exclusion criteria. We defined verification bias to be the selecting out of participants on the basis of a temperature measurement. This was not evident in any study. There was no evidence of any effect of the quality criteria (see box) when results were subgrouped and factors examined univariately, but the number of studies in each subgroup was small.
Independent measurement of the reference standard and test was not attempted in any study.22 Blinding is likely to be an important methodological issue, especially when placement time is determined by the operator. This may occur when mercury thermometers are used or when electronic thermometers are used in monitor mode rather than predictive mode. In some sequential studies and in those where concurrent measurements were carried out, a different device (of the same type) was used at each site. Calibration is therefore important, even when new thermometers are used.23 Ten studies did not provide details of thermometer calibration before data collection.
When a thermometer is read before stabilisation, temperature is underestimated,24 which may be another problem where placement time is at the discretion of the operator. Six out of 10 studies with mercury thermometers gave details about stabilisation. Mode or placement time was reported in two out of 10 studies with electronic thermometers. In a further two studies the thermometer was read when it beeped, and it is likely that predictive mode was used. Seven studies did not report the depth of placement of the rectal thermometer. In sequential studies the time lapse between the two readings was not always reported. The longer the delay between readings, the more likely there is a change in body temperature, which will affect the second reading.
We recommend that in future studies temperatures should be measured independently at each site in a consecutive series of eligible individuals. All thermometers should be calibrated. Details should be provided about placement time and depth (if appropriate), steps should be taken to ensure stabilisation, and the mode used in electronic thermometers should be stated. Temperature readings should be carried out concurrently or immediately sequentially and the time between measurements clearly documented. The minimum analysis that should be carried out is the Bland and Altman method8 giving plots and 95% limits of agreement. Studies involving replicated or repeated measurements should take this into account in the analysis.
Conclusions
We have shown that in children and young people the agreement between temperature measured at the axilla and temperature measured at the rectum is relatively low. This may prevent low grade fever from being detected and has important implications when body temperature needs to be measured with precision. Further research is needed to establish whether sufficient accuracy can be achieved by measuring temperature at the axilla in neonates. We identified several methodological weaknesses in the included studies, which may have affected the results.
What is already known on this topic
Numerous studies of methods for measuring temperature in children and young people have been carried out
Although the methods and results of the studies vary, there are concerns about the agreement between temperature measured at the axilla and temperature measured at the rectum
What this study adds
In children and young people temperature measured at the axilla does not agree sufficiently with temperature measured at the rectum to be relied on in clinical situations where accurate measurement is important
Variability in results was related to the age of the child and duration of placement time of the measuring device
Research is needed to identify whether sufficient accuracy can be achieved for measurement of temperature at the axilla in neonates
Future studies of temperature measurement in children should be more methodologically rigorous
Supplementary Material
Table.
Authors | No of patients | Age range (mean) | Population | Calibration | Rectal device, placement time, and depth | Axilla device (placement time) | Readings taken | Intervention between readings |
---|---|---|---|---|---|---|---|---|
Mercury versus mercury thermometer | ||||||||
Akinbami and Sowunmi 1991w1 | 104 | 0-48 hours | Neonates in nursery | No | Mercury read at stabilisation (>7 minutes), 2-3 cm | Mercury read at stabilisation (>7 minutes) | Concurrently | No |
Bliss-Holtz 1989w2 | 120 | 12-48 hours | Infants on radiant warmers | Yes | Mercury read at stabilisation (3-5 minutes), 2.5 cm | Mercury read at stabilisation (1-7 minutes) | Sequentially | No |
Eoff et al 1974w3 | 30 | 1-9 days (3.5 days) | Neonates in nursery | Not stated | Mercury read at 5 minutes, 1.5 cm | Mercury read at 5 minutes | Sequentially | No |
Eoff and Joyce 1981w4 | 50 | 1-6 years | Children in hospital | Not stated | Mercury read at 3 minutes, depth not stated | Mercury read at 5 minutes | Sequentially | No |
Haddock et al 1986w5 | 31 | 24-72 hours | Newborn infants | No | Mercury read at stabilisation (1-6 minutes), 2 cm | Mercury read at stabilisation (3-12 minutes) | Sequentially | No |
Khan et al 1990w6 | 30 | 0-28 days (59 hours) | Neonates in nursery | No | Mercury read at stabilisation (1-5 minutes), 2 cm | Mercury read at stabilisation (1-5 minutes) | Concurrently | No |
Kunnel et al 1988w7* | 99 | 1-4 days | Neonates in nursery | Yes | Mercury read at optimal temperature over 15 minutes, 2 cm | Mercury read at optimal temperature over 15 minutes | Concurrently | No |
Mayfield et al 1984w8* | 99 | 1-10 days (4 days) | Newborn infants in nursery | Yes | Mercury read at stabilisation (1-10 minutes), 2 cm | Mercury read at stabilisation (2-10 minutes) | Concurrently | No |
Morley et al 1992w9 | 937 | 0-6 months | Babies at home and in hospital (11% febrile) | Not stated | Mercury read at ⩾1 minute or at stabilisation, 3 cm | Mercury read at ⩾3 minutes | Not stated | Not stated |
Schiffman 1982w10 | 46 | 1 day (3 hours and 43 minutes) | Neonates in nursery | Yes | Mercury (10 minutes), depth not stated | Mercury read at 10 minutes | Sequentially | No |
Electronic versus electronic thermometer | ||||||||
Barrus 1983w11 | 50 | 2-6 years | Children in hospital paediatric unit | Yes | Electronic, mode and depth not stated | Electronic, mode not stated | Sequentially | No |
Cusson et al 1997w12* | 63 | >1 hour | Newborn infants in nursery (22% in incubators, 32% on radiant warmers) | Yes | Electronic, predictive mode, 2.5 cm | Electronic, predictive mode | Sequentially | No |
Eoff et al 1974w3 | 30 | 1-9 days (3.5 days) | Neonates in nursery | Not stated | Electronic telethermometer, depth not stated (5 minutes) | Electronic telethermometer, read at 5 minutes | Sequentially | No |
Jones et al 1993w13 | 573 (sick) and 203 (healthy) | <5 years in both groups | Sick children in outpatient clinic (31% febrile) and healthy children at home | Not stated in either study | In both groups: electronic, mode not stated, 2.3 cm | In both groups: electronic, mode not stated | Concurrently in both groups | No in both groups |
Martyn et al 1988w14* | 70 | 1-5 years (33.2 months) | Well children in clinic (31% febrile) | Yes | Electronic, mode and depth not stated | Electronic, mode not stated | Sequentially | No |
Muma et al 1991w15 | 224 | <3 years (12.4 months) | Infants and children in casualty department (39% febrile) | Yes | Electronic, mode and depth not stated | Electronic, mode not stated | Sequentially | Not stated |
Ogren 1990w16 | 61 | 0-14 years, most <3 years | Children in casualty department (61% febrile) | No | Electronic read at beep, mode and depth not stated | Electronic read at beep, mode not stated | Not stated | Not stated |
Shann and Mackenzie 1996w17 | 100 | 0-14 years | Children in hospital | Yes | Electronic read at one minute, mode not stated, 2, 3, or 4 cm (according to age) | Electronic read at one minute, mode not stated | Sequentially | No |
Weisse et al 1991w18 | 311 | 0-48 months | Children in inpatient and outpatient settings (21% febrile) | Yes | Electronic read at beep, mode not stated, 2-3 cm | Electronic read at beep, mode not stated | Sequentially | Not stated |
Studies in which standard deviation of differences in temperature was estimated.
Acknowledgments
We thank the authors who provided us with data from their studies and the reviewers for their helpful comments.
Footnotes
Funding: JVC is supported by a grant from the Royal Liverpool Children's NHS Trust Endowment Funds.
Conflict of interest: None declared.
Search terms, references, and eligible studies with missing or inappropriate data appear on the BMJ's website
References
- 1.Schmitt BD. Fever phobia: misconceptions of parents about fever. Am J Dis Child. 1980;134:176–181. [PubMed] [Google Scholar]
- 2.Thomas V, Andrea J, Gerhart A, Gocka I. National survey of pediatric fever management practices among emergency department nurses. J Emerg Nurs. 1994;20:505–510. [PubMed] [Google Scholar]
- 3.Keeley D. Taking infants' temperatures. BMJ. 1992;304:931–932. doi: 10.1136/bmj.304.6832.931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Erickson RS, Woo TM. Accuracy of infrared ear thermometry and traditional temperature methods in young children. Heart Lung. 1994;23:181–195. [PubMed] [Google Scholar]
- 5.Haddock B, Vincent P, Merrow D. Axillary and rectal temperatures of full-term neonates: are they different? Neonatal Network. 1986;5:36–40. [PubMed] [Google Scholar]
- 6.Mayfield SR, Bhatia J, Nakamura KT, Rios GR, Bell EF. Temperature measurement in term and preterm neonates. J Pediatr. 1984;104:271–275. doi: 10.1016/s0022-3476(84)81011-2. [DOI] [PubMed] [Google Scholar]
- 7.Cochrane Methods Working Group on systematic review of screening and diagnostic tests: recommended methods. Checklist for studies of diagnostic accuracy. Cochrane Library, issue 3. Oxford: Update Software, 1996.
- 8.Bland JM, Altman DG. Statistical methods for assessing agreement between two measures of clinical measurement. Lancet. 1986;i:307–310. [PubMed] [Google Scholar]
- 9.DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clin Trials. 1986;7:177–188. doi: 10.1016/0197-2456(86)90046-2. [DOI] [PubMed] [Google Scholar]
- 10.Muma BK, Treloar DJ, Wurmlinger K, Peterson E, Vitae A. Comparison of rectal, axillary, and tympanic membrane temperatures in infants and young children. Ann Emerg Med. 1991;20:41–44. doi: 10.1016/s0196-0644(05)81116-3. [DOI] [PubMed] [Google Scholar]
- 11.Treloar D, Muma B. Comparison of axillary, tympanic membrane and rectal temperatures in young children. Ann Emerg Med. 1988;17:435. doi: 10.1016/s0196-0644(05)81116-3. [DOI] [PubMed] [Google Scholar]
- 12.Bliss Holtz J. Comparison of rectal, axillary, and inguinal temperatures in full-term newborn infants. Nurs Res. 1989;38:85–87. [PubMed] [Google Scholar]
- 13.Bliss Holtz J. Determining cold-stress in full-term newborns through temperature site comparisons. Sch Inq Nurs Pract. 1991;5:113–123. [PubMed] [Google Scholar]
- 14.Morley CJ, Hewson PH, Thornton AJ, Cole TJ. Axillary and rectal temperature measurements in infants. Arch Dis Child. 1992;67:122–125. doi: 10.1136/adc.67.1.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Morley CJ. Measuring infants' temperatures. Midwives Chron Nurs Notes. 1992;105:26–29. [PubMed] [Google Scholar]
- 16.Jones RJ, O'Dempsey TJ, Greenwood BM. Screening for a raised rectal temperature in Africa. Arch Dis Child. 1993;69:437–439. doi: 10.1136/adc.69.4.437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Buntain WL, Pregler M, O'Brien PC, Lynn HB. Axillary versus rectal temperature: a comparative study. J La State Med Soc. 1977;129:5–8. [PubMed] [Google Scholar]
- 18.Eoff MJ, Meier RS, Miller C. Temperature measurement in infants. Nurs Res. 1974;23:457–460. [PubMed] [Google Scholar]
- 19.Hughes WT, Armstrong D, Body GP. Guidelines for the use of antimicrobial agents in neutropaenic patients with unexplained fever. J Infect Dis. 1990;161:381–396. doi: 10.1093/infdis/161.3.381. [DOI] [PubMed] [Google Scholar]
- 20.Keeling EB. Thermoregulation and axillary temperature measurements in neonates: a review of the literature. Matern Child Nurs J. 1992;20:124–140. [PubMed] [Google Scholar]
- 21.Cusson RM, Madonia JA, Taekman JB. The effect of environment on body site temperatures in full-term neonates. Nurs Res. 1997;46:202–207. doi: 10.1097/00006199-199707000-00004. [DOI] [PubMed] [Google Scholar]
- 22.Irwig L, Tosteson ANA, Gatsonis C, Lau J, Colditz G, Chalmers TC, et al. Guidelines for meta-analyses evaluating diagnostic tests. Ann Intern Med. 1994;120:667–676. doi: 10.7326/0003-4819-120-8-199404150-00008. [DOI] [PubMed] [Google Scholar]
- 23.Pontious S, Kennedy AH, Shelley S, Mittrucker C. Accuracy and reliability of temperature measurement by instrument and site. J Pediatr Nurs. 1994;9:114–123. [PubMed] [Google Scholar]
- 24.Pugh Davies S, Kassab JY, Thrush AJ, Smith PH. A comparison of mercury and digital clinical thermometers. J Adv Nurs. 1986;11:535–543. doi: 10.1111/j.1365-2648.1986.tb01285.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.