Skip to main content
BMJ Paediatrics Open logoLink to BMJ Paediatrics Open
. 2025 Mar 12;9(1):e002891. doi: 10.1136/bmjpo-2024-002891

Artificial intelligence for weight estimation in paediatric emergency care

Iraia Isasi 1,2,, Elisabete Aramendi 2,3, Erik Alonso 1,2, Sendoa Ballesteros-Peña 2; WEST research group
PMCID: PMC11907021  PMID: 40074244

Abstract

Objective

To develop and validate a paediatric weight estimation model adapted to the characteristics of the Spanish population as an alternative to currently extended methods.

Methods

Anthropometric data in a cohort of 11 287 children were used to develop machine learning models to predict weight using height and the body mass index (BMI) quartile (as surrogate for body habitus (BH)). The models were later validated in an independent cohort of 780 children admitted to paediatric emergencies in two other hospitals. The proportion of patients with a given absolute percent error (APE) was calculated for various APE thresholds and compared with the available weight estimation methods to date. The concordance between the BMI-based BH and the visual assessment was evaluated, and the effect of the visual estimation of the BH was assessed in the performance of the model.

Results

The machine learning model with the highest accuracy was selected as the final algorithm. The model estimates weight from the child’s height and BH (under-, normal- and overweight) based on a support vector machine with a Gaussian-kernel (SVM-G). The model presented an APE<10% and <20% for 74.7% and 96.7% of the children, outperforming other available predictive formulas by 3.2–37.5% and 1.3–29.1%, respectively. Low concordance was observed between the theoretical and visually assessed BH in 36.7% of the children, showing larger errors in children under 2 years.

Conclusions

The proposed SVM-G is a valid and safe tool to estimate weight in paediatric emergencies, more accurate than other local and global proposals.

Keywords: Child Health, Machine Learning, Resuscitation


WHAT IS ALREADY KNOWN ON THIS TOPIC

  • In critical medical situations, weighing the child is not always possible. Current paediatric weight estimation methods often lack accuracy and vary by geographic population.

WHAT THIS STUDY ADDS

  • This study introduces a more accurate artificial intelligence-based weight estimation model for Spanish children.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • Estimating weight would enable the calculation of optimal dosages for weight-based medications during emergency treatment.

Introduction

Paediatric emergency interventions frequently require the management of the children’s weight, including the dosing of medications or resuscitation fluids, the adjustment of cardioversion/defibrillation joules and the choice of disposal/device sizes.1,3 However, in critical medical situations, especially during cardiopulmonary resuscitation,4 5 weighing the child is not possible, either because the child’s unconsciousness impedes his/her stability on the scale or because urgent treatments need to be administered. Besides, parents’ information is not always available, especially in the pre-hospital environment. Accurate and rapid weight predictors are therefore essential in the aforementioned situations.

The earliest methods developed for paediatric weight estimation include age-based formulae and length-based systems using tape measures. Age-based formulae6,11 are easy to use and provide an absolute percent error (APE) lower than 10% and 20% in around 40% and 70% of the cases, respectively. This performance is well below the requirements for an accurate weight estimation tool, that is, an APE lower than 10% and 20% in at least 70% and 95% of the cases, respectively.12,14 Regarding length-based methods, the most widely adopted system in paediatric emergency medicine has been the Broselow tape, and although it is still widely used, it does not take into consideration the child’s body habitus (BH) and usually underestimates weight, especially in obese children.15,17 Newer and more accurate methods have been developed based on the length and BH, namely Mercy,18 Wozniak,19 DWEM5 (Devised Weight Estimation Method), Yamamoto,20 PAWPER3 (Pediatric Advanced Weight Prediction in the Emergency Room) and a modified version of PAWPER3 called PAWPER XL-MAC.21 DWEM, Yamamoto and PAWPER methods are based on the child’s height and the visual estimation of the BH by healthcare personnel to predict weight, whereas Mercy/Wozniak/PAWPER XL-MAC methods use the mid-arm circumference (MAC) (as surrogate of the BH) and humeral/tibial/total length to predict weight.

Unfortunately, the performance of these tools is subject to geographical variability since populations worldwide differ in their physical traits.22 23 It is reasonable, therefore, to assume that the effectiveness of weight estimation strategies should be evaluated in each sociodemographic context. For instance, in Spain, the most commonly recommended international formulas do not offer satisfactory results, and their use has been questioned.24 A more promising weight estimation tool designed and validated on a Spanish paediatric population was recently published by Ballesteros et al.25 The method provided easy-to-use formulas derived from linear regression models fed by the child’s height and BH,25 and clearly outperformed other weight estimators when evaluated over a Spanish population data set.

This study focuses on the development and validation of a paediatric weight estimation model that uses the visual estimation of the BH and height, with the aim of outperforming state-of-the-art weight estimation formulas over the Spanish population. Techniques framed in the field of artificial intelligence were applied, specifically machine learning models, which combine information on the child’s height and BH.

Materials and methods

Data collection

This section describes the characteristics of two independent data sets used for the development and validation of the weight prediction models proposed in this study.

Development database

Anthropometric data of 11 287 children with a median (IQR) age of 6 (3–10) years, collected through the global electronic clinical database of Osakidetza (Basque Health Service), were used for the development of the machine learning models. Data on patients’ sex, height and weight were obtained in routine health checks or specific consultations during 2019 at the primary care service. In addition, the body mass index (BMI) was calculated based on the height and weight of each patient. Patients were grouped according to height in 5 cm intervals, from 55 to 140 cm, and each interval was stratified into quartiles based on BMI. This stratification was used to approximate the theoretical BH of each child, assuming that those included in Q2–Q3 would be assigned to the normal BH group; Q1 would correspond to the underweight group and Q4 to the over-weight one.

Validation database

For the clinical validation process, data from a cohort of 780 patients attending the paediatric emergency department of two other hospitals in Bilbao and Barakaldo, Spain, were used. Nine nurses with at least a 2-year experience in the service were involved in the data collection. For each patient, sex and age variables were recorded, height, humeral length, MAC and weight measurements were taken and the BH was visually estimated by the nurses (under-, normal-, overweight). In total, there were 200 underweight, 391 normal-weight and 189 overweight patients. The theoretical BH group for each child was also subsequently determined using the method described above dependent on the BMI. Data on this cohort were previously used for the development of the Ballesteros et al25 method, so the details on the characteristics and anthropometry can be found there. Of the 780 children, measurements of humeral length and MAC were only taken in the first 516 patients. This initial cohort was used for a state-of-the-art comparative analysis,24 thus additional measurements had to be included.

Patient and public involvement

The need for an accurate weight estimation method in paediatric emergencies was identified through clinical experience and challenges observed in daily practice. While patients did not participate directly in the study design, the research used anthropometric data that reflect the characteristics of the Spanish paediatric population. Patients were actively recruited during visits to the paediatric emergency department. Informed consent was obtained from both the patient and their legal guardian prior to performing the necessary anthropometric measurements. Results will be shared with the medical community through publications and presentations at conferences. Additionally, consideration will be given to creating informative materials for caregivers regarding the importance of accurate weight estimation in paediatric emergencies. We extend our gratitude to the patients and their families who consented to participate in this study, allowing for the collection of anthropometric data that will enhance emergency care for children.

Bidimensional weight estimation models

Three machine learning models that predict a child’s weight based on height and BH were trained using the development database. Three machine learning models with progressively greater ability to identify nonlinear data patterns were selected: simple linear regression, second-degree polynomial regression and a support vector machine with a Gaussian kernel (SVM-G). Two parameters had to be optimised for the SVM-G: γ and C, the width of the Gaussian-kernel and the flexibility of the decision boundary, respectively. The criterion to select the optimal C and γ values was the minimisation of the mean APE of weight estimation obtained inside a five-iteration cross-validation loop. This was repeated on 100 random partitions in order to estimate the optimal statistical distribution of γ and C. The mean values of these distributions were subsequently used in the validation process of the SVM-G. The LIBSVM library26 was used to implement the SVM-G, and simple linear regression and second-degree polynomial regression models were implemented using the Statistical and Machine Learning toolbox in MATLAB.

State-of-the-art predicting models

The performance of the best machine learning model was compared with the weight predictive algorithms proposed in the literature, including age-based formulae (APLS 2011,11 ERC,10 Argall,7 Best-Guess8 9 and Luscombe6), a length-based system (Broselow15) and four bidimensional weight estimation systems: two based on the child’s length and the visual estimation of the BH (DWEM5 and Ballesteros et al25), one based on the humeral length and MAC (Mercy18) and one based on the child’s length and the estimation of the BH based on MAC (PAWPER XL-MAC21). Each of these algorithms was ad hoc designed for different populations worldwide: the USA (APLS 2011,11 Broselow,15 Mercy,18 DWEM,5 PAWPER XL-MAC21), Europe (ERC,10 Ballesteros25), the UK (Argall,7 Luscombe6) and Australia (Best-Guess8 9). All these formulas were directly evaluated on the validation database, with no adjustment of the original parameters.

All methods were assessed using the complete data set of 780 patients, except for Mercy and PAWPER XL-MAC, which could only be evaluated using the cohort of 516 patients, as it required humeral length and MAC measurements (refer to section ‘Validation database’).

The state-of-the-art methods that showed the best performance without adjustment of the original parameters were adapted to the Spanish population. For that, the original parameters corresponding to the best performing state-of-the-art methods (Broselow, DWEM and PAWPER XL-MAC) were adjusted according to the training database of the present study as follows:

  1. Broselow: The 50th percentile weight was determined for 5 cm height intervals of the training set. In the validation database, each child’s height was rounded to the nearest 5 cm interval, so that the corresponding weight estimate could be assigned.

  2. DWEM: The 5th, 50th and 95th percentile weights were determined for 5 cm height intervals of the training set. These percentiles determined the weights corresponding to three levels of BH (under-, normal- and overweight). In the validation database, each child’s height was rounded to the nearest 5 cm interval, so that the corresponding weight estimate could be assigned based on the visual BH.

  3. PAWPER XL-MAC: Since the training data set of the present study (780 patients) does not include MAC measurements, the 516 patients from the test database that contain MAC measurements were used to both train and test the PAWPER XL-MAC method. So, 75% (387 patients) were used to train the model/create the weight prediction measuring tape (similar to that presented in PAWPER XL-MAC21 paper), while the remaining 25% (129 patients) were used for testing. The training set was grouped based on height in 5 cm intervals, ranging from 50 cm to 140 cm, with each interval stratified into 7 centiles to define the corresponding MAC ranges for each BH level: BH1 (5th centile), BH2 (15th centile), BH3 (50th centile), BH4 (85th centile), BH5 (95th centile), BH6 (97th centile) and BH7 (99th centile). For each combination of height and BH interval, total body weight was calculated using the 5th, 10th, 15th, 50th, 85th, 95th and 97th centiles of the weight data for patients within those ranges. Once the measuring tape was created, the tape was evaluated using the test set: each child’s height was rounded to the nearest 5 cm interval, allowing the corresponding weight estimate to be assigned by rounding the child’s MAC to the nearest interval.

Performance metrics

The article reports performance metrics of predictive models as mean (95% CI) for those that passed the Kolmogorov-Smirnov normality test or as median (IQR) or median (2.5th–97.5th percentile range) for those that did not.

The performance of the three proposed machine learning models was assessed on the validation database using both theoretical and visually assessed BH. First, the proportion of estimates that showed an APE lower than 5%, 10% and 20% were calculated, hereinafter referred to as APE5, APE10 and APE20, respectively. These metrics are critical for evaluating the clinical utility of the model as they give a clear indication of how often the estimations are sufficiently close to the actual weights, which is particularly important for clinical decision-making.

Second, APEs [APE=100|estimated weight − real weight|/real weight] and percent errors (PE) [PE=100(estimated weight − real weight)/real weight] were reported for the best of the three machine learning solutions. Mean PE considers the sign of the error when averaging, so we can see if the predictions are generally biased in one direction, that is, provides insight into whether the model tends to overestimate or underestimate the weight. However, this metric can be misleading because positive and negative errors can cancel each other out, resulting in a low or zero mean error even when significant individual errors exist. That is why the median APE was also calculated. This metric removes the issue of error cancellation by taking the absolute value of each error before averaging, ensuring that all errors contribute positively to the final score. By using both metrics, we can gain a better understanding of the model’s bias (through mean PE) and its overall accuracy or absolute bias (through median APE). In the case of PE, which follows a normal distribution, the 95% CI, calculated as ±1.96 × SD, denotes the precision of the mean estimate, reflecting the uncertainty associated with the bias. It would indicate that the mean PE of a new sample would likely fall within this interval with 95% probability, thereby supporting the reproducibility of the results. In the case of APE, which do not follow a normal distribution, the (2.5th–97.5th percentile range) indicates where 95% of the data falls, reflecting the overall variability and spread of the APE in our cohort.

Third, the Bland-Altman plot was used to show the distribution of the PE for the best of the three machine learning solutions as a function of the actual weight values, indicating the mean PE and its 95% CI. Visualising the distribution of the PE with this supplementary information provides an efficient means to evaluate the overall performance of the model at a glance. It facilitates the identification of outliers, weight ranges where overestimation or underestimation tendencies occur, and weight ranges where errors are more pronounced.

Fourth, 100 bootstraps, that is, replicates of the data with repetition, were used to estimate the mean APE (95% CI), APE5 (95% CI), APE10 (95% CI) and APE20 (95% CI) of our best-performing method and available weight estimation methods. The mean and 95% CI were computed with the results obtained from each of the 100 bootstraps. According to the Central Limit Theorem, although the bootstrap samples are drawn from a population that does not follow a normal distribution, as in the case of the APE, the distribution of the 100 means from the resamples will tend to follow a normal distribution, which allows the estimation of the population mean and the 95% CI. A two-sample paired t-test was performed on the distributions of the mean APE, APE10, APE20 and APE5 across 100 bootstrap replicas, comparing the SVM-G with the available weight estimation methods. A p-value<0.05 was considered statistically significant. Finally, APE values for each state-of-the-art method and the SVM-G stratified over the three levels of BH were compared.

Finally, errors in the visual assessment of the BH were characterised. Machine learning models were trained using the theoretical BH group corresponding to the BMI quartiles. However, in a real medical emergency scenario, the BMI is not available and the visual assessment of the BH by healthcare personnel is required. In order to assess the bias in the weight prediction, the visual estimation of the BH was analysed in terms of sex, age and BMI quartile using the McNemar test to evaluate proportions.

Results

Figure 1 compares the performance of the three weight estimation machine learning models proposed in this work in terms of the percentage of cases with a given APE for both visual and theoretical BH. As the ability of the models to capture complex data patterns increases, their performance significantly improves. The most complex model, SVM-G, showed the best results with APE10/APE20 of 74.6%/96.7% for visual BH and 89.6%/98.7% for theoretical BH. The second-degree polynomial regression and the simple linear regression showed performances of 56.7%/90.8% and 23.1%/47.2% for visual BH and 60.7%/91.0% and 18.2%/44.1% for theoretical BH, respectively. PEs for both visual and theoretical BH are given in the Bland-Altman plots (figure 2) for the SVM-G. As shown in both plots, the PE distribution is quite uniform and independent of the actual weight value. However, it can be observed that the models tend to slightly overestimate weight in the 10–30 kg range and underestimate it in the 35–55 kg range. The mean PE (95% CI) and the median APE (2.5th–97.5th) were 2.2% (–10.3–14.8%) and 4.3% (0.2–16.2%) with theoretical BH and 1.6% (–16.1–19.3%) and 5.8% (0.2–21.0%) with visual BH. Thus, based on the mean PE, both models tend to overestimate the weight by 2.2% when theoretical BH is used and by 1.6% when visual BH is used. This correlates with the findings in the Bland-Altman plot as the overestimation range (10–30 kg) contains significantly more samples than the underestimation range (35–55 kg). However, the median APE indicates that the model with theoretical BH is more accurate as its absolute errors are on average 1.5% lower than those of the model with visual BH.

Figure 1. Performance of the three machine learning models proposed in the study as a function of the proportion of estimates with an absolute percent error (APE) lower than the limit indicated in the horizontal axis. BH, body habitus; LR, simple linear regression; PR, second-order polynomial regression; SVM-G, Gaussian-kernel support vector machine.

Figure 1

Figure 2. Bland-Altman plot showing the percent error (PE) as a function of the actual weights for the Gaussian-kernel support vector machine (SVM-G) model when visual (a) and theoretical (b) body habitus (BH) groups are used for model validation. The central line indicates the mean PE, whereas the exterior lines represent the 95% CI of the mean estimate.

Figure 2

Table 1 compares the performance of SVM-G with the available prediction models (with no adjustment of the original parameters) when visually assessed BH values are used for validation and using 100 bootstrap replicas for the mean APE and 95% CI estimation. Bidimensional weight estimation models based on length and BH (Mercy, DWEM, PAWPER XL-MAC, Ballesteros and SVM-G) and the Broselow method (length-based), outperformed age-based formulae by 10.3–25.5% and 11.7–18.8% in APE10 and APE20, respectively. Based on these findings, it is evident that emergency medicine clinicians should move away from using age-based methods as they are consistently outperformed by alternative approaches based on height, humeral measurements or MAC. Our method (SVM-G) was the outperforming one in terms of both mean APE and APE10/APE20 (p-value<0.05 in the two-sample paired t-test), obtaining an APE10/APE20 3.2/1.3 percentage points higher than the best performing state-of-the-art method (PAWPER XL-MAC).21 The best-performing state-of-the-art methods according to table 1, Broselow, DWEM and PAWPER XL-MAC achieved similar performances to those in table 1 when adapted to the Spanish population. Ballesteros et al were already trained using the training database of the present study in the original paper.25

Table 1. Comparison of the proposed model, Gaussian-kernel support vector machine and the available weight estimation methods in terms of the mean absolute percent error (APE) (95% CI) and the proportion of estimates (95% CI) that showed an APE lower than 5%, 10% and 20%, APE5, APE10 and APE20 respectively, in 100 bootstrap replicas of the data with repetition.

Method Mean APE (95% CI) APE5 (95% CI) APE10 (95% CI) APE20 (95% CI)
Luscombe 16.5 (15.8–17.6) 18.1 (15.2–21.1) 37.3 (33.9–40.4) 67.8 (64.4–70.9)
Best Guess 16.2 (15.4–17.1) 16.2 (15.4–17.1) 39.4 (34.9–42.6) 69.3 (66.5–72.6)
Argall 14.7 (13.8–15.5) 22.7 (19.7–26.1) 41.0 (38.2–44.3) 72.9 (70.3–75.6)
ERC 13.3 (12.4–14.1) 23.0 (20.0–26.2) 46.1 (42.6–50.2) 76.8 (74.0–80.3)
APLS 14.8 (13.8–15.6) 22.3 (19.1–26.1) 41.9 (38.8–45.7) 73.1 (70.5–76.3)
Broselow 9.5 (9.0–10.0) 32.9 (30.0–35.8) 61.6 (58.9–64.6) 91.2 (89.1–92.7)
Mercy 10.5 (9.6–11.5) 30.9 (27.0–34.0) 56.4 (51.1–60.6) 88.5 (85.4–91.3)
DWEM 9.3 (9.0–9.8) 30.9 (28.5–34.0) 60.0 (56.3–63.2) 92.5 (90.5–94.2)
Ballesteros 8.9 (8.3–9.2) 35.6 (32.3–38.5) 62.3 (59.7–65.8) 93.1 (91.2–94.5)
PAWPER XL-MAC 7.5 (7.0–8.0) 40.7 (36.9–45.0) 71.6 (67.6–74.8) 95.6 (93.6–97.5)
SVM-G 7.2 (6.8–7.6) 42.8 (39.7–45.5) 74.8 (72.1–78.0) 96.9 (95.5–97.8)

Note that the Mercy and PAWPER XL-MAC methods were only tested on 516 of the 780 patients for whom measurements on humeral length (Mercy) and mid-arm circumference (MAC) (Mercy and PAWPER XL-MAC) were available (refer to section ‘Validation database’).

Figure 3 compares the performance of SVM-G with the available weight estimation methods stratified over the three levels of theoretical BH. The methods based on visual BH (DWEM, Ballesteros, SVM-G) have larger errors for underweight patients, while they perform best for normal-weight patients. Overall, BH and length-based methods present narrower IQR ranges than age-based formulae, indicating greater precision in their estimates. The SVM-G achieves the lowest median APE for normal-weight patients (which represent 50% of the validation data set), whereas Ballesteros et al25 provide the lowest median APE for the overweight group and the PAWPER XL-MAC for the underweight group.

Figure 3. Comparison of the available weight estimation methods and the proposed Gaussian-kernel support vector machine (SVM-G) model in terms of the median absolute percentage error (APE) (IQR) stratified over the three values of body habitus (BH) group: under-, normal- and overweight.

Figure 3

Figure 4 shows the proportion of errors in the visual assessment of the BH group by nursing staff for several age groups. It can be observed that the group of 0–2 years showed the higher error, and no statistically significant differences were observed between male and female. In our validation data, the BH was correctly estimated in 63.3% of the cases, with 36.7% classified incorrectly, 20.2% underestimated and 16.4% overestimated. As figure 5 shows, the bias of BH estimation was higher close to the limits of the groups. Nurses generally tended to classify children with BMIs close to Q1 and Q4 into the ‘normal BH’ group. In particular, 57.5% of the children belonging to Q1 (theoretically underweight) and 61.9% of those in Q4 (theoretically overweight) were classified as ‘normal BH’. Finally, 13.3% of the children in Q2 and Q3 (theoretically normal-weight) were incorrectly classified, 10.2% as underweight and 3% as overweight.

Figure 4. Proportion of errors in the visual estimation of the body habitus (BH) group by nursing staff. The results are shown in terms of age and sex. P-values represent the statistically significant difference in sex within each age interval.

Figure 4

Figure 5. Distribution of the observations based on the height and body mass index (BMI). Black circles show the BMI thresholds used in each height range (ranges of 5 cm) for the assignment of the different body habitus (BH) levels. The white section corresponds to the BMI interval where 75% of the visually estimated BH errors concentrate in each height interval.

Figure 5

Discussion

This study proposes a new weight estimation algorithm on a Spanish paediatric cohort with promising results. Three mathematical models based on machine learning algorithms were compared, and the one with the highest accuracy (SVM-G) was selected as the final algorithm. This new version improved the accuracy of the weight prediction algorithms proposed to date by using two indirect anthropometric variables: height and BH. The improved accuracy of the algorithm could have the potential to enhance clinical safety, mainly for two reasons: the variability derived from visual weight estimation is reduced and the medication dosage is better adjusted to the child’s needs.

There is no consensus on the maximum admissible error for a weight estimation strategy, although performance limits of 70% for APE10 and 95% for APE20 have been suggested.12,14 Therefore, we can assume that our algorithm offers potential and sufficient capacity to replace the use of other similar tools used in routine practice. However, the relative difference between the estimated and actual weight has different consequences according to the age of the child since an error in the dosage has more severe effects on younger children. Research in this field should be oriented to achieve the minimum possible error within each geographic area.

When including the gender and age in the SVM-G, the global performance did not improve: APE10 decreased 1.3% and PE20 increased 0.3%. This result is partly aligned with the approach of state-of-the-art algorithms, which do not discriminate for sex. In addition, the age was not a discriminative parameter in our model given its high correlation with height (Pearson’s correlation coefficient ρ=0.9976), although several of the available weight estimation methods are solely based on age.

In terms of applicability, the proposed SVM-G algorithm is more complex than other state-of-the-art methods and requires computer support. Nevertheless, the model is light enough to be implemented in any mobile app and can be used to adapt a Broselow-type tape to the Spanish context. Figure 6 shows an implementation of a colour-based tape according to the SVM-G algorithm (European Union Intellectual Property N 008739189-0001).

Figure 6. An implementation of a colour-based tape according to the Gaussian-kernel support vector machine algorithm.

Figure 6

This work presents some limitations that should be mentioned. First, some of the weight estimation methods, original PAWPER3 and Yamamoto,20 could not be directly evaluated in our cohort due to the lack of the required anthropometric parameters. The PAWPER method considers five levels of the visual BH, and the Yamamoto method considers up to six levels of BH instead of the three of our data set. However, we considered it essential to somehow compare the performance of these methods with our method, and therefore, an additional experiment was conducted. The performance of the Yamamoto and PAWPER methods was achieved by implementing their own approximations, but using the theoretical BH rather than the visual BH. For this purpose, patients were grouped according to height in 5 cm intervals, from 55 to 140 cm, and each interval was stratified into five levels according to the 5th, 25th, 75th and 95th percentiles of the BMI, or into six levels according to the 5th, 15th, 65th, 85th and 95th percentiles of the BMI. In the case of the PAWPER tape, a baseline weight estimation was done in the validation set using the weight that corresponds with the 50th centile of the WHO weight-for-length growth charts. Then, this weight baseline was adjusted up, down or left unchanged depending on which level of theoretical BH each child belongs to. Regarding the Yamamoto method, for children under 3 years old, five linear regression lines were fitted on the training set (one for each level of theoretical BH), and for children over 3 years old, six linear regression lines were fitted (one for each level of theoretical BH). These linear regression models were then assessed on the validation data set. Since the theoretical BH is calculated using BMI, which is based on the variable we are trying to predict (the weight), and to make a fair comparison, the SVM-G was also implemented using the theoretical BH stratified in five levels. In this way, the PAWPER/Yamamoto/SVM-G methods obtained an APE10 and APE20 of 89.1%/84.2%/96.0% and 99.4%/96.9%/100%, respectively. Of course, these results do not represent reality as the primary source of error in both methods arises from the clinicians' visual estimation of BH. Thus, the actual results would be significantly worse in both methods. In fact, the APE10/APE20 dropped from 89.6%/98.7% to 74.6%/96.7% when the three levels of visual BH were used for weight estimation rather than the theoretical BH (see figure 1). However, what this experiment clearly shows is that the predictive capability of the mathematical model itself is greater in the SVM-G. The validation of the Wozniak method in our cohort was not possible as we lack measurements of tibial and ulna length.

The second limitation refers to the high error rate committed by healthcare personnel in the visual assessment of BH at three levels (under-, normal-, overweight), which negatively affects the performance of our algorithm. Our study revealed that 63.3% of children were correctly classified in terms of BH, with 36.7% classified incorrectly, 20.2% underestimated and 16.4% overestimated. These misclassification rates align closely with the findings of Schmidt et al,27 where 33.3% of children were incorrectly classified, with 17% underestimated and 16.3% overestimated. The limited experience of assessors could explain the approximately 33% misclassification in both studies. Wells et al28 demonstrated that using BH reference images improved the ability of novices to accurately estimate children’s weight when using the PAWPER tape system due to a more accurate visual categorisation of the BH. This suggests that visual aids could potentially improve the performance of healthcare providers in visually assessing the BH, mitigating the error rates observed and, therefore, indirectly improving the performance of our algorithm.

The third limitation concerns the external validity of the tool. Since weight estimation strategies should be used within the populations on which they were validated, previously reported performance results could differ if the proposed models were applied in other socio-geographic contexts. For instance, the APE10 for the PAWPER tape dropped from 89.2% to 75% when the population changed from African3 to European29 and to 71.5% when it changed to Asian.30 It is important to emphasise that performance can vary significantly not only across continents but also between countries or regions. For instance, the original PAWPER tape achieved significantly different APE10 values in two studies conducted in South Africa, 89.2%3 and 62.7%,31 highlighting the influence of local factors on the effectiveness of weight estimation tools. Adapting the SVM-G to other populations would be straightforward if sufficient demographic and anthropometric data were available to ensure that the model accurately reflects the characteristics of the target population.

It can be concluded that the proposed algorithm shows promising potential for paediatric weight estimation, outperforming existing methods commonly used in the emergency department. While the results suggest that it could assist clinicians in making critical decisions regarding drug dosages, intravenous fluid volumes and electrical energy for defibrillation, further clinical trials are necessary to validate its reliability and applicability across diverse patient populations and clinical settings.

Acknowledgements

We express our deepest gratitude to the late Prof Unai Irusta for his contribution in the early development of this study.

Footnotes

Funding: This work was supported through the grant PID2021-122727OB-I00 funded by MCIN/AEI/10.13039/501100011033 and “ERDF A way of making Europe”, and by the Basque Government under grant IT-1717-22.

Patient consent for publication: Not applicable.

Ethics approval: This study involves human participants and obtained a favourable report from the Basque Country’s Clinical Research Ethics Committee (ID: 017129) and was carried out in accordance with the Declaration of Helsinki and Spanish legislation. Participants gave informed consent to participate in the study before taking part.

Provenance and peer review: Not commissioned; externally peer reviewed.

Patient and public involvement: Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Collaborators: Collaborating investigators from the WEST research group: Irrintzi Ferńandez Aedo, Gorka Vallejo De la Hoz, Garbiñe Pérez Llarena, Irantzu Echeandia Lastra, Ińes Ercoreca Zarandona, M Inmaculada Polanco Villa, Laura Arranz Soriano, Laura Folgueira Lamas, Maider Etxeandia Santos, M Teresa Vime Manzanos, Nerea Ferńandez Amayuelas.

Author note: The industrial design was registered by the General Administration of the Basque Autonomous Community and the University of the Basque Country (UPV/EHU), both public entities. Any revenue generated will be reinvested into research and innovation projects by these public entities. The authors have not received, nor will they receive, any financial benefit from this registration.

Contributor Information

WEST research group:

Irrintzi Ferńandez Aedo, Gorka Vallejo De la Hoz, Garbiñe Pérez Llarena, Irantzu Echeandia Lastra, Ińes Ercoreca Zarandona, M Inmaculada Polanco Villa, Laura Arranz Soriano, Laura Folgueira Lamas, Maider Etxeandia Santos, M Teresa Vime Manzanos, and Nerea Ferńandez Amayuelas

Data availability statement

Data may be obtained from a third party and are not publicly available.

References

  • 1.Geduld H, Hodkinson PW, Wallis LA. Validation of weight estimation by age and length based methods in the Western Cape, South Africa population. Emerg Med J. 2011;28:856–60. doi: 10.1136/emj.2010.098640. [DOI] [PubMed] [Google Scholar]
  • 2.Abdel-Rahman SM, Ridge A, Kearns GL. Estimation of body weight in children in the absence of scales: a necessary measurement to insure accurate drug dosing. Arch Dis Child. 2014;99:570–4. doi: 10.1136/archdischild-2013-305211. [DOI] [PubMed] [Google Scholar]
  • 3.Wells M, Coovadia A, Kramer E, et al. The PAWPER tape: A new concept tape-based device that increases the accuracy of weight estimation in children through the inclusion of a modifier based on body habitus. Resuscitation. 2013;84:227–32. doi: 10.1016/j.resuscitation.2012.05.028. [DOI] [PubMed] [Google Scholar]
  • 4.Luscombe M, Owens B. Weight estimation in resuscitation: is the current formula still valid? Arch Dis Child. 2007;92:412–5. doi: 10.1136/adc.2006.107284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Garland JS, Kishaba RG, Nelson DB, et al. A rapid and accurate method of estimating body weight. Am J Emerg Med. 1986;4:390–3. doi: 10.1016/0735-6757(86)90184-1. [DOI] [PubMed] [Google Scholar]
  • 6.Kelly A-M, Nguyen K, Krieser D. Validation of the Luscombe weight formula for estimating children’s weight. Emerg Med Australas. 2011;23:59–62. doi: 10.1111/j.1742-6723.2010.01351.x. [DOI] [PubMed] [Google Scholar]
  • 7.Argall JAW. A comparison of two commonly used methods of weight estimation. Arch Dis Child. 2003;88:789–90. doi: 10.1136/adc.88.9.789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Thompson MT, Reading MJ, Acworth JP. Best Guess method for age-based weight estimation in paediatric emergencies: validation and comparison with current methods. Emerg Med Australas. 2007;19:535–42. doi: 10.1111/j.1742-6723.2007.01031.x. [DOI] [PubMed] [Google Scholar]
  • 9.Tinning K, Acworth J. Make your Best Guess: An updated method for paediatric weight estimation in emergencies. Emerg Medicine Australasia. 2007;19:528–34. doi: 10.1111/j.1742-6723.2007.01026.x. [DOI] [PubMed] [Google Scholar]
  • 10.Biarent D, Bingham R, Eich C, et al. European Resuscitation Council Guidelines for Resuscitation 2010 Section 6. Paediatric life support. Resuscitation. 2010;81:1364–88. doi: 10.1016/j.resuscitation.2010.08.012. [DOI] [PubMed] [Google Scholar]
  • 11.Advanced Life Support Group (ALSG) Advanced paediatric life support: the practical approach. John Wiley & Sons; 2011. [Google Scholar]
  • 12.Wells M, Goldstein LN, Bentley A. It is time to abandon age-based emergency weight estimation in children! A failed validation of 20 different age-based formulas. Resuscitation. 2017;116:73–83. doi: 10.1016/j.resuscitation.2017.05.018. [DOI] [PubMed] [Google Scholar]
  • 13.Stewart D. Accuracy of the broselow tape for estimating paediatric weight in two Australian emergency departments. University of Sydney; 2009. [Google Scholar]
  • 14.Wells M, Goldstein LN, Bentley A. Development and validation of a method to estimate body weight in critically ill children using length and mid-arm circumference measurements: The PAWPER XL-MAC system. S Afr Med J . 2017;107:1015. doi: 10.7196/SAMJ.2017.v107i11.12505. [DOI] [PubMed] [Google Scholar]
  • 15.Lubitz DS, Seidel JS, Chameides L, et al. A rapid method for estimating weight and resuscitation drug dosages from length in the pediatric age group. Ann Emerg Med. 1988;17:576–81. doi: 10.1016/s0196-0644(88)80396-2. [DOI] [PubMed] [Google Scholar]
  • 16.Wells M, Goldstein LN, Bentley A, et al. The accuracy of the Broselow tape as a weight estimation tool and a drug-dosing guide - A systematic review and meta-analysis. Resuscitation. 2017;121:9–33. doi: 10.1016/j.resuscitation.2017.09.026. [DOI] [PubMed] [Google Scholar]
  • 17.Rosenberg M, Greenberger S, Rawal A, et al. Comparison of Broselow tape measurements versus physician estimations of pediatric weights. Am J Emerg Med. 2011;29:482–8. doi: 10.1016/j.ajem.2009.12.002. [DOI] [PubMed] [Google Scholar]
  • 18.Abdel-Rahman SM, Susan M, Anna LR. An Improved Pediatric Weight Estimation Strategy. Open Med Device J. 2012;4:87–97. doi: 10.2174/1875181401204010087. [DOI] [Google Scholar]
  • 19.Wozniak R. University of British Columbia; 2012. The evaluation of potential weight-estimation methods in a primarily HIV positive cohort in Botswana for use in resource limited settings (PhD thesis) [Google Scholar]
  • 20.Yamamoto LG, Inaba AS, Young LL, et al. Improving length-based weight estimates by adding a body habitus (obesity) icon. Am J Emerg Med. 2009;27:810–5. doi: 10.1016/j.ajem.2008.06.023. [DOI] [PubMed] [Google Scholar]
  • 21.Wells M, Goldstein LN, Bentley A. Development and validation of a method to estimate body weight in critically ill children using length and mid-arm circumference measurements: The PAWPER XL-MAC system. S Afr Med J . 2017;107:1015–21. doi: 10.7196/SAMJ.2017.v107i11.12505. [DOI] [PubMed] [Google Scholar]
  • 22.Georgoulas VG, Wells M. The PAWPER tape and the Mercy method outperform other methods of weight estimation in children at a public hospital in South Africa. S Afr Med J . 2016;106:933. doi: 10.7196/SAMJ.2016.v106i9.10572. [DOI] [PubMed] [Google Scholar]
  • 23.Suwezda A, Melamud A, Matamoros R. ¿Son válidas las fórmulaspara estimar el peso de los niños en las urgencias? Evid Pediatr. 2007;3:65. [Google Scholar]
  • 24.Ballesteros-Peña S, Fernández-Aedo I, Vallejo-de la Hoz G, et al. Validez de las estrategias de estimación de peso en pacientes pediátricos atendidos en urgencias. Emergencias. 2019:239–44. [PubMed] [Google Scholar]
  • 25.Ballesteros-Peña S, Fernández-Aedo I, Vallejo-De la Hoz G, et al. Development and validation of a weight estimation tool for paediatric emergency care. Enfermería Clínica (English Edition) 2021;31:45–50. doi: 10.1016/j.enfcle.2019.12.006. [DOI] [PubMed] [Google Scholar]
  • 26.Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2011;2:1–27. doi: 10.1145/1961189.1961199. [DOI] [Google Scholar]
  • 27.Schmidt AR, Buehler PK, Meyer J, et al. Length-based body weight estimation in paediatric patients: The impact of habitus-A clinical observational trial. Acta Anaesthesiol Scand. 2018;62:1389–95. doi: 10.1111/aas.13179. [DOI] [PubMed] [Google Scholar]
  • 28.Wells M, Goldstein LN, Bentley A. The Use of Body Habitus Reference Images Improves the Ability of Novices to Accurately Estimate Children’s Weight Using the PAWPER XL Tape System. J Emerg Med. 2018;54:165–75. doi: 10.1016/j.jemermed.2017.10.009. [DOI] [PubMed] [Google Scholar]
  • 29.Silvagni D, Baggio L, Mazzi C, et al. The PAWPER tape as a tool for rapid weight assessment in a Paediatric Emergency Department: Validation study and comparison with parents’ estimation and Broselow tape. Resuscitation Plus . 2022;12:100301. doi: 10.1016/j.resplu.2022.100301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shrestha K, Subedi P, Pandey O, et al. Estimating the weight of children in Nepal by Broselow, PAWPER XL and Mercy method. World J Emerg Med. 2018;9:276–81. doi: 10.5847/wjem.j.1920-8642.2018.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wu M-T. South African Hospital; 2020. Paediatric weight estimation: validation of the PAWPER XL tape mid-arm circumference method in a South African Hospital (PhD thesis) [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    Data may be obtained from a third party and are not publicly available.


    Articles from BMJ Paediatrics Open are provided here courtesy of BMJ Publishing Group

    RESOURCES