Skip to main content
Journal of Global Health logoLink to Journal of Global Health
. 2025 Feb 7;15:04013. doi: 10.7189/jogh.15.04013

Factors associated with underweight, overweight, and obesity in Chinese children aged 3–14 years using ensemble learning algorithms

Kening Chen 1, Fangjieyi Zheng 2, Xiaoqian Zhang 3,4, Qiong Wang 3,4, Zhixin Zhang 4,5,*, Wenquan Niu 2,*
PMCID: PMC11804908  PMID: 39913538

Abstract

Background

Factors underlying the development of childhood underweight, overweight, and obesity are not fully understood. Traditional models have drawbacks in handling large-scale, high-dimensional, and nonlinear data. In this study, we aimed to identify factors responsible for underweight, overweight, and obesity using machine learning methods among Chinese children.

Methods

Our study participants were children aged 3–14 from 30 kindergartens and 26 schools in Beijing and Tangshan. Weight status was defined per the World Health Organization criteria. We implemented three ensemble learning algorithms and compared their performance and ranked the contributing factors by importance and identified an optimal set. A user-friendly web application was developed to calculate the predicted probability of childhood underweight, overweight, and obesity.

Results

We analysed data from 18 503 children aged 3–14, including 1798 underweight, 10 579 of normal weight, 3257 overweight, and 2869 with obesity. Of all algorithms, random forest performed the best, with the area under the receiver operating characteristic reaching 0.759 for underweight, 0.806 for overweight, and 0.849 for obesity, with other metrics also reinforcing this algorithm. Further cumulative analyses showed that, for underweight, the optimal set of six factors included maternal body mass index (BMI), age, paternal BMI, maternal reproductive age, paternal reproductive age, and birth weight. The optimal set for overweight comprised of five factors: age, fast food intake, maternal BMI, paternal BMI, and sedentary time. For obesity, the optimal set included six factors: age, fast food intake, maternal BMI, paternal BMI, sedentary time, and maternal reproductive age. Further logistic regression analyses confirmed the predictive capability of individual top factors.

Conclusions

Our findings indicate that random forest is the best ensemble learning algorithm for predicting underweight, overweight, and obesity in children aged 3–14 years. We identified the optimal set of significant factors for each malnutrition status and incorporated them into a web application to support the application of this study’s findings.


Malnutrition is a global public health threat and is the cause behind nearly 50% of all deaths of children under five years [1]. Its burden revolves around either undernutrition or overnutrition. Statistics from the United Nations Children’s Fund and World Health Organization (WHO) show that 6.7% of children under five years worldwide and 3.6% in China specifically are affected by wasting [2,3]. Yet simultaneously, health systems worldwide are tackling the childhood obesity epidemic [4]. According to the WHO, the number of children aged 5–19 years with obesity has risen from 11 million in 1975 to 340 million in 2023 [5]. In China, the prevalence of overweight and obesity was estimated to be 6.8% and 3.6% in children under six years, and 11.1% and 7.9% in children aged 6–17 years, respectively [6]. Evidence shows that underweight, overweight, and obesity negatively impact children’s health, and a deeper understanding of potential risk factors behind malnutrition can enhance our knowledge of weight management and support the development of effective prevention strategies.

The development of childhood underweight, overweight, and obesity is a complex process to which inherited and non-inherited factors contribute individually and interactively, including parental obesity [7], gestational weight gain [8], and lifestyle habits [9]. However, findings from such studies are often not reproducible, with no consensus on their implications. For example, some studies have reported a significant association between late bedtime and obesity in children [1012], while others reported contrasting findings [13,14]. These inconsistencies could be due to observations being multifaceted, possibly because of different body builds, miscellaneous nutritional habits, various diagnostic criteria, and diverse analytical methodologies. To date, most studies on this subject have been cross-sectional. Although case-control studies cannot replace cohort studies in identifying causal factors associated with abnormal weight in children, they are an important alternative strategy, especially in the context of sufficient sample sizes, ethnically homogeneous populations, internationally accepted diagnostic criteria, and advanced statistical methodologies.

With this in mind, we aimed to test the hypothesis that factors selected and compiled by machine learning algorithms can significantly predict the risk for underweight, overweight, and obesity.

METHODS

Study design and participants

We conducted two large cross-sectional surveys in 2020 and 2022 using random sampling techniques, collecting demographic and health-relevant data from children aged 3–14 from schools in Beijing and Tangshan.

Specifically, we collected our data in two waves using stratified cluster sampling methods. The first collection wave lasted from September to December 2020, with preschool-aged children being sampled from four out of sixteen districts in Beijing and two out of seven districts in Tangshan. Five kindergartens were selected within each district for a total of 30. The second wave occurred in January 2022 in Pinggu District, Beijing, among eight primary and 18 junior high schools.

Data collection and variable definition

At baseline, physical assessments included standardised measurements of height (in centimeters, rounded to the nearest single decimal place) and weight (in kilograms, rounded to the nearest single decimal place) carried out by health practitioners in selected kindergartens and schools. The same measuring equipment was distributed to each kindergarten or school, with the health practitioners and teachers in charge being trained to ensure that measurement methods, procedures, readings, and biases were as consistent as possible.

We used structured questionnaires to collect information on five dimensions: demographic characteristics (date of birth, age, sex, height, and weight), fetal and early life factors (gestational week, mode of delivery, pregnancy and delivery order, twin birth, length and weight at birth, infancy feeding, breastfeeding duration, and time to add solid food), lifestyle-related factors (sedentary time, screen time, time of outdoor activities, bedtime, eating speed and weekly intake frequencies of sweet food, night meals and fast food), health status (food or drug allergies, number of dental caries, and chronic illnesses), and family information (family income, parental height and weight, parental education level, and parental reproductive age) (Table S1 in the Online Supplementary Document). We selected all variables based on expert knowledge and available literature on potential factors responsible for childhood underweight, overweight and obesity, to identify suspected or established factors associated with malnutrition.

We assessed the reliability and validity of self-designed questionnaires by initially distributing 200 samples before formal distributions. The reliability coefficient (α) was over 0.85 for both questionnaires.

Quality control

Prior to data collection, health practitioners and teachers-in-charge from selected kindergartens and schools were trained about survey procedures and items in questionnaires, and they assisted parents or guardians of participating children during surveys. The teachers contacted the parents or guardians in case of missing data or abnormal values.

Definition of weight metrics

Body mass index (BMI) was calculated as weight in kilograms divided by height in meters squared (kg/m2). We used children’s age- and sex-standardised BMI z-scores to classify the different weight status categories, which we had calculated as the deviation of an individual’s BMI from the population mean, expressed in standard deviations (SDs). Based on the 2006 WHO growth standards for children aged 0–5 years [15] and the 2007 WHO growth reference for school-aged children aged 5–19 years (61–228 months) [16], we defined underweight as a BMI z-score<−2 SD, overweight as a BMI z-score>+1 SD, and obesity as a BMI z-score>+2 SD among children.

Statistical analyses

We used R, version 4.3.2 (R Core Team, Vienna, Austria) for our statistical analyses. We removed variables with over 30% missing data from analyses and used the multiple imputation method for variables with less than 30% missing data. After filling in missing data using the ‘R-MICE’ package, we applied propensity score matching to balance sex between children with normal weight and those classified as underweight, overweight, or obese. We randomly divided the data into the training set (70%) and testing set (30%) for cross-validation. We then conducted a between-group comparison using the t-test for normally distributed continuous variables (expressed as means and standard deviations (SDs)), the rank sum test for skewed continuous variables (expressed as medians and interquartile ranges (IQR)), and the χ22) test for categorical variables (expressed as counts and percentages).

We employed logistic regression and three ensemble learning algorithms – decision tree, random forest, and gradient boosting machine – to model variables under investigation associated with childhood underweight, overweight, and obesity. There were advantages and disadvantages to all three ensemble learning algorithms, and the impact of hyperparameter tuning on the performance of these algorithms varied (Tables S2 and S3 in the Online Supplementary Document). We assessed the optimal algorithm with the best performance by accuracy (i.e. prediction of correct outcomes as a percentage of the total sample), Brier score (i.e. the mean of the squared difference between the observed and predicted event rates), area under the receiver operating characteristic curve (AUROC) (derived from plotting sensitivity vs. 1-specificity), and decision curve analysis (for estimating the net benefit of a predictive model), as well as the mean absolute error, mean squared error, area over the regression error characteristic curve, and area over the regression receiver operating characteristic. We selected the algorithm with the best overall performance across these metrics. Additionally, under the optimal algorithm, we assessed the importance of factors under investigation and ranked them in descending order for the prediction of underweight, overweight, and obesity. We determined the optimal number of top factors by evaluating cumulative performance across AUROC, accuracy, and precision. Generally, with the addition of important factors, these metrics gradually improve until plateauing, which then forms the optimal set of features. Furthermore, to facilitate the interpretation of factors finally determined via the optimal algorithm, we implemented a logistic regression model for underweight, overweight, and obesity, and expressed effect-size estimates as odds ratio (OR) and 95% confidence interval (CI). Finally, we developed a user-friendly Shiny web application to calculate the predicted probability of childhood underweight, overweight, and obesity. We analysed the data between January and March 2024.

Ethical considerations

This study included two waves, and we separately obtained ethical approval from the Ethics Committee of the China-Japan Friendship Hospital and Beijing University of Chinese Medicine. The parents or guardians of the enrolled children provided written informed consent. Data collection procedures adhered to local data protection regulations. We securely stored all data on an access-restricted server, accessible only to our research team members, and we anonymised personally identifiable information by assigning unique identifiers before analyses, ensuring the confidentiality of participants’ identities. We followed the STROBE guidelines in reporting our findings (Table S4 in the Online Supplementary Document).

RESULTS

Baseline characteristics

We enrolled 18 503 children aged 3–14 years from 30 kindergartens and 26 primary and junior high schools in Beijing and Tangshan, of whom 10 579 were of normal weight, 3257 were overweight, 1798 were underweight, and 2869 had obesity (Figure 1, Table 1; Figure S1 in the Online Supplementary Document).

Figure 1.

Figure 1

BMI categories subdivided into age groups of study children. Weight status was defined according to the criteria recommended by the WHO [15,16]. BMI – body mass index.

Table 1.

Baseline characteristics of study children by weight status

Weight status*

All participants, (n = 18 503)
Underweight, (n = 1798)
Normal weight, (n = 10 579)
Overweight, (n = 3257)
Obesity, (n = 2869)
Demographic characteristics





Age (months)†
90 (58–131)
66 (52–85)
83 (56–127)
108 (66–145)
110 (77–139)
Boys‡
9522 (51.5)
938 (52.2)
4835 (45.7)
1734 (53.2)
2015 (70.2)
Foetal and early life factors‡





Gestational age in weeks
39 (38–40)
39 (38–40)
39 (38–40)
39 (38–40)
39 (38–40)
Full-term birth‡





Preterm delivery
1757 (9.5)
172 (9.6)
975 (9.2)
300 (9.2)
310 (10.8)
Normal delivery
16 181 (87.5)
1562 (86.9)
9276 (87.7)
2870 (88.1)
2473 (86.2)
Post-term pregnancy
565 (3.1)
64 (3.6)
328 (3.1)
87 (2.7)
86 (3.0)
Delivery mode‡





Vaginal delivery
9130 (49.3)
939 (52.2)
5533 (52.3)
1451 (44.6)
1207 (42.1)
Caesarean section
9329 (50.4)
853 (47.4)
5024 (47.5)
1794 (55.1)
1658 (57.8)
Forceps delivery
44 (0.2)
6 (0.3)
22 (0.2)
12 (0.4)
4 (0.1)
Pregnancy order†
1 (1–2)
2 (1–2)
1 (1–2)
1 (1–2)
1 (1–2)
Delivery order†
1 (1–2)
1 (1–2)
1 (1–2)
1 (1–1)
1 (1–1)
Twin birth‡
445 (2.4)
39 (2.2)
274 (2.6)
71 (2.2)
61 (2.1)
Birth length in cm†
50 (50–52)
50 (50–52)
50 (50–52)
50 (50–52)
50 (50–52)
Birth weight in kg†
3.35 (3.00–3.60)
3.30 (3.00–3.60)
3.30 (3.00–3.60)
3.40 (3.10–3.70)
3.50 (3.10–3.75)
Infancy feeding‡





Pure breastfeeding
10 501 (56.8)
1060 (59.0)
6046 (57.2)
1813 (55.7)
1582 (55.1)
Partial breastfeeding
6353 (34.3)
599 (33.3)
3624 (34.3)
1163 (35.7)
967 (33.7)
Non-breastfeeding
1649 (8.9)
139 (7.7)
909 (8.6)
281 (8.6)
320 (11.2)
Breastfeeding duration‡





<6 mo
5475 (29.6)
372 (20.7)
2970 (28.1)
1067 (32.8)
1066 (37.2)
6–24 mo
11 248 (60.8)
1182 (65.7)
6546 (61.9)
1962 (60.2)
1558 (54.3)
≥24 mo
1780 (9.6)
244 (13.6)
1063 (10.0)
228 (7.0)
245 (8.5)
Time to add solid food‡





<6 mo
3260 (31.6)
321 (34.5)
1830 (32.2)
598 (31.0)
511 (28.8)
6–9 mo
5327 (51.7)
433 (46.5)
2930 (51.6)
1013 (52.5)
951 (53.5)
≥9 mo
1725 (16.7)
177 (19.0)
916 (16.1)
317 (16.4)
315 (17.7)
Lifestyle-related factors†





Sedentary time (hours per day)
3.86 (2.00–6.29)
2.57 (1.29–4.86)
3.57 (1.71–6.00)
4.43 (2.00–6.71)
4.86 (2.29–6.86)
Screen time (hours per day)
1.00 (0.64–1.57)
1.00 (0.64–1.57)
1.00 (0.64–1.57)
1.29 (0.71–1.71)
1.29 (0.93–2.00)
Outdoor activities (hours per day)
1.29 (1.00–2.00)
1.29 (1.00–2.00)
1.29 (1.00–2.00)
1.29 (1.00–2.00)
1.29 (1.00–2.00)
Bedtime (o’clock PM)
9.50 (9.00–10.00)
9.00 (9.00–10.00)
9.50 (9.00–10.00)
10.00 (9.00–10.00)
10.00 (9.00–10.00)
Eating speed (minutes per meal)
16.67 (13.33–23.33)
18.33 (15.00–25.00)
18.33 (13.33–23.33)
16.67 (13.33–20.00)
16.67 (13.33–20.00)
Fast food intake frequency‡





Every day
5636 (30.5)
885 (49.2)
3524 (33.3)
760 (23.3)
467 (16.3)
3–5 times weekly
3230 (17.5)
481 (26.8)
1902 (18.0)
447 (13.7)
400 (13.9)
1–2 times weekly
5038 (27.2)
220 (12.2)
2653 (25.1)
1065 (32.7)
1100 (38.3)
None or once in a while
4599 (24.9)
212 (11.8)
2500 (23.6)
985 (30.2)
902 (31.4)
Sweet food intake frequency‡





Every day
1699 (9.2)
207 (11.5)
1036 (9.8)
249 (7.6)
207 (7.2)
3–5 times weekly
6313 (34.1)
846 (47.1)
3744 (35.4)
964 (29.6)
759 (26.5)
1–2 times weekly
7941 (42.9)
571 (31.8)
4443 (42.0)
1521 (46.7)
1406 (49.0)
None or once in a while
2550 (13.8)
174 (9.7)
1356 (12.8)
523 (16.1)
497 (17.3)
Night meal intake frequency‡





Every day
6047 (32.7)
312 (17.4)
3231 (30.5)
1270 (39.0)
1234 (43.0)
3–5 times weekly
3702 (20.0)
268 (14.9)
2033 (19.2)
698 (21.4)
703 (24.5)
1–2 times weekly
2952 (16.0)
368 (20.5)
1753 (16.6)
479 (14.7)
352 (12.3)
None or once in a while
5802 (31.4)
850 (47.3)
3562 (33.7)
810 (24.9)
580 (20.2)
Health status





Food allergy‡
1897 (10.3)
213 (11.8)
1045 (9.9)
312 (9.6)
327 (11.4)
Drug allergy‡
775 (4.2)
74 (4.1)
436 (4.1)
123 (3.8)
142 (4.9)
Dental caries†
0 (0–2)
0 (0–2)
1 (0–2)
0 (0–2)
0 (0–2)
Family information†





Maternal reproductive age in years
27.58 (25.25–30.33)
28.75 (26.25–32.06)
27.67 (25.42–30.42)
27.17 (24.92–29.75)
27.00 (24.67–29.50)
Paternal reproductive age in years
28.58 (26.25–31.75)
30.09 (27.25–33.67)
28.59 (26.42–31.83)
28.17 (26.01–31.17)
28.17 (25.84–30.75)
Maternal BMI in kg/m2
22.66 (20.51–25.39)
20.04 (18.20–22.53)
22.46 (20.43–24.97)
23.44 (21.48–26.37)
24.22 (21.88–28.04)
Paternal BMI in kg/m2
25.74 (23.39–28.41)
25.09 (22.86–27.77)
25.25 (23.05–27.76)
26.12 (24.05–29.38)
27.04 (24.62–30.42)
Maternal education level‡





High school degree or below
7366 (39.8)
552 (30.7)
4191 (39.6)
1364 (41.9)
1259 (43.9)
Bachelor’s degree
10 058 (54.4)
937 (52.1)
5812 (54.9)
1763 (54.1)
1546 (53.9)
Master’s degree or above
1079 (5.8)
309 (17.2)
576 (5.4)
130 (4.0)
64 (2.2)
Paternal education level‡





High school degree or below
8447 (45.7)
635 (35.3)
4793 (45.3)
1562 (48.0)
1457 (50.8)
Bachelor’s degree
8832 (47.7)
816 (45.4)
5142 (48.6)
1546 (47.5)
1328 (46.3)
Master’s degree or above
1224 (6.6)
347 (19.3)
644 (6.1)
149 (4.6)
84 (2.9)
Family income in CNY per year‡





<100 000
7695 (41.6)
597 (33.2)
4436 (41.9)
1388 (42.6)
1274 (44.4)
100 000–300 000
7909 (42.7)
679 (37.8)
4521 (42.7)
1425 (43.8)
1284 (44.8)
≥300 000 2899 (15.7) 522 (29.0) 1622 (15.3) 444 (13.6) 311 (10.8)

BMI – body mass index, mo – months

*Weight status was defined according to the criteria recommended by the World Health Organization [15,16].

†Continuous variables presented as median (interquartile range).

‡Categorical variables presented as numbers (%).

Selection of optimal algorithm

Of all algorithms, random forest performed the best, with the AUROC reaching 0.759 for underweight, 0.806 for overweight, and 0.849 for obesity (Figure 2). This was also confirmed statistically by other assessment metrics and visually by the decision curve analysis (Figure S2 in the Online Supplementary Document). We therefore selected the random forest as the optimal algorithm for the prediction of underweight, overweight, and obesity in children aged 3–14 years.

Figure 2.

Figure 2

Predictive performance of 3 ensemble learning algorithms annexed with Logistic regression for childhood weight status. Weight status was defined according to the criteria recommended by the WHO [15,16]. GBM – gradient boosting machine, ROC – the receiver operating characteristic curve.

Determination of optimal important factors

Under the use of the optimal random forest algorithm, we explored the importance of the top 15 factors for childhood underweight, overweight, and obesity (Figure 3). To determine the optimal set of important factors, we assessed the cumulative performance of top factors using AUROC, accuracy, and precision (Table 2). After inspecting the changes in these indicators with the increment of important factors, we found that the cumulative performance of all three outcomes tended to increase and then decrease.

Figure 3.

Figure 3

The ranking of the 15 most important variables related to childhood weight status based on the random forest. Weight status was defined according to the criteria recommended by the WHO [15,16].

Table 2.

AUROC, accuracy, and precision with cumulating number of top 15 factors using random forest algorithm for childhood weight status*

Cumulating number of top 15 factors AUROC Accuracy Precision
Underweight




Maternal BMI
1
0.692
0.644
0.703
Age
2
0.711
0.651
0.675
Paternal BMI
3
0.749
0.688
0.701
Paternal reproductive age
4
0.740
0.675
0.695
Maternal reproductive age
5
0.747
0.673
0.692
Birth weight
6
0.753
0.690
0.705
Sedentary time
7
0.749
0.693
0.710
Outdoor activities
8
0.737
0.680
0.687
Screen time
9
0.739
0.677
0.688
Eating speed
10
0.736
0.666
0.672
Breastfeeding
11
0.741
0.679
0.685
Birth length
12
0.735
0.670
0.675
Bedtime
13
0.750
0.690
0.700
Gestational week
14
0.754
0.693
0.706
Time to add solid food
15
0.747
0.684
0.694
Overweight




Age
1
0.753
0.741
0.785
Fast food intake frequency
2
0.749
0.736
0.776
Maternal BMI
3
0.757
0.727
0.765
Paternal BMI
4
0.764
0.725
0.749
Sedentary time
5
0.775
0.726
0.749
Maternal reproductive age
6
0.774
0.722
0.749
Paternal reproductive age
7
0.777
0.729
0.759
Birth weight
8
0.788
0.724
0.751
Outdoor activities
9
0.788
0.732
0.762
Bedtime
10
0.795
0.735
0.776
Breastfeeding
11
0.799
0.734
0.778
Eating speed
12
0.805
0.736
0.780
Screen time
13
0.803
0.737
0.780
Night meal intake frequency
14
0.799
0.734
0.778
Birth length
15
0.804
0.744
0.789
Obesity




Age
1
0.773
0.769
0.786
Fast food intake frequency
2
0.784
0.772
0.776
Maternal BMI
3
0.808
0.755
0.761
Paternal BMI
4
0.828
0.771
0.779
Sedentary time
5
0.829
0.771
0.781
Maternal reproductive age
6
0.832
0.771
0.779
Bedtime
7
0.835
0.763
0.779
Outdoor activities
8
0.835
0.770
0.790
Paternal reproductive age
9
0.837
0.776
0.795
Birth weight
10
0.836
0.779
0.789
Eating speed
11
0.844
0.780
0.793
Breastfeeding
12
0.842
0.782
0.797
Screen time
13
0.845
0.779
0.792
Night meal intake frequency
14
0.843
0.778
0.791
Chronic diseases 15 0.844 0.779 0.794

AUROC – area under the receiver operating characteristic curve, BMI – body mass index

*Weight status was defined according to the criteria recommended by the World Health Organization [15,16].

For underweight, the optimal set of six factors included maternal BMI, age, paternal BMI, maternal reproductive age, paternal reproductive age, and birth weight; for overweight, the optimal set comprised five factors: age, fast food intake frequency, maternal BMI, paternal BMI, and sedentary time; for obesity, the optimal set included six factors: age, fast food intake frequency, maternal BMI, paternal BMI, sedentary time, and maternal reproductive age.

Risk quantification

We used logistic regression to improve the clinical applicability of the optimal sets of important factors. Each factor in the three optimal sets was significantly associated with the risk of underweight, overweight, or obesity in children aged 3–14 years, at a significance level of 0.1% (Table 3).

Table 3.

The risk prediction of top factors for childhood weight status using the logistic regression model*

OR (95% CI)† P-value
Underweight (six factors)


Maternal BMI


Underweight (BMI < 18.5 kg/m2)
5.084 (4.459–5.796)
<0.001
Normal (BMI 18.5–25 kg/m2)
ref

Overweight or obesity (BMI ≥ 25 kg/m2)
0.579 (0.485–0.691)
<0.001
Age


Pre-school children (3–6 years)
ref

School-age children (7–14 years)
0.340 (0.304–0.381)
<0.001
Paternal BMI


Underweight (BMI < 18.5 kg/m2)
3.047 (2.358–3.936)
<0.001
Normal (BMI 18.5–25 kg/m2)
ref

Overweight or obesity (BMI ≥ 25 kg/m2)
1.016 (0.909–1.135)
0.781
Paternal reproductive age in years‡


24–28
ref

<24 or ≥28
1.692 (1.499–1.886)
<0.001
Maternal reproductive age in years‡


21–27
ref

<21 or ≥27
1.607 (1.441–1.792)
<0.001
Birth weight


Normal (≥2.5 kg)
ref

Low birth weight (<2.5 kg)
1.522 (1.188–1.950)
0.001
Overweight (five factors)


Age


Pre-school children (3–6 years)
ref

School-age children (7–14 years)
1.848 (1.703–2.004)
<0.001
Fast food intake frequency


None or once in a while
ref

1–2 times weekly
1.090 (0.957–1.240)
0.193
3–5 times weekly
1.861 (1.674–1.240)
<0.001
Every day
1.827 (1.641–2.034)
<0.001
Maternal BMI


Underweight (BMI < 18.5 kg/m2)
0.471 (0.377–0.589)
<0.001
Normal (BMI 18.5–25 kg/m2)
ref

Overweight or obesity (BMI ≥ 25 kg/m2)
1.507 (1.368–1.661)
<0.001
Paternal BMI


Underweight (BMI < 18.5 kg/m2)
1.191 (0.877–1.618)
0.263
Normal (BMI 18.5–25 kg/m2)
ref

Overweight or obesity (BMI ≥ 25 kg/m2)
1.400 (1.281–1.529)
<0.001
Sedentary time


<2 h per day
ref

≥2 h per day
1.223 (1.114–1.344)
<0.001
Obesity (six factors)


Age


Pre-school children (3–6 years)
ref

School-age children (7–14 years)
2.745 (2.506–3.006)
<0.001
Fast food intake frequency


None or once in a while
ref

1–2 times weekly
1.587 (1.373–1.834)
<0.001
3–5 times weekly
3.129 (2.777–3.526)
<0.001
Every day
2.723 (2.408–3.079)
<0.001
Maternal BMI


Underweight (BMI < 18.5 kg/m2)
0.557 (0.439–0.706)
<0.001
Normal (BMI 18.5–25 kg/m2)
ref

Overweight or obesity (BMI ≥ 25 kg/m2)
2.013 (1.819–0.706)
<0.001
Paternal BMI


Underweight (BMI < 18.5 kg/m2)
1.057 (0.725–1.540)
0.774
Normal (BMI 18.5–25 kg/m2)
ref

Overweight or obesity (BMI ≥ 25 kg/m2)
1.894 (1.716–1.540)
<0.001
Sedentary time


<2 h per day
ref

≥2 h per day
1.415 (1.278–1.567)
<0.001
Maternal reproductive age in years‡


27–40
ref

<27 or ≥40 1.389 (1.279–1.509) <0.001

BMI – body mass index, CI – confidence interval, OR – odds ratio, ref – reference

*Weight status was defined according to the criteria recommended by the World Health Organization [15,16].

†ORs were adjusted for age, sex, education, and family income.

‡Subgroup thresholds for parental reproductive age are derived from the restricted cubic spline curves.

Convenient application for clinical utility

We implemented the final prediction model into a web application to enhance its usability in clinical settings (Figure 4). By entering the actual values required for the optimal algorithm, the application automatically calculates the probability of being underweight, overweight, or obese [17].

Figure 4.

Figure 4

The web application [17] for the probability of childhood underweight, overweight, and obesity.

DISCUSSION

We aimed to identify factors significantly associated with childhood underweight, overweight, and obesity by using machine learning algorithms among children aged 3–14 years from Beijing and Tangshan. Through comprehensive exploration, random forest consistently outperformed the other algorithms in predicting the three weight metrics, achieving an accuracy of over 75%. Moreover, the optimal set of important factors demonstrated a comparable performance to all factors assessed. To our knowledge, this is the first study to use artificial intelligence techniques to identify risk profiles for underweight, overweight, and obesity in Chinese children.

Abnormal body weight is highly prevalent in both adults and children. In most cases, childhood obesity often persists into adulthood [18,19], which has led to increased attention on identifying the factors responsible for overweight and obesity in children. At the same time, childhood underweight remains a major public health issue, with our results showing that approximately one in ten children are underweight. Despite over two decades of intensive research aimed at identifying factors contributing to abnormal weight status, there is no clear consensus on the number and specific factors involved in the predisposition to underweight, overweight, or obesity in children. A major challenge in defining the risk profiles for abnormal weight is statistical methodology, especially with the rise of artificial intelligence, which is set to revolutionise medicinal practice, improving the efficiency and accuracy of disease diagnosis [20,21]. To gain deeper insights, we employed three ensemble learning algorithms, alongside the traditional logistic regression model, to determine which algorithm performed best and to identify the minimal set of factors sufficient to predict childhood underweight, overweight, and obesity.

After a comprehensive comparison of various performance metrics, random forest consistently outperformed the others in predicting childhood underweight, overweight, and obesity. For instance, in terms of accuracy and AUROC, random forest showed superior performance compared to decision tree and gradient boosting machine algorithms, while decision tree ranked highest for Brier’s score across all three weight categories. While the AUROC is a commonly used metric for evaluating the performance of classification models, providing a thorough assessment of a model’s sensitivity and specificity trade-offs at different thresholds, evidence suggests that Brier score may be misleading in case of imbalanced data sets, as it might reflect good overall performance, but fail to capture poor performance in the minority class [22]. Moreover, random forest has been widely adopted in the literature due to its ability to improve prediction accuracy by combining a pre-specified number of decision trees and its effectiveness at handling high-dimensional data with numerous features and reducing the risk of overfitting by constructing multiple decision trees and averaging their results [2327]. During the training process, each decision tree in a random forest considers only a random subset of the data, allowing the algorithm to capture various combinations of features, which improves generalisation ability, stability, and robustness. Additionally, random forest can be easily parallelised, meaning that multi-core processors or distributed computing resources can be used to speed up the training process when working with large data sets. These advantages have led to random forest being widely adopted in clinical settings.

Using random forest, we identified a minimum set of contributing factors for childhood underweight, overweight, and obesity, with comparable prediction performance to models that included all factors under investigation. Notably, three factors – child’s age, parental BMI, and maternal reproductive age – were commonly associated with the risk of underweight, overweight, and obesity in children aged 3–14 years. Additionally, birthweight was exclusively associated with underweight, while fast food intake frequency and sedentary time were specifically linked to overweight and obesity. Maternal reproductive age was found to be exclusively associated with childhood obesity compared to overweight. Consistent with previous studies, parental weight status appears to influence that of their children [2831]. Research shows that a child with one obese parent is three times more likely to become obese in adulthood, and if both parents are obese, the child’s risk increases 10-fold [32]. Conversely, the risk of being underweight is higher in children with thin parents [33]. Regarding parental reproductive age, we used restricted cubic spline curves in our logistic regression analysis to determine cut-off values, showing that both very high or very low maternal age at childbearing were associated with suboptimal weight status in children. Similarly, inappropriate childbearing age was reported to increase the risk of overweight and obesity in children and interact with parental weight status [34]. Our findings suggest that the risk of overweight and obesity in offspring generally increases and then decreases with parental reproductive age, with the highest risk observed in fathers aged 24–30 years and mothers aged 24–28 years, as confirmed by our logistic regression results. In addition, variations in risk profiles for different weight statuses were found, particularly for childhood underweight, and the prevalence of malnutrition in children with low birth weight was significantly higher than in those with normal birth weight [35]. For childhood overweight and obesity, frequent fast food consumption [36] and long sedentary hours [37,38] are well-established risk factors.

In short, the most important risk factors identified in this study aligned with those found in prior studies, demonstrating the interpretability, accuracy, and robustness of our findings. Furthermore, by employing machine learning techniques, we were able to synthesise data on early childhood factors, lifestyles, and family conditions. We also ranked the contribution of relevant factors and developed a prediction tool based on a minimal set of contributing factors. This extends previous research and offers valuable insights for clinical decision-making and individualised prevention and intervention strategies for children at risk for unhealthy weight status, which are of critical importance for public health.

Limitations

This study has several limitations. First, while BMI is an important indicator recommended by the WHO for assessing nutrition status, other body size indices, such as waist-to-hip ratio, waist-to-height ratio, and body roundness index [39], may better reflect fat distributions. However, data on waist and hip circumference were not available for this study. Second, due to the lack of uniform standards for weight status in Chinese children under five years, we applied the WHO-recommended BMI z-score thresholds to define underweight, overweight, and obesity in children. Whether this criterion is appropriate for Chinese children remains largely unknown. Third, the causes of suboptimal weight status in children are multifactorial, but only 31 factors were included in the machine learning algorithms. Most factors in this study were based on self-reported questionnaires, without the inclusion of biochemical markers, limiting the possibility for further analyses of mediating factors or mechanisms. Finally, all participants were sourced from Chinese children in Beijing and Tangshan, so the generalisability of our findings to other countries, regions, and ethnicities is limited due to the absence of external validation.

CONCLUSIONS

Our findings suggest that random forest is the best ensemble learning algorithm for predicting underweight, overweight, and obesity in children aged 3–14 years. We identified the optimal set of significant factors for each weight status and compiled these into a web application to facilitate the broader application of this study. Our findings provide insights into the risk profiles associated with suboptimal weight status in Chinese children and highlight the potential clinical applicability of the developed model in identifying high-risk children with abnormal weight status.

Additional material

jogh-15-04013-s001.pdf (675.9KB, pdf)

Acknowledgements

We are grateful to all participating children and their parents or guardians for their positive cooperation, to the kindergarten or school teachers and health practitioners for their generous help, and to all the researchers for their hard work. We thank the anonymous reviewers whose comments and suggestions helped to improve and clarify this manuscript.

Disclaimer: The views expressed in the submitted article are the author’s own and not an official position of the institution or funder.

Ethics statement: This study was approved by the Ethics Committee of China-Japan Friendship Hospital (2018-93-K67) and Beijing University of Chinese Medicine (2022BZYLL0906). Written informed consent was provided by the parents or guardians of children enrolled.

Data availability: The dataset employed in this study contains identifiable minor data and private data therefore it cannot be fully disclosed. The de-identified dataset is available upon request from the corresponding author.

Footnotes

Funding: This work was supported by the Public Service Development and Reform Pilot Project of Beijing Medical Research Institute, the Capital’s Funds for Health Improvement and Research (grant number: 2024-2-1133), and the National Natural Science Foundation of China (grant number: 81970042).

Authorship contributions: ZZ and WN designed the study. XZ, QW, and ZZ obtained statutory and ethics approvals and contributed to data acquisition. XZ and QW trained the teachers and doctors responsible for the data collection. KC, FZ, and WN performed the statistical analysis. KC wrote the first draft. ZZ and WN are the study guarantors. All authors contributed to the article and approved the submitted version.

Disclosure of interest: The authors completed the ICMJE Disclosure of Interest Form (available upon request from the corresponding author) and disclose no relevant interest.

REFERENCES

  • 1.Concern Worldwide. Malnutrition now 'a major global public health threat'. 2019. Available: https://www.concern.net/press-releases/malnutrition-now-major-global-public-health-threat. Accessed: 23 December 2024.
  • 2.United Nations Children’s Fund, World Health Organization. Child Malnutrition. 2023. Available: https://data.unicef.org/topic/nutrition/malnutrition/. Accessed: 23 December 2024.
  • 3.National Health Commission of the People’s Republic of China. [Report on Chinese 0–6 years old child nutrition development]. 2012. Available: https://www.chinanutri.cn/yyjkzxpt/yyjkkpzx/xcclk/xinxi/201501/t20150115_109818.html. Accessed: 23 December 2024. Chinese.
  • 4.Lister NB, Baur LA, Felix JF, Hill AJ, Marcus C, Reinehr T, et al. Child and adolescent obesity. Nat Rev Dis Primers. 2023;9:24. 10.1038/s41572-023-00435-4 [DOI] [PubMed] [Google Scholar]
  • 5.GBD 2021 Diabetes Collaborators Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet. 2023;402:203–34. 10.1016/S0140-6736(23)01301-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Pan XF, Wang L, Pan A.Epidemiology and determinants of obesity in China. Lancet Diabetes Endocrinol. 2021;9:373–92. 10.1016/S2213-8587(21)00045-0 [DOI] [PubMed] [Google Scholar]
  • 7.González-Domínguez Á, Jurado-Sumariva L, Domínguez-Riscart J, Saez-Benito A, González-Domínguez R.Parental obesity predisposes to exacerbated metabolic and inflammatory disturbances in childhood obesity within the framework of an altered profile of trace elements. Nutr Diabetes. 2024;14:2. 10.1038/s41387-024-00258-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lecorguillé M, Schipper M, O’Donnell A, Aubert AM, Tafflet M, Gassama M, et al. Parental lifestyle patterns around pregnancy and risk of childhood obesity in four European birth cohort studies. Lancet Glob Health. 2023;11:S5. 10.1016/S2214-109X(23)00090-6 [DOI] [PubMed] [Google Scholar]
  • 9.Bertrand-Protat S, Chen J, Jonquoy A, Frayon S, Thu Win Tin S, Ravuvu A, et al. Prevalence, causes and contexts of childhood overweight and obesity in the Pacific region: a scoping review. Open Res Eur. 2023;3:52. 10.12688/openreseurope.15361.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Anderson SE, Andridge R, Whitaker RC.Bedtime in Preschool-Aged Children and Risk for Adolescent Obesity. J Pediatr. 2016;176:17–22. 10.1016/j.jpeds.2016.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Scharf RJ, DeBoer MD.Sleep timing and longitudinal weight gain in 4- and 5-year-old children. Pediatr Obes. 2015;10:141–8. 10.1111/ijpo.229 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yang L, Han S, Miao C, Lou H, Gao G, Lou X, et al. Associations of multiple sleep dimensions with overall and abdominal obesity among children and adolescents: a population-based cross-sectional study. Int J Obes (Lond). 2023;47:817–24. 10.1038/s41366-023-01324-2 [DOI] [PubMed] [Google Scholar]
  • 13.Ferranti R, Marventano S, Castellano S, Giogianni G, Nolfo F, Rametta S, et al. Sleep quality and duration is related with diet and obesity in young adolescent living in Sicily, Southern Italy. Sleep Sci. 2016;9:117–22. 10.1016/j.slsci.2016.04.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Venkatapoorna CMK, Ayine P, Selvaraju V, Parra EP, Koenigs T, Babu JR, et al. The relationship between obesity and sleep timing behavior, television exposure, and dinnertime among elementary school-age children. J Clin Sleep Med. 2020;16:129–36. 10.5664/jcsm.8080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.World Health Organization. Body mass index-for-age (BMI-for-age). 2006. Available: https://www.who.int/toolkits/child-growth-standards/standards/body-mass-index-for-age-bmi-for-age. Accessed: 23 December 2024.
  • 16.World Health Organization. BMI-for-age (5–19 years). 2007. Available: https://www.who.int/tools/growth-reference-data-for-5to19-years/indicators/bmi-for-age. Accessed: 23 December 2024.
  • 17.Capital Institute of Pediatrics. Weight Status Predictor. 2024. Available: https://niuwenquan.shinyapps.io/weightcal/. Accessed: 23 December 2024.
  • 18.The Lancet Diabetes Endocrinology Childhood obesity: a growing pandemic. Lancet Diabetes Endocrinol. 2022;10:1. 10.1016/S2213-8587(21)00314-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Yuan C, Dong Y, Chen H, Ma L, Jia L, Luo J, et al. Determinants of childhood obesity in China. Lancet Public Health. 2024;9:e1105–14. 10.1016/S2468-2667(24)00246-9 [DOI] [PubMed] [Google Scholar]
  • 20.Xiao N, Ding Y, Cui B, Li R, Qu X, Zhou H, et al. Navigating obesity: A comprehensive review of epidemiology, pathophysiology, complications and management strategies. Inn Med. 2024;2:100090. 10.59717/j.xinn-med.2024.100090 [DOI] [Google Scholar]
  • 21.Torres-Martos Á, Anguita-Ruiz A, Bustos-Aibar M, Ramírez-Mena A, Arteaga M, Bueno G, et al. Multiomics and eXplainable artificial intelligence for decision support in insulin resistance early diagnosis: A pediatric population-based longitudinal study. Artif Intell Med. 2024;156:102962. 10.1016/j.artmed.2024.102962 [DOI] [PubMed] [Google Scholar]
  • 22.Yang W, Jiang J, Schnellinger EM, Kimmel SE, Guo W.Modified Brier score for evaluating prediction accuracy for binary outcomes. Stat Methods Med Res. 2022;31:2287–96. 10.1177/09622802221122391 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Attanasi ED, Coburn TC. Random Forest. In: Sagar BSD, Cheng Q, McKinley J, Agterberg A, editors. Encyclopedia of Mathematical Geosciences. Berlin, Germany: Springer; 2021. Available: https://link.springer.com/referenceworkentry/10.1007/978-3-030-26050-7_265-1. Accessed: 23 December 2024. [Google Scholar]
  • 24.Fawagreh K, Gaber MM, Elyan E.Random forests: from early developments to recent advancements. Syst Sci Control Eng. 2014;2:602–9. 10.1080/21642583.2014.956265 [DOI] [Google Scholar]
  • 25.Iranzad R, Liu X.A review of random forest-based feature selection methods for data science education and applications. Int J Data Sci Anal. 2024;24:1–15. 10.1007/s41060-024-00509-w [DOI] [Google Scholar]
  • 26.Parmar A, Katariya R, Patel V. A Review on Random Forest: An Ensemble Classifier. In: Hemanth J, Fernando X, Lafata P, Baig Z, editors. International Conference on Intelligent Data Communication Technologies and Internet of Things. Lecture Notes on Data Engineering and Communications Technologies. Cham, Berlin, Germany: Springer, Cham; 2018. p.758–753. [Google Scholar]
  • 27.Schonlau M, Zou RY.The random forest algorithm for statistical learning. Stata J. 2020;20:3–29. 10.1177/1536867X20909688 [DOI] [Google Scholar]
  • 28.Wang Y, Min J, Khuri J, Li M.A Systematic Examination of the Association between Parental and Child Obesity across Countries. Adv Nutr. 2017;8:436–48. 10.3945/an.116.013235 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Olden K.A bad start for socioeconomically disadvantaged children. Environ Health Perspect. 1996;104:462–3. 10.1289/ehp.96104462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Andriani H, Liao C-Y, Kuo H-W.Parental weight changes as key predictors of child weight changes. BMC Public Health. 2015;15:645. 10.1186/s12889-015-2005-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lindholm A, Almquist-Tangen G, Alm B, Bremander A, Dahlgren J, Roswall J, et al. Early rapid weight gain, parental body mass index and the association with an increased waist-to-height ratio at 5 years of age. PLoS One. 2022;17:e0273442. 10.1371/journal.pone.0273442 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lin X, Li H.Obesity: Epidemiology, Pathophysiology, and Therapeutics. Front Endocrinol (Lausanne). 2021;12:706978. 10.3389/fendo.2021.706978 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wells JC, Sawaya AL, Wibaek R, Mwangome M, Poullas MS, Yajnik CS, et al. The double burden of malnutrition: aetiological pathways and consequences for health. Lancet. 2020;395:75–88. 10.1016/S0140-6736(19)32472-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Deng R, Lou K, Zhou SL, Li XX, Zou ZY, Ma YH, et al. [Relationship between parental reproductive age and the risk of overweight and obesity in offspring]. Zhonghua Yu Fang Yi Xue Za Zhi. 2022;56:583–9. Chinese. [DOI] [PubMed] [Google Scholar]
  • 35.Rahman MS, Howlader T, Masud MS, Rahman ML.Association of Low-Birth Weight with Malnutrition in Children under Five Years in Bangladesh: Do Mother’s Education, Socio-Economic Status, and Birth Interval Matter? PLoS One. 2016;11:e0157814. 10.1371/journal.pone.0157814 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Jia P, Luo M, Li Y, Zheng JS, Xiao Q, Luo J.Fast-food restaurant, unhealthy eating, and childhood obesity: A systematic review and meta-analysis. Obes Rev. 2021;22:e12944. 10.1111/obr.12944 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Barnett TA, Kelly AS, Young DR, Perry CK, Pratt CA, Edwards NM, et al. Sedentary Behaviors in Today’s Youth: Approaches to the Prevention and Management of Childhood Obesity: A Scientific Statement From the American Heart Association. Circulation. 2018;138:e142–59. 10.1161/CIR.0000000000000591 [DOI] [PubMed] [Google Scholar]
  • 38.Russell CG, Taki S, Laws R, Azadi L, Campbell KJ, Elliott R, et al. Effects of parent and child behaviours on overweight and obesity in infants and young children from disadvantaged backgrounds: systematic review with narrative synthesis. BMC Public Health. 2016;16:151. 10.1186/s12889-016-2801-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Zhang X, Ma N, Lin Q, Chen K, Zheng F, Wu J, et al. Body Roundness Index and All-Cause Mortality Among US Adults. JAMA Netw Open. 2024;7:e2415051. 10.1001/jamanetworkopen.2024.15051 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

jogh-15-04013-s001.pdf (675.9KB, pdf)

Articles from Journal of Global Health are provided here courtesy of International Society for Global Health

RESOURCES