Abstract
Objective
To develop an accurate and interpretable height estimation model for children and adolescents using body composition variables and explainable artificial intelligence approaches.
Methods
A light gradient boosting method was employed on a dataset of 278,301 measurements from 54,374 children and adolescents aged 6–18 years. The model incorporated anthropometric and body composition measures. Model interpretability was enhanced through feature importance analysis, Shapley additive explanations, partial dependence plots, and accumulated local effects.
Results
The models achieved high accuracy with mean absolute percentage errors of 1.64% and 1.63% for boys and girls, respectively. Soft lean mass (SLM), body fat mass percentage (BFMP), skeletal muscle mass, and skeletal muscle mass percentage were consistently identified as key factors influencing height estimation. Analysis revealed a positive correlation between SLM and estimated height, while BFMP exhibited an inverse relationship with height projections.
Conclusion
These findings provide valuable insights into the relationship between body composition and height, underlining the potential of body composition variables as accurate height predictors in children and adolescents. The model's interpretability and accuracy make it a promising tool for pediatric growth assessment and monitoring.
Keywords: Body composition big data, children and adolescents, explainable AI, height estimation, machine-learning
Introduction
During childhood and adolescence, the relationship between body composition and biological growth is a significant aspect of pediatric health and development. Studies have established strong correlations between various body composition variables and height growth.1,2 For instance, studies have shown that children and adolescents with obesity tend to have shorter final adult height than those with normal weight.3,4,5,6 This association has been attributed to the early onset of puberty in children with high body fat, which is generally linked to a shorter final adult height,7,8,9,10,11,12 Furthermore, adequate protein intake is essential for normal growth and development.13,14,15,16
Despite the well-established relationship between body composition and height growth, no comprehensive models integrate multiple body composition variables to predict and analyze height in children and adolescents. This scarcity can be attributed to the lack of large-scale anthropometric and body composition data and appropriate statistical approaches to process and analyze such data effectively. Thus, how body composition variables can be collectively used to elucidate height growth remains unanswered.
This study aimed to address this question by developing a robust height estimation model using biometric big data from children and adolescents, incorporating body composition variables as predictors. This study used an extensive dataset, consisting of 276,301 measurements (145,292 for boys and 133,009 for girls) of 54,374 children and adolescents (28,592 boys and 25,782 girls). The predictor variables included anthropometric measurements (weight) and body composition variables (protein mass, soft lean mass (SLM), body fat mass (BFM), skeletal muscle mass (SMM)), and their respective ratios to the total body weight.
To provide a systematic explanation of the estimated model, explainable artificial intelligence (XAI) approaches were applied.17,18,19,20,21,22,23 Specifically, feature importance, SHapley Additive exPlanations (SHAP), partial dependence plots (PDPs), and accumulated local effect (ALE) plots were used to facilitate an in-depth investigation of the relationship between body composition and height. These methods helped us identify the most influential body composition variables in height estimation models, understand the magnitude and direction of each variable's effect on the estimated height, and visualize the marginal effect of each variable on the estimated height while considering the interactions and nonlinear relationships between variables.
The results of this study will advance our understanding of the complex relationships between body composition and growth in children and adolescents. The large-scale dataset analyzed will provide a unique opportunity for the creation of accurate and robust AI models for height estimation in this population. The use of accurate and interpretable height estimation models will provide valuable insights into the factors that influence height growth during this critical period of development. The findings of this study have important implications for pediatric health assessment, growth monitoring, and interventions by promoting optimal growth and development in children and adolescents.
Materials and methods
Dataset
The dataset analyzed in this study was obtained from the GP Cohort Study, a mixed longitudinal investigation conducted by Global Prediction Co., Ltd (GP) in Gwangmyeong City, Republic of Korea from January 2013 to January 2024. GP specializes in growth research utilizing children's biometric data and has received clinical Good Manufacturing Practices accreditation from the Ministry of Food and Drug Safety for its growth testing software. The GP Cohort Study encompasses elementary, middle, and high school students (aged 7–18 years) in Gyeonggi Province, with approximately 35 schools participating each year and data collection occurring biannually through school visits.
Anthropometric and body composition measurements were conducted using an octopolar multifrequency bioelectrical impedance analyzer (Inbody models J10 and J30, Inbody, Seoul, Korea). Height measurements adhered to CDC guidelines, and body composition examinations were carried out following the manual provided by InBody Co., Ltd highly trained personnel conducted all measurements in accordance with established standard operating procedures.
The dataset analyzed in this study consisted of 278,301 measurements collected from 54,374 children and adolescents (28,592 boys and 25,782 girls). Each measurement record contains 12 distinct attributes: basic information (age, sex), anthropometric measurements (height, weight), and body composition variables (protein mass, SLM, BFM, SMM, and their respective ratios to total body weight). The target variable is the current height (height), whereas the remaining variables serve as input features for machine-learning algorithms. The variables used in the dataset, their descriptions, and data types are presented in Table 1.
Table 1.
Variable descriptions.
Variables | Description | Unit | Type |
---|---|---|---|
Height | Height measurement | cm | numeric |
Sex | Male or female | - | binary |
Age | Monthly age | months | numeric |
Weight | Weight measurement | kg | numeric |
Protein | Protein mass | kg | numeric |
SLM | Soft lean mass | kg | numeric |
BFM | Body fat mass | kg | numeric |
SMM | Skeletal muscle mass | kg | numeric |
PMP | Protein mass / Weight * 100 | % | numeric |
SLMP | SLM / Weight * 100 | % | numeric |
BFMP | BFM / Weight * 100 | % | numeric |
SMMP | SMM / Weight * 100 | % | numeric |
Notes. SLM: soft lean mass; BFM: body fat mass; SMM: skeletal muscle mass; PMP: protein mass percentage; SLMP: soft lean mass percentage; BFMP: body fat mass percentage; SMMP: skeletal muscle mass percentage.
Preprocessing
Based on preliminary data analysis and clinical considerations, data preprocessing included several key steps to ensure data quality and consistency. After converting data types and unifying formats, we removed missing values and calculated age in months to align with clinical practice in growth assessment. Weight measurements were validated against body composition components using age-specific tolerance levels, and height decreases were filtered using defined thresholds. Finally, outlier detection was performed separately for each sex and month group using a rolling window approach with varying window sizes based on sample density. Detailed preprocessing criteria and thresholds are provided in Appendix A.
Training
In this study, the light gradient boosting method (LightGBM) was employed to estimate the height of children and adolescents aged 6–18 years. LightGBM, a gradient boosting framework using tree-based learning algorithms, was selected for its efficiency in processing large-scale datasets and its proven stability and accuracy in handling simple structured data. This method was particularly suitable for our comprehensive anthropometric and body composition measurements. For model evaluation, we implemented a structured data splitting strategy based on individual identification numbers. Using stratified sampling, 20% of individuals were randomly selected for the test set. The remaining 80% were split into training (80%) and validation (20%) sets for hyperparameter optimization. This approach prevented data leakage by ensuring measurements from the same individual remained within a single set. Separate models were developed for boys and girls to capture sex-specific growth patterns. Hyperparameter optimization was performed using grid search on the training set with the fixed validation set. The model's key hyperparameters - maximum depth of trees, maximum number of leaves, number of boosting rounds, and learning rate - were systematically tuned. The optimal configuration was used to train the final model on the complete training set, and performance was evaluated using the independent test set.
Explainable AI
Several XAI techniques were employed, which helped us understand the importance of different features and their effect on height estimation, provided transparency, and built trust in the models. Feature-specific analyses were conducted independently for male and female predictive models, generating distinct sets of results for each sex.
Feature importance: The built-in feature importance functionality of LightGBM was utilized to identify the most influential variables in the models. This parameter is calculated based on the number of times a feature is used to split the data across all trees in the model. By ranking the features according to their importance scores, the variables with the greatest effect on a model's predictions were determined. This information is valuable in understanding the key drivers of height estimation and can guide future research and data collection efforts.
SHAP: This game-theoretic approach assigns each feature an importance value for a particular estimation by considering the contribution of the feature to the output of the model. The influence of each feature on the estimated height was assessed by computing the SHAP values for each instance in the dataset. In this study, the SHAP values were visualized using summary plots. These plots provided insights into the magnitude and direction of the feature's effect on the estimated height. Features with positive SHAP values contributed to an increase in estimated height, whereas those with negative SHAP values contributed to a decrease. In addition, the mean absolute SHAP values were calculated to quantify the overall importance of each feature, allowing us to rank the variables according to their influence on the estimations of the model.
PDPs: This parameter shows how the estimations by the model change on average, as the value of a single feature varies while holding all other features constant. By visualizing the marginal effect of each feature on the estimated height, the PDPs provide a clear understanding of the influence of each feature. PDPs are useful in identifying nonlinear relationships between features and estimated heights and detecting any interactions between features.
ALE plots: While PDPs assume that features are independent, ALE plots consider the actual distribution of features and their correlations. These plots display the cumulative effect of a feature on the estimations by the model, considering the range of values that the feature takes in the dataset.
Ethics statement
This study was approved by the Institutional Review Board of Korea University Ansan Hospital, Gyeonggi-do, Korea (IRB No. 2024AS0248). Prior to study initiation, written informed consent was obtained from all participants and their legal guardians, as all subjects were minors. The consent process included detailed explanations of the study purpose, procedures, potential risks and benefits, and the voluntary nature of participation. Students who did not provide consent or whose guardians did not provide consent were excluded from the study. Additionally, students who were unable to comply with the standard measurement procedures were excluded.
Statistics
The distributions of key variables in the training and test datasets were analyzed to assess their representativeness and comparability. For each variable, we calculated the median and interquartile range (IQR). The trained models were evaluated using a separate test set that was not used during training. For the comparison of the estimated and actual height values, two widely used performance metrics were calculated: root mean squared error (RMSE) and mean absolute percentage error (MAPE). RMSE measures the average magnitude of the estimation errors and indicates the overall accuracy of the model. However, the MAPE expresses the average percentage difference between the estimated and actual values, offering a more intuitive understanding of the error relative to the target. A bootstrapping procedure was employed to analyze the model's stability and reliability. The test set was randomly divided into 50 subsets, and performance metrics (RMSE and MAPE) were calculated for each subgroup. Standard deviation (SD) and 95% confidence intervals (CI) were then estimated for each metric, stratified by age and sex.
All statistical analyses were conducted using Python version 3.9.7, developed by the Python Software Foundation, and R version 4.2.2, maintained by the R Foundation for Statistical Computing.
Results
Baseline characteristics
Table 2 presents the baseline characteristics of the study population, stratified by sex and dataset (training plus validation vs. test). The entire dataset of 278,301 measurements, each with 12 attributes, was split into two parts: 80% for training plus validation (further split into 80% training, 20% validation) and 20% for the test set by sex. For males, the training plus validation set had 114,121 measurements (22,873 individuals) and the test set 29,171 measurements (5719 individuals). For females, the training plus validation set contained 106,072 measurements (20,627 individuals) and the test set 26,937 measurements (5157 individuals). IQR is presented in square brackets.
Table 2.
Baseline characteristics.
Panel (a) Male | ||
---|---|---|
Variables | Training + validation set (n = 114,121) | Test set (n = 29,171) |
Number of Entities | 22,873 | 5719 |
Months, median [IQR] | 118.00 [98.00–141.00] | 118.00 [99.00–141.00] |
Weight, median [IQR] | 36.20 [28.10–49.00] | 36.20 [28.30–48.90] |
Protein, median [IQR] | 5.40 [4.50–7.10] | 5.40 [4.50–7.00] |
SLM, median [IQR] | 25.80 [21.50–33.70] | 25.90 [21.60–33.50] |
BFM, median [IQR] | 7.40 [4.50–13.00] | 7.20 [4.50–12.90] |
SMM, median [IQR] | 14.30 [11.50–19.20] | 14.30 [11.60–19.10] |
Height, median [IQR] | 139.30 [129.70–152.60] | 139.50 [129.90–152.30] |
Notes. The table presents the median and interquartile range [1st quartile–3rd quartile] for variables in the training and test sets. Panel (a) shows data for male subjects, while panel (b) displays data for female subjects. The number of data points (n) and unique individuals (Number of Entities) are also reported for each set. IQR: interquartile range; SLM: soft lean mass; BFM: body fat mass; SMM: skeletal muscle mass.
In the male training set, the median age was 118.00 months [98.00–141.00], height 139.30 cm [129.70–152.60], and weight 36.20 kg [28.10–49.00]. Median body composition values were: protein mass 5.40 kg [4.50–7.10], SLM 25.80 kg [21.50–33.70], BFM 7.40 kg [4.50–13.00], and SMM 14.30 kg [11.50–19.20]. For females in the training set, the median age was 116.00 months [97.00–138.00], height 137.90 cm [128.00–150.90], and weight 33.30 kg [26.40–43.50]. Median body composition values were: protein mass 5.00 kg [4.10–6.30], SLM 23.80 kg [19.80–30.20], BFM 7.70 kg [5.00–12.00], and SMM 13.00 kg [10.50–16.90]. The test set showed similar values for both sexes, indicating an adequate balance between the training and test sets.
Estimation accuracy
The estimation accuracy of the developed models was evaluated by age (Table 3 and Figure 1). In the assessment of the overall estimation performance, the male model showed an RMSE of 2.98 ± 0.07 cm and a MAPE of 1.64 ± 0.04%, while the female model achieved an RMSE of 2.89 ± 0.08 cm and a MAPE of 1.63 ± 0.05%. Further analysis of model performance metrics along with detailed bootstrap analysis results can be found in Appendix Table B1. Considering the average height difference between boys and girls, both models appear to have comparable prediction accuracy. Further analysis of accuracy by age revealed that the estimation accuracy generally decreased with increasing age, with a notable inflection point at age 12 for boys and age 11 for girls. For instance, the MAPEs for boys were 1.60 ± 0.20% at age 6 and 1.67 ± 0.16% at age 11. However, for boys aged ≥15 years, the MAPE increased to 1.83 ± 0.24%, indicating a significant reduction in estimation accuracy. Similarly, for girls, the MAPE was 1.60 ± 0.20% at age 6 and 1.60 ± 0.16% at age 10 but substantially increased to 1.90 ± 0.33% for those aged ≥15 years. This can be attributed to substantial individual variations in the timing, velocity, and duration of growth during the pubertal growth period.
Table 3.
Estimation results by age.
Age | Male | Female | ||
---|---|---|---|---|
RMSE (cm) | MAPE (%) | RMSE (cm) | MAPE (%) | |
Total | 2.98 ± 0.07 (2.83 to 3.12) |
1.64 ± 0.04 (1.56 to 1.72) |
2.89 ± 0.08 (2.73 to 3.05) |
1.63 ± 0.05 (1.54 to 1.72) |
6 | 2.38 ± 0.28 (1.82 to 2.94) |
1.60 ± 0.20 (1.21 to 1.98) |
2.40 ± 0.28 (1.85 to 2.94) |
1.60 ± 0.20 (1.21 to 1.99) |
7 | 2.41 ± 0.19 (2.03 to 2.79) |
1.53 ± 0.12 (1.28 to 1.77) |
2.43 ± 0.22 (2.01 to 2.86) |
1.56 ± 0.14 (1.28 to 1.83) |
8 | 2.61 ± 0.18 (2.26 to 2.97) |
1.59 ± 0.12 (1.35 to 1.83) |
2.57 ± 0.18 (2.21 to 2.93) |
1.57 ± 0.11 (1.35 to 1.80) |
9 | 2.79 ± 0.24 (2.32 to 3.25) |
1.62 ± 0.14 (1.35 to 1.89) |
2.74 ± 0.20 (2.35 to 3.12) |
1.60 ± 0.13 (1.34 to 1.86) |
10 | 2.94 ± 0.23 (2.48 to 3.40) |
1.63 ± 0.14 (1.36 to 1.90) |
2.86 ± 0.26 (2.35 to 3.38) |
1.60 ± 0.16 (1.28 to 1.92) |
11 | 3.12 ± 0.28 (2.57 to 3.67) |
1.67 ± 0.16 (1.37 to 1.98) |
3.21 ± 0.31 (2.60 to 3.82) |
1.70 ± 0.18 (1.35 to 2.05) |
12 | 3.46 ± 0.31 (2.84 to 4.07) |
1.76 ± 0.17 (1.43 to 2.08) |
3.30 ± 0.30 (2.72 to 3.89) |
1.71 ± 0.16 (1.39 to 2.02) |
13 | 3.68 ± 0.53 (2.63 to 4.73) |
1.81 ± 0.24 (1.33 to 2.29) |
3.39 ± 0.56 (2.28 to 4.49) |
1.71 ± 0.28 (1.16 to 2.25) |
14 | 3.55 ± 0.56 (2.45 to 4.65) |
1.69 ± 0.26 (1.19 to 2.19) |
3.52 ± 0.58 (2.37 to 4.66) |
1.78 ± 0.34 (1.12 to 2.44) |
15+ | 3.93 ± 0.51 (2.92 to 4.93) |
1.83 ± 0.24 (1.37 to 2.30) |
3.78 ± 0.59 (2.62 to 4.95) |
1.90 ± 0.33 (1.26 to 2.54) |
Notes. This table presents the RMSE in centimeters and MAPE for male and female height estimation models by age group. Values are reported as mean ± standard deviation (95% CI). RMSE: root mean square error; MAPE: mean absolute percentage error; CI: confidence interval.
Figure 1.
Estimation results by age. The figure shows RMSE and MAPE of height estimation models by age and gender. Blue lines represent males, red lines females. Shaded areas indicate 95% confidence intervals for each metric. RMSE, root mean squared error; MAPE, mean absolute percentage error.
Model explanation
Figure 2 presents the feature importance values derived from the LightGBM models, converted into percentages for both male and female models. The results indicate that the main variables for predicting height are similar between sexes, with BFMP, age (in months), SLM, SLMP, SMMP, and SMM being identified as the key factors.
Figure 2.
Feature importance height estimation models. The importance of each predictor variable for (a) male and (b) female models is presented as a percentage, and the variables are ordered from the most to least important.
To further investigate the effect of these variables on the model's predictions, SHAP analysis was performed. The distribution of SHAP values for each variable, along with their mean absolute SHAP values, is illustrated in Figure 3. Consistent with the feature importance results, SLM, BFMP, SMM, SMMP, and age (in months) were the main contributors to height estimation for boys and girls. The SHAP values indicated that ceteris paribus, increasing the SLM value increased the estimated height, whereas higher BFMP was associated with a decrease in the estimated height. These findings align with those of existing studies demonstrating a positive association between lean body mass and height growth and a negative association between body fat and height growth.
Figure 3.
SHAP summary plots and mean SHAP values. The left side shows the effect of feature values on the model output, with colors indicating high (pink) or low (blue) estimated heights. The right side represents the mean absolute SHAP value for each feature. The variables are ordered from the most to least important. Results are shown for the male (top) and female (bottom) height estimation models. SHAP, Shapley additive explanations.
Furthermore, PDPs were used to further clarify the relationship between body composition and estimated height. The PDPs for each variable in the male and female models are presented in Figures 4 and 5, respectively. The relationships among the SLM, BFMP, SMM, and SMMP and estimated height were monotonic. SLM exhibited a monotonically increasing relationship with the estimated height, whereas BFMP, SMM, and SMMP showed a monotonically decreasing one. However, the relationships were not always perfectly linear, with some slight nonlinearities and variations in the slope across different ranges of feature values.
Figure 4.
PDPs for males. The x-axis represents the range of values for the selected feature, while the y-axis shows the corresponding change in the estimated height. The blue line represents the average change in estimated height as the feature value varies, and the shaded area indicates the confidence interval. PDPs, partial dependence plots.
Figure 5.
PDPs for females. The x-axis represents the range of values for the selected feature, while the y-axis shows the corresponding change in the estimated height. The blue line represents the average change in estimated height as the feature value varies, and the shaded area indicates the confidence interval. PDPs, partial dependence plots.
Finally, to account for the correlations among body composition variables, ALEs were calculated. The ALE plots for each variable in the male and female models are shown in Figures 6 and 7, respectively. As shown, SLM has the most substantial positive influence on the estimated height, whereas BFMP has a significant negative effect. These findings are partially consistent with existing data, which show a positive relationship between lean body mass and height growth and a negative relationship between body fat and height growth in children and adolescents. The relative influence of these features differed between the male and female models, with the male model showing greater effects for SLM, BFMP, and SMM. The ALE plots also highlighted the nonlinear relationships between the features and estimated height, particularly in the higher range of SLM values and lower range of BFMP and SMM values.
Figure 6.
ALE plots for males. The x-axis represents the range of values for the selected feature, while the y-axis shows the corresponding change in the estimated height relative to the average prediction. The blue line represents the ALE of the feature on the estimated height. The y-axis was standardized for the comparison of the effect sizes. ALE, accumulated local effects.
Figure 7.
ALE plots for females. The x-axis represents the range of values for the selected feature, while the y-axis shows the corresponding change in the estimated height relative to the average prediction. The blue line represents the ALE of the feature on the estimated height. The y-axis was standardized for the comparison of the effect sizes. ALE, accumulated local effects.
Discussion
In this study, we developed an accurate and interpretable height estimation model for children and adolescents using body composition variables and ML/XAI approaches. Our model demonstrated high accuracy, achieving mean absolute percentage errors of 1.64% and 1.63% for boys and girls, respectively. These results highlight the potential of using comprehensive body composition data in conjunction with machine learning techniques for pediatric height estimation. Our findings revealed that SLM, BFMP, SMM, and SMMP were consistently identified as key factors influencing height estimation.
Our study employs ML algorithms to improve pediatric growth assessment through comprehensive analysis of body composition measures. This approach aligns with recent trends in digital healthcare, where ML applications have shown promise in various clinical assessments.24,25 Prior research has explored ML algorithms for height prediction in children - such as Shmoish et al.'s work on predicting adult height from early childhood measurements 26 and Mlakar et al.'s development of a novel growth curve comparison method. 27 However, these studies were limited in their ability to comprehensively integrate various physical measurements, despite well-established connections between body composition factors and growth. Recent studies have demonstrated the potential of collecting and analyzing large-scale body composition data in children and adolescents through non-invasive, cost-effective, and efficient methods.28,29 By applying machine learning algorithms to this comprehensive body composition data, we aim to address this critical gap in the literature.
XAI techniques are increasingly used for model interpretation in healthcare settings. For example, Caterson et al. demonstrated the effectiveness of XAI approaches in electronic health record analysis, 30 while Javidi et al. showed the value of interpretable deep learning in understanding pediatric health outcomes. 31 In our study, we employed various XAI techniques including SHAP, feature importance analysis, PDPs, and ALEs, providing valuable insights into the relationship between body composition and height. Through these analyses, we observed a positive correlation between SLM and estimated height and an inverse relationship between BFMP and height projections. Body fat affects the hypothalamic-pituitary-gonadal axis through adipokines such as leptin and ghrelin, insulin, ceramide, and signaling pathways connecting peripheral metabolism and central circuits. These mechanisms influence pubertal timing, where early onset leads to shorter predicted height. 32 Additionally, we observed decreased prediction accuracy with age, particularly during the pubertal period. This reflects that growth spurts during puberty are more influenced by bone age than chronological age. Future studies incorporating bone age assessment alongside chronological age could improve prediction accuracy during pubertal development.33,34
From a clinical perspective, our findings contribute to pediatric health assessment and growth monitoring. While height and weight measurements remain fundamental, our model offers several practical advantages in clinical settings. First, the integration of body composition data could help clinicians identify growth patterns that might be missed by traditional anthropometric measurements alone, particularly in cases where body composition changes precede visible height changes. Second, this model could enhance routine growth monitoring by providing early indicators of potential growth abnormalities through body composition analysis. The development of comprehensive growth charts incorporating these metrics would give clinicians additional reference tools for more precise growth assessment.
Since 1985, recombinant human growth hormone (rhGH) therapy has been used to treat children with short stature. Growth prediction models that reflect individual patient characteristics help predict rhGH therapy response. While several growth prediction models have been developed over the past decades,35,36,37,38 their clinical application remains limited. Our model's integration of comprehensive body composition data suggests a potential new approach to predicting treatment responses in rhGH therapy patients. To move toward clinical implementation, comparative validation studies with existing prediction models will be necessary across different types of short stature conditions to evaluate prediction accuracy and assess the model's practical utility in monitoring treatment outcomes.
However, it is important to acknowledge the limitations of our study. The dataset, while large, is limited to Korean children and adolescents from the Gyeonggi Province region, which may limit the generalizability of our findings to other populations. While we expect similar biological mechanisms of growth and puberty across populations, significant variations exist in growth patterns, timing of puberty, and body composition across different ethnicities and socioeconomic contexts. For instance, studies have shown differences in body composition and growth trajectories between Asian and Western populations, with Asian children typically showing different fat distribution patterns and earlier onset of puberty. Additionally, socioeconomic disparities within and between countries can significantly impact growth through factors such as nutrition, healthcare access, and environmental conditions. These population-specific variations could affect the model's performance when applied to different ethnic groups or socioeconomic contexts. Future validation studies across diverse populations are needed to assess and adapt the model's applicability beyond Korean children and adolescents.
Furthermore, the cross-sectional nature of our study allows us to identify correlations but not causal relationships between body composition and height. While our findings suggest potential associations, longitudinal studies are needed to confirm these relationships and establish temporal sequences. Our model is also limited to body composition variables, excluding several factors known to influence height development such as genetic information, 39 lifestyle habits such as dietary and sleep patterns, 40 socio-economic factors, 41 and health information including obesity treatment. 42 The absence of these factors likely affects our model's predictive capacity and limits our understanding of the complex interactions involved in height development. Although working with such a large-scale population made it challenging to obtain detailed individual-level information, future studies incorporating these additional variables could provide more comprehensive height estimation models and better capture the multifaceted nature of growth development in children and adolescents.
In conclusion, our study demonstrates the potential of using body composition variables and AI techniques for accurate and interpretable height estimations in children and adolescents. The findings provide a foundation for future research in pediatric growth assessment and could contribute to more personalized approaches in child health and development monitoring. As we continue to refine these models and integrate them into clinical practice, they have the potential to significantly enhance our ability to evaluate and optimize children's growth and development.
Conclusion
This study developed an ML model for height prediction in children and adolescents using body composition data. Our findings revealed that SLM, BFMP, SMM, and SMMP are key predictors of height, with the model achieving high accuracy. The XAI approach provided valuable insights into body composition-height relationships, showing positive correlations with lean mass and negative correlations with body fat percentage. While our study is limited by its population specificity and cross-sectional nature, it establishes a foundation for incorporating body composition analysis into pediatric growth assessment. The model's accuracy and interpretability make it a promising tool for clinical practice, potentially supporting growth monitoring and early detection of growth disorders.
Supplemental Material
Supplemental material, sj-docx-1-dhj-10.1177_20552076251331879 for Height estimation in children and adolescents using body composition big data: Machine-learning and explainable artificial intelligence approach by Dohyun Chun, Taesung Chung, Jongho Kang, Taehoon Ko, Young-Jun Rhie and Jihun Kim in DIGITAL HEALTH
ORCID iDs: Dohyun Chun https://orcid.org/0000-0003-3031-4011
Young-Jun Rhie https://orcid.org/0000-0002-1250-6469
Jihun Kim https://orcid.org/0000-0002-2957-8776
Statements and declarations
Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by “Regional Innovation Strategy (RIS)” through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (MOE) in 2024 (2022RIS-005).
Conflict of interest: The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dohyun Chun, Jongho Kang, and Jihun Kim are employees of and hold stocks in Global Prediction Co., Ltd Taesung Chung is an employee of Global Prediction Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Data availability: The materials used in this study were provided by Global Prediction Co., Ltd Due to privacy and proprietary considerations, they can be made available through appropriate data sharing agreements upon reasonable request to the corresponding author.
Supplemental material: Supplemental material for this article is available online.
References
- 1.Johnson W, Stovitz SD, Choh AC, et al. Patterns of linear growth and skeletal maturation from birth to 18 years of age in over weight young adults. Int J Obes 2012; 36: 535–541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Dalskov S, Müller M, Ritz C, et al. Effects of dietary protein and glycaemic index on biomarkers of bone turnover in children. Br J Nutr 2013; 109: 1253–1262. [DOI] [PubMed] [Google Scholar]
- 3.Costello AMDL. Growth velocity and stunting in rural Nepal. Arch Dis Child 1989; 64: 1478–1482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Tse WY, Hindmarsh PC, Brook CGD. The infancy-childhood puberty model of growth: clinical aspects. Acta Paediatr 1989; 78: 38–43. [DOI] [PubMed] [Google Scholar]
- 5.He Q, Karlberg J. BMI In childhood and its association with height gain, timing of puberty, and final height. Pediatr Res 2001; 49: 244–251. [DOI] [PubMed] [Google Scholar]
- 6.Brener A, Bello R, Lebenthal Y, et al. The impact of adoles cent obesity on adult height. Horm Res Paediatr 2017; 88: 237–243. [DOI] [PubMed] [Google Scholar]
- 7.Morrison JA, Barton B, Biro FM, et al. Sexual maturation and obesity in 9-and 10-year-old black and white girls: the national heart, lung, and blood institute growth and health study. J Pediatr 1994; 124: 889–895. [DOI] [PubMed] [Google Scholar]
- 8.Kaplowitz PB, Slora EJ, Wasserman RC, et al. Earlier onset of puberty in girls: relation to increased body mass index and race. Pediatrics 2001; 108: 347–353. [DOI] [PubMed] [Google Scholar]
- 9.Davison KK, Susman EJ, Birch LL. Percent body fat at age 5 predicts earlier pubertal development among girls at age 9. Pediatrics 2003; 111: 815–821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lee JM, Appugliese D, Kaciroti N, et al. Weight status in young girls and the onset of puberty. Pediatrics 2007; 119: e624–e630. [DOI] [PubMed] [Google Scholar]
- 11.Lee JM, Kaciroti N, Appugliese D, et al. Body mass index and timing of pubertal initiation in boys. Arch Pediatr Adolesc Med 2010; 164: 139–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lorentzon M, Norjavaara E, Kindblom JM. Pubertal timing predicts leg length and childhood body mass index predicts sitting height in young adult men. J Pediatr 2011; 158: 452–457. [DOI] [PubMed] [Google Scholar]
- 13.Jamison DT, Breman JG, Measham AR, et al. Disease control priorities in developing countries. 2nd edition. Washington, DC: The World Bank, 2006. [PubMed] [Google Scholar]
- 14.Akachi Y, Canning D. The height of women in Sub-Saharan Africa: the role of health, nutrition, and income in child hood. Ann Hum Biol 2007; 34: 397–410. [DOI] [PubMed] [Google Scholar]
- 15.Berkey CS, Colditz GA, Rockett HR, et al. Dairy consumption and female height growth: prospective cohort study. Cancer Epidemiol Biomarkers Prev 2009; 18: 1881–1887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Alimujiang A, Colditz GA, Gardner JD, et al. Childhood diet and growth in boys in relation to timing of puberty and adult height: the longitudinal studies of child health and development. Cancer Causes Control 2018; 29: 915–926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Karim MR, Islam T, Shajalal M, et al. Explainable AI for bioinformatics: methods, tools and applications. Brief Bioinform 2023; 24: 1–22. [DOI] [PubMed] [Google Scholar]
- 18.Liu Y, Herrin J, Huang C, et al. Nonexercise machine learning models for maximal oxygen uptake prediction in national population surveys. J Am Med Inform Assoc 2023; 30: 943–952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Maouche I, Terrissa LS, Benmohammed K, et al. An explain able AI approach for breast cancer metastasis prediction based on clinicopathological data. IEEE Trans Biomed Eng 2023; 70: 3321–3329. [DOI] [PubMed] [Google Scholar]
- 20.Park A, Shim JE, Shin W, et al. A comprehensive evaluation of regression-based drug responsiveness prediction models, using cell viability inhibitory concentrations (IC50 values). Bioinformatics 2022; 38: 2810–2817. [DOI] [PubMed] [Google Scholar]
- 21.Moncada-Torres A, van Maaren MC, Hendriks MP, et al. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci Rep 2021; 11: 6968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Baik SM, Hong KS, Park DJ. Deep learning approach for early prediction of COVID-19 mortality using chest X-ray and electronic health records. BMC Bioinform 2023; 24: 190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bottino L, Cannataro M. Explanation of machine learning models for predicting obesity level using Shapley values. In: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) , 2023, pp.3288–3291: IEEE. [Google Scholar]
- 24.Luo X, Ding H, Broyles A, et al. Using machine learning to detect sarcopenia from electronic health records. Digital Health 2023; 9: 20552076231197098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kustiawan TC, Nadhiroh SR, Ramli R, et al. Use of mobile app to monitoring growth outcome of children: a systematic literature review. Digital Health 2022; 8: 20552076221138641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Shmoish M, German A, Devir N, et al. Prediction of adult height by machine learning technique. J Clin Endocrinol Metab 2021; 106: e2700–e2710. [DOI] [PubMed] [Google Scholar]
- 27.Mlakar M, Gradišek A, Luštrek M, et al. Adult height prediction using the growth curve comparison method. PLoS One 2023; 18: e0281960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Chun D, Kim SJ, Suh J, et al. Big data-based reference centiles for body composition in Korean children and adolescents: a cross-sectional study. BMC Pediatr 2024; 24: 692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chun D, Kim SJ, Suh J, et al. Timing, velocity, and magnitude of pubertal changes in body composition: a longitudinal study. Pediatr Res 2024; 97: 293–300. [DOI] [PubMed] [Google Scholar]
- 30.Caterson J, Lewin A, Williamson E. The application of explainable artificial intelligence (XAI) in electronic health record research: a scoping review. Digital Health 2024; 10: 20552076241272657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Javidi H, Mariam A, Alkhaled L, et al. An interpretable predictive deep learning platform for pediatric metabolic diseases. J Am Med Inform Assoc 2024; 31: 1227–1238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Chung YL, Rhie YJ. Severe obesity in children and adolescents: metabolic effects, assessment, and treatment. J Obes Metab Syndr 2021; 30: 326–335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Nam HK, Lea WWI, Yang Z, et al. Clinical validation of a deep-learning-based bone age software in healthy Korean children. Ann Pediatr Endocrinol Metab 2024; 29: 102–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Park KH, Gwag SH, Kim YJ, et al. Long-term efficacy of triptorelin 3-month depot in girls with central precocious puberty. J Korean Soc Pediatr Endocrinol 2024; 29: 161–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ranke MB, Lindberg A, Chatelain P, et al. Derivation and validation of a mathematical model for predicting the response to exogenous recombinant human growth hormone (GH) in pre pubertal children with idiopathic GH deficiency. J Clin Endocrinol Metab 1999; 84: 1174–1183. [DOI] [PubMed] [Google Scholar]
- 36.Ranke MB, Cutfield WS, Lindberg A, et al. A growth prediction model for short children born small for gestational age. J Pediatr Endocrinol Metab 2002; 15: 1273. [PubMed] [Google Scholar]
- 37.Wikland KA, Kriström B, Rosberg S, et al. Validated multi variate models predicting the growth response to GH treatment in individual short children with a broad range in GH secretion capacities. Pediatr Res 2000; 48: 475–484. [DOI] [PubMed] [Google Scholar]
- 38.Loftus J, Lindberg A, Aydin F, et al. Individualised growth response optimisation (iGRO) tool: an accessible and easy-to-use growth prediction system to enable treatment optimisation for children treated with growth hormone. J Pediatr Endocrinol Metab 2017; 30: 1019–1026. [DOI] [PubMed] [Google Scholar]
- 39.Cousminer DL, Berry DJ, Timpson NJ, et al. Genome-wide association and longitudinal analyses reveal genetic loci linking pubertal height growth, pubertal timing and childhood adiposity. Hum Mol Genet 2013; 22: 2735–2747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Esfarjani SV, Zamani M, Ashrafizadeh SS, et al. Associa tion between lifestyle and height growth in high school students. J Fam Med Prim Care 2023; 12: 3279–3284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Gao M, Wells JC, Johnson W, et al. Socio-economic dispari ties in child-to-adolescent growth trajectories in China: findings from the China health and nutrition survey 1991–2015. Lancet Reg Health West Pac 2022; 21: 100399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Putri RR, Danielsson P, Marcus C, et al. Height and growth velocity in children and adolescents undergoing obesity treatment: a prospective cohort study. J Clin Endocrinol Metab 2024; 109: e314–e320. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, sj-docx-1-dhj-10.1177_20552076251331879 for Height estimation in children and adolescents using body composition big data: Machine-learning and explainable artificial intelligence approach by Dohyun Chun, Taesung Chung, Jongho Kang, Taehoon Ko, Young-Jun Rhie and Jihun Kim in DIGITAL HEALTH