Abstract
Background
Height gain in children with growth disorders undergoing recombinant human growth hormone (rhGH) therapy shows considerable variability. Predicting treatment outcomes is essential for optimizing individualized treatment strategies.
Objective
To develop and evaluate a predictive model using clinical data to assess early height growth response in children with growth disorders undergoing rhGH therapy.
Methods
A total of 786 children were included, randomly split into a derivation cohort (N = 551) and a test cohort (N = 235). Multiple machine learning models were built in the derivation cohort, including logistic regression, decision tree, random forest, XGBoost, LightGBM, and multilayer perceptron (MLP). Model performance was evaluated in the test cohort using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and accuracy metrics. Input variables included chronological age, height standard deviation score (HSDS), body mass index standard deviation score (BSDS), IGF-1, and the difference between bone age and chronological age (BA-CA).
Results
The random forest and MLP models performed best. The random forest model achieved an AUROC of 0.9114 and an AUPRC of 0.8825. The MLP model showed accuracy, precision, specificity, and F1 scores of 0.8468, 0.8208, 0.8583, and 0.8246, respectively. Chronological age, BA-CA, HSDS, and BSDS were the most influential variables. The decision tree identified HSDS ≥ -0.72 as the primary split point.
Conclusion
Machine learning models, especially random forest and MLP, predict height gain effectively in children receiving rhGH therapy, aiding personalized treatment. Despite MLP’s strong performance, its “black-box” nature may limit clinical adoption. Future work should focus on enhancing model interpretability.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12902-025-01991-4.
Keywords: Recombinant human growth hormone, Growth disorders, Short-term response, Predictive model, Machine learning
Introduction
Recombinant human growth hormone (rhGH) was initially approved in 1985 for the treatment of children with growth hormone deficiency (GHD). Since then, it has received approval for various other pediatric conditions linked to growth disorders, including idiopathic short stature, Turner syndrome, being small for gestational age, chronic renal insufficiency, Prader-Willi syndrome, Noonan syndrome, and SHOX gene defects. In children with GHD, rhGH therapy has become a standard treatment, widely used to promote height gain and improve quality of life [1–6].
Despite the proven efficacy of rhGH therapy, there is considerable variability in treatment response among patients [7]. Multiple factors influence the outcome, including age, sex, bone age, baseline height, and growth hormone levels [8, 9]. The complex interactions among these factors make it challenging for clinicians to predict individual responses to rhGH therapy, particularly during the early stages of treatment. Therefore, there is an urgent need for an accurate and practical predictive model to assess short-term response to rhGH therapy, aiding optimised treatment decisions and avoiding unnecessary long-term treatment.
In recent years, machine learning methods have gained significant attention in medical research due to their ability to handle complex, multidimensional data [10]. Compared to traditional statistical methods, machine learning can identify nonlinear relationships in high-dimensional data and offers strong predictive and generalisation capabilities. This potential makes machine learning an attractive approach for constructing predictive models in medicine [11, 12].
In this context, we aim to develop and validate a predictive model using machine learning to evaluate the short-term height response in children with growth disorders undergoing rhGH therapy. Our goal is to create a user-friendly and accurate model that helps clinicians predict individual treatment outcomes early, facilitating personalized therapy. To achieve this, we analysed extensive clinical data from a large cohort of paediatric patients. We also assessed the accuracy and robustness of the model, and further explored the clinical significance of key predictive factors within the model.
Methods
Study design and patient selection
This study is a retrospective cohort study conducted in China. Paediatric patients who initiated rhGH treatment in the pediatric department of a tertiary hospital from January 2010 onwards were retrospectively selected from hospital records. The index date was defined as the date when rhGH treatment was initiated. The follow-up period was 12 months. The inclusion criteria were as follows: (1) age between 3 and 15 years, the age range where rhGH is commonly administered; (2) a minimum treatment duration of 180 days to ensure a sufficient period for assessing the therapeutic response; and (3) the availability of height measurements at both baseline and 12 months post-treatment initiation. Patients were excluded if they had chronic diseases affecting growth, such as chronic kidney disease or diabetes.
After applying these criteria, 786 patients remained and were randomly allocated in a 7:3 ratio into two groups: the derivation cohort, consisting of 551 patients, was used for model development, while the test cohort, comprising 235 patients, was utilized for model performance evaluation. A schematic representation of the patient flow is shown in Fig. 1.
Fig. 1.
Patient flow
Study outcome
The study outcome was the efficacy of rhGH therapy over a 12-month period. The change in height standard deviation score (△HSDS) was used as the outcome measure, calculated as the difference between the baseline HSDS and the HSDS after 12 months of treatment initiation. A threshold of △HSDS ≥ 0.5 was chosen to define a good response, as this value has been previously established in the literature as a clinically significant change in growth velocity following rhGH therapy [13]. Conversely, a poor response was indicated by a △HSDS < 0.5.
Data collection
Data were collected from the electronic health record system. Baseline data were collected within a 90-day window before or after the initiation of rhGH therapy, including variables such as sex, chronological age, weight, height, bone age, baseline growth hormone levels (GH stimulation test results), insulin-like growth factor-1 (IGF-1) levels, thyroid function, and parental heights. Height growth data were also collected at 12 months (± 90 days) after treatment initiation. When multiple records were available, the measurement closest to the designated data collection timepoint was utilized.
Height standard deviation score (HSDS), weight standard deviation score (WSDS), and body mass index standard deviation score (BSDS) were calculated by comparing individual measurements to age- and sex-specific reference values in China [14, 15]. The sex-adjusted mid-parental height (MPH) was calculated as follows: (father’s height + mother’s height)/2 ± 6.5 cm (- for girls, + for boys). MPH was transformed into SDS using the aforementioned reference standard.
Variable selection and processing
During model construction, we initially collected variables identified in previous research as being closely related to growth hormone treatment response. Further evaluation of data completeness was conducted, and only variables with a missing rate < 20% were included in the analysis. Ultimately, the following 11 variables were selected for modelling: sex, chronological age, MPH SDS, HSDS, WSDS, BSDS, IGF-1, difference between bone age and chronological age (BA-CA), use of long-acting growth hormone (long-acting), medication possession ratio (MPR), and initial dose (initiating dose). Missing data were imputed using multiple imputation methods based on fully conditional specification, applied separately to the derivation cohort and the test cohort.
Model development
Six modelling methods were used to construct the predictive models in the derivation cohort: logistic regression, decision tree, random forest, XGBoost, LightGBM, and multilayer perceptron (MLP). For logistic regression, Lasso feature selection was employed. The optimal regularization parameter was determined using 10-fold cross-validation to minimize the cross-validation error. A logistic regression model was subsequently constructed with the selected features. Hyperparameters for decision tree, random forest, XGBoost, LightGBM, and MLP models were optimized through a grid search approach, using 10-fold cross-validation to maximize the area under the receiver operating characteristic curve (AUROC). The decision tree model was constructed with a complexity parameter of 0.01 and required a minimum of 50 observations for leaf nodes. The random forest model was configured with 3 variables for potential splitting at each node, an extra-trees splitting rule, and a minimal node size of 10, consisting of 500 trees. The key parameters for the XGBoost model included a maximum of 500 boosting iterations, a maximum tree depth of 3, a learning rate of 0.1, no minimum loss reduction for further partitioning, a subsample ratio of columns set at 0.5, a minimum hessian of 1 for child nodes, and a full subsample ratio of 1 for training instances. The LightGBM model was configured with a learning rate of 0.1 and trained for 500 rounds. For the MLP model, the optimal number of units in the hidden layer was set to 3, and the weight decay was set to 0.5. The model was trained for a maximum of 500 iterations.
We subsequently determined the optimal cut-off points for binary prediction for each constructed model. The R package cutpointr was utilized to optimize these cut-off values by maximizing the Youden Index [16].
Model performance evaluation
The performance of the models, constructed using each of the six modelling methods, was evaluated on an independent test cohort. Evaluation metrics included AUROC, area under the precision-recall curve (AUPRC), accuracy, precision, recall (sensitivity), specificity, and F1 score. Accuracy was calculated as the proportion of total cases correctly identified by the model. Precision was determined by dividing the number of correct positive predictions by the total predicted positives. Recall, or sensitivity, was measured by dividing the number of actual positive cases correctly identified by the model by the total actual positives. Specificity was calculated as the proportion of actual negative cases correctly identified relative to the total actual negatives. The F1 score was computed as the harmonic mean of precision and recall, providing a balance between these two metrics.
Sensitivity analysis was conducted on the impact of imputation using a dataset with complete cases from the test cohort, with all the performance evaluation metrics.
Statistical analysis
All statistical analyses were conducted using R software (version 4.0.5). Continuous variables are expressed as mean ± standard deviation or median (interquartile range), while categorical variables are presented as frequencies and percentages. Comparisons between good and poor responders were made using t-tests or Mann-Whitney U tests for continuous variables and chi-square tests for categorical variables. A significance level of P < 0.05 was considered statistically significant.
Ethical statement
The study was approved by the hospital’s ethics committee (approval number: KY-2023-236), and informed consent was obtained from all participants or their legal guardians prior to study initiation. This study adhered to the ethical principles outlined in the Declaration of Helsinki, ensuring the protection of patient privacy and data security.
Results
In total, 786 eligible children with growth disorders were included in the study. Out of the 551 patients in the derivation cohort, 298 (54.08%) exhibited a poor response (△HSDS < 0.5), while 253 patients (45.92%) showed a good response (△HSDS ≥ 0.5). Out of 235 patients in the test cohort, 130 (55.32%) had a poor response, and 105 patients (44.68%) demonstrated a good response.
Baseline characteristics
The cohort primarily comprised ISS (91.1%), followed by GHD (6.6%), Turner syndrome (1.4%), and rare conditions including SGA (0.4%), hypothyroidism (0.1%), and achondroplasia (0.3%). In the derivation cohort, 47.01% (259 patients) were male and 52.99% (292 patients) were female. Similarly, the test cohort included 47.23% (111 patients) male and 52.77% (124 patients) female participants. The average age of the patients was 10.28 ± 2.54 years (range: 3.84–15.99 years). In the derivation cohort, the mean age was significantly higher in the ΔHSDS < 0.5 group (11.20 ± 1.87 years) compared to the ΔHSDS ≥ 0.5 group (9.18 ± 2.79 years, P < 0.001). A similar trend was observed in the test cohort, with mean ages of 11.29 ± 2.00 years in the ΔHSDS < 0.5 group and 8.38 ± 2.67 years in the ΔHSDS ≥ 0.5 group (P < 0.001) (Supplementary Material 1).
The MPH in the derivation cohort averaged 161.81 ± 7.20 cm, with the ΔHSDS < 0.5 group having an MPH of 161.47 ± 7.01 cm and the ΔHSDS ≥ 0.5 group having an MPH of 162.21 ± 7.41 cm. In the test cohort, the average MPH was 161.69 ± 6.87 cm. Baseline HSDS in the derivation cohort was − 1.01 ± 1.12, with the ΔHSDS < 0.5 group at -0.50 ± 1.03 and the ΔHSDS ≥ 0.5 group at -1.61 ± 0.92 (P < 0.001). Similar differences were noted in the test cohort (P < 0.001) (Supplementary Material 1).
Significant differences were also observed in baseline WSDS and BSDS between the two cohorts. The ΔHSDS ≥ 0.5 group in the derivation cohort had a significantly lower WSDS compared to the ΔHSDS < 0.5 group (P < 0.001), and similar differences were found in BSDS. Details on IGF-1 levels, bone age, and bone density distributions are presented in Supplementary Material 1, all showing significant differences.
The variables were selected based on their established relevance to growth hormone treatment response and their completeness in the dataset, including only those with a missing rate below 20% in the modeling process.
Model performance evaluation
Table 1 shows the performance of the six models in the derivation cohort, test cohort, and complete cases from the test cohort. Figure 2 shows the ROC curves in the test cohort. In the test cohort, the random forest model had the best performance with an AUROC of 0.9114 and an AUPRC of 0.8825. The logistic regression and MLP models also demonstrated good predictive capabilities, with AUROCs of 0.9012 and 0.9010, and AUPRCs of 0.8510 and 0.8358, respectively. Similar results were observed in complete cases from the test cohort, where the random forest model had the best performance. The AUROC values were similar, while the AUPRC values were compromised in the sensitivity analysis.
Table 1.
Model performance evaluation: AUROC and AUPRC
| Derivation cohort | Test cohort | Sensitivity analysis | ||||
|---|---|---|---|---|---|---|
| Method | AUROC | AUPRC | AUROC | AUPRC | AUROC | AUPRC |
| Logistic regression | 0.8374 | 0.7969 | 0.9012 | 0.8510 | 0.9000 | 0.8241 |
| Decision tree | 0.8510 | 0.8041 | 0.8642 | 0.8128 | 0.8445 | 0.7571 |
| Random forest | 0.9802 | 0.9776 | 0.9114 | 0.8825 | 0.9115 | 0.8616 |
| XGBoost | 1.0000 | 1.0000 | 0.8717 | 0.8496 | 0.8683 | 0.8250 |
| LightGBM | 1.0000 | 1.0000 | 0.8813 | 0.8587 | 0.8847 | 0.8460 |
| Multi-layer perceptron | 0.8442 | 0.8010 | 0.9010 | 0.8358 | 0.9017 | 0.7962 |
AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve
Fig. 2.
Model evaluation: ROC curves
The optimal cut-off points for the models are presented in Table 2. Utilizing these cut-off values, we proceeded to assess the classification accuracy of each model. The prediction accuracies are detailed in Table 3. Among the models, the MLP model had the best performance across various metrics in the test cohort, with an accuracy of 0.8468, precision of 0.8208, recall of 0.8286, and an F1 score of 0.8246. The logistic regression and random forest models followed closely, with accuracies of 0.8426 and 0.8340, respectively. Similar results were observed in complete cases from the test cohort.
Table 2.
Optimal cut-off values selected
| Method | Cut-off |
|---|---|
| Logistic regression | 0.4965 |
| Decision tree | 0.6774 |
| Random forest | 0.5156 |
| XGBoost | 0.3553 |
| LightGBM | 0.2050 |
| Multi-layer perceptron | 0.5631 |
Table 3.
Model performance evaluation: accuracy
| Method | Accuracy | Precision | Recall (Sensitivity) |
Specificity | F1 score |
|---|---|---|---|---|---|
| Derivation cohort | |||||
| Logistic regression | 0.7677 | 0.7637 | 0.7154 | 0.8121 | 0.7388 |
| Decision tree | 0.7931 | 0.8368 | 0.6285 | 0.8624 | 0.7178 |
| Random forest | 0.9165 | 0.9224 | 0.8933 | 0.9362 | 0.9076 |
| XGBoost | 0.9946 | 0.9883 | 1.0000 | 0.9899 | 0.9941 |
| LightGBM | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| Multi-layer perceptron | 0.7495 | 0.7578 | 0.6680 | 0.8188 | 0.7101 |
| Test cohort | |||||
| Logistic regression | 0.8426 | 0.8018 | 0.8476 | 0.8308 | 0.8241 |
| Decision tree | 0.7957 | 0.7957 | 0.7048 | 0.8154 | 0.7475 |
| Random forest | 0.8340 | 0.7876 | 0.8476 | 0.8154 | 0.8165 |
| XGBoost | 0.7787 | 0.6970 | 0.8762 | 0.6923 | 0.7764 |
| LightGBM | 0.8085 | 0.7438 | 0.8571 | 0.7615 | 0.7965 |
| Multi-layer perceptron | 0.8468 | 0.8208 | 0.8286 | 0.8538 | 0.8246 |
| Sensitivity analysis | |||||
| Logistic regression | 0.8314 | 0.7662 | 0.8429 | 0.8235 | 0.8027 |
| Decision tree | 0.7849 | 0.7460 | 0.6714 | 0.8235 | 0.7068 |
| Random forest | 0.8198 | 0.7468 | 0.8429 | 0.8039 | 0.7919 |
| XGBoost | 0.7674 | 0.6593 | 0.8571 | 0.6961 | 0.7453 |
| LightGBM | 0.7965 | 0.7073 | 0.8286 | 0.7647 | 0.7632 |
| Multi-layer perceptron | 0.8372 | 0.7763 | 0.8429 | 0.8333 | 0.8082 |
In summary, all machine learning models performed well in predicting the efficacy of growth hormone therapy, with the random forest and MLP models showing the best overall performance.
Feature selection and feature importance
For the logistic regression analysis, the Lasso method was used to select variables, which resulted in the inclusion of age, HSDS, BSDS, IGF-1, BA-CA, and MPR in the final model. The regression analysis showed that age (odds ratio [OR] = 0.77, P < 0.001), HSDS (OR = 0.471, P < 0.001), BSDS (OR = 1.401, P = 0.006), and BA-CA (OR = 0.709, P < 0.001) were significantly associated with the response to rhGH therapy (Table 4).
Table 4.
Summary table of the logistic regression model
| Feature | OR (95% CI) | P-value |
|---|---|---|
| (Intercept) | 0.936 (0.049, 15.728) | 0.964 |
| Sex | 0.742 (0.471, 1.165) | 0.195 |
| Age | 0.77 (0.681, 0.866) | < 0.001 |
| MPH SDS | 1.252 (0.874, 1.804) | 0.223 |
| Height SDS | 0.471 (0.338, 0.646) | < 0.001 |
| BMI SDS | 1.401 (1.103, 1.788) | 0.006 |
| IGF-1 | 1 (0.998, 1.002) | 0.824 |
| BA - CA | 0.709 (0.581, 0.863) | < 0.001 |
| Long-acting | 0.552 (0.195, 1.654) | 0.271 |
| MPR | 7.83 (0.692, 108.293) | 0.109 |
OR, odds ratio; CI, confidence interval; MPH, mid-parental height; SDS, standard deviation score; BMI, body mass index; IGF-1, insulin-like growth factor 1; BA, bone age; CA; chronological age; MPR, medication possession ratio
For the decision tree model (Fig. 3), important splitting variables included HSDS, BA-CA, and chronological age. HSDS ≥ -0.72 was the most critical node, followed by further subdivisions based on BA-CA and chronological age. This model classified treatment outcomes for different patient groups, highlighting the impact of baseline characteristics on growth hormone therapy outcomes.
Fig. 3.
Constructed Decision Tree Model. The top number in each node represents the percentage of patients responding positively to treatment; darker colors indicate a higher likelihood of effectiveness. The bottom number shows the proportion of patients meeting the node criteria. BA: bone age; CA: chronological age; SDS: standard deviation score
Feature importances for the random forest model are shown in Table 5. Age (importance = 32.92), BA-CA (importance = 29.24), and HSDS (importance = 29.12) were identified as the most significant. IGF-1, WSDS, and BSDS also demonstrated high importance, indicating their significant impact on predicting treatment outcomes.
Table 5.
Feature importance for the random forest model
| Feature | Importance |
|---|---|
| Age | 32.92 |
| BA - CA | 29.24 |
| Height SDS | 29.12 |
| IGF-1 | 21.19 |
| Weight SDS | 15.96 |
| BMI SDS | 11.57 |
| MPR | 11.42 |
| Initiating dose | 9.94 |
| MPH SDS | 9.88 |
| Sex | 6.07 |
| Long-acting | 2.35 |
BA, bone age; CA; chronological age; SDS, standard deviation score; IGF-1, insulin-like growth factor 1; BMI, body mass index; MPR, medication possession ratio; MPH, mid-parental height
Tables 6 and 7 show the feature importance in the XGBoost and LightGBM models, respectively. In the XGBoost model, chronological age, and BA-CA had the highest gain values, 0.2125 and 0.1806, respectively, indicating their substantial contribution to the model. In the LightGBM model, BA-CA and HSDS were the two most important variables, with gain values of 0.2005 and 0.1933, respectively.
Table 6.
Feature importance for the XGBoost model
| Feature | Gain | Cover | Frequency |
|---|---|---|---|
| Age | 0.2125 | 0.1943 | 0.1811 |
| BA - CA | 0.1806 | 0.1152 | 0.1050 |
| Height SDS | 0.1536 | 0.0870 | 0.0909 |
| IGF-1 | 0.0962 | 0.1184 | 0.1229 |
| Weight SDS | 0.0910 | 0.1139 | 0.1166 |
| Initiating dose | 0.0856 | 0.1071 | 0.1143 |
| BMI SDS | 0.0829 | 0.1206 | 0.1166 |
| MPH SDS | 0.0494 | 0.0616 | 0.0861 |
| MPR | 0.0403 | 0.0765 | 0.0531 |
| Sex | 0.0078 | 0.0054 | 0.0134 |
BA, bone age; CA; chronological age; SDS, standard deviation score; IGF-1, insulin-like growth factor 1; BMI, body mass index; MPH, mid-parental height; MPR, medication possession ratio
Table 7.
Feature importance for the LightGBM model
| Feature | Gain | Cover | Frequency |
|---|---|---|---|
| BA - CA | 0.2005 | 0.1294 | 0.1283 |
| Height SDS | 0.1933 | 0.0969 | 0.1067 |
| Age | 0.1809 | 0.1826 | 0.1608 |
| BMI SDS | 0.0927 | 0.1201 | 0.1180 |
| Initiating dose | 0.0902 | 0.1163 | 0.1366 |
| IGF-1 | 0.0902 | 0.1003 | 0.1241 |
| Weight SDS | 0.0678 | 0.1163 | 0.1025 |
| MPH SDS | 0.0487 | 0.0722 | 0.0798 |
| MPR | 0.0248 | 0.0597 | 0.0305 |
| Sex | 0.0108 | 0.0061 | 0.0126 |
BA, bone age; CA; chronological age; SDS, standard deviation score; BMI, body mass index; IGF-1, insulin-like growth factor 1; MPH, mid parental height; MPR, medication possession ratio
The importance of individual features varied across models, with chronological age, BA-CA, and HSDS being the most influential factors affecting treatment outcomes.
Discussion
This study developed and validated various machine learning models for predicting height changes in children with growth disorders receiving rhGH therapy. We conducted a detailed analysis of baseline characteristics and assessed the performance of different models in predicting treatment response [17–22]. The results demonstrated that machine learning models effectively predict height growth in these children based on baseline characteristics, with random forest and MLP models showing the best overall performance.
Despite the established efficacy of rhGH therapy, there is significant heterogeneity in treatment response among patients. Predicting treatment response is crucial for making informed clinical decisions. Firstly, predictive models provide a systematic approach to patient stratification, enabling clinicians to identify those more likely to respond positively to rhGH therapy. This capability can lead to the development of more personalized treatment plans. Secondly, by offering early predictions of treatment response, these models can guide decisions on the appropriate dosage of therapy. Consequently, there is an urgent need for a reliable and practical predictive model capable of assessing short-term responses to rhGH therapy. Such a model would facilitate optimized treatment decisions and help prevent the prolongation of unnecessary long-term treatments.
The development and comparison of six predictive models in our study offer a step towards meeting this need. These models, encompassing logistic regression, decision tree, random forest, XGBoost, LightGBM, and MLP, provide a spectrum of approaches from traditional to advanced machine learning techniques. Among the models compared, the random forest model achieved the highest AUROC (0.9114) and AUPRC (0.8825) scores, indicating its superior ability to capture variability in treatment response. Conversely, the MLP model excelled in terms of accuracy (0.8468), precision (0.8208), and F1 score (0.8246), highlighting its strength in balancing precision and recall. These findings suggest that multifactorial predictive models have significant potential for personalised treatment, particularly for patients with complex clinical profiles.
Key variables influencing the efficacy of rhGH therapy were identified through Lasso regression and feature importance analysis. Variables such as age, HSDS, BSDS, and the BA-CA were consistently important across multiple models, aligning with previous literature [23]. In the random forest model, age (importance = 32.92), BA-CA (importance = 29.24), and HSDS (importance = 29.12) were the most critical predictors. This is consistent with clinical experience: younger age, delayed bone age, and lower baseline height are associated with a more pronounced response to growth hormone therapy [7]. In addition, the split nodes of the decision tree model further supported the significance of these variables. HSDS ≥ -0.72 was the first major split point, highlighting the primary role of baseline height in predicting treatment response. Subsequent branches, such as BA-CA and age, further refined the classification of patient subgroups, illustrating the complex influence of individual characteristics on treatment outcomes.
Our results align with the literature but offer methodological innovations. Traditional predictive models often use linear regression or simple statistical methods [24–26], which, while providing some predictive capacity, are limited in handling complex nonlinear relationships and high-dimensional data. Our study uses machine learning models to offer a personalised predictive model that can help clinicians anticipate treatment response before initiating rhGH therapy. This model not only optimises treatment plans but also provides clearer expectations for parents and patients, potentially improving treatment adherence. In resource-limited settings or where economic burdens are significant, predictive models can help identify patients who are most likely to benefit from treatment, avoiding unnecessary delays or overtreatment. Furthermore, the model highlights the significant impact of clinical features such as BMI and bone age, suggesting that factors beyond traditional height and age assessments should be considered when evaluating growth hormone treatment efficacy.
When comparing different models, the decision tree model, although simple and interpretable, underperformed relative to more complex models such as random forest and MLP. The random forest model, by aggregating multiple decision trees, effectively reduces overfitting risk and excels in high-dimensional data settings, while the MLP model, as a deep learning approach, captures nonlinear relationships in the data, thereby enhancing predictive accuracy [18, 27]. Although XGBoost and LightGBM also showed good predictive performance [21, 22], they are limited by their model complexity and challenges in interpretation. Specifically, these models involve more complex hyperparameter tuning, which can complicate the modelling process. Furthermore, the ensemble nature of these algorithms can make them less transparent and harder to interpret compared to simpler models, such as logistic regression, where the relationship between predictors and outcomes is more straightforward. Despite their efficiency advantages in certain scenarios, XGBoost, and LightGBM did not outperform random forest and MLP in this study. In this study, although the random forest and MLP models demonstrated marginally better performance metrics compared to other models, these improvements over logistic regression were not substantial. This suggests that, despite the complexity of ensemble methods, the logistic regression model remains a competitive option due to its simplicity, interpretability, and efficiency. An additional benefit of logistic regression is its inherent ability to produce well-calibrated probabilities through direct modeling with a logistic function. In contrast, other models are not naturally calibrated and often require post-hoc calibration techniques to ensure accurate probability estimates. However, logistic regression does not automatically account for complex interactions or nonlinear relationships unless explicitly specified, whereas advanced models like random forest and MLP inherently capture such patterns. Thus, the choice of model should consider the specific nature of the data.
In the realm of predictive modeling for rhGH therapy, a critical challenge revolves around the balance between model complexity and practicality within clinical settings. Logistic regression stands out as a model that can be easily translated into user-friendly scorecards or nomograms, allowing for its use without reliance on computer or mobile applications. This feature renders it particularly advantageous in clinical environments where technological resources may be limited. In contrast, advanced machine learning techniques such as decision trees, random forests, XGBoost, LightGBM, and MLP may offer enhanced predictive capabilities but require computational resources for their operation. These models, while more complex, can be integrated into settings with robust technological infrastructure, where the potential for higher accuracy justifies the need for such resources. The suitability of each model, therefore, hinges on the specific context of use, with the choice informed by a consideration of both the available technology and the desired level of predictive performance. This highlights the importance of selecting models that not only perform well but are also aligned with the practical constraints and capabilities of the clinical environment in which they will be applied.
Despite these encouraging results, the study had several limitations. First, the use of retrospective data may introduce selection bias, which could affect the representativeness of our findings. Second, the dataset was relatively small, particularly in the test cohort, which may limit the generalizability of the model. Future research should consider multicentre collaborations to expand the dataset, improving model stability and applicability. Third, although we evaluated multiple baseline characteristics, some potential influencing factors, such as lifestyle and nutritional status, were not included and may impact treatment outcomes. Last, direct comparison of different published models should be considered.
Moving forward, the next phase of our work will focus on the external validation of our best-performing predictive models to ensure their generalizability and accuracy across different patient populations. Additionally, future studies could incorporate more biomarker data, such as genomic information and biochemical parameters, to further improve predictive accuracy. As data availability and computational power continue to grow, deep learning-based, personalised prediction models could become integral to growth hormone treatment decision-making. Further model optimisation and clinical validation will enhance the applicability of machine learning models in paediatric endocrinology.
Conclusion
This study demonstrates the potential of various machine learning models in predicting the response to rhGH therapy in children with growth disorders. Random forest and MLP models showed superior performance and identified key variables such as age, HSDS, BSDS, and bone age, which are important for guiding personalised treatment decisions. Future research should focus on further optimising these models and validating their clinical use to advance precision medicine in growth hormone therapy.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
We thank the doctors, nurses, laboratory staff, and study participants for the work that has been done, and we also thank all patients who were involved in this study.
Author contributions
F.Z. designed the study. F.Z., A.W., L.C., Y.X., and L.H. collected the data. X.L. and J.Z. conducted the data analysis and interpretation. F.Z. drafted the initial manuscript. A.W. and Y.Z. revised the manuscript for important intellectual content. All authors reviewed and approved the final manuscript.
Funding
This work was supported by a grant from the Wenzhou Municipal Science and Technology Bureau Project (No. Y2023002 to F.Z.)
Data availability
The data used in this study are available upon reasonable request and are subject to ethical restrictions concerning participant privacy. For access to the data, please contact the corresponding author.
Declarations
Ethics approval and consent to participate
This study adhered to the Declaration of Helsinki (2013 revision) and was approved by the Ethics Committees of Wenzhou People’s Hospital and Wenzhou Maternal and Child Health Hospital (approval number: KY-2023-236). Written informed consent was obtained from all participating adults and from legal guardians of minors prior to their inclusion in the study.
Consent for publication
All participants (or guardians of minors) consented to the publication of anonymized data in compliance with ethical and legal requirements for data privacy.
Competing interests
The authors declare no competing interests.
Clinical trial number
This study was registered on ClinicalTrials.gov (Identifier: PID-238207).
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Bruzzi P et al. Real-life long-term efficacy and safety of Recombinant human growth hormone therapy in children with short stature homeobox-containing deficiency. Endocr Connect, 2023. 12(7). [DOI] [PMC free article] [PubMed]
- 2.Sodero G, et al. Efficacy and safety of growth hormone therapy in children with Noonan syndrome. Growth Horm IGF Res. 2023;69–70:101532. [DOI] [PubMed] [Google Scholar]
- 3.Jung MH, et al. Efficacy and safety evaluation of human growth hormone therapy in patients with idiopathic short stature in Korea - A randomised controlled trial. Eur Endocrinol. 2020;16(1):54–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Maghnie M, et al. Safety and efficacy of pediatric growth hormone therapy: results from the full KIGS cohort. J Clin Endocrinol Metab. 2022;107(12):3287–301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sodero G et al. Efficacy and safety of growth hormone (GH) therapy in patients with SHOX gene variants. Child (Basel), 2025. 12(3). [DOI] [PMC free article] [PubMed]
- 6.Chen Jiajia CBGC. Chinese expert consensus on the diagnosis and treatment of idiopathic short stature in children. Chin J Practical Pediatr, 2023. 11(38).
- 7.Ji Y, Ma B. Effect analysis of Recombinant human growth hormone therapy in short stature children. Panminerva Med. 2023;65(3):432–3. [DOI] [PubMed] [Google Scholar]
- 8.Yang T, et al. Effect of different doses of Recombinant human growth hormone therapy on children with growth hormone deficiency: a retrospective observational study. Eur Rev Med Pharmacol Sci. 2023;27(13):6162–9. [DOI] [PubMed] [Google Scholar]
- 9.Lim HH, et al. Growth responses during 3 years of growth hormone treatment in children and adolescents with growth hormone deficiency: comparison between idiopathic, organic and isolated growth hormone deficiency, and multiple pituitary hormone deficiency. J Korean Med Sci. 2022;37(11):e90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bai X, et al. Magnetic resonance imaging of knees: a novel approach to predict Recombinant human growth hormone therapy response in short-stature children in late puberty. World J Pediatr. 2024;20(7):723–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Greener JG, et al. A guide to machine learning for biologists. Nat Rev Mol Cell Biol. 2022;23(1):40–55. [DOI] [PubMed] [Google Scholar]
- 12.Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. 2023;388(13):1201–8. [DOI] [PubMed] [Google Scholar]
- 13.Bang P, et al. A comparison of different definitions of growth response in short prepubertal children treated with growth hormone. Horm Res Paediatr. 2011;75(5):335–45. [DOI] [PubMed] [Google Scholar]
- 14.Li H, et al. [Height and weight standardized growth charts for Chinese children and adolescents aged 0 to 18 years]. Zhonghua Er Ke Za Zhi. 2009;47(7):487–92. [PubMed] [Google Scholar]
- 15.Li H, et al. [Body mass index growth curves for Chinese children and adolescents aged 0 to 18 years]. Zhonghua Er Ke Za Zhi. 2009;47(7):493–8. [PubMed] [Google Scholar]
- 16.Fluss R, Faraggi D, Reiser B. Estimation of the Youden index and its associated cutoff point. Biom J. 2005;47(4):458–72. [DOI] [PubMed] [Google Scholar]
- 17.Sa R, et al. Random forest for predicting treatment response to radioiodine and Thyrotropin suppression therapy in patients with differentiated thyroid Cancer but without structural disease. Oncologist. 2024;29(1):e68–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Riina N, et al. Using Multi-Layer perceptron driven diagnosis to compare biomarkers for primary open angle Glaucoma. Invest Ophthalmol Vis Sci. 2024;65(11):16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Yao Y, Zhang S, Xue T. Integrating LASSO feature selection and soft voting classifier to identify origins of replication sites. Curr Genomics. 2022;23(2):83–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.de Gonzalo-Calvo D, et al. Circulating miR-133a-3p defines a low-risk subphenotype in patients with heart failure and central sleep apnea: a decision tree machine learning approach. J Transl Med. 2023;21(1):742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li B, et al. Using machine learning (XGBoost) to predict outcomes after infrainguinal bypass for peripheral artery disease. Ann Surg. 2024;279(4):705–13. [DOI] [PubMed] [Google Scholar]
- 22.Shavit Y, Ferens R, Keller Y. Coarse-to-Fine Multi-Scene pose regression with Transformers. IEEE Trans Pattern Anal Mach Intell. 2023;45(12):14222–33. [DOI] [PubMed] [Google Scholar]
- 23.Yang X et al. Multi-modal features-based human-herpesvirus protein-protein interaction prediction by using LightGBM. Brief Bioinform, 2024. 25(2). [DOI] [PMC free article] [PubMed]
- 24.Feng YD, et al. Development and validation of a nomogram to predict poor short-term response to Recombinant human growth hormone treatment in children with growth disorders. J Endocrinol Invest. 2023;46(7):1343–59. [DOI] [PubMed] [Google Scholar]
- 25.Cho WK, et al. Predicting First-Year growth in response to growth hormone treatment in prepubertal Korean children with idiopathic growth hormone deficiency: analysis of data from the LG growth study database. J Korean Med Sci. 2020;35(19):e151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Pozzobon G, et al. Growth hormone therapy in children: predictive factors and short-term and long-term response criteria. Endocrine. 2019;66(3):614–21. [DOI] [PubMed] [Google Scholar]
- 27.Bouferoua F, et al. Predictive factors of catch-up growth in term, small for gestational age infants: a two-year prospective observational study in Algeria. J Pediatr Endocrinol Metab. 2023;36(9):842–50. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data used in this study are available upon reasonable request and are subject to ethical restrictions concerning participant privacy. For access to the data, please contact the corresponding author.



