Skip to main content
Springer logoLink to Springer
. 2025 Nov 26;49(1):170. doi: 10.1007/s10916-025-02306-9

Automated Bone Age Assessment and Adult Height Prediction from Pediatric Hand Radiographs via a Cascaded Deep Learning Framework

Nihui Pei 1, Yijiang Zhuang 1, Zhe Su 2, Fangjing Wang 3, Yansong Liu 3, Xianglei Li 2, Huiping Su 2, Hongwu Zeng 1,
PMCID: PMC12657579  PMID: 41296212

Abstract

Abstract

Bone age assessment and adult height prediction are essential for evaluating pediatric growth. Traditional methods rely on manual radiographic interpretation, which is subjective, time-consuming, and prone to inter-observer variability. This study presents an automated approach using a cascaded deep learning model to assess bone age and predict adult height from pediatric hand radiographs, aiming to improve diagnostic objectivity and efficiency. A total of 8,242 left-hand radiographs from Chinese children were retrospectively collected. Bone age was annotated by experienced pediatric endocrinologists using the China-05 standard. The model employed Yolact for instance segmentation to detect and classify bone structures, followed by parallel ResNet-18 subnetworks to grade ossification centers in the radius, ulna, and metacarpal/phalangeal bones. Predicted grades were integrated using a standardized scoring system to estimate bone age. A regression model then predicted adult height based on these features. The model achieved a Pearson correlation of 0.98 (Inline graphic) for bone age and 0.94 (Inline graphic) for adult height predictions. Bland-Altman analysis showed minimal bias and narrow limits of agreement. Mean absolute errors were 0.25 years for bone age and 1.75 cm for adult height. Average inference time was 7.8 seconds, significantly enhancing clinical efficiency. The proposed cascaded deep learning model delivers accurate, efficient, and reliable bone age assessment and adult height prediction, offering strong potential for clinical integration in pediatric growth evaluation.

Graphical abstract

graphic file with name 10916_2025_2306_Figa_HTML.jpg

Keywords: Pediatric growth and development, Bone age, Adult height prediction, Deep learning

Research Highlights

  1. Innovative Dual-Model Architecture This study innovatively employs a dual-model architecture that integrates a YOLACT-based instance segmentation model with a multi-level ResNet101 classifier, achieving intelligent bone age assessment and precise adult height prediction in children.

  2. Full Age Range Dataset Construction We pioneered the collection of hand anteroposterior X-ray images and clinical data from children aged 3–18 years, establishing the first dataset covering the full age range of Chinese children’s height development characteristics.

  3. Clinically Validated High Accuracy Results demonstrate statistically significant correlations and consistency in bone age assessment and adult height prediction for the medium age group (6–12 years), achieving clinically validated high accuracy.

  4. High-Efficiency Intelligent Analysis The intelligent analysis system significantly improved processing efficiency, reducing the time per case from 10±1.2 minutes (manual) to 7±0.8 seconds (AI), thereby optimizing diagnostic workflows.

  5. Clinical Application Value The AI-driven bone age and height prediction system achieves clinically acceptable accuracy, demonstrating substantial value in supporting clinical decision-making for pediatric growth monitoring.

Introduction

Issues related to children’s growth and development have garnered increasing attention in society, particularly those manifesting as short stature or a slow growth rate [1]. This is mainly reflected in the fact that the child’s bone age and puberty time lag behind their peers [2]. Bone age, as a key indicator reflecting the maturity of skeletal development, provides an accurate and objective measure of an individual’s growth status and serves as a predictive tool for future growth potential [3, 4]. In clinical practice, hand X-rays are used to assess the maturation rate of bones and calculate bone age, which helps doctors quickly understand the developmental status of the bones [5]. It is of great significance for the scientific evaluation of children’s growth and development and the formulation of treatment strategies for children with growth disorders [68].

Currently, the most commonly used methods for bone age assessment worldwide are the Greulich-Pyle (GP) atlas method and the Tanner-Whitehouse (TW) scoring system [9, 10]. Although these methods are widely used, they are not suitable for the developmental trends of contemporary Chinese children [11].In the early 21st century, Zhang et al.developed the China-05 method based on the TW3 scoring principle, specifically tailored to Chinese pediatric populations. This method refines the classification of ossification center development, offering enhanced suitability for clinical use in China.The core of the "China-05" standard is the RUS-CHN method. Its name represents the skeletal areas assessed: Radius, Ulna, and all phalangeal bones (S), with CHN indicating its localization for China. The full assessment process relies on a hand X-ray. A doctor analyzes the bone ossification centers, assigns developmental scores using the RUS-CHN scoring system, and then calculates the bone age from these scores. Based on the difference between bone age and chronological age, children are categorized into three groups: typically developing, advanced, or delayed. The average height for each age group within these categories is then calculated and divided by the average adult height for that developmental type to determine the percentage of adult height. This method, using RUS-CHN bone age, can accurately predict the adult height of males aged 3-16 and females aged 3-15 [12, 13]. Nevertheless, bone age assessment still requires evaluating each ossification center individually, which is both time-consuming and labor-intensive. Furthermore, the process is influenced by the clinician’s level of experience, leading to potential subjectivity in interpretation [14].

With the development of artificial intelligence technology, deep learning technology has been introduced into the medical field in many ways [15]. The feasibility and effectiveness of deep learning models based on convolutional neural networks for bone age assessment have also been widely confirmed [16, 17]. For example, the 2017 Pediatric Bone Age Challenge organized by the Radiological Society of North America (RSNA) [18, 19]. However, the majority of models were based on a single classification network structure, which limits their robustness and generalizability. In order to enhance the precision of bone age assessment, the objective is to utilise deep learning technology to formulate a sophisticated diagnostic instrument that facilitates graded quantification of the ossification centres in the radius, ulna, and metacarpal bones, whilst concurrently predicting adult height during the bone age evaluation process.

The objective of this study is to employ deep learning algorithms to construct a cascaded network model for both bone age assessment and adult height prediction, thereby enhancing accuracy and reducing subjectivity, while also verifying its feasibility for clinical practice.

Materials and Methods

Data Collection and Partitioning

This study is a retrospective investigation, approved by the Institutional Review Board of Shenzhen Children’s Hospital. A total of 8,242 left hand radiographs X-ray images were collected from the radiology department of Shenzhen Children’s Hospital between July 2020 and September 2024. Among these, 4,807 were from males and 3,435 were from females, with the specific data distribution presented in Table 1. The inclusion criteria were children aged 3 to 16 years. Exclusion criteria included hand fractures, interference from foreign objects, and X-ray images of poor quality.Subsequently, a group of pediatric healthcare and endocrinology experts, each with more than 10 years of clinical experience, annotated the ossification center stages as well as the masks–defined as pixel-level regions of interest (ROIs) covering the radius, ulna, and metacarpal bones–in the anteroposterior X-ray images of 8,242 children according to the China-05 method, using the Pair annotation software, to obtain the reference bone ages. These reference bone ages were used to train and validate the deep learning model. The 8,242 X-ray images were randomly divided into training and test sets in an 8:2 ratio, with age as the stratifying factor. Finally, 6,594 images were used to train the deep learning model, and 1,648 images were used to validate the method’s effectiveness.

Table 1.

Data Distribution of Children’s Actual Age and Gender Based on Inclusion Criteria

Age Male Female Male Average Height Female Average Height Children Total
3 85 63 84 82 148
4 234 155 91 91 389
5 345 241 103 102 586
6 528 445 115 114 973
7 614 504 126 125 1118
8 600 597 134 134 1197
9 670 539 142 142 1209
10 661 446 149 148 1107
11 581 263 153 150 844
12 275 100 155 154 375
13 116 44 162 162 160
14 60 18 166 164 78
15 21 15 170 165 36
16 17 5 172 168 22
Children Total 4807 3435 8242

Methods

Figure 1 illustrates the algorithmic workflow for bone age assessment and adult height prediction, which primarily comprises three stages: preprocessing (standardizing image data), deep neural network processing (object detection and classification), and postprocessing (quantifying the model outputs). The detailed implementation of each stage is described as follows.

Fig. 1.

Fig. 1

The overall workflow of bone age assessment and adult height prediction is illustrated. (A) represents data preprocessing. (B) depicts the structure and details of the bone age assessment and adult height prediction model. (C) demonstrates the visualization of metacarpal bone segmentation results and the percentile curves of bone age and height.

Preprocessing

Due to inconsistencies in focal distance during the acquisition of anteroposterior hand X-rays, the foreground size often varies. To ensure that the hand is centered within the image and to eliminate irrelevant background information, we applied a corner detection method to extract the hand and corresponding mask regions. As illustrated in Figure 1(a), the process begins with binarization of the raw image to isolate the mask region. Next, corner points along the image edges are identified, and their coordinates are used to calculate the minimum enclosing rectangle, thereby extracting the region of interest corresponding to the hand.

To improve the robustness of the model, data augmentation was applied to the training dataset. Each image was randomly processed with transformations including rotation within the range of Inline graphic, scaling with a factor in the range [0.9, 1.1], as well as contrast enhancement and the addition of Gaussian noise. Finally, all images were resized to a uniform size using edge padding, and pixel values were normalized to the range [0, 1]for model training.

Figure 2 (4y) The maximum epiphyseal diameter of the radius is half the width of its shaft. A clear epiphysis is visible on the ulna.(10y) The epiphysis of the radius covers one side of the shaft. The ulnar epiphysis is as wide as its shaft.(14y) The epiphysis of the radius covers both sides of the shaft. The ulnar epiphysis has fused with approximately half of the shaft. From the magnified view in the image, it is observed that both sides of the epiphysis of the proximal phalanx of the first finger have covered the shaft.

Fig. 2.

Fig. 2

(4y) indicates a chronological age of 4 years.(10y) indicates a chronological age of 10 years.(14y) indicates a chronological age of 14 years.

X-ray Acquisition Technique and Quality Criteria. Positioning, The patient is positioned erect or seated adjacent to the detector. The hand under examination is pronated and fully extended, fingers equally separated and apposed to the cassette. The head of the third metacarpal is aligned to the centre of the field of view. For bilateral acquisitions both hands are placed symmetrically in identical pronation; a supine or upright posture is selected according to patient cooperation. Field limitation and beam centre. The collimation field encompasses the distal 2 cm of the radius and ulna and the soft-tissue tips of the distal phalanges. Single-hand acquisition: the central ray is directed perpendicular to the head of the third metacarpal. Simultaneous bilateral acquisition: the beam is centred perpendicular to the midpoint of an imaginary line connecting the heads of both third metacarpals.

Expert Panel Composition and Annotation Quality Control. The expert panel consists of 12 clinicians: six junior physicians with 3 years of clinical experience, three mid-level physicians with 5 years of clinical experience, and three senior physicians with over 10 years of clinical experience. They are organized into three collaborative teams, each with four members: two junior physicians, one mid-level physician, and one senior physician. This tiered structure ensures a streamlined workflow. The workflow follows a strict quality control system with three steps: Step 1: Two junior physicians independently perform the initial segmentation and annotation of the image data and classify the ossification center levels. Step 2: The mid-level physician conducts a quality check of the annotated results, verifying segmentation accuracy and level consistency. They provide feedback for corrections to ensure the data quality meets the intermediate standard. Step 3: The senior physician performs the final review, auditing the quality-checked data. They resolve any conflicting annotations, standardize the criteria, and create a unified dataset suitable for model training.

Cascaded Network Model

We developed a cascaded network model based on convolutional neural networks, integrating instance segmentation and image classification algorithms to automatically localize the radius, ulna, metacarpals, and finger bone, segment regions of interest, and grade each ossification center.The overall structure of the model is illustrated in Figure 1(b). On the left, the Yolact instance segmentation framework is used [20], ResNet-101 is used as the backbone network for feature extraction [21], and the PrototypeNetwork and PredicateHead decoding modules are used to achieve skeleton segmentation and category prediction [22]. On the right side, parallel ResNet-18 networks [23] are designed for the ulna, radius, and metacarpals to classify the ossification stage of each bone category.

The specific workflow is as follows: the input image is first processed for region of interest detection and mask generation, resulting in a bounding box (Bbox) for each target bone. These Bboxes are then used to crop the corresponding ROIs from the input image through spatial coordinate transformation, which are subsequently fed into the next stage of the model. To ensure the effectiveness of the cropped regions, an adaptive scaling strategy is applied to the Bboxes based on the size of the detected bone contours within the generated masks. Next, a set of parallel classification networks independently classifies the ossification stage of each bone. Finally, the model outputs are quantified to match the corresponding maturity levels defined by the China-05 standard [24].

Yolact network was trained using the Adam optimizer with an initial learning rate of Inline graphic and a batch size of 16. The training process ran for 300 epochs, with early stopping applied based on the validation loss to prevent overfitting. All input images were resized to Inline graphic pixels and normalized.The ResNet classification model was also trained using the Adam optimizer with an initial learning rate of Inline graphic and a batch size of 8 for 100 epochs. Input images were cropped or padded from the ROIs generated by Yolact to a uniform size of Inline graphic pixels. The detailed training configuration is summarized in Table 2.

Table 2.

Training Configuration of Yolact and ResNet Models

Model Optimizer Learning Rate Batch Size Epochs
Yolact Adam Inline graphic 16 300
ResNet Adam Inline graphic 8 100

Post-processing

To ensure the clinical applicability of the model outputs, we employed the China-05 standard for wrist bone age assessment in Chinese children, along with the corresponding standardized growth charts, to derive bone age estimates and predict adult height [25]. The post-processing pipeline consists of three main steps: bone age value mapping, chronological age consistency adjustment, and height prediction.

First, for each ossification center classification predicted by the network, we constructed a mapping table to convert the predicted ossification grade into bone age scores according to the China-05 standard. Each bone’s classification level is encoded using its corresponding bone age score, and gender-specific scoring templates are applied. The detailed calculation is as follows:

graphic file with name d33e713.gif 1

Where i represents the type of bone, j denotes the ossification stage corresponding to the i-th bone, n indicates the total number of bones assessed, W represents the weighting coefficient for each bone, and Inline graphic corresponds to the bone development score derived from the RUS-CHN scoring system in the China-05 standard. The term b is a constant used for sex adjustment, where g serves as the baseline, with 0 representing female and 1 representing male.

In addition, to enhance the robustness of the model, we introduce an age-consistency correction mechanism to address occasional abnormal predictions. Specifically, we first construct a reference mapping set based on the *China-05 Standard*, which contains the expected grade ranges of different ossification centers at various ages. After the model generates predictions, the predicted grade of each bone is compared with the reference set, and bones showing a significant deviation from the actual age are marked as candidates for correction. Subsequently, guided by clinicians’ prior knowledge, we replace the abnormal predictions with the median grade derived from anatomically and developmentally related neighboring bones. Finally, the corrected bone grades are used to recalculate the bone age, ensuring that the overall prediction remains consistent with both the individual’s chronological age and local anatomical development.

First, we determine whether correction is needed:

graphic file with name d33e748.gif 2

where Inline graphic denotes the predicted grade of the i-th bone for gender g, Inline graphic represents the expected grade of the i-th bone at chronological age A in the reference set, and Inline graphic is the threshold for significant deviation. If the inequality holds, the bone is marked as a correction target.

For the marked bones, their corrected score is obtained by replacing the original prediction with the median grade of the neighboring bones:

graphic file with name d33e780.gif 3

where Inline graphic denotes the corrected score of the i-th bone, and Inline graphic represents the set of neighboring bones with similar developmental stages and features (defined according to clinical expertise).

Finally, the corrected bone grades are used to recalculate bone age:

graphic file with name d33e798.gif 4

where n is the total number of bones involved in the assessment, Inline graphic denotes the weight coefficient of the i-th bone, b is a gender correction constant, Inline graphic indicates female, and Inline graphic indicates male.

Finally, based on the quantified bone age values and considering the subject’s gender, age, and current height, we predict the adult height. We use a polynomial fitting model to model the standard growth curve and perform interpolation for height prediction under continuous bone age values, ensuring the smoothness and accuracy of the height assessment. The specific implementation process is as follows:

graphic file with name d33e827.gif 5

Where h represents the current height, a is the predicted bone age, Inline graphic is the corresponding percentage table of bone age and adult height from the RUS-CHN method in the China-05 standard, and Inline graphic represents the difference between the predicted bone age and the actual age.

Statistical analysis

To evaluate the differentiation and clinical consistency of the proposed cascade network model in bone age assessment, we conducted statistical analysis on the discrepancies between the model’s output and expert evaluations. All statistical analyses were performed in a Python 3.12 environment using analysis libraries such as NumPy and Matplotlib.All statistical tests were considered statistically significant at Inline graphic.

Specifically, the Pearson correlation coefficient and Bland-Altman method were used to statistically analyze the model output results and expert annotations to evaluate the correlation and consistency between the model prediction results and manual evaluation. Secondly, box plots were used to analyze the intra-group error distribution of the subjects to verify the generalization ability of the model. Finally, a comparative experiment was conducted with manual annotation to verify the accuracy and practicality of the model.

Results

This study included a total of 1,648 left-hand X-ray images as the test set, with all samples containing expert-annotated bone age and adult height predictions, as well as concurrent height measurement data. To ensure robustness, all experiments were independently repeated five times with different random seeds, and the mean results were reported.

As shown in Figure 3 A and C, the predicted bone age and adult height exhibited strong and statistically significant positive correlations with the expert annotations, with Pearson correlation coefficients of Inline graphic and Inline graphic, respectively. Approximately 90% of the scatter points were distributed close to the fitted regression lines, indicating small overall prediction errors and good model fit. Notably, the regression line for height predictions closely aligned with the identity line Inline graphic, demonstrating the models stable predictive performance across different age groups.

Fig. 3.

Fig. 3

Correlation and Consistency Analysis of Bone Age and Height Predictions. (A-C) Scatter plots show the correlation between the model’s predictions and expert annotations for bone age (A) and height (C). The solid line represents the linear regression fit, while the dashed line represents the identity line (y=x). (B-D) Bland-Altman plots assess the consistency between the model’s predictions and expert annotations for bone age (B) and height (D). The central line represents the mean bias, and the dashed lines represent the 95% limits of agreement (mean ± 1.96 standard deviations).

Meanwhile, Bland-Altman analysis was performed to visualize the paired differences between the model predictions and expert annotations, aiming to assess the level of agreement and identify any systematic bias. As shown in Fig. 2B, the mean difference in bone age between the model and expert assessments was 0.19 years, indicating no proportional bias. The 95% limits of agreement were Inline graphic years, suggesting a high level of consistency. Figure 2D illustrates the agreement analysis for height prediction, where the mean difference was -0.11 cm, with 95% limits of agreement of Inline graphic cm. The vast majority of data points fell within these intervals, indicating that the discrepancies in height prediction were within an acceptable range. Overall, the model’s predictions for both bone age and height demonstrated strong agreement with expert references. Despite the presence of a few outliers, most data points were located within the 95% confidence limits, and the mean differences approached zero, indicating good consistency without significant systematic bias.

To further evaluate the generalizability of the model, the test set was stratified by age groups (3-6 years, 6-8 years, 9-11 years, and 12-16 years) and gender. The distribution of prediction errors between the model outputs and expert annotations was analyzed accordingly. As shown in Figure 4A, bone age prediction errors in the younger age groups tended to have median values greater than zero, indicating a mild overestimation by the model in early childhood. However, with increasing age, the prediction errors gradually converged toward zero, reflecting improved stability and consistency. In addition, gender-related differences were minimal, suggesting the model is relatively robust to gender factors in bone age assessment. Figure 4B illustrates the distribution of height prediction errors across different age groups and genders. Notably, although greater variability was observed in the 3-6 years group, the errors in the 6-12 years range were generally centered around zero, with the most accurate predictions observed in the 9-11 years group.

Fig. 4.

Fig. 4

Error Analysis of Bone Age and Height Predictions Across Different Age Groups and Genders. Box plots of prediction errors (model minus expert prediction) for bone age (A) and height (B), stratified by age group and gender. Green and orange boxes represent males and females, respectively.

To further evaluate the generalizability of the proposed model, we conducted experiments on the 2017 RSNA public dataset and compared the results with several state-of-the-art bone age prediction models, including DenseNet, BAE-ViT, and an attention-based multi-scale network. As shown in Table 5, our cascaded Yolact + ResNet model achieved a mean absolute error MAE of 0.28 years, outperforming DenseNet, BAE-ViT, and the attention-based multi-scale network. These results indicate that the proposed model not only maintains high prediction accuracy on unseen data but also demonstrates superior generalization capability compared to existing single-network methods (Table 3).

Table 5.

Comparison of Predictive Performance Between the Model and Radiologists

Method Boneage Height Time(case)
MAE(years) RMSE(years) MAE(cm) RMSE(cm)
Model 0.25 0.38 1.75 2.56 7.8s
Radiologist 0.51 0.64 3.75 4.82 11.2m

Table 3.

Comparison of Bone Age Prediction Models on the 2017 RSNA Dataset

Method MAE (years)
Ours 0.28
DenseNet 0.38
BAE-ViT 0.383
Attention-based Multi-scale Net 0.293

To further validate the superiority of the proposed cascaded architecture, we conducted comparative experiments against previously reported methods. As shown in Table 4, Han et al. achieved a mean absolute error (MAE) of 0.42 years using a spatial transformer network combined with a 5ResNet architecture. Suh et al. obtained an MAE of 0.48 years by extracting the wrist and specific fingers with ROI-based preprocessing followed by parallel VGG16 networks. Other CNN-based approaches reported an MAE of 0.56 years. In contrast, our model achieved a substantially lower MAE of 0.25 years, highlighting its improved accuracy. These results demonstrate that integrating Yolact for precise ossification center localization with ResNet for skeletal maturity classification is effective across diverse pediatric age groups.

Table 4.

Comparison of Bone Age Assessment Performance with Existing Methods

Method MAE (years)
Ours 0.25
ref[26] 0.42
ref[29] 0.48
ref[27] 0.56

Finally, to comprehensively evaluate the clinical applicability of the proposed model, a random sample of 100 hand radiographs X-ray images was selected from the test set. The model’s predictions were compared with assessments made by radiologists with over three years of experience, as shown in Table 5. The model achieved a mean absolute error (MAE) of 0.25 years and a root mean square error (RMSE) of 0.38 years in bone age assessment, demonstrating performance comparable to that of experienced radiologists. In adult height prediction, the model outperformed radiologists, with a 53.3% reduction in error, indicating superior predictive accuracy. Moreover, the model exhibited significantly higher efficiency, with an average processing time of 7.8 seconds per case, compared to more than 10 minutes for manual evaluation, suggesting that the model can substantially improve workflow efficiency without compromising accuracy.

Discussion

In this study, we proposed a cascaded deep learning network for bone age assessment and adult height prediction, based on the China-05 grading system and standard growth charts. The model also enables visualization of bone segmentation masks and growth curves. Compared with traditional manual assessment methods, the proposed approach significantly improves both reproducible and consistent. Experimental results demonstrate that the model exhibits strong generalizability on the dataset from Shenzhen Children’s Hospital, with prediction accuracy notably surpassing that of expert radiologists. Furthermore, the model achieved substantially faster evaluation times compared to manual assessment. These findings suggest that the proposed method offers an effective alternative to conventional bone age evaluation, with the potential to alleviate clinical workload and enhance diagnostic efficiency.

With the rapid advancement of deep learning technologies, automated bone age assessment has emerged as a highly active area of research. Existing studies have demonstrated that convolutional neural network (CNN)-based approaches are effective for bone age estimation using X-ray imaging [26, 27]. For instance, Jeong et al. developed a deep learning network based on the Greulich-Pyle method for automated bone age assessment [28]. Yaxin Han et al. applied a spatial Transformer network to localize five regions of interest on hand radiographs X-ray images and employed a 5Inline graphicResNet architecture for classification-based training [29]. However, Most existing methods rely on single-network models trained on full-hand images, leading to unstable performance across complex age ranges. While these models perform well in narrow age groups with minor skeletal differences, their generalization ability declines as the age range and skeletal variations increase. Furthermore, using the entire hand introduces redundant background information that can interfere with feature learning. In contrast, our cascaded model (Yolact and ResNet) significantly outperforms these approaches in terms of Mean Absolute Error (MAE). Our model achieved an MAE of 0.25 years, which is superior to the 0.45 years from the Spatial Transformer Networks-based method and the 0.51 years from the image preprocessing-based method. This clearly demonstrates the superiority of our approach in bone age assessment. [16, 30].To address these limitations, Junghwan Suh et al. implemented image preprocessing techniques to extract the wrist and the first, third, and fifth fingers, then used parallel VGG16 networks followed by a regression layer to estimate bone age, and employed standard growth charts for adult height prediction [31]. In contrast, our study introduces a cascaded architecture combining Yolact and ResNet, enabling precise object detection of ossification centers and refined classification of skeletal maturity levels. This structure enhances the model ability to handle diverse pediatric age groups effectively.Finally, by integrating the China-05 standard growth charts into the regression module, we established a fully automated pipeline from image recognition to clinical index prediction, encompassing both bone age and adult height estimation. Our approach addresses the limitations of traditional deep learning models in terms of adaptability to data heterogeneity, and demonstrates superior performance in accuracy, robustness, and clinical feasibility, highlighting its broad potential for real-world application in pediatric radiology.

Despite the promising results achieved in bone age assessment and adult height prediction, this study has several limitations. First, the dataset used in this research was collected exclusively from Shenzhen Children’s Hospital, which introduces a degree of geographic and single-center bias. This may limit the models generalizability to populations from other regions or different ethnic backgrounds [32]. Given the potential variations in nutritional status and growth environments across regions, future work should incorporate multicenter and cross-regional datasets to enhance sample diversity and improve the models robustness and generalizability.Second, the current dataset exhibits an imbalanced age distribution, with relatively fewer samples in the early childhood (3-5 years) and late adolescence (12-16 years) groups. This imbalance may affect the models stability and prediction performance in these age ranges, potentially limiting its discriminative power in clinical applications. Therefore, future studies should aim to achieve a more balanced sample distribution across age groups to ensure reliable performance and broader applicability in real-world pediatric practice.

The cascaded network model proposed in this study demonstrates significant clinical potential by enabling not only automated bone age assessment but also accurate prediction of adult height based on skeletal maturity. In routine clinical practice, bone age evaluation traditionally relies on atlas-based comparison and subjective judgment to determine the degree of ossification, a process that is time-consuming and prone to observer variability. By contrast, our deep learning-based approach provides a fast, objective, and accurate solution for bone age and height prediction, offering a reliable foundation for evaluating pediatric growth and development.Moreover, the model supports quantitative visualization of its outputs, including clear segmentation maps of metacarpal and phalangeal bones, as well as percentile curves of age and height, thereby enhancing clinical interpretability and improving communication with patients and caregivers. These visual tools contribute to better physician-parent dialogue and shared decision-making. With continued technological advancement, this model has the potential to be implemented as a practical diagnostic aid in pediatric endocrinology and radiology, supporting large-scale screening programs and enabling early interventions in child growth disorders [33].

While our deep learning-based approach has achieved notable progress in bone age assessment and adult height prediction using hand radiographs X-ray images, there remains considerable scope for improvement and extension. In future research, efforts should be made to collect more diverse datasets encompassing different regions, ethnicities, and age groups to further enhance the models generalizability and cross-population robustness [34].Additionally, incorporating structural features of bones–such as the length and width of tubular bones–into the model as supplementary inputs could further improve its discriminative power. Moreover, integrating longitudinal follow-up data would allow for dynamic modeling of ossification center changes over time. This would support the development of temporal learning frameworks capable of capturing the intricate relationship between ossification progression and height growth trajectories, thereby improving the accuracy, reliability, and clinical relevance of adult height prediction.

Conclusion

In this study, we developed a cascaded deep learning model for automated bone age assessment and adult height prediction based on hand radiographs X-ray images, aiming to evaluate pediatric growth and development. Experimental results demonstrated strong correlation and consistency between model predictions and expert assessments. This novel application highlights the potential of deep learning in pediatric growth evaluation, offering more accurate and intelligent support for clinical decision-making.

Acknowledgements

The authors would like to sincerely thank the staff of the Department of Radiology, Shenzhen Children’s Hospital, the Department of Endocrinology, Shenzhen Children’s Hospital, and ZhenData Intelligent Technology Co., Ltd, Shenzhen for their dedicated support and assistance in the process of data collection and processing for this project. We also thank the Shenzhen Children’s Hospital Ethics Committee Review for their thorough review and approval of this study. Special thanks are extended to Hongwu Zeng and Zhe Su for their invaluable theoretical guidance throughout the study and their critical work in the language refinement and final review of the manuscript.

Authors’ Contributions

Nihui Pei: Attending Radiologist working in the Radiology Department, specializing in the field of musculoskeletal disease imaging diagnosis, responsible for the collection and quality control of imaging data for this project, as well as the overall coordination and management of the entire project. Yijiang Zhuang: Attending Radiologist, working in the Radiology Department, proficient in various imaging processing software, responsible for the collection of imaging data for this project, and providing technical support in software engineering and algorithm development for the project. Zhe Su: Chief Physician, Doctoral Supervisor, working in the Endocrinology Department, with extensive clinical experience and numerous scientific achievements in the field of children’s growth and physical development, providing theoretical guidance for this project. Fangjing Wang: Overseeing medical imaging data processing and algorithm development, responsible for managing the project’s progress, quality, and coordination of personnel organization. Yansong Liu: The research focus is on medical image preprocessing and the development of algorithms related to medical imaging. Responsible for the segmentation of metacarpal, phalangeal, and carpal bones in X-ray images for this project, as well as the measurement of metacarpal and phalangeal bone lengths. Xianglei Li: Associate Chief Nurse, working in the Endocrinology Department, responsible for bone age assessment and height prediction in this project. Huiping Su: Attending Physician, working in the Endocrinology Department, with extensive clinical research experience in the fields of bone age assessment and height prediction, responsible for the collection of clinical data for this project. Hongwu Zeng: Radiologist, Professor, Chief, Doctoral Supervisor, with extensive clinical experience and numerous scientific achievements in the fields of children’s neurodevelopment and physical development, providing theoretical guidance for this project.

Funding

This work was supported by Guangdong High-level Hospital Construction Fund and Sanming Project of Medicine in Shenzhen (No.SZSM202011005) and the Shenzhen Municipal Science and Technology Plan Project. (No.JCYJ20230807093815031). This study was supported by Guangdong High-level Hospital Construction Fund and Sanming Project of Medicine in Shenzhen (No.SZSM202011005) and the Shenzhen Municipal Science and Technology Plan Project. (No.JCYJ20230807093815031). Apart from this, no other commercial or non-commercial organizations provided financial support for this research.

Data Availability

No datasets were generated or analysed during the current study.

Declarations

Ethical Approval and Consent to Participate

Shenzhen Children’s Hospital Ethics Committee Review (Scientific Research) Opinion No. 202501702. The study utilized retrospective, de-identified, and anonymized imaging data, and the committee granted a waiver of individual informed consent.

Consent for Publication

The manuscript does not contain any personally identifiable features, images, or information. All authors have read and approved the final version of this manuscript for publication in the journal.

Competing Interests

The authors declare no competing interests.

Clinical Trial Number

This study is an investigator-initiated, observational clinical study, designed as a retrospective cohort study. The study has been filed with the relevant institutional body under the Filing Number: MR-44-25-051722. The implementing institution for the project is Shenzhen Children’s Hospital, and the practicing registration authority is the Health Commission of Shenzhen Municipality, Guangdong Province.

Respected Editors and Readers

In this study, to ensure the fairness and transparency of the research, our research team hereby declares that there are no conflicts of interest that could affect the impartiality or objectivity of the manuscript during its preparation and submission.

Interest Relationships

There are no special interest relationships between any of the authors and the funding institutions that could influence the interpretation of the research results or the writing of the paper. All authors conducted their research independently, ensuring the objectivity and fairness of the study. All authors and related institutions confirm that there are no conflicts of interest regarding the publication of this paper. We guarantee that the contents of this statement are true and accurate. If any falsehood is found, we are willing to bear the corresponding responsibilities. We will strictly adhere to academic ethical standards to ensure the objectivity and reliability of the research results. Should you have any questions, please feel free to contact us. Thank you very much.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Nelson, C.A., Sullivan, E., Engelstad, A.-M.: Annual research review: Early intervention viewed through the lens of developmental neuroscience. Journal of Child Psychology and Psychiatry 65(4), 435–455 (2024) 10.1111/jcpp.13858 [DOI] [PubMed]
  • 2.Cao, L., et al: Hand skeletal features of children and adolescents with different growth statuses and periods. Quant Imaging Med Surg 14(3), 2528–2538 (2024) 10.21037/qims-23-26 [DOI] [PMC free article] [PubMed]
  • 3.Lee, M., et al: Retrospective clinical trial to evaluate the effectiveness of a new tanner–whitehouse-based bone age assessment algorithm trained with a deep neural network system. Diagnostics 15(8), 8 (2025) 10.3390/diagnostics15080993 [DOI] [PMC free article] [PubMed]
  • 4.Nivedita, Solanki, S.: Enhancing the accuracy of automatic bone age estimation using optimized cnn model on x-ray images. In: Khurana, M., Thakur, A., Kantha, P., Shieh, C.-S., Shukla, R.K. (eds.) Machine Learning Algorithms, pp. 329–340. Springer, Cham (2025). 10.1007/978-3-031-75861-4_29
  • 5.Leeuwen, K.G., et al: Comparison of commercial ai software performance for radiograph lung nodule detection and bone age prediction. Radiology 310(1), 230981 (2024) 10.1148/radiol.230981 [DOI] [PubMed]
  • 6.Kim, J.K., Park, D., Chang, M.C.: Assessment of bone age based on hand radiographs using regression-based multi-modal deep learning. Life 14(6), 6 (2024) 10.3390/life14060774 [DOI] [PMC free article] [PubMed]
  • 7.Alrawi, R.M.S., Basheer, N.M.: Pediatric radiology: An analysis of ai-powered bone age determination methods. NTU Journal of Engineering and Technology 4(1), 1 (2025) 10.56286/a82tjh48
  • 8.Alzubaidi, L., et al: Comprehensive review of deep learning in orthopaedics: Applications, challenges, trustworthiness, and fusion. Artificial Intelligence in Medicine 155, 102935 (2024) 10.1016/j.artmed.2024.102935 [DOI] [PubMed]
  • 9.Pape, J., Hirsch, F.W., Deffaa, O.J., DiFranco, M.D., Rosolowski, M., Gräfe, D.: Applicability and robustness of an artificial intelligence-based assessment for greulich and pyle bone age in a german cohort. RöFo 196, 600–606 (2023) 10.1055/a-2203-2997 [DOI] [PubMed]
  • 10.Martín Pérez, S.E., Martín Pérez, I.M., Molina Suárez, R., Vega González, J.M., García Hernández, A.M.: The validation of the tanner–whitehouse 3 method for radiological bone assessments in a pediatric population from the canary islands. Osteology 5(1), 1 (2025) 10.3390/osteology5010006 [DOI] [PMC free article] [PubMed]
  • 11.Liang, Y., et al: Validation of an ai-powered automated x-ray bone age analyzer in chinese children and adolescents: A comparison with the tanner–whitehouse 3 method. Adv Ther 41(9), 3664–3677 (2024) 10.1007/s12325-024-02944-4 [DOI] [PubMed]
  • 12.Jung, H.W., et al: Comparison of adult height prediction using bone age and body composition for growth assessment in korean children. Sci Rep 15(1), 10581 (2025) 10.1038/s41598-025-94685-9 [DOI] [PMC free article] [PubMed]
  • 13.Tanwar, V.: Transfer learning models for automated bone age prediction: Evaluating efficiency and clinical applicability. In: 2024 4th International Conference on Technological Advancements in Computational Sciences (ICTACS), pp. 185–190 (2024). 10.1109/ICTACS62700.2024.10840640
  • 14.Hamd, Z.Y., et al: Deep learning-based automated bone age estimation for saudi patients on hand radiograph images: a retrospective study. BMC Med Imaging 24(1), 199 (2024) 10.1186/s12880-024-01378-2 [DOI] [PMC free article] [PubMed]
  • 15.Lei, L., Qile, P., Guang, C., Zhipeng, L.: Consistency between artificial intelligence and expert greulich-pyle atlas method for bone age assessment. Chinese Journal of Tissue Engineering Research 28(28), 4436 (2024) 10.12307/2024.466
  • 16.Yuan, W., Fan, P., Zhang, L., Pan, W., Zhang, L.: Bone age assessment using various medical imaging techniques enhanced by artificial intelligence. Diagnostics 15(3), 3 (2025) 10.3390/diagnostics15030257 [DOI] [PMC free article] [PubMed]
  • 17.Wang, S., et al: A pediatric bone age assessment method for hand bone x-ray images based on dual-path network. Neural Comput and Applic 36(17), 9737–9752 (2024) 10.1007/s00521-023-09098-4
  • 18.Rassmann, S., et al: Deeplasia: deep learning for bone age assessment validated on skeletal dysplasias. Pediatr Radiol 54(1), 82–95 (2024) 10.1007/s00247-023-05789-1 [DOI] [PMC free article] [PubMed]
  • 19.Xiao, J., Huang, Y., Guan, J., Ma, S., Zhang, D.: Boncc: A lightweight tw3-based bone age assessment coordinate classification model. In: 2024 International Joint Conference on Neural Networks (IJCNN), pp. 1–6 (2024). 10.1109/IJCNN60899.2024.10650982
  • 20.Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9157–9166 (2019). 10.1109/ICCV.2019.00926
  • 21.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). 10.1109/CVPR.2016.90
  • 22.Bu, J., et al: A multi-element identification system based on deep learning for the visual field of percutaneous endoscopic spine surgery. JOIO 58(5), 587–597 (2024) 10.1007/s43465-024-01134-2 [DOI] [PMC free article] [PubMed]
  • 23.Wang, R., Zhou, X., Liu, Y., Liu, D., Lu, Y., Su, M.: Identification of the surface cracks of concrete based on resnet-18 depth residual network. Applied Sciences 14(8), 8 (2024) 10.3390/app14083142
  • 24.Devi, S., Thopalli, K., Dayana, R., Malarvezhi, P., Thiagarajan, J.J.: Improving object detectors by exploiting bounding boxes for augmentation design. IEEE Access 11, 108356–108364 (2023) 10.1109/ACCESS.2023.3320638
  • 25.Sun, Y., Guo, X., Zhou, X., Shi, C., Yang, H., Pan, H.: Application status and research progress of imaging assessment of skeletal maturity in adolescents and children. xhyxzz 15(6), 1456–1462 (2024) 10.12290/xhyxzz.2024-0217
  • 26.Alzyoud, J.A.M., Rababah, E., Almuhaisen, M.H.O., Al-Qtaitat, A.I.: Bone age determination of epiphyseal fusion at knee joint and its correlation with chronological age. Medicina 60(5), 5 (2024) 10.3390/medicina60050779 [DOI] [PMC free article] [PubMed]
  • 27.Gräfe, D., Beeskow, A.B., Pfäffle, R., Rosolowski, M., Chung, T.S., DiFranco, M.D.: Automated bone age assessment in a german pediatric cohort: agreement between an artificial intelligence software and the manual greulich and pyle method. Eur Radiol 34(7), 4407–4413 (2024) 10.1007/s00330-023-10543-0 [DOI] [PMC free article] [PubMed]
  • 28.Suh, J., et al: Bone age estimation and prediction of final adult height using deep learning. Yonsei Medical Journal 64(11), 679–686 (2023) 10.3349/ymj.2023.0244 [DOI] [PMC free article] [PubMed]
  • 29.Kim, J.R., et al: Computerized bone age estimation using deep learning based program: Evaluation of the accuracy and efficiency. American Journal of Roentgenology 209(6), 1374–1380 (2017) 10.2214/AJR.17.18224 [DOI] [PubMed]
  • 30.Pape, J., Rosolowski, M., Pfäffle, R., Beeskow, A.B., Gräfe, D.: A critical comparative study of the performance of three ai-assisted programs for bone age determination. Eur Radiol 35(3), 1190–1196 (2025) 10.1007/s00330-024-11169-6 [DOI] [PMC free article] [PubMed]
  • 31.Han, Y., Wang, G.: Skeletal bone age prediction based on a deep residual network with spatial transformer. Computer Methods and Programs in Biomedicine 197, 105754 (2020) 10.1016/j.cmpb.2020.105754 [DOI] [PubMed]
  • 32.Gitto, S., et al: Ai applications in musculoskeletal imaging: a narrative review. Eur Radiol Exp 8(1), 22 (2024) 10.1186/s41747-024-00422-8 [DOI] [PMC free article] [PubMed]
  • 33.Athar, M.: Potentials of artificial intelligence in familial hypercholesterolemia: Advances in screening, diagnosis, and risk stratification for early intervention and treatment. International Journal of Cardiology 412, 132315 (2024) 10.1016/j.ijcard.2024.132315 [DOI] [PubMed]
  • 34.Qiao, Y., Lv, P., Hong, K., Zhao, Y., Feng, Q., Zhang, C.: Use of the ultrasound bone maturity indexes to assess whether children have reached their final height. Ultrasound in Medicine and Biology 51(5), 903–908 (2025) 10.1016/j.ultrasmedbio.2025.02.004 [DOI] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No datasets were generated or analysed during the current study.


Articles from Journal of Medical Systems are provided here courtesy of Springer

RESOURCES