Abstract
Bone Age Assessment is essential for monitoring physiologic and abnormal growth of children and adolescents, allowing the identification of skeletal maturation. Traditional methods, such as Greulich and Pyle, are widely used but have limitations due to the variability among evaluators, a consequence of the inherent subjectivity of visual analysis. This paper presents a comprehensive end-to-end AI-based mobile prototype as a solution that combines methodological innovation with practical application. To achieve this, we developed a new deep learning architecture that integrates VGG16, Feature Pyramid Network, a convolutional block attention module, and metadata to enhance model performance. The model was trained using a novel dataset curated exclusively from healthy individuals within the Brazilian population, ensuring data reliability through a rigorous selection process. Subsequently, the model is embedded in a fully functional mobile application, enabling real-time and standardized bone age estimation. The system automatically implements the Greulich and Pyle method, demonstrating the potential of the proposed AI-based mobile app prototype to substantially reduce diagnostic variability. Experimental results show a mean absolute difference of 8.7 months overall and 7.2 months for individuals aged 6 to 18 years, highlighting the potential of our approach to enhance accuracy, efficiency, and reproducibility in clinical practice.
Keywords: Bone age assessment, Radiology, Deep learning, Artificial intelligence, Mobile application
Subject terms: Computational biology and bioinformatics, Health care, Mathematics and computing, Medical research
Introduction
Bone age assessment (BAA) is crucial for pediatric medicine, endocrinology, and forensic science to monitor patients’ health1–4. Unlike chronological age, which is determined by birth date, bone age reflects skeletal development and can reveal abnormal growth patterns as well as predict final adult height. The two most widely used BAA methods are Greulich-Pyle (GP)5 and Tanner-Whitehouse (TW)6. The GP method compares hand and wrist radiographs to a reference atlas to identify the best-matching representation, offering a global approach. In contrast, the TW method employs a more detailed scoring system that assesses individual bones. Both approaches are practical, but GP is more commonly adopted in clinical practice due to its simplicity and ease of use compared to the labor-intensive TW system.
The GP method is prone to subjectivity and variability, as it relies heavily on the radiologist’s expertise and subjective interpretations7,8. Studies with clinical specialists have shown that the estimations between different professional scans vary by up to 9.84 months for the same cases under analysis9. Additionally, the manual atlas comparison is time-consuming and inconsistent10. These issues highlight the need for automated, objective methods to improve BAA precision and efficiency11. While commercial software aims to address these challenges, its performance is often limited by image quality and poor generalization10,12,13.
Over the past two decades, Artificial Intelligence (AI) has significantly impacted healthcare14–16, including Bone Age Assessment (BAA)17–19. Early AI BAA studies used traditional machine learning models11,17, but major advancements came with convolutional neural networks (CNNs)18,20, such as VGG21,22, ResNET10,23,24, Inception10,25, and MobileNets21. Frameworks like Faster-RCNN and YOLO improved data preprocessing for BAA25–27, with models trained on private datasets achieving mean absolute differences (MAD) between 9.48 and 6.43 months. The Radiological Society of North America (RSNA)28 advanced AI-based BAA by launching a competition and releasing a large pediatric hand radiograph dataset. Recent studies, using the RSNA dataset, have achieved MAD ranging from 9.99 to 3.99 months, as shown in Table 3.
Table 3.
Comparison of results with other methods across different datasets.
| Dataset | Source | MAD (months) |
|---|---|---|
| RSNA2017 | Gao et al.43 | 9.99 |
| Zulkifley et al.44 | 7.70 | |
| Wu et al.45 | 7.38 | |
| Human Expert46 | 7.32 | |
| Steenkiste et al.47 | 6.80 | |
| Li et al.48 | 6.20 | |
| Ren et al.49 | 5.20 | |
| Iglovikov et al.27 | 4.97 | |
| Koitka et al.26 | 4.56 | |
| Chen et al.50 | 4.40 | |
| Su et al.51 | 4.23 | |
| Wang et al.10 | 4.17 | |
| Liu et al.52 | 3.99 | |
| Private | Zhou et al.20 | 9.48 |
| Spampinato et al.18 | 9.12 | |
| Kim et al.53 | 8.20 | |
| Serpa et al.29 | 7.68 | |
| Our approach | 7.22 | |
| Mutasa et al.42 | 6.43 |
Note: adapted Table10. Significant values are in bold.
Despite advances in AI-driven BAA, manual assessment remains the standard in clinical practice. This is primarily due to concerns within the medical community regarding the development and reliability of current AI tools. A major issue is the lack of transparency in the datasets used. Often, it is unclear whether only healthy individuals were strictly included, raising concerns about potential biases if non-healthy patients were part of the dataset. Furthermore, model generalization remains a significant challenge, as most datasets fail to reflect global variations in skeletal development. Consequently, models trained on datasets such as the RSNA’s often underperform when applied to other populations, such as the Brazilian cohort29. This underscores the growing need for population-specific models to ensure more accurate and equitable assessments.
In this work, we present a significant contribution to BAA by developing a novel dataset tailored explicitly to the Brazilian population - an underrepresented group in existing research. To ensure the reliability and validity of our results, the dataset was curated through a rigorous selection process that exclusively included healthy individuals, setting our methodology apart from prior studies that often lack strict inclusion criteria. Building on this foundation, we contributed by designing different deep learning architectures to find the most suitable one for our problem. Crucially, our work extends beyond model development by delivering a comprehensive, end-to-end prototype: we integrate our best-performing model into a fully functional mobile application prototype, enabling real-time, user-friendly deployment in clinical environments. This comprehensive solution represents a substantial advancement in BAA by reducing the subjectivity of traditional methods and minimizing inter-evaluator variability. While our approach introduces significant methodological innovations and offers clear benefits in diagnostic accuracy, efficiency, and reproducibility, it remains a proof of concept that paves the way for clinical adoption, but requires further validation and refinement before full implementation.
Methods
This section describes the materials and methods used in this study. We first present the mobile app prototype developed to integrate the AI models and support clinicians in bone age assessment. Next, we outline the dataset and labeling procedures, followed by the data augmentation strategies applied. Finally, we detail the model architectures, along with the training and validation methodologies adopted for BAA.
Mobile app prototype
The application architecture is composed of three primary components: Frontend, Backend, and machine learning service (ML Service).
The Frontend serves as the interface through which users interact with the system. It is designed to be intuitive and responsive, with careful attention to user needs and workflow efficiency. The interface was developed in close collaboration with domain specialists to ensure its alignment with real-world requirements. The Flutter framework with Dart programming language was used to develop it.
The Backend forms the core of the application, managing communication between the user interface, the database, and the ML Service. It is responsible for handling user requests and coordinating data flow throughout the system. The Backend utilizes SQLAlchemy to interact with a PostgreSQL database, which provides reliable and scalable storage for structured data. This architecture ensures that both user information and the results of bone age analyses are handled securely and efficiently.
The ML Service is an independent module responsible for running the bone age estimation model. It does not interact with the database, maintaining clear functional separation. Both the Backend and the ML Service are developed in Python with FastAPI and run within Docker containers, ensuring standardized environments for predictable execution and simplifying deployment and scaling.
The diagram in Fig. 1 illustrates the interaction between the Backend and the ML Service during the systems’s operation. When the ML Service starts, it automatically loads the most recent version of the trained machine learning model from a secure online storage location (referred as bucket). Only the ML Service has permission to access this location. Once the model is loaded and ready for inference, the service remains available to handle incoming analysis requests.
Fig. 1.
Medical system prototype for bone age estimation.
The bone age estimation process begins when a user uploads a hand and wrist X-ray image to the system (step 1). This action requires authentication and a patient record to be already registered. The image is first received by the Backend (step 2), which saves it in a secure online storage space and records a reference to it in the database (step 3). The Backend then forwards the image to the ML Service for analysis (step 4).
In the ML Service, once both the ML model and image are available, image processing commences (step 5). This may include pre-processing tasks such as normalization (e.g. image resizing and format standardization), followed by inference using the trained model. The resulting bone age estimation is returned to the backend (step 6) and is also stored in the database (step 7), from which it can be retrieved and viewed by the user. In the current prototype implementation, the final result is not returned synchronously. Instead, users retrieve the estimation on demand via a separate request to the system, which then fetches the result from the database.
Data
The dataset is original and was developed by the pediatric endocrinology unit of Irmandade da Santa Casa de Misericórdia de São Paulo, approved by its Research Ethics Committee (Comitê de Ética em Pesquisa – CEP) under CAAE number 39288520.5.0000.5479. All procedures were conducted in accordance with relevant guidelines and regulations, with informed consent obtained from all participants and their legal guardians.
Data curation was conducted to include only healthy individuals, with careful selection based on clinical records, parental surveys, and on-site clinical evaluations. The data curation criteria were as follows: First, clinical history was reviewed, including gestational age, delivery conditions, birth weight, and height. Individuals born prematurely (before 37 weeks of gestation) or those classified as small for gestational age (below the 3rd percentile) were excluded30. Second, a parental survey identified individuals with chronic diseases or long-term medication use, such as glucocorticoids or steroids, which could affect growth; these individuals were also excluded. Third, on-site measurements of height and weight were obtained. Individuals with a BMI outside the normal range were excluded to prevent confounding effects on growth. Fourth, individuals whose chronological age differed from their bone age (according to the Greulich-Pyle method) by more than 2 standard deviations were excluded, as this could indicate abnormal development. Finally, individuals with bone dysplasia or other structural abnormalities were excluded based on hand and wrist radiograph analysis. This comprehensive selection process ensured that the dataset accurately reflects a healthy pediatric population, providing a reliable foundation for the model’s training and subsequent bone age estimation.
The dataset is organized as a data frame, with attributes including patient ID, sex, chronological age, GP age, and corresponding X-ray images. Each patient ID is uniquely associated with a single image. In total, the dataset contains 434 X-ray images from 423 patients, with varying image resolutions. The dataset is composed of 48% male participants, with chronological ages ranging from 6 to 17 years and GP ages ranging from 4 to 18 years, as shown in Fig. 2a and b.
Fig. 2.
Frequency distribution of the data by age, showing the percentage of males in each range.
The dataset was prepared for two distinct tasks: (1) detection and classification of Regions of Interest (ROIs), and (2) BAA. For the detection and classification task, ROIs were annotated as labels using the RectLabel software. Each label corresponds to an Oriented Bounding Box (OBB) that can be part of one of 19 classes. Eighteen classes correspond to one of the ossification regions of the hand and wrist, and one class is used to delimit the hand and wrist region globally - as shown in Fig. 3.
Fig. 3.
Labeled images (a-d) with the 19 different OBBs distinguished by color.
For training BAA models, images went through a standardization process to have all vertically oriented, rescaled, and padded to fit the shape of
. To obtain a more reliable estimate of the model’s generalization ability compared to a single train-test split, we implemented the k-fold cross-validation method31. The dataset was divided into multiple folds to ensure robust evaluation. Specifically, the data was sorted by GP labels and distributed sequentially across five folds, with each fold containing an approximately equal representation of patients. The mathematical expression for this distribution, where data are allocated in multiples of 5, can be written as:
![]() |
1 |
where, i represents the index of a patient in the dataset, and k denotes the fold index, which can take values from the set
. The term
belongs to the set
. Thus, each fold
consists of patients whose indices
satisfy the condition
, where
iterates over all multiples of 5 within the range
.
An augmentation pipeline was developed to enhance the dataset by generating modified versions of the original images. To prevent cross-contamination between folds, augmentation was performed separately within each fold, ensuring that augmented images from one fold remained exclusive to that fold. The pipeline incorporates various image processing techniques, including zoom, horizontal and vertical flips, rotation, and brightness adjustments. These operations are applied stochastically, meaning each transformation is randomly selected with its parameters drawn from predefined ranges, allowing for diverse and controlled variations of the original images. Figure 4 represents a consolidated flowchart of the entire process, from data acquisition to image preparation, implemented while preparing data.
Fig. 4.
Consolidated flowchart of the acquisition and preparation of the dataset.
As a result of the data preparation process, we generated two distinct types of labels and corresponding strategies to train models for different objectives, as detailed in the Deep learning models subsection.
Deep learning models
The experimental process employed several deep learning architectures, which can be categorized into the ROIs detection and classification model, and the BAA model. The ROIs detection and classification model identifies 18 critical hand ossification areas commonly used in clinical practice to estimate bone age, along with a consolidated “hand + wrist“ class. Specifically, one class is dedicated to detecting the entire hand and wrist, while others focus on individual bones, including the ulna, radius, and each finger’s distal, middle, and proximal phalanges, along with the metacarpus, as previously shown in Fig. 3. The model’s output is used to crop the ROIs from the image, followed by histogram equalization, reshaping, and padding. These processed images are then used as input to the bone age estimation model, which, alongside additional patient information, predicts the bone’s age.
You Only Look Once (YOLO) was selected to perform ROIs detection and classification. YOLO is a real-time object detection model that formulates detection as a single regression problem, directly predicting bounding box coordinates and class probabilities from full images in a single network evaluation32,33. It is widely recognized for its computational efficiency and high inference speed, making it suitable for time-sensitive applications. YOLOv8 was selected for its availability of pretrained models, which facilitate transfer learning, as well as its demonstrated accuracy, efficiency, and ability to detect and classify objects using OBB. This is made possible by its native support for rotation-aware object localization33,34.
For BAA task, Visual Geometry Group 16-layer net (VGG16) and ResNet50 were selected due to the possibility of leveraging pre-trained models provided by TensorFlow 2.18.035 and Keras 3.18.036. These pretrained Convolutional Neural Networks (CNN) have learned key hierarchical features from the ImageNet dataset and can be efficiently transferred and fine-tuned for specialized tasks like BAA. VGG1622 is known for its relatively simple architecture employing repeated stacks of 3x3 convolutional layers, and provides strong baseline for feature extraction. Its consistent architecture makes it computationally efficient, while its deep layers capture increasingly complex features. It has shown in literature strong performance in medical imaging tasks, converges effectively with conservative training schedules, standard data augmentation, requires less computational power, it is well-suited for fine-tuning on datasets with limited samples37. In contrast, ResNet50’s deep residual connections enable the training of very deep networks and address issues such as vanishing gradients, making it ideal for more complex tasks that demand deeper architectures24.
Following the initial comparative analysis, VGG16 was selected as the backbone of our models due to its superior baseline performance as a feature extractor compared to ResNet50. To address the complexity of BAA, several architectural modifications were implemented on the VGG16 backbone:
Meta-information Fusion: Sex information was integrated into the model, reflecting its clinical relevance in bone maturation. As sex-specific differences in bone age development are well-documented, we assigned values of 100 for females and 200 for males, enabling the model to distinguish between the sexes and improve prediction accuracy.
Feature Pyramid Network (FPN): We incorporated FPN38–40, a multi-scale feature processing architecture, to enhance the model’s ability to capture both fine-grained local structures (e.g. individual bones) and broader anatomical patterns (e.g. overall skeletal growth). This modification improved the model’s representation of both localized and global image features, crucial for BAA.
Convolutional Block Attention Module (CBAM): To refine feature extraction, we added CBAM41, a lightweight attention mechanism that recalibrates both channel-wise and spatial-wise feature responses. This allowed the model to focus on the most relevant features while reducing the impact of noise or irrelevant information.
Parallel ROIs Branch: A parallel branch was introduced specifically to process ROIs. This branch focused on clinically significant regions, ensuring the model prioritized key anatomical areas for bone age estimation, while the main VGG16 branch processed the entire image. The outputs of both branches were concatenated and passed through dense layers for final bone age prediction.
The study was conducted systematically, maintaining consistent data and parameters across all experiments. Each model, as shown in Fig. 5, consisted of a single output neuron with a linear activation function to estimate bone age in years. The Adam optimizer was used with a learning rate of
for adjusting only the dense layers, and
for fine-tuning both convolutional and dense layers. To evaluate and optimize the models during training, we used the Mean Absolute Difference (MAD) and the standard deviation, with MAD calculated as:
![]() |
2 |
where
and
are the true and predicted bone ages, respectively, and N is the number of samples in the dataset.
Fig. 5.
Experimental architectures explored using VGG16’s backbone: (a) Meta-information Fusion; (b) Meta-information fusion and FPN; (c) Meta-information fusion and CBAM; (d) Meta-information fusion, FPN, and CBAM; (e) Meta-information fusion, FPN, and parallel ROIs branch.
Results
ROIs detection and classification model
The YOLO training process yielded a robust and consistent method for identifying regions of interest. As shown in Fig. 6, the confusion matrix and the F1-confidence curve indicate that the model achieved a high overall confidence level of 99%. The confusion matrix reveals minor misclassifications among the “phalange_proximal_II” and “phalange_proximal_IV” regions across the five fingers, which were not classified as background. Additionally, some inconsistencies were observed where background regions were mistakenly classified as anatomical structures. To address this issue, we incorporated a post-processing step that filters predictions based on inference confidence, ensuring that only the most confident detection is retained for each class. This strategy improves the reliability of region selection and reduces the impact of false positives.
Fig. 6.
YOLO performance.
Another noteworthy aspect, especially when compared to other approaches in the literature that rely on ROIs detection methods, is the effectiveness of YOLOv8 in accurately orienting bounding boxes. This orientation capability allowed for clearer separation between the ROIs and other parts of the hand, resulting in more precise localization of the relevant anatomical structures for age estimation. In tasks such as ours, using non-oriented bounding boxes often leads to the inclusion of irrelevant or redundant hand regions, due to the fixed coordinate-based approach required to encompass the entire target area. By contrast, the orientation-aware bounding boxes generated by YOLOv8 help minimize noise and improve the quality of the data used in the estimation process.
Bone age assessment models
The initial experiments focused on comparing the performance of ResNet50 and VGG16 to establish a baseline for further development. ResNet50 achieved a MAD of 10.6 months and a loss of 1.1 months, while VGG16 outperformed it with a MAD of 9.7 months and a loss of 0.89 months. Due to these results, VGG16 was selected as the backbone and modified. The outcomes of these experiments are summarized in Table 1.
Table 1.
Table presenting the average performance and standard deviation of the models trained using the k-fold cross-validation method.
| Experiment | Metric | Average (months) | Std (months) |
|---|---|---|---|
| VGG | MAD | 15.16 | 0.41 |
| Loss | 2.05 | 0.24 | |
| VGG+SEX | MAD | 9.63 | 0.45 |
| Loss | 1.18 | 0.27 | |
| VGG+SEX+FPN | MAD | 8.94 | 0.39 |
| Loss | 1.03 | 0.24 | |
| VGG+SEX+CBAM | MAD | 8.99 | 0.29 |
| Loss | 1.13 | 0.32 | |
| VGG+SEX+CBAM+FPN | MAD | 8.84 | 0.52 |
| Loss | 1.02 | 0.18 | |
| VGG+SEX+FPN+ROIs | MAD | 8.94 | 0.33 |
| Loss | 1.08 | 0.15 |
Significant values are in bold.
The architecture that achieved the lowest MAD in the k-fold cross-validation was based on VGG, enhanced with the inclusion of sex as an input feature, along with the integration of CBAM and FPN modules. The best-performing model achieved a MAD of 8.42 months. Figure 7 and Table 2 present the performance of the five trained models, highlighting a noticeable increase in prediction variability within the 4 to 6-year bone age range.
Fig. 7.
Performance of the bone age estimation model.
Table 2.
MAD per range of age.
| Age range (years) | Male (months) | Female (months) | Consolidated (months) |
|---|---|---|---|
| 4–5 | 22.54 | 4.66 | 13.60 |
| 5–6 | 12.26 | 10.77 | 11.99 |
| 6–7 | 10.29 | 7.34 | 8.52 |
| 7–8 | 7.25 | 6.19 | 6.62 |
| 8–9 | 9.25 | 6.32 | 7.67 |
| 9–10 | 7.44 | nan | 7.44 |
| 10–11 | 7.89 | 8.04 | 7.97 |
| 11–12 | 7.40 | 6.70 | 7.05 |
| 12–13 | 10.30 | 8.13 | 8.69 |
| 13–14 | 8.28 | 6.88 | 7.68 |
| 14–15 | 7.97 | 6.95 | 7.64 |
| 15–16 | 5.71 | 7.68 | 6.80 |
| 16–17 | 5.58 | 8.85 | 8.17 |
| 17–18 | 5.48 | 5.98 | 5.74 |
App system
The system was developed to support user needs in managing patient records and X-ray examinations, while also enabling clinical use of the trained bone age estimation model. When executed locally, the average processing time of the model is approximately 0.7 seconds. With the model integrated into the system and deep learning services deployed on a cloud server, the end-to-end processing time for estimating bone age from an image, including pre-processing, averages around 4.34 seconds. It is worth noting that processing time may vary by approximately one second depending on image quality, with higher-quality images (e.g. those with less noise, greater sharpness, and optimal brightness) requiring less time to process.
Discussion
This study presented the development of a comprehensive AI-assisted system prototype for BAA. The system integrates a custom dataset, a deep learning experimental pipeline, and an accessible mobile-based prototype tailored specifically for BAA. The backbone of the system is a YOLOv8-based region detection model combined with various CNN architectures to estimate bone age from hand and wrist X-rays.
YOLOv8 demonstrated strong performance in detecting and classifying ROIs. By delimiting the “hand + wrist” region, the system effectively removes irrelevant areas such as written text and noise, ensuring that only the key anatomical regions are considered. This process not only enhances the consistency of the analysis but also standardizes the portion of the image corresponding to the hand and wrist within the
input image, impacting the quality of the image used in the BAA models.
The ablation study conducted on the proposed BAA models revealed that incorporating sex information positively impacted the model’s performance, confirming the value of meta-information fusion. The addition of Feature Pyramid Network and Convolutional Block Attention Module slightly improved performance, but any contribution to healthcare devices is valuable, given the clinical significance of bone age assessment. Interestingly, the parallel ROIs branch did not reduce the MAD as initially expected. This outcome may be due to the limited role of the ROI branch in extracting relevant image features, possibly stemming from insufficient activation or relevance of this branch in influencing the final prediction.
Among the models evaluated, the best-performing BAA model was a VGG16-based architecture enhanced with meta-information fusion, FPN, and CBAM, achieving an overall MAD of 8.84 months for ages between 4-18 years old. Due to data imbalance and the number of samples (as shown in Fig. 2) in younger age groups (4-6 years), the overall MAD is increased by 1.62 months. Given the high level of uncertainty in this range of ages, the use of our method is considered more appropriate for the 6 to 18-year age range, where it achieves a Mean Absolute Difference of 7.2 months.
To contextualize these results, we compared our method against a range of existing methodologies in the literature, as shown in Table 3. This comparison highlights studies using both the RSNA dataset and those based on authorial datasets, which often focus on specific ethnic populations and smaller sample sizes. Although cross-study comparisons are inherently difficult due to differing training and testing conditions, our method consistently demonstrates performance that is either comparable to or superior to studies using private datasets. For instance, Mutasa et al.42 report slightly better performance using a private dataset, likely due to the smaller size of our dataset. Nevertheless, the fact that we achieved comparable results despite a smaller dataset and imbalances in age and sex distribution underscores the efficiency, robustness, and competitive nature of our approach.
In future work, while the model’s current performance on the existing dataset is promising, enhancing its robustness and applicability will require additional data collection. A key strategy is to target a more balanced acquisition of samples across different age ranges to address existing age- and sex-related imbalances, particularly to extend the model’s capabilities to children under 6 years old. However, acquiring data from younger patients poses challenges, as it depends heavily on parents bringing their children in for routine checkups, which typically occur only when medical issues are suspected. This may lead to the exclusion of certain patients based on the study’s selection criteria. To mitigate these challenges, targeted data augmentation techniques can be employed to balance the dataset and improve model performance across all age groups.
The AI-based prototype has demonstrated significant potential for clinical applications, showcasing the feasibility of an end-to-end pipeline that performs bone age assessments within minutes. This proof of concept paves the way for future clinical trials, where the system can be validated against expert assessments to evaluate concordance and refine its clinical predictions. As more data becomes available, newer architectures such as EfficientNet and Vision Transformers (ViT), which are better suited for larger datasets, could be explored. These models, tested alongside the current approach as a baseline, may further enhance system performance and generalization. Another avenue for future work includes the integration of explainable AI techniques, such as Grad-CAM, to improve model transparency. This could provide valuable insights into the most relevant areas influencing the final prediction and offer visual explanations for end users, thereby enhancing the method’s reliability and interpretability.
Ultimately, the AI-based application was designed to augment–not replace–human expertise in clinical workflows. By providing clinicians with a reliable, efficient decision-support tool, the system helps standardize bone age assessments, leading to more consistent, accurate results and reducing potential errors. As the system evolves with real-world clinical feedback and new data, it promises to become an increasingly valuable asset in healthcare, improving the speed, consistency, and accuracy of pediatric bone age evaluations.
Author contributions
R.A.T prepared the data, conceived the deep learning experimentation, and analyzed the results. M.P. contributed to the novel deep learning design. F.V.L. conducted the mobile application prototype. C.K. and C.A.L. acquired and labeled the data. L.A.S. managed and oriented the project. All authors reviewed the manuscript.
Data availability
The datasets generated and/or analyzed during the current study are available in the BAA_data GitHub repository at https://github.com/TeodoroRafael/BAA_data.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Filipe Verrone de Lima, Mirela Popa, Gustavo Scalabrini Sampaio, Cristiane Kochi, Carlos Alberto Longui and Leandro A Silva contributed equally to this work.
References
- 1.Creo, A. L. & Schwenk, W. F. Bone age: A handy tool for pediatric providers. Pediatrics140, 6 (2017). [DOI] [PubMed] [Google Scholar]
- 2.Zerin, J. & Hernandez, R. Approach to skeletal maturation. Hand Clin.7, 53–62 (1991). [PubMed] [Google Scholar]
- 3.Marshall, W. Interrelationships of skeletal maturation, sexual development and somatic growth in man. Ann. Hum. Biol.11, 29–40 (1974). [DOI] [PubMed] [Google Scholar]
- 4.Poznanski, A., Hernandez, R., Guire, K., Bereza, U. & Garn, S. Carpal length in children-a useful measurement in the diagnosis of rheumatoid arthritis and some congenital malformation syndromes. Radiology129, 661–668 (1978). [DOI] [PubMed] [Google Scholar]
- 5.Greulich, W. W. & Pyle, S. I. Radiographic Atlas of Skeletal Development of the Hand and Wrist (Stanford University Press, 1959). [Google Scholar]
- 6.Tanner, J., Healy, M., Cameron, N. & Goldstein, H. Assessment of skeletal maturity and prediction of adult height (tw3 method). Saunders3, 1 (2001). [Google Scholar]
- 7.Jm, Z. & Rj, H. Approach to skeletal maturation. Hand Clin.7, 53–62 (1991). [PubMed] [Google Scholar]
- 8.Moseley, C. F. Assessment and prediction in leg-length discrepancy. Instr Course Lect38, 325–330 (1989). [PubMed] [Google Scholar]
- 9.Bull, R. K., Edwards, P. D., Kemp, P. M., Fry, S. & Hughes, I. A. Bone age assessment: A large scale comparison of the greulich and pyle, and tanner and whitehouse (tw2) methods. Arch. Dis. Child.81, 172–173 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wanga, C. et al. Attention-based multiple-instance learning for pediatric bone age assessment with efficient and interpretable. Biomed. Signal Process. Control79, 104028 (2023). [Google Scholar]
- 11.Zhang, Z. & Sejdić, E. Radiological images and machine learning: Trends, perspectives, and prospects. Comput. Biol. Med.108, 354–370 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Michael, D. & Nelson, A. Handx: A model-based system for automatic segmentation of bones from digital hand radiographs. IEEE Trans. Med. Imaging8, 64–69 (1989). [DOI] [PubMed] [Google Scholar]
- 13.Thodberg, H., Kreiborg, S., Juul, A. & Pedersen, K. The bonexpert method for automated determination of skeletal maturity. IEEE Trans. Med. Imaging28, 52–66 (2009). [DOI] [PubMed] [Google Scholar]
- 14.Patel, V. L. et al. The coming of age of artificial intelligence in medicine. Artif. Intell. Med.46, 5–17 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Miller, D. D. & Brown, E. W. Artificial intelligence in medical practice: The question to the answer?. Am. J. Med.131, 129–133 (2018). [DOI] [PubMed] [Google Scholar]
- 16.Castro, F. C., Flores, E. L., Carrijo, G. A., Veiga, A. C. P. & Carneiro, M. P. B. Localizacao, segmentacao e classificacao automatica de regioes de interesse para a avaliacao da maturidade esquelética utilizando o metodo de tanner-whitehouse. Res. Biomed. Eng.28, 53–67 (2012). [Google Scholar]
- 17.Giordano, D., Kavasidis, I. & Spampinato, C. Modeling skeletal bone development with hidden Markov models. Comput. Med. Imaging Graph.124, 138–147 (2016). [DOI] [PubMed] [Google Scholar]
- 18.Spampinato, C., Palazzo, S., Giordano, D., Aldinucci, M. & Leonardi, R. Deep learning for automated skeletal bone age assessment in x-ray images. Med. Image Anal.36, 41–51 (2017). [DOI] [PubMed] [Google Scholar]
- 19.Choy, G. et al. Current applications and future impact of machine learning in radiology. Radiology288, 318–328 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zhou, J. et al. Using convolutional neural networks and transfer learning for bone age classification. In 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 1–6 (IEEE, 2017).
- 21.Wibisono, A. et al. Deep learning and classic machine learning approach for automatic bone age assessment. 2019 4th Asia-Pacific Conference on Intelligent Robot Systems 235–240 (2019).
- 22.Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition (2015). arxiv:1409.1556.
- 23.Han, Y. & Wang, G. Skeletal bone age prediction based on a deep residual network with spatial transformer. Comput. Methods Progr. Biomed.197, 105754 (2020). [DOI] [PubMed] [Google Scholar]
- 24.Koonce, B. ResNet 50. In Convolutional Neural Networks with Swift for Tensorflow (Apress, 2021). [Google Scholar]
- 25.Bui, T. D., Lee, J.-J. & Shin, J. Incorporated region detection and classification using deep convolutional networks for bone age assessment. Artif. Intell. Med.97, 1–8 (2019). [DOI] [PubMed] [Google Scholar]
- 26.Koitka, S. et al. Mimicking the radiologists’ workflow: Estimating pediatric hand bone age with stacked deep neural networks. Med. Image Anal.64, 101743 (2020). [DOI] [PubMed] [Google Scholar]
- 27.Iglovikov, V., Rakhlin, A., Kalinin, A. & Shvets, A. Paediatric bone age assessment using deep convolutional neural networks. Deep learning in medical image analysis and multimodal learning for clinical decision support 300–308 (2018).
- 28.Halabi, S. S. et al. The rsna pediatric bone age machine learning challenge. RSNA J. Radiol.290, 498–503 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Serpa, A. S. et al. Validation of a deep learning algorithm for bone age estimation among patients in the city of são paulo, brazil. Radiol. Bras.56, 263–268. 10.1590/0100-3984.2023.0056-en (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Fenton, T. A systematic review and meta-analysis to revise the fenton growth chart for preterm infants. BMC Pediatr.13, 59. 10.1186/1471-2431-13-59 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Fushiki, T. Estimation of prediction error by using k-fold cross-validation. Stat. Comput.21, 137–146. 10.1007/s11222-009-9153-8 (2011). [Google Scholar]
- 32.Joseph, R., Santosh, D., Ross, G. & Ali, F. You only look once: Unified, real-time object detection. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 779–788 (2016).
- 33.Hussain, M. Yolov1 to v8: Unveiling each variant–a comprehensive review of yolo. IEEE Access12, 42816–42833. 10.1109/ACCESS.2024.3378568 (2024). [Google Scholar]
- 34.Feng, S., Huang, Y. & Zhang, N. An improved yolov8 obb model for ship detection through stable diffusion data augmentation. Sensors24, 5850. 10.3390/s24175850 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Abadi, M. et al. TensorFlow: Large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.org.
- 36.Chollet, F. keras. https://github.com/fchollet/keras (2015).
- 37.Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at 10.48550/arXiv.1409.1556 (2014). [DOI]
- 38.Lin, T.-Y. et al. Feature pyramid networks for object detection. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2117–2125 (2017).
- 39.Kim, S.-W., Kook, H.-K., Sun, J.-Y., Kang, M.-C. & Ko, S.-J. Parallel feature pyramid network for object detection. Proc. of the European Conference on Computer Vision (ECCV), 234–250 (2018).
- 40.Kirillov, A., Girshick, R., He, K. & Dollar, P. Panoptic feature pyramid networks. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6399–6408 (2019).
- 41.Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. Cbam: Convolutional block attention module. Proc. of the European Conference on Computer Vision (ECCV), 3–19 (2018).
- 42.Mutasa, S., Chang, P., Ruzal-Shapiro, C. & Ayyala, R. Mabal: A novel deep-learning architecture for machine-assisted bone age labeling. J. Digit. Imaging31, 513–519 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Gao, Y., Zhu, T. & Xu, X. Bone age assessment based on deep convolution neural network incorporated with segmentation. Int. J. Comput. Assist. Radiol. Surg.15, 1951–1962 (2020). [DOI] [PubMed] [Google Scholar]
- 44.Zulkifley, M. et al. ntelligent bone age assessment: An automated system to detect a bone growth problem using convolutional neural networks with attention mechanism. Diagnostics11, 765 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wu, E. e. a. Residual attention based network for hand bone age assessment. IEEE 16th International Symposium on Biomedical Imaging, 1158–1161 (2019).
- 46.Larson, D. et al. Performance of a deep-learning neural network model in assessing skeletal maturity on pediatric hand radiographs. Radiology287, 313–322 (2018). [DOI] [PubMed] [Google Scholar]
- 47.Steenkiste, T. V. e. a. Automated assessment of bone age using deep learning and gaussian process regression. 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 674–677 (2018). [DOI] [PubMed]
- 48.Li, S. et al. A deep learning-based computer-aided diagnosis method of x-ray images for bone age assessment. Complex Intell. Syst.8, 1929–1939 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ren, X. et al. Regression convolutional neural network for automated pediatric bone age assessment from hand radiograph. IEEE J. Biomed. Health Inform.23, 2030–2038 (2019). [DOI] [PubMed] [Google Scholar]
- 50.Chen, C. et al. Attention-guided discriminative region localization and label distribution learning for bone age assessment. IEEE J. Biomed. Health Inform.26, 1208–1218 (2022). [DOI] [PubMed] [Google Scholar]
- 51.Su, L., Fu, X. & Hu, Q. Generative adversarial network based data augmentation and gender-last training strategy with application to bone age assessment. Comput. Methods Progr. Biomed.212, 106456 (2021). [DOI] [PubMed] [Google Scholar]
- 52.Liu, C., Xie, H. & Zhang, Y. Self-supervised attention mechanism for pediatric bone age assessment with efficient weak annotation. IEEE Trans. Med. Imaging40, 2685–2697 (2021). [DOI] [PubMed] [Google Scholar]
- 53.Kim, P. et al. Bone age assessment using artificial intelligence in Korean pediatric population: A comparison of deep-learning models trained with healthy chronological and greulich-pyle ages as labels. Korean J. Radiol.24, 1151–1163. 10.3348/kjr.2023.0092 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated and/or analyzed during the current study are available in the BAA_data GitHub repository at https://github.com/TeodoroRafael/BAA_data.









