Development of a Deep-Learning Model for Estimating Newborn Gestational Age via Lumbar Vertebral Segmentation on Plain Radiography

Sungwon Ham; Gayoung Choi; Bo-Kyung Je; Saelin Oh

doi:10.3348/kjr.2025.0172

. 2025 Aug 18;26(9):867–876. doi: 10.3348/kjr.2025.0172

Development of a Deep-Learning Model for Estimating Newborn Gestational Age via Lumbar Vertebral Segmentation on Plain Radiography

Sungwon Ham ^1,^*, Gayoung Choi ^2,^*, Bo-Kyung Je ^2,^✉, Saelin Oh ³

PMCID: PMC12394821 PMID: 40873377

Abstract

Objective

To develop a deep learning model for estimating newborn gestational age (GA) based on the shape of the lumbar vertebral bodies on cross-table lateral radiographs obtained on the first day after birth.

Materials and Methods

This retrospective study included 423 cross-table lateral radiographs of 423 newborns (242 boys and 181 girls) taken within 24 hours after birth at two hospitals. Of these, 256 radiographs (157 boys and 99 girls) obtained from one institution were used for model development, and 167 radiographs (85 boys and 82 girls) from the other institution were used for model external testing. Clinical data, including medical history of underlying disorders, GA determined by ultrasound parameters, birth date, birth weight, sex, examination date, and reason for requesting radiographs, were obtained. The radiographs underwent manual labeling of the five lumbar vertebral bodies, followed by preprocessing steps such as normalization, resizing, denoising, cropping, and augmentation via horizontal flipping and rotation. Subsequently, we trained a deep learning model using a DeepLabv3+ network with a ResNet50 backbone for lumbar segmentation and used a customized AgeClassifier model with two parallel ResNet18 backbones for GA estimation. Model performance was evaluated using an external test dataset after image cropping.

Results

Neither GA nor birth weight differed significantly between boys and girls. In the segmentation model, the mean dice similarity coefficient ± standard deviation (SD) was 0.801 ± 0.031. For GA estimation, the mean absolute error ± SD was 5.2 ± 0.5 days. The Bland-Altman bias (AI-estimated GA - ground truth GA) and 95% limits of agreement were -0.4 days and -13.0 to 12.3 days, respectively.

Conclusion

Our deep learning model showed promising performance in lumbar vertebral body segmentation and GA estimation using plain radiographs, suggesting its potential utility as a supportive tool for neonatal maturity assessment in clinical practice.

Keywords: Infant, newborn; Gestational age; Radiography, abdominal; Age determination by skeleton; Deep learning

INTRODUCTION

Gestational age (GA) is the duration of pregnancy from the beginning of the mother’s last menstrual period to the time of delivery, and it is calculated in weeks [1]. Estimating GA is crucial to assess the maturity of a baby, and thus provide prompt care for each newborn [2]. Appropriate postnatal treatment can reduce medical risks and prevent potential complications in neonates [3].

In a fetus or newborn, GA is usually estimated using clinical data from the last menstrual period, fetal ultrasound (US), or postnatal physical examination [1,4]. Calculation using the last menstrual period is the most suitable definition of GA. However, its validity relies on maternal recall, which is often variable or unreliable, particularly in populations with high rates of maternal illiteracy or low socioeconomic status [4,5,6]. Prenatal US-based dating is achieved by measuring fetal skeletal and organ maturation and is the most accurate method in early pregnancy; however, it can only be performed in an environment equipped with a US machine [5,7]. Postnatal physical examination includes both physical and neurological assessments, and as many as 18 different clinical methods are used to assess GA in newborns, with Dubowitz and Ballard scores being the most validated [5,8,9,10]. Their reliability depends on the training and experience of the physicians performing the assessments [2,4]. Currently, no postnatal radiological method exists for estimating GA in newborns; therefore, we designed a retrospective study introducing the concept of bone age to estimate postnatal GA. Bone age is an assessment of skeletal maturity based on sequential maturation of the ossification center shown on radiographs of the hands, wrists, elbows, knees and pelvis [11].

A child’s bone age provides useful information in various clinical settings such as endocrine disorders, skeletal development, and growth [12]. We focused on the spine when considering bone age in neonates. The spine is an axial skeleton that begins to mineralize in the 8th week of gestation and can be measured from the 10th weeks of gestation using US [13,14]. Spine length showed a strong positive correlation with GA and is considered a reliable indicator of fetal growth along with other US-based parameters [14]. During daily practice, we noticed that the lumbar vertebral bodies were well demonstrated in the cross-table lateral view, with almost no obscured part, which was different from the anteroposterior view. The cross-table lateral view is an additional projection of abdominal radiographs obtained using a horizontal X-ray beam in the supine position. It is the most sensitive method for detecting a small pneumoperitoneum in neonates and aids in diagnosing pneumomediastinum, pneumothorax, and bowel obstruction [15,16,17,18]. In this study, we applied cross-table lateral views with the aim of developing a deep learning-based model for GA estimation in newborns based on the shape of their lumbar vertebral bodies, segmented from cross-table lateral views taken on the first day after birth.

MATERIALS AND METHODS

This retrospective study was approved by the Institutional Review Board of Korea University Ansan Hospital (IRB No. 2019AS0148), and the requirement of informed consent was waived owing to its retrospective design. The overall workflow for data collection, preprocessing, and deep learning model training is illustrated in Figure 1.

Data Collection

Three pediatric radiologists collected cross-table lateral radiographs of newborns obtained within 24 hours after birth between November 2007 and March 2023 at two tertiary institutions. Maternal and neonatal medical records were reviewed to collect clinical data, including history of underlying disorders, GA determined by US parameters, birth date, birth weight, sex, and date of cross-table lateral radiographs with the reason they were requested. Newborns with metabolic disorders, endocrine disorders, chromosomal abnormalities, and congenital anomalies as well as stillbirths were excluded. Two pediatric radiologists excluded images with unacceptable quality, blurred vertebral body margins, vertebral bodies obscured by medical devices (e.g., tubes or lines), or improper lateral positioning. A total of 423 cross-table lateral radiographs of 423 newborns (242 boys and 181 girls) were collected: 256 radiographs (157 boys and 99 girls) from Korea University Ansan Hospital, the primary study site, were used for model development, while 167 radiographs (85 boys and 82 girls) from Korea University Anam Hospital, a branch hospital within the same Korea University Hospital network, were used for model external testing. The dataset from the primary site was randomly divided into 200 subsets for training, 30 for tuning, and 26 for internal testing.

Manual Labeling of Lumbar Vertebral Bodies

A single pediatric radiologist manually labeled the lumbar vertebral bodies to ensure consistent and accurate results. After identifying the five lumbar vertebral bodies on each cross-table lateral radiograph, the radiologist meticulously labeled their margins using the Labelme software (https://github.com/labelmeai/labelme). This manual labeling produced the datasets for ground truth segmentation.

Data Preprocessing and Deep Learning Model Training

A computer scientist designed the data preprocessing and deep learning model training processes (Fig. 1). Several preprocessing steps were used to optimize the datasets for deep learning model training [19]. Initially, Z-score normalization was applied to standardize the pixel intensity values across all images, followed by resizing to ensure uniform input dimensions and noise reduction to improve clarity, thereby making the training process more efficient. Next, a bounding box was automatically generated around the manually annotated ground truth masks of the lumbar vertebral bodies. Subsequently, the bounding box was expanded by five pixels through morphological dilation to include the surrounding anatomical context, and the resulting region was automatically cropped and resized to 256 × 256 pixels, enabling the model to focus on relevant anatomical structures. Clipping was then performed to remove intensity outliers and improve the image quality of the neural network. Additional preprocessing steps included image flipping and rotation to augment the dataset and enhance the generalizability of the model across different directions of the lumbar vertebral bodies. Finally, the images were converted from DICOM to JPG format to simplify processing within the deep learning framework.

The deep learning models consisted of segmentation and GA estimation tasks that were automatically executed in sequence. First, the segmentation model received input radiographs and generated vertebral region masks. These predicted segmentation masks were fed into the GA estimation model as inputs, along with the original radiographs. This end-to-end framework was designed to incorporate raw image features and anatomical localization information extracted from the segmentation model as inputs to the GA estimation model.

The segmentation task for the five lumbar vertebral bodies used for GA estimation utilized a DeepLabv3+ network with a ResNet50 backbone designed for semantic segmentation [20]. DeepLabv3+ was selected for its proven effectiveness in medical image segmentation, particularly in capturing fine structural details through multiscale feature extraction. The model architecture included a ResNet50 backbone for feature extraction, an Atrous Spatial Pyramid Pooling (ASPP) module to capture multiscale contextual information, and a decoder to integrate high- and low-level features for detailed segmentation [21,22]. Although the DeepLabv3+ model was selected before conducting comparative experiments in this study, its performance was subsequently validated against three alternative architectures, thereby confirming its suitability for the task. The segmentation model was trained using the AdamW optimizer with a learning rate of 1e-3, balanced cross-entropy loss, a bath size of 16, and a decay factor of 0.1, over a total of 500 epochs [23]. The segmentation model automatically generated vertebral region masks from input radiographs. These predicted masks were then used as inputs for the GA estimation model, along with the original radiographs.

The ablation study was designed to evaluate the impact of various segmentation models and backbone architectures on model performance. For segmentation, 2D-UNet was selected for its capacity to maintain spatial continuity, which is critical to accurately capture the anatomical structure of the lumbar vertebrae across contiguous slices [24]. UNet++ was chosen because of its refined skip connection structure, which improves multiscale feature aggregation and boundary delineation and is particularly beneficial in segmenting the fine contours of vertebral bodies [25]. DenseNet was used to assess the effects of feature reuse and enhanced gradient flow, which support efficient learning on relatively small-scale medical datasets [26].

For the GA estimation task, we developed a custom AgeClassifier model that regresses GA based on the original and corresponding segmentation masks. The model architecture consisted of two parallel ResNet18 backbones: one processed raw images, and the other processed the segmentation masks. Each branch included convolutional layers, batch normalization, and ReLU activation, enabling the model to learn complementary visual and anatomical features. The network was trained using the AdamW optimizer with a learning rate of 1e-3, batch size of 16, and mean square error as the loss function, for 1000 epochs. The code is publicly available on GitHub: https://github.com/KUAH-rad/GA_Estimation_LumbarSeg.

Evaluation of Segmentation Performance

During training, the model learned to predict lumbar vertebral boundaries using manually labeled ground truth segmentation masks. During inference, the predicted segmentation masks were generated automatically without ground truth annotations. The performance of the segmentation model for the five lumbar vertebral bodies was assessed using the dice similarity coefficient (DSC) or Jaccard similarity coefficient (JSC), which are statistical metrics that quantify similarity by analyzing the overlap between predicted and ground truth segmentations [27].

Evaluation of GA Estimation Performance

The performance of the GA estimation model was evaluated using the mean absolute error (MAE), which is the average of the absolute differences between predicted and actual values, providing a measure of the average error [28]. In addition, we used Bland-Altman analysis to evaluate the agreement between ground truth and AI-estimated GAs.

Statistical Analysis

In addition to the aforementioned statistical methods used in this study, independent t-tests were used to compare GA and birth weight between sexes across the combined data of two institutions. The performances of the deep learning model with and without image cropping were compared using paired t-tests. Statistical analysis was conducted with IBM SPSS Statistics (version 25, IBM Corp., Armonk, NY, USA), MedCalc® Statistical Software (version 20.215, Ostend, Belgium), and Python (version 3.11, Python Software Foundation, Wilmington, DE, USA), with significance set at P < 0.05.

RESULTS

Study Participants

The GA distribution of the 423 newborns is illustrated as a histogram (Fig. 2). Seventy-three newborns (17.3%) were delivered in the first and second trimesters of pregnancy (GA <28 weeks, 195 days), and 350 (82.7%) were delivered in the third trimester. The most premature newborn had a GA of 155 days (22⁺¹ weeks), whereas the most mature newborn had a GA of 292 days (41⁺⁵ weeks). Neither GA nor birth weight showed a statistically significant difference between male and female newborns, either within individual institutions or across the entire cohort (Table 1). A review of medical records indicated that cross-table lateral radiographs were primarily requested to evaluate abdominal distention or barotrauma, including pneumothorax, pneumomediastinum, and pneumoperitoneum.

Table 1. Gestational age and birth weight of 423 participants and their comparison between sexes.

Institutions	Sex	No.	GA, day	Birth weight, g
Internal (n = 256)	Male	157	250.5 ± 35.7	2677.5 ± 994.5
	Female	99	242.8 ± 39.4	2410.0 ± 1043.4
	P-value		0.052	0.371
External (n = 167)	Male	85	230.2 ± 38.4	2026.9 ± 1137.1
	Female	82	236.2 ± 37.5	2024.6 ± 1067.9
	P-value		0.646	0.152
Total (n = 423)	Male	242	243.3 ± 37.9	2449.0 ± 1089.9
	Female	181	239.8 ± 38.6	2235.4 ± 1069.1
	P-value		0.422	0.899

Open in a new tab

Data are mean ± standard deviation.

GA = gestational age

Segmentation Performance

The performance metrics of the segmentation model (DeepLabv3+) for the lumbar vertebral bodies on both the internal and external datasets are listed in Table 2. In the dataset for the external institution, the mean DSC ± standard deviation (SD) was 0.801 ± 0.031 when image cropping was applied, which showed a statistically significant improvement (P = 0.046) compared to the results without image cropping.

Table 2. Model’s performance in segmentation of lumbar vertebral bodies and GA estimation.

Institutions	Input data	Segmentation	GA estimation
Institutions	Input data	DSC	MAE, day
Internal (n = 256)	w/o crop	0.804 ± 0.045	6.2 ± 0.6
	w/ crop	0.849 ± 0.022	4.9 ± 0.5
	P-value	0.031	0.058
External (n = 167)	w/o crop	0.781 ± 0.054	7.2 ± 0.7
	w/ crop	0.801 ± 0.031	5.2 ± 0.5
	P-value	0.046	0.068

Open in a new tab

Data are mean ± standard deviation.

GA = gestational age, DSC = dice similarity coefficient, MAE = mean absolute error, w/o = without, w/ = with

The results of the ablation study using DenseNet, 2D-UNet, and UNet++ are presented in Table 3 and Figure 3. DeepLabv3+ showed a higher mean DSC and mean JSC on both the internal and external datasets than DenseNet, 2D-UNet, and UNet++.

Table 3. Ablation study of model’s performance in various notable network architectures for segmentation of lumbar vertebral bodies with image cropping.

Network architectures	Institutions	Segmentation
Network architectures	Institutions	DSC	JSC
DeepLabv3+	Internal	0.849 ± 0.022	0.837 ± 0.057
DeepLabv3+	External	0.801 ± 0.031	0.778 ± 0.037
DenseNet	Internal	0.802 ± 0.079	0.781 ± 0.085
DenseNet	External	0.776 ± 0.089	0.759 ± 0.086
2D-UNet	Internal	0.820 ± 0.067	0.796 ± 0.077
2D-UNet	External	0.788 ± 0.085	0.762 ± 0.084
UNet++	Internal	0.819 ± 0.063	0.798 ± 0.069
UNet++	External	0.786 ± 0.084	0.760 ± 0.088

Open in a new tab

Data are mean ± standard deviation.

DSC = dice similarity coefficient, JSC = Jaccard similarity coefficient

GA Estimation Performance

The performance metrics of the GA estimation model for the internal and external datasets are presented in Table 2. In the internal and external datasets, the MAE of GA estimation significantly improved with image cropping compared to that without cropping. In the dataset of the external institution, the MAE ± SD was 5.2 ± 0.5 days when image cropping was applied. Bland-Altman results on ground truth and AI-estimated GAs for both internal and external institutional test datasets are shown in Figure 4. In the external dataset, the mean difference (AI-estimated GA - ground truth GA) and 95% limits of agreement reduced when image cropping was applied: 0.8 days and -16.3 to 18.0 days, respectively, without cropping and -0.4 days and -13.0 to 12.3 days, respectively, with cropping.

DISCUSSION

Physicians use several methods to estimate GA both prenatally and postnatally to ensure appropriate care of fetuses and newborns. Postnatal GA estimation, represented by the Dubowitz and Ballard scores, is based on assessing maturation by a number of external physical characteristics and is widely used because of its affordability and ease of application [5,8,9,10,29,30,31]. However, its accuracy is limited by subjectivity, as it depends on the examiner’s experience. Moreover, its predictive value decreases when applied to preterm, post-term, or critically ill infants because it relies on physical size and neurological muscle tone [32,33].

Prenatal radiological methods for GA estimation appear to be more objective. Among these, prenatal US is the most accurate technique for determining GA [5,7]. Kjar’s postmortem study of fetal hand and foot radiographs demonstrated a consistent correlation between ossification of the hand and foot bones and the crown-rump length of the fetus [34]. Other studies have demonstrated close correlations between dental maturation, crown-rump length, and fetal GA [35,36]. In contrast, postnatal radiological assessments of GA remain limited, with only two studies reporting the use of postnatal brain US to evaluate cerebral gyri and sulci in preterm infants younger than 34 weeks GA [37,38].

We focused on the spine for GA estimation because its length shows a strong positive correlation with GA and is considered a reliable US parameter for fetal growth [14]. Upon review of various radiographs of newborns, we noticed that the cross-table lateral view provided the clearest visualization of the neonatal spine. On the first day after birth, this view distinctly displayed the lumbar vertebrae, as retroperitoneal colon gas was still minimal. In contrast, the anteroposterior abdominal view was less effective because of obscuration caused by abundant bowel gases. The thoracic vertebrae were difficult to delineate in both anteroposterior and lateral views of the chest radiographs because of interference from the lungs, heart, vessels, and airways. Based on these observations, cross-table lateral radiographs were selected for this study.

This study aimed to develop an objective and straightforward radiological method for estimating GA in newborns using deep learning techniques. As shown in Figure 1, our approach leverages a comprehensive deep learning pipeline that performs both segmentation and GA estimation tasks. First, the segmentation task utilized a DeepLabv3+ network with a ResNet50 backbone, a model architecture well-suited for semantic segmentation. This helped us accurately identify and isolate the lumbar vertebral bodies. The ResNet50 backbone facilitated robust feature extraction, whereas the ASPP module and decoder improved segmentation detail and accuracy. In addition to the DeepLabv3+ network, we performed an ablation study using other notable architectures, including DenseNet, 2D-UNet, and UNet++ (Table 3). The additional networks exhibited lower accuracy than the baseline DeepLabv3+ model, probably because of their limited ability to capture multiscale contextual features. Second, for GA estimation, we employed a custom AgeClassifier model that integrated two parallel ResNet18 backbones, one for processing images and the other for processing segmentation masks. These two parallel-backbone approaches allowed the model to effectively correlate segmented anatomical features with GA.

Among the preprocessing techniques, the image cropping technique significantly improved model performance compared with training on uncropped images. Cropping reduced background interference noise, allowing the model to focus on the relevant anatomical area, thereby enhancing learning efficiency. These results underscore the importance of optimizing image analysis workflows in medical applications.

In 2018, Torres et al. introduced a multi-input deep learning AI model for postnatal GA estimation using photographs of a newborn’s face, ear, foot, and birth weight [2]. Their system, based on the CVL17 architecture, improved upon the Ballard score by 21.8%, resulting in an accurate GA estimation of 7.98 days of root mean squared error and 6 days of expected error. By comparison, our AI model requires only a single cross-table lateral abdominal radiograph, uses the ResNet50 architecture, and achieved 4.9 days of MAE, which was under 1 week. In addition, our model is expected to be applicable across a broad age range, thereby addressing the limitations of the Ballard score, which include the low predictive validity for GA in preterm, post-term, and critically ill infants.

Moreover, our AI model has the potential to improve outcomes for babies born with an uncertain GA in low-to-middle income countries, where healthcare is often limited and basic medical imaging is scarce [39,40]. Advancements in digital radiography detectors have made image storage, post-processing, transmission, and remote interpretation more efficient. These compact systems can fit in a backpack, operate on battery power, integrate with AI, and enable image transmission via the internet [40]. They are suitable for use in areas with limited healthcare infrastructure and personnel in low-to-middle income countries.

This study had some limitations. First, the datasets may not represent all populations, particularly those with diverse genetics and healthcare environments. Second, the cohort included newborns whose birth weights fell outside the standard range, specifically those classified as large for the GA, small for the GA, and those with intrauterine growth restrictions of unknown etiology. Third, the radiographs in this study may not have fully captured lumbar vertebral development, as most participants (82.7%) were born in the third trimester. Fourth, the clinical utility of cross-table lateral radiographs in routine neonatal care appears to be limited, rendering their use as a sole method for GA estimation uncertain. Fifth, the reliance on convenience sampling in our study may have introduced bias into the results, particularly considering the substantial influence of image quality on the visibility of the lumbar vertebrae. Sixth, further elucidation may be necessary regarding the performance of GA estimation, encompassing a comprehensive analysis of the influencing factors, further comparative assessments, and a systematic investigation of the potential sources of GA estimation error.

In conclusion, our deep learning model showed promising performance in the segmentation of lumbar vertebral bodies and GA estimation using plain radiographs, suggesting its potential utility as a supportive tool for neonatal maturity assessment in clinical practice. This technique could be particularly useful for preterm and post-term infants by providing a noninvasive and reliable method for GA estimation. Future research should focus on validating this method across various populations and refining its accuracy using advanced deep learning models.

Footnotes

Conflicts of Interest: The authors have no potential conflicts of interest to disclose.

Author Contributions:

Conceptualization: Bo-Kyung Je.
Data curation: all authors.
Formal analysis: Sungwon Ham, Gayoung Choi.
Funding acquisition: Bo-Kyung Je.
Investigation: Sungwon Ham, Gayoung Choi, Bo-Kyung Je.
Methodology: Gayoung Choi, Bo-Kyung Je.
Project administration: Bo-Kyung Je.
Resources: Bo-Kyung Je.
Software: Sungwon Ham.
Supervision: Bo-Kyung Je.
Validation: Sungwon Ham, Gayoung Choi, Saelin Oh.
Visualization: Sungwon Ham, Bo-Kyung Je.
Writing—original draft: Sungwon Ham, Gayoung Choi, Bo-Kyung Je.
Writing—review & editing: Bo-Kyung Je.

Funding Statement: This study was supported by the Korea University College of Medicine (K2011001).

Availability of Data and Material

The datasets generated or analyzed during the study are available from the corresponding author on reasonable request.

References

1.Aminoff MJ, Daroff RB. Encyclopedia of the neurological sciences. 2nd ed. Amsterdam: Elsevier; 2014. pp. 477–480. [Google Scholar]
2.Torres Torres M, Valstar M, Henry C, Ward C, Sharkey D. Postnatal gestational age estimation of newborns using small sample deep learning. Image Vis Comput. 2019;83-84:87–99. doi: 10.1016/j.imavis.2018.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Dodd V. Gestational age assessment. Neonatal Netw. 1996;15:27–36. [PubMed] [Google Scholar]
4.DiPietro JA, Allen MC. Estimation of gestational age: implications for developmental research. Child Dev. 1991;62:1184–1199. [PubMed] [Google Scholar]
5.Lee AC, Panchal P, Folger L, Whelan H, Whelan R, Rosner B, et al. Diagnostic accuracy of neonatal assessment for gestational age determination: a systematic review. Pediatrics. 2017;140:e20171423. doi: 10.1542/peds.2017-1423. [DOI] [PubMed] [Google Scholar]
6.Aliyu LD, Kurjak A, Wataganara T, de Sá RA, Pooh R, Sen C, et al. Ultrasound in Africa: what can really be done? J Perinat Med. 2016;44:119–123. doi: 10.1515/jpm-2015-0224. [DOI] [PubMed] [Google Scholar]
7.Campbell S, Warsof SL, Little D, Cooper DJ. Routine ultrasound screening for the prediction of gestational age. Obstet Gynecol. 1985;65:613–620. [PubMed] [Google Scholar]
8.Dubowitz LM, Dubowitz V, Goldberg C. Clinical assessment of gestational age in the newborn infant. J Pediatr. 1970;77:1–10. doi: 10.1016/s0022-3476(70)80038-5. [DOI] [PubMed] [Google Scholar]
9.Ballard JL, Novak KK, Driver M. A simplified score for assessment of fetal maturation of newly born infants. J Pediatr. 1979;95(5 Pt 1):769–774. doi: 10.1016/s0022-3476(79)80734-9. [DOI] [PubMed] [Google Scholar]
10.Ballard JL, Khoury JC, Wedig K, Wang L, Eilers-Walsman BL, Lipp R. New Ballard score, expanded to include extremely premature infants. J Pediatr. 1991;119:417–423. doi: 10.1016/s0022-3476(05)82056-6. [DOI] [PubMed] [Google Scholar]
11.Creo AL, Schwenk WF., 2nd Bone age: a handy tool for pediatric providers. Pediatrics. 2017;140:e20171486. doi: 10.1542/peds.2017-1486. [DOI] [PubMed] [Google Scholar]
12.Cavallo F, Mohn A, Chiarelli F, Giannini C. Evaluation of bone age in children: a mini-review. Front Pediatr. 2021;9:580314. doi: 10.3389/fped.2021.580314. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Bagnall KM, Harris PF, Jones PR. A radiographic study of the human fetal spine. 2. The sequence of development of ossification centres in the vertebral column. J Anat. 1977;124(Pt 3):791–802. [PMC free article] [PubMed] [Google Scholar]
14.De Biasio P, Ginocchio G, Vignolo M, Ravera G, Venturini PL, Aicardi G. Spine length measurement in the first trimester of pregnancy. Prenat Diagn. 2002;22:818–822. doi: 10.1002/pd.428. [DOI] [PubMed] [Google Scholar]
15.Seibert JJ, Parvey LS. The telltale triangle: use of the supine cross table lateral radiograph of the abdomen in early detection of pneumoperitoneum. Pediatr Radiol. 1977;5:209–210. doi: 10.1007/BF00972178. [DOI] [PubMed] [Google Scholar]
16.Lorenzo RL, Harolds JA. The use of prone films for suspected bowel obstruction in infants and children. AJR Am J Roentgenol. 1977;129:617–622. doi: 10.2214/ajr.129.4.617. [DOI] [PubMed] [Google Scholar]
17.MacEwan DW, Dunbar JS, Smith RD, Brown BS. Pneumothorax in young infants--recognition and evaluation. J Can Assoc Radiol. 1971;22:264–269. [PubMed] [Google Scholar]
18.Hoffer FA, Ablow RC. The cross-table lateral view in neonatal pneumothorax. AJR Am J Roentgenol. 1984;142:1283–1286. doi: 10.2214/ajr.142.6.1283. [DOI] [PubMed] [Google Scholar]
19.Xu R, Zi Y, Dai L, Yu H, Zhu M. Advancing medical diagnostics with deep learning and data preprocessing. Int J Innov Res Comput Sci Technol. 2024;12:143–147. [Google Scholar]
20.Yurtkulu SC, Şahin YH, Unal G. Semantic segmentation with extended DeepLabv3 architecture. [accessed on May 19, 2025]. Available at: [DOI]
21.Mascarenhas S, Agarwal M. A comparison between VGG16, VGG19 and ResNet50 architecture frameworks for image classification. [accessed on May 19, 2025]. Available at: [DOI]
22.Lian X, Pang Y, Han J, Pan J. Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation. Pattern Recognit. 2021;110:107622 [Google Scholar]
23.Huang F, Li J, Zhu X. Balanced symmetric cross entropy for large scale imbalanced and noisy data. [accessed on May 19, 2025];arXiv [Preprint] 2020 doi: 10.48550/arXiv.2007.01618. Available at: [DOI] [Google Scholar]
24.Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. [accessed on May 19, 2025];arXiv [Preprint] 2015 doi: 10.48550/arXiv.1505.04597. Available at: [DOI] [Google Scholar]
25.Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. UNet++: a nested U-Net architecture for medical image segmentation. [accessed on May 19, 2025];arXiv [Preprint] 2018 doi: 10.48550/arXiv.1807.10165. Available at: [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K. DenseNet: implementing efficient ConvNet descriptor pyramids. [accessed on May 19, 2025];arXiv [Preprint] 2014 doi: 10.48550/arXiv.1404.1869. Available at: [DOI] [Google Scholar]
27.Aljabri M, AlGhamdi M. A review on the use of deep learning for medical images segmentation. Neurocomputing. 2022;506:311–335. [Google Scholar]
28.Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)? – arguments against avoiding RMSE in the literature. Geosci Model Dev. 2014;7:1247–1250. [Google Scholar]
29.Allen MC. Assessment of gestational age and neuromaturation. Ment Retard Dev Disabil Res Rev. 2005;11:21–33. doi: 10.1002/mrdd.20059. [DOI] [PubMed] [Google Scholar]
30.Farr V, Kerridge DF, Mitchell RG. The value of some external characteristics in the assessment of gestational age at birth. Dev Med Child Neurol. 1966;8:657–660. doi: 10.1111/j.1469-8749.1966.tb01823.x. [DOI] [PubMed] [Google Scholar]
31.Behrman RE, Butler AS. Preterm birth: causes, consequences, and prevention. Washington, DC: National Academies Press; 2007. pp. 55–83. [PubMed] [Google Scholar]
32.Sanders M, Allen M, Alexander GR, Yankowitz J, Graeber J, Johnson TR, et al. Gestational age assessment in preterm neonates weighing less than 1500 grams. Pediatrics. 1991;88:542–546. [PubMed] [Google Scholar]
33.Taylor RA, Denison FC, Beyai S, Owens S. The external Ballard examination does not accurately assess the gestational age of infants born at home in a rural community of the Gambia. Ann Trop Paediatr. 2010;30:197–204. doi: 10.1179/146532810X12786388978526. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Kjar I. Skeletal maturation of the human fetus assessed radiographically on the basis of ossification sequences in the hand and foot. Am J Phys Anthropol. 1974;40:257–275. doi: 10.1002/ajpa.1330400211. [DOI] [PubMed] [Google Scholar]
35.Garn SM, Burdi AR, Miller RL, Nagy JM. Prenatal dental development as a reference standard for embryologic status. J Dent Res. 1970;49:894. doi: 10.1177/00220345700490043701. [DOI] [PubMed] [Google Scholar]
36.Sakurai T, Michiue T, Ishikawa T, Yoshida C, Sakoda S, Kano T, et al. Postmortem CT investigation of skeletal and dental maturation of the fetuses and newborn infants: a serial case study. Forensic Sci Med Pathol. 2012;8:351–357. doi: 10.1007/s12024-012-9327-0. [DOI] [PubMed] [Google Scholar]
37.Murphy NP, Rennie J, Cooke RW. Cranial ultrasound assessment of gestational age in low birthweight infants. Arch Dis Child. 1989;64:569–572. doi: 10.1136/adc.64.4.569. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Baumann C, Hüppi P, Amato M. [Prenatal and postnatal determination of gestational age of small newborn infants] Z Geburtshilfe Perinatol. 1993;197:135–140. German. [PubMed] [Google Scholar]
39.World Health Organization. Tracking universal health coverage: 2023 global monitoring report. [accessed on October 1, 2024]. Available at: https://www.who.int/publications/i/item/9789240080379.
40.Frija G, Salama DH, Kawooya MG, Allen B. A paradigm shift in point-of-care imaging in low-income and middle-income countries. EClinicalMedicine. 2023;62:102114. doi: 10.1016/j.eclinm.2023.102114. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated or analyzed during the study are available from the corresponding author on reasonable request.

[B1] 1.Aminoff MJ, Daroff RB. Encyclopedia of the neurological sciences. 2nd ed. Amsterdam: Elsevier; 2014. pp. 477–480. [Google Scholar]

[B2] 2.Torres Torres M, Valstar M, Henry C, Ward C, Sharkey D. Postnatal gestational age estimation of newborns using small sample deep learning. Image Vis Comput. 2019;83-84:87–99. doi: 10.1016/j.imavis.2018.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Dodd V. Gestational age assessment. Neonatal Netw. 1996;15:27–36. [PubMed] [Google Scholar]

[B4] 4.DiPietro JA, Allen MC. Estimation of gestational age: implications for developmental research. Child Dev. 1991;62:1184–1199. [PubMed] [Google Scholar]

[B5] 5.Lee AC, Panchal P, Folger L, Whelan H, Whelan R, Rosner B, et al. Diagnostic accuracy of neonatal assessment for gestational age determination: a systematic review. Pediatrics. 2017;140:e20171423. doi: 10.1542/peds.2017-1423. [DOI] [PubMed] [Google Scholar]

[B6] 6.Aliyu LD, Kurjak A, Wataganara T, de Sá RA, Pooh R, Sen C, et al. Ultrasound in Africa: what can really be done? J Perinat Med. 2016;44:119–123. doi: 10.1515/jpm-2015-0224. [DOI] [PubMed] [Google Scholar]

[B7] 7.Campbell S, Warsof SL, Little D, Cooper DJ. Routine ultrasound screening for the prediction of gestational age. Obstet Gynecol. 1985;65:613–620. [PubMed] [Google Scholar]

[B8] 8.Dubowitz LM, Dubowitz V, Goldberg C. Clinical assessment of gestational age in the newborn infant. J Pediatr. 1970;77:1–10. doi: 10.1016/s0022-3476(70)80038-5. [DOI] [PubMed] [Google Scholar]

[B9] 9.Ballard JL, Novak KK, Driver M. A simplified score for assessment of fetal maturation of newly born infants. J Pediatr. 1979;95(5 Pt 1):769–774. doi: 10.1016/s0022-3476(79)80734-9. [DOI] [PubMed] [Google Scholar]

[B10] 10.Ballard JL, Khoury JC, Wedig K, Wang L, Eilers-Walsman BL, Lipp R. New Ballard score, expanded to include extremely premature infants. J Pediatr. 1991;119:417–423. doi: 10.1016/s0022-3476(05)82056-6. [DOI] [PubMed] [Google Scholar]

[B11] 11.Creo AL, Schwenk WF., 2nd Bone age: a handy tool for pediatric providers. Pediatrics. 2017;140:e20171486. doi: 10.1542/peds.2017-1486. [DOI] [PubMed] [Google Scholar]

[B12] 12.Cavallo F, Mohn A, Chiarelli F, Giannini C. Evaluation of bone age in children: a mini-review. Front Pediatr. 2021;9:580314. doi: 10.3389/fped.2021.580314. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Bagnall KM, Harris PF, Jones PR. A radiographic study of the human fetal spine. 2. The sequence of development of ossification centres in the vertebral column. J Anat. 1977;124(Pt 3):791–802. [PMC free article] [PubMed] [Google Scholar]

[B14] 14.De Biasio P, Ginocchio G, Vignolo M, Ravera G, Venturini PL, Aicardi G. Spine length measurement in the first trimester of pregnancy. Prenat Diagn. 2002;22:818–822. doi: 10.1002/pd.428. [DOI] [PubMed] [Google Scholar]

[B15] 15.Seibert JJ, Parvey LS. The telltale triangle: use of the supine cross table lateral radiograph of the abdomen in early detection of pneumoperitoneum. Pediatr Radiol. 1977;5:209–210. doi: 10.1007/BF00972178. [DOI] [PubMed] [Google Scholar]

[B16] 16.Lorenzo RL, Harolds JA. The use of prone films for suspected bowel obstruction in infants and children. AJR Am J Roentgenol. 1977;129:617–622. doi: 10.2214/ajr.129.4.617. [DOI] [PubMed] [Google Scholar]

[B17] 17.MacEwan DW, Dunbar JS, Smith RD, Brown BS. Pneumothorax in young infants--recognition and evaluation. J Can Assoc Radiol. 1971;22:264–269. [PubMed] [Google Scholar]

[B18] 18.Hoffer FA, Ablow RC. The cross-table lateral view in neonatal pneumothorax. AJR Am J Roentgenol. 1984;142:1283–1286. doi: 10.2214/ajr.142.6.1283. [DOI] [PubMed] [Google Scholar]

[B19] 19.Xu R, Zi Y, Dai L, Yu H, Zhu M. Advancing medical diagnostics with deep learning and data preprocessing. Int J Innov Res Comput Sci Technol. 2024;12:143–147. [Google Scholar]

[B20] 20.Yurtkulu SC, Şahin YH, Unal G. Semantic segmentation with extended DeepLabv3 architecture. [accessed on May 19, 2025]. Available at: [DOI]

[B21] 21.Mascarenhas S, Agarwal M. A comparison between VGG16, VGG19 and ResNet50 architecture frameworks for image classification. [accessed on May 19, 2025]. Available at: [DOI]

[B22] 22.Lian X, Pang Y, Han J, Pan J. Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation. Pattern Recognit. 2021;110:107622 [Google Scholar]

[B23] 23.Huang F, Li J, Zhu X. Balanced symmetric cross entropy for large scale imbalanced and noisy data. [accessed on May 19, 2025];arXiv [Preprint] 2020 doi: 10.48550/arXiv.2007.01618. Available at: [DOI] [Google Scholar]

[B24] 24.Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. [accessed on May 19, 2025];arXiv [Preprint] 2015 doi: 10.48550/arXiv.1505.04597. Available at: [DOI] [Google Scholar]

[B25] 25.Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. UNet++: a nested U-Net architecture for medical image segmentation. [accessed on May 19, 2025];arXiv [Preprint] 2018 doi: 10.48550/arXiv.1807.10165. Available at: [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26.Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K. DenseNet: implementing efficient ConvNet descriptor pyramids. [accessed on May 19, 2025];arXiv [Preprint] 2014 doi: 10.48550/arXiv.1404.1869. Available at: [DOI] [Google Scholar]

[B27] 27.Aljabri M, AlGhamdi M. A review on the use of deep learning for medical images segmentation. Neurocomputing. 2022;506:311–335. [Google Scholar]

[B28] 28.Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)? – arguments against avoiding RMSE in the literature. Geosci Model Dev. 2014;7:1247–1250. [Google Scholar]

[B29] 29.Allen MC. Assessment of gestational age and neuromaturation. Ment Retard Dev Disabil Res Rev. 2005;11:21–33. doi: 10.1002/mrdd.20059. [DOI] [PubMed] [Google Scholar]

[B30] 30.Farr V, Kerridge DF, Mitchell RG. The value of some external characteristics in the assessment of gestational age at birth. Dev Med Child Neurol. 1966;8:657–660. doi: 10.1111/j.1469-8749.1966.tb01823.x. [DOI] [PubMed] [Google Scholar]

[B31] 31.Behrman RE, Butler AS. Preterm birth: causes, consequences, and prevention. Washington, DC: National Academies Press; 2007. pp. 55–83. [PubMed] [Google Scholar]

[B32] 32.Sanders M, Allen M, Alexander GR, Yankowitz J, Graeber J, Johnson TR, et al. Gestational age assessment in preterm neonates weighing less than 1500 grams. Pediatrics. 1991;88:542–546. [PubMed] [Google Scholar]

[B33] 33.Taylor RA, Denison FC, Beyai S, Owens S. The external Ballard examination does not accurately assess the gestational age of infants born at home in a rural community of the Gambia. Ann Trop Paediatr. 2010;30:197–204. doi: 10.1179/146532810X12786388978526. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B34] 34.Kjar I. Skeletal maturation of the human fetus assessed radiographically on the basis of ossification sequences in the hand and foot. Am J Phys Anthropol. 1974;40:257–275. doi: 10.1002/ajpa.1330400211. [DOI] [PubMed] [Google Scholar]

[B35] 35.Garn SM, Burdi AR, Miller RL, Nagy JM. Prenatal dental development as a reference standard for embryologic status. J Dent Res. 1970;49:894. doi: 10.1177/00220345700490043701. [DOI] [PubMed] [Google Scholar]

[B36] 36.Sakurai T, Michiue T, Ishikawa T, Yoshida C, Sakoda S, Kano T, et al. Postmortem CT investigation of skeletal and dental maturation of the fetuses and newborn infants: a serial case study. Forensic Sci Med Pathol. 2012;8:351–357. doi: 10.1007/s12024-012-9327-0. [DOI] [PubMed] [Google Scholar]

[B37] 37.Murphy NP, Rennie J, Cooke RW. Cranial ultrasound assessment of gestational age in low birthweight infants. Arch Dis Child. 1989;64:569–572. doi: 10.1136/adc.64.4.569. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B38] 38.Baumann C, Hüppi P, Amato M. [Prenatal and postnatal determination of gestational age of small newborn infants] Z Geburtshilfe Perinatol. 1993;197:135–140. German. [PubMed] [Google Scholar]

[B39] 39.World Health Organization. Tracking universal health coverage: 2023 global monitoring report. [accessed on October 1, 2024]. Available at: https://www.who.int/publications/i/item/9789240080379.

[B40] 40.Frija G, Salama DH, Kawooya MG, Allen B. A paradigm shift in point-of-care imaging in low-income and middle-income countries. EClinicalMedicine. 2023;62:102114. doi: 10.1016/j.eclinm.2023.102114. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Development of a Deep-Learning Model for Estimating Newborn Gestational Age via Lumbar Vertebral Segmentation on Plain Radiography

Sungwon Ham

Gayoung Choi

Bo-Kyung Je

Saelin Oh